<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hits or Misses? A Linguistically Explainable Formula for Fanfiction Success</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giulio Leonardi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dominique Brunato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felice Dell'Orletta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Istituto di Linguistica Computazionale “Antonio Zampolli”, ItaliaNLP Lab</institution>
          ,
          <addr-line>Pisa</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Pisa</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This study presents a computational analysis of Italian fanfiction, aiming to construct an interpretable model of successful writing within this emerging literary domain. Leveraging explicit features that capture both linguistic style and semantic content, we demonstrate the feasibility of automatically predicting successful writing in fanfiction and we identify a set of robust linguistic predictors that maintain their predictive power across diverse topics and time periods, ofering insights into the universal aspects of engaging storytelling. This approach not only enhances our understanding of fanfiction as a genre but also ofers potential applications in broader literary analysis and content creation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;fanfiction</kwd>
        <kwd>Italian corpus</kwd>
        <kwd>success prediction</kwd>
        <kwd>linguistic features</kwd>
        <kwd>Explainable Boosting Machine</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivation</title>
      <p>
        machine learning ofer a powerful lens for making
explicit patterns that may explain the complex interplay
The growing proliferation of online literary content has between reader engagement and content success.
led to the emergence of new genres and storytelling This paper moves in this field and presents a
computaforms, with fanfiction being particularly popular among tional analysis focused on Italian fanfiction, addressing
teens and young adults. Fanfiction consists of stories the following research questions: i.) Can the success of
created by fans (mostly hobby authors) that extend or Italian fanfiction be automatically predicted using
stylisalter the narrative of existing popular media like books, tic and lexical features of the texts?; ii.) Which types of
movies, comics or games, and represents a significant features demonstrate the highest predictive capability,
portion of user-generated content on the web [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In re- and how consistent are these features across diferent
cent years, the widespread popularity that this genre has time periods and thematic domains?; iii.) To what
exassumed has prompted research into the linguistic and tent can these features be explained in terms of their
stylistic elements that contribute to its success, mirror- contribution to predicting success?
ing studies conducted on more traditional literary genres Our contributions. i.) We collected a corpus of
Ital[
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ], among others. ian fanfiction stories enriched with metadata considered
      </p>
      <p>
        Understanding the elements that contribute to narra- as proxies of their success; ii.) We investigate the
relationtive success is a fascinating area of research with implica- ship between stylistic and lexical features of stories and
tions across various fields, from literary analysis to digital their success from a modeling perspective; iii.) We
idenhumanities. From a socio-linguistic perspective, it can tified the most influential features in success prediction,
ofer deeper insights into people and culture. It also has showing the key role played by form and stylistic related
significant applications in areas such as personalized con- features across time and thematic domains of fanfictions.
tent recommendation and educational technology [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. The paper is structured as follows: Section 2 briefly
While personal interests undoubtedly play a crucial role contextualizes our study among relevant literature;
Secin predicting a reader’s engagement with a literary con- tion 3 presents the reference corpus of Italian fanfiction
tent, the way information is presented can also evoke stories that we collected; in Section 4 we provide an
diferent reactions and levels of interaction, ultimately overview of the approach we devised including the
deinfluencing the narrative’s success. In this regards, recent scription of features used for classification and the
classiadvancements in Natural Language Processing (NLP) and fiers employed. Section 5 discusses the main findings and
ofers a fine-grained analysis of the classification results
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, in terms of feature explainability. In Section 6 we
sum*DCecor0r4es—po0n6,d2in02g4a, uPtihsao,r.Italy marize key findings and outlining promising directions
† These authors contributed equally. for future research in this field.
$ g.leonardi5@studenti.unipi.it (G. Leonardi);
dominique.brunato@ilc.cnr.it (D. Brunato);
felice.dellorletta@ilc.cnr.it (F. Dell’Orletta)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
      </p>
      <p>Attribution 4.0 International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        techniques is still limited. Mattei et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] employ
linguistic profiling to analyze a corpus of Italian fanfiction
The exploration of online content and its engagement inspired by the Harry Potter series, with the purpose of
levels has increasingly benefited from advancements in identifying linguistic patterns associated with success.
NLP and machine learning. Diferent perspectives have Inspired by this previous study, our research aims to
exbeen touched upon considering diferent textual domains, tend these findings through a computational modeling
typology of linguistic features and quantitative metrics approach, investigating the power of linguistic features
to operationalize a very subjective concept like success. for predicting fanfiction success and their generalization
The study by Toubia and colleagues [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] explores how across diferent experimental settings.
the structure of narratives, particularly the internal
semantic progression measured by features derived from
dense word representations, afects the success of stories 3. Corpus Construction
across diferent text typologies (movies, TV shows, and
academic papers). Berger and colleagues [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] examine As a first step, we compiled a reference corpus of
Italhow the linguistic structure of online content afects user ian fanfiction. To this end, we searched available texts
engagement, specifically by modeling sustainable atten- on efpfanfic.net, one of the largest Italian websites
dedition. This concept goes beyond just attracting a reader cated to publishing and reading amateur stories, focusing
with a catchy headline or advertisement; it also encom- specifically on stories labeled in the fanfiction genre.
passes the likelihood that a reader will continue viewing Using a web scraping system, we extracted fanfictions
or reading the content. In their analysis of more than based on the Harry Potter series, a highly popular fandom
35,000 online contents from heterogeneous sources, they on the site, boasting 57,196 stories published between
emphasize the role of features related to processing ease 2003 and 2023. Figure 1 presents the temporal distribution
and emotional language. of these fanfictions up to 2020.
      </p>
      <p>
        In the realm of literary works, Ashok et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] first Additionally, we gathered a secondary corpus
consistleverage stylometric analysis and machine learning tech- ing of 2,441 stories based on The Lord of the Rings series.
niques to predict the success of popular English novels This secondary corpus served as a test set to assess the
from the Gutenberg Project. Their approach demon- influence of thematic domains on the analysis of story
strated the potential of these techniques for assessing success.
literary success. Extending these findings, Maharajan For this study, we focused on the first chapter of each
et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] proposed a multi-task approach to simultane- fanfiction to ensure a consistent analysis. While it is
ously evaluating success and genre prediction. Using widely recognized that thematic units within stories —
deep learning representations, in addition to hand-craft particularly the beginnings and endings — often difer
features related to topic, sentiment, writing style, and from the middle sections due to their distinct narrative
readability of books, they obtained better performance roles, we observed that the majority of stories (69%)
conthan the single success prediction task approach. Focus- sist of only a single chapter, making them efectively
selfing on contemporary English-language literature, the contained. The efpfanfic portal allows users to review
study by Bizzoni and colleagues [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] investigate how per- each chapter with ratings marked as negative, neutral, or
ceived novel quality is influenced by a broad spectrum of positive. Consistent with prior research such as [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] we
textual features — such as those related to readability and used the absolute number of reviews to define the success
sentiment — and how these perceptions vary depending of a story, which we consider broadly as popularity. This
on the reader’s level of expertise. approach is based on the assumption that a high
num
      </p>
      <p>
        The growing volume of online fanfiction has also been ber of interactions, regardless of their sentiment, reflects
the subject of numerous studies, either from the perspec- strong reader’s engagement. This is especially confirmed
tive of text mining by using NLP or through a qualita- since in our dataset negative reviews represent less than
tive lens via a manual examination. A comprehensive 1% of the total.
survey of analyses in this direction has been recently To formulate our success prediction task, we
estabprovided by [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. For example, Milli and Bamman [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] lished a review threshold to classify each story as either
explore the relationship between fanfiction and its orig- a success or a failure. After analyzing the distribution of
inal canon, ofering one of the first empirical analyses reviews for Harry Potter texts (Figure 2), we decided to
of this genre. Similarly, Sourati et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] find that the exclude stories that fell in the middle of the distribution –
similarity between fanfictions and their original stories those that could not be clearly defined as successes or
fail— particularly in terms of emotional arcs and character ures. Consequently, stories with fewer than two reviews
dynamics—correlates significantly with fanfiction’s pop- (25th percentile) were classified as failures, and those
ularity. with more than six reviews (75th percentile) as successes.
      </p>
      <p>In the context of Italian fanfiction, research using NLP Stories within the interquartile range were excluded from</p>
      <sec id="sec-2-1">
        <title>4.1. Success Predictors</title>
        <sec id="sec-2-1-1">
          <title>A comprehensive set of features was extracted for each</title>
          <p>
            story in the corpus. These features were categorized into
two primary groups: linguistic features, reflecting the
text’s linguistic style and structure and lexical features,
representing the semantic content of the text.
4.1.1. Linguistic Features
To model text’s linguistic style and structure, we drew
inspiration from the linguistic profiling framework, a
NLPbased methodology in which a large set of
linguisticallymotivated features automatically extracted from
annotated texts is used to obtain a vector-based
representation of it. Such representations can be then compared
across texts representative of diferent textual genres
and varieties to identify the peculiarities of each [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ].
For our study, we relied on Profiling-UD 1, a
multilingual tool inspired by this framework, which extracts over
130 linguistic features from texts using the Universal
Dependencies (UD) annotation formalism. As described in
Brunato et al. [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ], these features encompass a range of
linguistic phenomena that can be classified into distinct
groups covering e.g. shallow text features (e.g. document
and sentence length, average word length), distribution
of grammatical categories, inflectional morphology and
the analysis. We also excluded texts published after 2020,
considering them too recent for meaningful comparison.
          </p>
          <p>As summarized in Table 1, the final corpora, hereafter
abbreviated as HP (Harry Potter) and LOTR (The Lord of
the Rings), consist of 26,032 and 932 texts, respectively.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Methodology</title>
      <p>Based on the newly collected dataset and its internal
distinction, we formulated the task of success prediction
as a binary classification problem, that is: given a story,
the model is asked to predict whether it belongs to the
successful or unsuccessful class, where the two classes
were defined according to the metric based on the number
of reviews received by readers.</p>
      <p>In line with our main purpose to construct a model of 1http://linguistic-profiling.italianlp.it/
syntactic properties related to local and global parse tree Table 2
depth structure. Classification Accuracy(%) of the Models. ‘Ling.’ and ‘Lex.’</p>
      <p>
        These features have proven efective in tasks related refer respectively to models trained on linguistic and lexical
to modeling text form, such as assessing text complex- features. The baseline corresponds to the majority class label.
ity, and identifying stylistic traits of authors or author Scenario SVM Ling. EBM Ling. SVM Lex. Baseline
groups. Building on previous research on a similar
corpus of fanfiction [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], we hypothesize that these features
can also distinguish between successful and unsuccessful
fanfictions from a modeling perspective.
in-domain
out-domain
avg. cross-time
average
4.1.2. Lexical Features
The second representation employed is based on lexi- better accuracy compared to linear models. Additionally,
cal information and leverages the relative frequency of with a reasonable number of features, the model remains
n-grams in each document. The choice of n-grams, in explainable. Each shape function can be visualized as
contrast to more powerful semantic representation de- a two-dimensional plot, with the feature value on the
rived from embeddings, is deliberately motivated by the x-axis and the score assigned by the shape function on
desire to use lexical features that remain completely ex- the y-axis. A score greater than 0 indicates a contribution
plicit. The model, henceforth referred to as the Lexical towards the positive class, whereas a score less than 0
Model, consists of the following features: indicates a contribution towards the negative class. The
ifnal prediction value for a record is simply the sum of
• Forms: unigrams, bigrams, and trigrams of to- the scores obtained from each shape function, potentially
kens. transformed by the link function. Beyond analyzing
in• Lemmas: unigrams, bigrams, and trigrams of lem- dividual shape functions, the average contribution of
mas. each feature can be evaluated by taking the mean of the
• Characters: sequences of characters at the be- absolute values of the assigned scores.
ginning or end of words, ranging from 1 to 4 There are various algorithms within the family of
characters in length. GAMs, primarily distinguished by the method used to
ift the shape functions. In the case of the EBM,
stan4.2. Classifiers dard gradient boosting is used. However, in each
boosting iteration, the algorithm sequentially cycles through
In line with our research questions, the explainability each feature, constructing each univariate shape function
of the classification is crucial to evaluate the impact of through bagged boosted trees. This method has proven
linguistic and lexical features on the prediction of suc- to be one of the most efective for training a GAM.
cess. Therefore, two classification algorithms that allow For our study, the EBM was employed exclusively for
for a precise global explanation of the predictions were experiments based on linguistic features due to the
exselected. cessive dimensionality of the lexical model. This high
      </p>
      <p>The first classifier employed is a linear Support Vector dimensionality would have rendered the GAM too
comMachine. By fitting a decision hyperplane in the feature plex to interpret and too time-expensive to train.
space, this method enables the examination of the
hyperplane’s coeficients to assess the importance of the
features. 5. Results and Discussion</p>
      <p>
        The second algorithm employed is the Explainable
Boosting Machine (EBM), which belongs to the family of
Generalized Additive Models (GAMs). As explained in
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] a GAM is a model of the form:
      </p>
      <sec id="sec-3-1">
        <title>The classification results are summarized in Table 2, for</title>
        <p>each model and scenario under evaluation.</p>
        <p>For models using linguistic features, in the in-domain
scenario both the SVM and the EBM outperform the
ma() =  0 + ∑︁ () (1) jority class baseline, with accuracies of 65.03% and 66.15%
respectively, compared to 50.16% for the baseline. This
where (.) is called the link function, used to model indicates that both classifiers are efectively capturing
the output (e.g., the logistic function for classification). the linguistic patterns associated with success within the
Each (.) is referred to as a shape function, which is a same thematic domain.
univariate function modeling the relationship between For linguistic models, in the out-domain scenario the
the feature  and the target. performance of the SVM drops significantly, with an
ac</p>
        <p>The prediction is thus a sum of  non-linear and arbi- curacy of 59.22%, whereas the EBM experiences a less
trarily complex shape functions, generally resulting in
tures. We provide an in-depth analysis of this model in
the following section.</p>
        <sec id="sec-3-1-1">
          <title>5.1. The Model of Success</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>To gain a better understanding of the classification results</title>
        <p>
          and identify the most influential features for predicting
success, we ranked the features according to the absolute
value of their weight in the EBM classifier model trained
Figure 3: Classification Accuracy in the Cross-Time Setting on the entire training set. Table 3 presents an extract of
the top 15 features. The analysis reveals that, in
addition to basic text features such as the average document
length (measured in tokens [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]) and the average word
drastic decline, achieving an accuracy of 64.70%. How- length (in characters [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]), more complex linguistic
propever, both classifiers still perform better than the baseline, erties play a crucial role. Among these, features related
suggesting some degree of ability to generalize of the lin- to verbal predicates and verbal morphology emerge as
guistic features across diferent thematic domains. particularly influential. This suggests that the
syntac
        </p>
        <p>The lexical model, in the in-domain scenario, achieves tic and morphological characteristics of verbs, such as
an accuracy of 69.56%, outperforming all models with lin- tense, mood and person, provide valuable information
guistic features, suggesting that lexical features provide for the classifier prediction, highlighting the importance
a more powerful representation for in-domain success of deeper linguistic structures in building a model of
prediction. Nevertheless, in the out-domain scenario, the successful writing.
lexical model does not surpass the baseline, indicating While this ranking highlights the ‘global’ importance
a complete lack of predictive ability. This suggests that of features, it does not explain their efect on
classificalexical features, which are primarily based on the content tion. For a more detailed analysis, Figure 4 in Appendix
of the specific fanfiction’s narrative universe, perform A highlights the threshold values for each of the top
well within the same thematic domain but lose all sig- 15 ranked features, indicating the point at which the
nificance outside of it. Conversely, linguistic features, expected classification shifts from one class to another.
which focus on the form of the text, appear to be more Additionally, it provides the number of instances in the
adaptable regardless of the theme. training set for each feature value. Interestingly, there</p>
        <p>Figure 3 presents the performance over time for classi- are some features which split almost exactly the amount
ifers trained with linguistic features. Additionally, two of data into two subsets. For example, the features
repbaselines are shown: "Random Choice", which randomly resenting word length (char_per_tok) has a discriminant
selects between the two classes, and "Maj. Class", which threshold of 4.55 characters which distinguishes
successalways assigns the majority class from the correspond- ful stories – typically with longer words – from
unsucing training set (2011 stories), i.e. the positive one. The cessful ones – usually with shorter words. Similarly,
fearesults of the lexical model in the cross-time scenario tures related to the (morpho-)syntactic profile of the text
were insignificant, as they were very similar to the "Maj. such as the percentage of conjunctions (dep_dist_conj)
Class" baseline. The classifier, therefore, defaults to as- and non-finite verb forms ( verbs_form_dist_Fin) show a
signing the negative class, demonstrating no predictive similar pattern. For these features, values lower than the
capability. To avoid confusion, the lexical model results discriminant threshold contribute to predicting the
negaare not included in this Figure. In contrast, the cross- tive class, efectively splitting the data into two groups
time results for models using linguistic features are more with comparable densities. Regarding verb presence
(vermeaningful: the results remain stable around an average bal_head_per_sentence), an increased use of verbs
correof 62%, regardless of the dominant class in the tested lates with the unsuccessful class. This finding contradicts
year and the classifier used ( avg. cross-time in Table 2). the idea that higher readability, typically conveyed by a</p>
        <p>
          The cross-time scenario further suggests that linguistic predominantly verbal prose rather than a nominal one,
features possess greater adaptability beyond their own is a good indicator of writing quality. However, it aligns
domain, maintaining a considerable degree of general- with observations by Ashok et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], who identified
ization over time. Conversely, lexical features seem func- similar patterns in canonical literary novels.
tional only within the specific domain of the training set, Features related to verbal morphology also show a
losing all predictive power for texts from diferent do- peculiar trend. For instance, a complementary
perspecmains. Overall the model that performed best on average tive emerges concerning the use of person morphology.
across the three scenarios, and with the least variance Increasing the use of second person plural beyond a
relain performance, is the EBM trained with linguistic fea- tively low threshold (0.4) positively afects the prediction
of success, which may indicate an alignment with the
Reader-Insert2 format, a specific type of fanfiction where
the reader assumes the role of the protagonist, heavily
relying on second-person narration. In contrast, an
excessive use of the first person plural is associated with
the negative class.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusion</title>
      <p>Understanding success factors in literary writing is an
evolving area of cross-disciplinary research. This study
on Italian fanfiction demonstrated the feasibility of
predicting success using computational methods and
explainability techniques. Notably, we found that features
related to style and structure of texts show greater
robustness than lexical ones across diferent domains and
time periods. This suggests that the way a story is crafted
may be more universally appealing than specific word
choices or thematic elements.</p>
      <p>We believe that the implications of this study extend
far beyond fanfiction research. On the one hand, it
provides new methodologies for analyzing online literary
phenomena ofering potential contributions to digital
humanities. From the NLP perspective, it could inform text
generation models, potentially guiding the creation of
content that resonates more efectively with readers.</p>
      <p>Future research could explore the generalizability of
these findings to other languages and genres, as well
as the investigation on the dynamics of evolving reader
preferences over time by also considering alternative
measures to gauge success. Additionally, this study does
not take into account the importance of the author; a
potential future development would be to consider the
2https://fanlore.org/wiki/Reader-Insert
impact of the author’s popularity and productivity on
the success of their fanfiction.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hellekson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Busse</surname>
          </string-name>
          ,
          <article-title>Fan fiction and fan communities in the age of the internet: new essays</article-title>
          ,
          <source>McFarland</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V. G.</given-names>
            <surname>Ashok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <article-title>Success with style: Using writing style to predict the success of novels</article-title>
          ,
          <source>in: Proceedings of the 2013 conference on empirical methods in natural language processing</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>1753</fpage>
          -
          <lpage>1764</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Brottrager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arslan</surname>
          </string-name>
          , U. Brandes, T. Weitin,
          <article-title>Modeling and predicting literary reception. a data-rich approach to literary historical reception</article-title>
          ,
          <source>Journal of Computational Literary Studies</source>
          <volume>1</volume>
          (
          <year>2022</year>
          ). URL: https://doi.org/10.48694/jcls.95.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Algee-Hewitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Allison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gemma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Heuser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Moretti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Walser</surname>
          </string-name>
          , Canon/archive : large
          <article-title>-scale dynamics in the literary field</article-title>
          ,
          <year>2018</year>
          . URL: https:// litlab.stanford.edu/LiteraryLabPamphlet11.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[5] Reviews matter: How distributed mentoring predicts lexical diversity on fanfiction</article-title>
          .net,
          <year>2018</year>
          . URL: https://api.semanticscholar.org/CorpusID: 265096028.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sauro</surname>
          </string-name>
          ,
          <article-title>Fan fiction and informal language learning, The handbook of informal language learning (</article-title>
          <year>2019</year>
          )
          <fpage>139</fpage>
          -
          <lpage>151</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O.</given-names>
            <surname>Toubia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Berger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eliashberg</surname>
          </string-name>
          ,
          <article-title>How quantifying the shape of stories predicts their success</article-title>
          ,
          <source>Proceedings of the National Academy of Sciences of the United States of America</source>
          <volume>118</volume>
          (
          <year>2021</year>
          ). URL: https: //api.semanticscholar.org/CorpusID:235648521.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Berger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Moe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Schweidel</surname>
          </string-name>
          ,
          <article-title>What holds attention? linguistic drivers of engagement</article-title>
          ,
          <source>Journal of Marketing</source>
          <volume>87</volume>
          (
          <year>2023</year>
          )
          <fpage>793</fpage>
          -
          <lpage>809</lpage>
          . URL: https: //api.semanticscholar.org/CorpusID:255250393.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Maharjan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Arevalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Solorio</surname>
          </string-name>
          ,
          <article-title>A multi-task approach to predict likability of books</article-title>
          ,
          <source>in: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume</source>
          <volume>1</volume>
          ,
          <string-name>
            <given-names>Long</given-names>
            <surname>Papers</surname>
          </string-name>
          ,
          <year>2017</year>
          , pp.
          <fpage>1217</fpage>
          -
          <lpage>1227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bizzoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Moreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. M. S.</given-names>
            <surname>Lassen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Thomsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nielbo</surname>
          </string-name>
          ,
          <article-title>A matter of perspective: Building a multi-perspective annotated dataset for the study of literary quality</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            , M.-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hoste</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lenci</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sakti</surname>
          </string-name>
          , N. Xue (Eds.),
          <source>Proceedings of the 2024 Joint International Conference on Computational Linguistics</source>
          ,
          <article-title>Language Resources and Evaluation (LREC-COLING 2024), ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italia,
          <year>2024</year>
          , pp.
          <fpage>789</fpage>
          -
          <lpage>800</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .lrec-main.
          <volume>71</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zigmond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Glassco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Giabbanelli</surname>
          </string-name>
          ,
          <article-title>Big data meets storytelling: using machine learning to predict popular fanfiction</article-title>
          ,
          <source>Social Network Analysis and Mining</source>
          <volume>14</volume>
          (
          <year>2024</year>
          )
          <fpage>58</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Milli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bamman</surname>
          </string-name>
          ,
          <article-title>Beyond canonical texts: A computational analysis of fanfiction</article-title>
          , in: J.
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Duh</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          Carreras (Eds.),
          <source>Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Austin, Texas,
          <year>2016</year>
          , pp.
          <fpage>2048</fpage>
          -
          <lpage>2053</lpage>
          . URL: https://aclanthology.org/D16-1218. doi:
          <volume>10</volume>
          .18653/ v1/
          <fpage>D16</fpage>
          -1218.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sourati Hassan Zadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sabri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chamani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bahrak</surname>
          </string-name>
          ,
          <article-title>Quantitative analysis of fanfictions' popularity</article-title>
          ,
          <source>Social Network Analysis and Mining</source>
          <volume>12</volume>
          (
          <year>2022</year>
          )
          <fpage>42</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mattei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brunato</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Dell'Orletta, The style of a successful story: a computational study on the fanfiction genre</article-title>
          , in: J.
          <string-name>
            <surname>Monti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Tamburini</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Seventh Italian Conference on Computational Linguistics</source>
          , CLiC-it
          <year>2020</year>
          , Bologna, Italy, March 1-
          <issue>3</issue>
          ,
          <year>2021</year>
          , volume
          <volume>2769</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2769</volume>
          /paper_52.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>H. van Halteren</surname>
          </string-name>
          ,
          <article-title>Linguistic profiling for authorship recognition and verification</article-title>
          ,
          <source>in: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)</source>
          , Barcelona, Spain,
          <year>2004</year>
          , pp.
          <fpage>199</fpage>
          -
          <lpage>206</lpage>
          . URL: https://aclanthology.org/ P04-1026. doi:
          <volume>10</volume>
          .3115/1218955.1218981.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D.</given-names>
            <surname>Brunato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Venturi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Montemagni</surname>
          </string-name>
          ,
          <article-title>Profiling-UD: a tool for linguistic profiling of texts</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Béchet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Blache</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cieri</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Goggi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Isahara</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Mazo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
          </string-name>
          , S. Piperidis (Eds.),
          <source>Proceedings of the Twelfth Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>7145</fpage>
          -
          <lpage>7151</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .lrec-
          <volume>1</volume>
          .
          <fpage>883</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Caruana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          ,
          <article-title>Intelligible models for classification and regression</article-title>
          ,
          <source>Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          (
          <year>2012</year>
          ). doi:
          <volume>10</volume>
          .1145/2339530.2339556.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>