=Paper= {{Paper |id=Vol-2769/52 |storemode=property |title=The Style of a Successful Story: a Computational Study on the Fanfiction Genre |pdfUrl=https://ceur-ws.org/Vol-2769/paper_52.pdf |volume=Vol-2769 |authors=Andrea Mattei,Dominique Brunato,Felice Dell'Orletta |dblpUrl=https://dblp.org/rec/conf/clic-it/MatteiBD20 }} ==The Style of a Successful Story: a Computational Study on the Fanfiction Genre== https://ceur-ws.org/Vol-2769/paper_52.pdf
                          The Style of a Successful Story:
                    a Computational Study on the Fanfiction Genre
                    Andrea Mattei• , Dominique Brunato , Felice Dell’Orletta
                                           •
                                             University of Pisa
                               a.mattei3@studenti.unipi.it
             
               Istituto di Linguistica Computazionale “Antonio Zampolli” (ILC–CNR)
                                    ItaliaNLP Lab - www.italianlp.it
            {dominique.brunato, felice.dellorletta}@ilc.cnr.it

                      Abstract                             a characterization of this new genre with respect
                                                           to more traditional ones (Paolillo, 2001; Herring
    This paper presents a new corpus for the               and Androutsopoulos, 2015). In the NLP commu-
    Italian language representative of the fan-            nity, the writing style of user-generated data has
    fiction genre. It comprises about 55k user-            been analyzed through computational stylometry
    generated stories inspired to the original             approaches for addressing tasks broadly related to
    fantasy saga “Harry Potter” and published              author profiling (Daelemans, 2013), such as gen-
    on a popular website. The corpus is large              der and age detection (Peersman et al., 2011; Kop-
    enough to support data-driven investiga-               pel et al., 2002). The vast majority of this work has
    tions in many directions, from more tradi-             taken into account contents published on few mi-
    tional studies on language variation aimed             croblogging platforms considered as more repre-
    at characterizing this genre with respect              sentative of the contemporary user-generated me-
    to more traditional ones, to emerging top-             diascape, e.g. Twitter. More recently, the atten-
    ics in computational social science such as            tion has been oriented to the language used by
    the identification of factors involved in the          online communities whose members share a com-
    success of a story. The latter is the fo-              mon interest towards an object, an activity – and
    cus of the presented case-study, in which              more in general any area of human interest – al-
    a wide set of multi-level linguistic features          lowing scholars to shed light on the growing phe-
    has been automatically extracted from a                nomenon of fandom (Sindoni, 2015). One of the
    subset of the corpus and analysed in or-               most prominent expressions of fandom is fanfic-
    der to detect the ones which significantly             tion (fanfic, fic or FF), i.e. fiction written by fans
    discriminate successful from unsuccessful              of a TV series, movie, book etc., using existing
    stories                                                characters and situations to develop new plots. In
                                                           many languages dedicated websites exist where
1   Introduction                                           users can publish their own literary works inspired
                                                           to the original book they are fans of.
Computational Sociolinguistics is an emergent in-
terdisciplinary field aimed at exploiting compu-              From a computational linguistics standpoint,
tational approaches to study the relationship be-          one perspective from which fanfiction has been in-
tween language and society (Nguyen et al., 2016).          vestigated aimed to infer the relationship between
One of the primary factors driving its foundation is       user-generated stories and their original source,
the widespread diffusion of social media and other         e.g. comparing the representation of characters ac-
user-generated data available online, which has            cording to their gender, as well as to model reader
promoted massive research on computer-mediated             reactions to stories (Smitha and Bamman, 2016).
communication from several perspectives. For               Inspired to that study, which was based on a large
instance, scholars working in the field of genre           dataset of stories mainly in English, we collect a
and register variation have relied on quantitative         new corpus of fanfic stories1 , which, to our knowl-
approaches to inspect the peculiarities of social          edge, is the first one for the Italian language. We
media language, with the purpose of providing              rely on this corpus to carry out an investigation
                                                              1
     Copyright c 2020 for this paper by its authors. Use        Terms of service forbid us to distribute this data. How-
permitted under Creative Commons License Attribution 4.0   ever, the tools used to gather it are available at https:
International (CC BY 4.0).                                 //github.com/AndreMatte97/Fanfiction
aimed at shedding light on the possibility of com-        3   The success of a fanfiction story: an
putationally modeling the expected success of a               exploratory study
fanfic story, based on the assumptions of linguistic
profiling and stylometry research.                        Based on the newly created dataset, we carried
                                                          out a computational stylometric analysis aimed at
2   Dataset collection                                    studying whether there is a connection between
                                                          the success of a fanfic story and its writing style.
The corpus comprises texts collected from efpfan-         Such a connection has been demonstrated for more
fic.net, a portal active since 2001 which allows          canonical literary works covering novel and movie
users to publish stories and to comment on them.          domains (Ganjigunte et al., 2013; Solorio et al.,
The website is made up of two sections: one for           2017), showing that stylometry is a viable ap-
original stories and the other for fanfictions. We        proach also in scenarios different from authorship
considered only the latter and we limited the col-        attribution and verification.
lection to stories based on the fantasy saga by the          The methodological framework of our investi-
British writer J.K. Rowling, “Harry Potter”. This         gation is linguistic profiling (Montemagni, 2013;
choice was motivated by the main purpose of our           van Halteren, 2004), a NLP-based approach in
analysis, i.e. characterizing the success of a novel      which a large set of linguistically-motivated fea-
with respect to its writing style rather than as an ef-   tures automatically extracted from text are used to
fect of the various subject matters it deals with. At     obtain a vector-based representation of it. Such
the same time, the preference given to a very popu-       representations can be then compared across texts
lar book allowed us to keep a consistent number of        representative of different textual genres and va-
potential readers and reviewers across the corpus,        rieties to identify the peculiarities of each. For
still having a large sample of texts to analyze. The      the purpose of our analysis, we split the original
data collection was performed through web scrap-          dataset into two varieties corresponding to “suc-
ing, with two spiders written in Python using the         cessful” and “unsuccessful” stories. To define suc-
open-source Scrapy framework. The first spider            cess we follow an approach similar to that used by
crawls the list of stories in the category of choice      Solorio et al. (2017), which is based on the num-
and extracts their first chapters together with some      ber of reviews obtained by each story. In this re-
metadata, including the URLs of the subsequent            gard, we decided to include all reviews, not only
chapters. The second spider takes these addresses         the positive ones, which can undoubtedly testify a
as input and downloads texts and additional infor-        favorable attitude by the reader for the story. Two
mation about all the chapters after the firsts. In the    main reasons motivated our choice: first, we no-
dataset created this way, the record for each chap-       ticed that the overwhelming majority of collected
ter includes: ID and Reference ID, combinations           reviews are written to convey appreciation, with
used by the website to identify the webpage of            just 0.73% among a total of nearly 900k reviews
each chapter. We use the ID of the first chapter as       being negative; therefore, from a statistical point
a reference to group together records belonging to        of view, we can reasonably get rid of the distinc-
the same story; Title; Rating, an estimate given by       tion between various kinds of reviews and simply
the author about the rawness of themes and scenes         take into consideration the overall amount of feed-
contained in his story; Date of posting; Author’s         back received. Secondly, also a negative feedback
nickname; Number of chapters in the story; Text;          proves that a given story has been read and aroused
Total number of reviews received by the story, di-        some interest in the reader. With this in mind, we
vided in positive, negative and neutral; Number           define as “unsuccessful” those stories that did not
of reviews received by the single chapter, as well        receive any reviews, thus being largely ignored by
as the text of the most recent ones. The crawlers         their readers. Conversely, the “successful” cat-
downloaded 54,717 stories, for a total of 19,7310         egory includes all stories with the same number
chapters and a mean of approximately 3.6 chapter          of chapters having received a review count higher
per story, which is consistent with the one calcu-        than the average of all stories of that length. We
lated taking into account every entry on the web-         also decided to limit the focus of this analysis to
site. The obtained corpus was divided into folders,       single-chapter fanfictions written before 2018, so
each containing stories with the same number of           as to avoid the inclusion of stories not yet con-
chapters.                                                 cluded. The resulting classes comprise 2101 un-
successful texts and 14486 successful ones, with a         language specific tagset (XPOS *); Distribution of
threshold for success amounting to 5 reviews. Ta-          verbs according to tense, mood and person, both
ble 1 shows an example of stories classified in the        for main and auxiliar verbs (aux *; V *)).
two categories.                                               Verbal Predicate Structure: Average distribu-
   All texts were pre-processed by means of reg-           tion of verbal roots and of verbal heads for sen-
ular expressions, with the aim of removing er-             tences (VerbHead); features related to the arity of
rors and inconsistencies in the use of punctua-            verbs (i.e. average number of dependents for ver-
tion, capitalization and special characters, in order      bal head, distribution of verbs by arity).
to increase the reliability of automatic linguistic           Global and Local Parsed Tree: Average depth
annotation and the process of feature extraction,          of the syntactic tree (MaxDepth); average depth of
which were performed using the Profiling-UD tool           embedded complement chains headed by a prepo-
(Brunato et al., 2020).                                    sition; average length of dependency links and of
   In what follows we first provide an overview of         the maximum link (Links Len; Max Link Length);
the linguistic features used for our statistical anal-     relative order of the subject and object with respect
ysis and then we discuss the ones that turned out          to the verb;
to be more prominent in successful writing.                   Syntactic relations: Distribution of typed UD
                                                           dependency relations (dep *);
3.1   Linguistic Features                                     Use of Subordination: Distribution of main
The set of features is based on the one described          and subordinate clauses (Main clause, Subord
in Brunato et al. (2020) and counts more than 150          clause), average length of subordinate chains, dis-
features, distributed across distinct levels of lin-       tribution of subordinate chains by length.
guistic annotation and computed according to the
                                                           4   Data Analysis
Universal Dependencies (UD) annotation frame-
work. These features have be shown to be effec-            For each considered feature we calculated the
tive in a variety of different scenarios, all related to   average value and the standard deviation in the
modeling the ‘form’ of a text, rather than the con-        two classes. We the assessed whether the varia-
tent: e.g., from the assessment of sentence com-           tion between mean values is significant using the
plexity by humans (Brunato et al., 2018) to the            Wilcoxon rank sum test. We found that 57% (i.e.
identification of the native language of a speaker         126 out of the 219) of features are differently dis-
from his/her productions in a second language              tributed in a significant way between successful
(L2) (Cimino et al., 2018). Specifically, they can         and unsuccessful stories. In Table 2 we report an
be grouped into the following main phenomena:              extract of the most interesting ones.
   Raw Text Features: Document length com-                    As it can be seen, successful stories are on aver-
puted as the total number of tokens and of sen-            age longer in terms of number of tokens and sen-
tences ((#Tokens, #Sentences in Table 2); average          tences (1, 2), although these sentences are gener-
sentence length and token length, calculated in to-        ally shorter (3), suggesting that readers appreciate
kens and in characters, respectively (Sent length,         more a plain writing style. However, when lexical
Word length).                                              factors are considered, the preference is given to
   Lexical Richness: Distribution of words and             texts exhibiting less frequent words, as suggested
lemmas belonging to the Basic Italian Vocabulary           by the slightly lower distribution of words belong-
(De Mauro, 2000) (BIV Tok, BIV Types) and to               ing to the Basic Italian Vocabulary (5,6) and es-
the internal repertories (i.e. fundamental, high us-       pecially to the Fundamental one (7). Inflectional
age and high availability, BIV Fund; BIV High-             morphology also appears as a domain of varia-
US; BIV High-AV); Type/Token Ratio, a feature              tion between the two classes. Successful fanfic-
of lexical variety computed as the ratio between           tions employ quite more often verbs in the second
the number of lexical types and the number of to-          person (15), a feature typical of narrative writing
kens in the first 100 and 200 words of text (TTR           related to direct speech. On the contrary, we ob-
Lemma); Lexical density.                                   serve a higher distribution of third person verb,
   Morpho-Syntactic Information: Distribution              specifically auxiliaries, both singular (14) and plu-
of all grammatical categories, with respect to the         ral (13), in less successful texts, which can hint at
Universal part-of-speech tagset (UPOS * and the            a preference for reported speech.
              Label       Example (Italian)                            Example (English)
            Successful    La città di Edimburgo era sommersa da       The city of Edimburgh was flooded by
                          una cascata d’acqua. Pioveva. Pioveva        a cascade of water. It was raining. It
                          da giorni e giorni, senza sosta. Il cielo    had been raining for days and days, re-
                          era illuminato di lampi e scosso da tuoni.   lentlessly. The sky was lit by lightning
                          Le strade erano vuote. Per la prima          and shaken by thunder. The streets were
                          volta da giorni, allo scoccare della mez-    empty. For the first time in days, at
                          zanotte, la pioggia cessò di colpo. Il      the stroke of midnight, the rain stopped
                          silenzio piombò sui quartieri che sem-      abruptly. Silence fell upon the districts
                          brarono improvvisamente più bui. E in       that suddenly seemed darker. And in
                          quel silenzio penetrante, l’unico rumore     that piercing silence, the only noise that
                          che si riusciva a distinguere era un tac-    could be recognized was a faint and ir-
                          tac-tac leggero e discontinuo. Proveniva     regular tac-tac-tac. It was coming from
                          da una finestra. La finestra di una lus-     a window. The window of a luxurious
                          suosa casa in centro, l’unica luce accesa    house in the city centre, the only light
                          a quell’ora. Joanne era davanti al com-      still on at that time. Joanne was in front
                          puter, fonte di quel tremolio e scriveva.    of the computer, source of that trem-
                          Batteva le dita sulla tastiera per alcuni    bling and was writing. She tapped her
                          istanti, poi si fermava, rileggeva, can-     fingers on the keyboard for a few mo-
                          cellava e riscriveva. Andava avanti cosı̀    ments, then stopped, reread, deleted and
                          da giorni. I suoi occhi erano stanchi, ma    rewrote. She had been going on like this
                          la sua mente lavorava frenetica. Man-        for days. Her eyes were tired, but her
                          cava poco2 .                                 mind was working frantically. Almost
                                                                       there.
           Unsuccessful   Il cielo era tetro cosparso di nuvole che    The sky was bleak strewn with clouds
                          sembravano volere annunciare un ac-          that seemed to want to announce a
                          quazzone, il vento ulula forte facendo       downpour, the wind howls loudly mak-
                          sbattere le finestre violentemente, come     ing the windows slam violently, as if
                          se volesse gridare, liberarsi da una rab-    it wanted to scream, to free itself from
                          bia repressa. La donna dai lunghi capelli    a suppressed anger. The woman with
                          rosso scuro continuava a fissare la devas-   the long dark red hair kept staring the
                          tazione attraverso il vetro che ora si era   devastation through the glass that was
                          appannato dal suo stesso respiro. Aveva      now clouded by her own breath. Her
                          lo sguardo malinconico non più illumi-      melancholic gaze was no longer lit up
                          nato da quella dolce espressione che il      by that sweet look that laughter gave
                          riso le donava. Una mano le si poggiò       her. A hand rested on her shoulder and
                          sulla spalla e girò pian piano il volto     slowly turned her face towards the loved
                          verso la persona amata che con un ritmo      one who started slowly caressing her
                          lento cominciò ad accarezzarle le gote      cheeks which took on a rosy tone on
                          che assunsero un colorito roseo alla sua     her pale skin. She closed her eyes, as
                          pelle pallida. Chiuse gli occhi come         if to savor that sweet touch that had now
                          per assaporare quel dolce tocco che ora      moved into her hair. “Don’t look beyond
                          si era spostato nei suoi capelli. “Non       the glass anymore” Whispered the voice
                          guardare più oltre il vetro” Mormorò la    with a note of concern, it belonged to
                          voce con una nota di preoccupazione,         James, husband of Lily the woman with
                          apparteneva a James, marito di Lily la       long red hair.
                          donna dai lunghi capelli rossi3 .

   Table 1: An extract of a ‘successful’ story (the most reviewed one) and of an ‘unsuccessful’ one.


   Focusing on the distribution of morpho-                     ditionally we can see that balanced marks (24), i.e.
syntactic categories, there is a significant differ-           parenthesis and quotation marks, occur more in
ence in the usage of the most common punctuation               successful texts, strengthening our previous claim
marks, commas (25) and full stops (26), which                  about a more frequent presence of direct speech
are quite more frequent in highly-reviewed fan-                in this class. At syntactic level, dependency rela-
fictions. These features relate themselves to the              tions are slightly shorter in successful texts, both
previously observed difference in terms of docu-               considering the average value of all dependen-
ment length, as texts with more sentences neces-               cies (29) and the value of the maximum depen-
sarily use punctuation marks to divide them. Ad-               dency link (30). In readability assessment stud-
                                                               ies, longer syntactic dependencies are typically
  2
    The full story can be found at https://efpfanfic.          found in complex texts, and the same holds for
net/viewstory.php?sid=607026&i=1                               deeper syntactic trees. Both these features have
  3
    The full story can be found at https://efpfanfic.          lower values in highly-reviewed stories, suggest-
net/viewstory.php?sid=27412&i=1
                             Unsucc           Success     clauses (32) in unsuccessful texts, while there is a
 Feature                 Avg (StDev) Avg (StDev)
                                                          nearly even split between hypotaxis and parataxis
                    Raw Text Features
 1. # Tokens             1401 (1940)      2120 (2718)     in successful ones.
 2. # Sentences          78.4 (116.7) 125.1 (153.6)
 3. Sent length          20.18 (12.39) 17.38 (6.43)          To deepen our analysis, we also computed the
 4. Word length          4.50 (.250)      4.52 (.193)     coefficient of variation σ* for all features varying
                     Lexical Features                     significantly between the two classes, where σ* is
 5. % BIV Tok            85.7 (5.1)       84.8 (4.7)
 6. % BIV Types          73.4 (7)         70.1 (7)        the ratio between the standard deviation σ and the
 7. % BIV Fund           61     (7.5)     57.1 (7.7)      mean µ. This allowed us to evaluate the disper-
 8. * % BIV High-AV 3.1         (1)       3.1    (1)      sion of values around the average in a standardized
 9. % BIV High-US        8.5    (2.4)     9.1    (2.5)
 10. Lexical density     .498 (.033)      .503 (.031)
                                                          way, and thus to compare the stability of features
 11. TTR Lemma 100 .560 (.118)            .560 (.112)     pertaining to data measured on different scales. A
 12. TTR Lemma 200 .433 (.114)            .436 (.110)     feature that is much scattered in a class of texts
                Morpho-Syntactic Features                 and highly stable in the other has a greater chance
 13. % Aux 3perPl        13.2 (9)         11.8 (7.6)
 14. % Aux 3perSin       54.4 (17.1)      53.2 (15.6)
                                                          of being a meaningful representative of the latter.
 15. % Aux 2perSin       6.3    (8.1)     7.9    (8.5)
 16. % Aux Imperf.       38.5 (24.8)      31.3 (23.6)        In Figure 1 we show the average variability in
 17. % Aux Pres.         52.4 (26)        60     (23.6)   the two classes of the four groups of features dis-
 18. % V Gerund          5.7    (3.8)     6.3    (4)      tinguished according to the level of annotation
 19. % upos VERB         12.5 (1.8)       12.3 (1.7)
 20. % upos NOUN         13.8 (2.3)       13.5 (2.1)      they were extracted from. As a whole, we no-
 21. % upos ADJ          4.7    (1.4)     4.6    (1.2)    ticed that successful texts display less variability
 22. % upos PRON         8.59 (2.24)      8.51 (2)        in nearly every considered feature: 117 of them
 23. % upos ADP          10.8 (1.9)       10.4 (1.8)
 24. % xpos FB           1.7    (2)       2.3    (2.4)    (92%) are more stable in this class. In successful
 25. % xpos FF           6.5    (2.7)     7.1    (2.8)    stories, features with greater stability compared to
 26. % xpos FS           5.5    (2.2)     6.1    (2.1)    the other class are mainly raw text, e.g. number of
 27. % xpos CC           3.1    (.9)      2.9    (.8)
 28. % xpos CS           1.7    (.7)      1.8    (.7)
                                                          sentences, number of tokens and syntactic ones,
                                                          e.g. verbal heads per sentence and average depth
                   Syntactic Features
 29. Links Len          2.78 (.438)     2.72   (.385)     of syntactic trees. Among the few features which
 30. *Max Link Len      1.19 (2.38)     .687   (1.33)     are more stable in poorly received texts, we find
 30. MaxDepth           3.96 (1.45)     3.58   (.857)     instead verbal predicate features, such as the dis-
 32. % Main clause      48.8 (9.9)      49.9   (9)
 33. % Subord clause    51.2 (9.9)      50.1   (9)        tributions of past tenses and of indicative moods,
 34. % Verb Head        2.63 (1.72)     2.26   (.897)     in addition to the frequency of usage of cardinal
 35. % dep nsubj        4.9    (1.1)    4.7    (1)        numbers. The set of lexical features is instead the
 36. * % dep obj        5.3    (1.1)    5.3    (1)
 37. % dep obl          5.5    (1.1)    5.2    (1)
                                                          most stable one for both classes.
 38. % dep punct        14.2 (3.8)      16     (4.1)
 39. % dep conj         4      (1.3)    3.7    (1.1)
 40. % dep det          10.9 (2)        10.5   (1.8)

Table 2: An extract of linguistic features varying
significantly between successful and unsuccess-
ful stories. All differences are significant at p <
0.001, except for features marked with an asterisk,
which have p < 0.05.


ing that the style of successful writing is charac-
terized by a simpler syntactic structure. Interest-
ingly, these results, although preliminary, go in
the opposite direction to those reported by Ganji-        Figure 1: Average coefficient of variation in each
gunte et al. (2013) for successful literary works in      class of features, both for successful and unsuc-
English, which where found to be less correlated          cessful texts.
with text readability scores. Finally, subordinate
clauses (33) occur slightly more often than main
5   Conclusion                                           M. Koppel, S. Argamon and A. Rachel Shimoni 2002.
                                                           Automatically Categorizing Written Texts by Au-
In this paper, we presented a NLP-based stylo-             thor Gender. Lit. Linguistic Comput., 17, 4, 401–
metric analysis on the emerging genre of fanfic-           412
tion aimed at characterizing the writing style of a      S. Montemagni.      2013.     Tecnologie linguistico-
successful story. We collected a new large-scale           computazionali e monitoraggio della lingua italiana.
corpus which – to the best of our knowledge – is           Studi Italiani di Linguistica Teorica e Applicata
the first one of this genre for Italian. We showed         (SILTA), 145–172.
that successful stories, defined as those receiving      D. Nguyen, A.S. Doğruöz, C.P. Rosé, and F.M.G. de
a number of reviews higher that the average, are           Jong. 2016. Computational Sociolinguistics: A
characterized by a variety of linguistic features at       Survey. Computational Linguistics, Vol. 42, No. 3,
different levels of granularity and that these fea-        537–593.
tures are more uniformly distributed within them.        John Paolillo. 2001. Language variation on Internet
   In the future, we would like to broad the per-          Relay Chat: A social network approach. Journal of
spective to other genres in order to study whether         Sociolinguistics, 5, 180–213.
there are linguistic predictors of successful writ-      C. Peersman, W. Daelemans, and L. Van Vaerenbergh.
ing which are constant across different genres, as          2011. Predicting Age and Gender in Online So-
well as across concepts somehow similar to suc-             cial Networks. Proceedings of the 3rd International
cess, such as virality and engagement.                      Workshop on Search and Mining User-Generated
                                                            Contents, 37–44.
                                                         M.G. Sindoni     2011.    ‘I Really Have No Idea
References                                                 What Non-Fandom People Do with Their Lives.’
D. Brunato, A. Cimino, F. Dell’Orletta, G. Venturi and     A Multimodal and Corpus-Based Analysis of Fan-
  S. Montemagni. 2020. Profiling-UD: a Tool for            fiction. Lingue e Linguaggi, (13), 2015, 277–300,
  Linguistic Profiling of Texts. Proceedings of The        doi.org/10.1285/i22390359v13p277.
  12th Language Resources and Evaluation Confer-         M. Smitha and D. Bamman. 2016. Beyond Canoni-
  ence, European Language Resources Association,           cal Texts: A Computational Analysis of Fanfiction.
  7145–7151.                                               Proceedings of the 2016 Conference on Empirical
D. Brunato, L. De Mattei, F. Dell’Orletta, B. Iavarone     Methods in Natural Language Processing, EMNLP
  and G. Venturi. 2018. Is this Sentence Difficult?        2016, Austin, Texas, USA, November 1-4, 2016.
  Do you Agree?      Proceedings of Conference on
                                                         T. Solorio, M. Montes-y-Gómez, Suraj Maharjan, J.
  Empirical Methods in Natural Language Processing
                                                            Ovalle and Fabio A. González. 2017. Multi-task
  (EMNLP 2018), 2018.
                                                            Approach to Predict Likability of Books. Proceed-
A. Cimino, F. Dell’Orletta, D. Brunato and G. Ven-          ings of the 15th Conference of the European Chap-
  turi. 2018. Sentences and Documents in Native             ter of the Association for Computational Linguistics,
  Language Identification. Proceedings of 5th Italian       1217–1227.
  Conference on Computational Linguistics (CLiC-
  IT), 1–6, Turin.                                       H. van Halteren 2004. Linguistic profiling for author
                                                           recognition and verification. Proceedings of the As-
W. Daelemans. 2013 Explanation in Computational            sociation for Computational Linguistics, 200–207.
  Stylometry. Gelbukh A. (eds) Computational Lin-
  guistics and Intelligent Text Processing. CICLing
  2013, Lecture Notes in Computer Science, vol 7817.
  Springer, Berlin, Heidelberg.
Tullio De Mauro. 2000. Grande dizionario italiano
  dell’uso (GRADIT). Torino, UTET.
V. Ganjigunte Ashok, S. Feng and Yejin Choi. 2013.
   Success with style: Using writing style to predict
   the success of novels. Proceedings of the 2013 Con-
   ference on Empirical Methods in Natural Language
   Processing, 1753–1764.
S.C. Herring and J. Androutsopoulos.           2015.
  Computer-mediated discourse 2.0. The handbook of
  discourse, 2nd ed. Deborah Tannen, Heidi E. Hamil-
  ton, Deborah Schiffrin, eds. John Wiley Sons.,
  1753–1764.