<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Unmasking the Wordsmith: Revealing Author Identity through Reader Reviews</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chiara Alzetta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felice Dell'Orletta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chiara Fazzone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessio Miaschi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giulia Venturi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ItaliaNLP Lab, CNR, Istituto di Linguistica Computazionale 'A.Zampolli'</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Traditional genre-based approaches for book recommendations face challenges due to the vague definition of genres. To overcome this, we propose a novel task called Book Author Prediction, where we predict the author of a book based on user-generated reviews' writing style. To this aim, we first introduce the 'Literary Voices Corpus' (LVC), a dataset of Italian book reviews, and use it to train and test machine learning models. Our study contributes valuable insights for developing user-centric systems that recommend leisure readings based on individual readers' interests and writing styles.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Book Author Prediction</kwd>
        <kwd>Italian reviews</kwd>
        <kwd>stylistic analysis</kwd>
        <kwd>user-generated book reviews</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Background</title>
      <p>
        [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Nevertheless, these models often face challenges
when book content is inaccessible due to licensing
reReading for pleasure is currently experiencing a signif- strictions.
icant decline, as evidenced by surveys indicating that Consequently, an alternative and promising line of
leisure reading has reached an unprecedented low1. Book research on book recommender systems involves
leverrecommender systems have been proposed as a valuable aging user reviews as a valuable source of information for
tool to promote the practice of reading for pleasure [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. generating recommendations. Analyzing reviews allows
These systems provide personalized suggestions and aid for a unique perspective on books from the viewpoint of
users in navigating the vast array of available literary their readers, without requiring access to their content.
works [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Their integration into e-commerce services Reviews ofer valuable insights into readers’ opinions
has long been explored, as it benefits both sellers and and preferences, and they have been efectively utilized
consumers [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. to predict trends in the book market [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14 ref15">11, 12, 13, 14, 15</xref>
        ].
      </p>
      <p>
        Typically integrated with online platforms, book rec- There are few attempts to exploit user reviews also for
litommender systems rely on the history of users to pre- erary genre identification. These include [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] for
dict their future interests and provide recommendations English and Portuguese book reviews respectively. We
based on the literary genre or authors that users have have also contributed to this line of research by focusing
previously engaged with. While recommending the other on Italian book reviews [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. In our previous work, we
books by an author that the reader enjoyed is trivial, sug- demonstrated how book reviews published by amateur
gesting books belonging to the same genre remains a readers on two social reading platforms, namely Amazon
complex area of study, particularly concerning literary and Goodreads, can be exploited to automatically identify
novels [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This is mostly due to the fact that the notion the genre of the reviewed book.
of genre represents a quite heterogeneous object of study Building upon our prior investigations, our current
due to multiple factors [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In fact, the same book can research aims to explore whether the writing style of
be assigned to more than one literary genre either on user-generated reviews, analyzed in terms of lexical and
the same reading platform or across diverse platforms. (morpho-)syntactic characteristics, can serve as a reliable
Accordingly, various approaches have been proposed to source of information also to predict the author of a
reautomatically identify literary genres using book content viewed book. We started from the assumption that the
[
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ], titles or summaries [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and even cover designs vague definition of literary genres might make
recommendations based on related authors more efective than
CLiC-it 2023: 9th Italian Conference on Computational Linguistics, genre-based approaches. To this end, inspired by the
Nov 30 — Dec 02, 2023, Venice, Italy literature on Authorship Attribution [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], we introduced
$ chiara.alzetta@ilc.cnr.it (C. Alzetta); felice.dellorletta@ilc.cnr.it a novel task named Book Author Prediction. We tackle
(F. Dell’Orletta); chiara.fazzone@ilc.cnr.it (C. Fazzone); the problem as a supervised classification task, where the
(aGle.sVsieon. mtuirais)chi@ilc.cnr.it (A. Miaschi); giulia.venturi@ilc.cnr.it objective is to predict the author of a given book from a
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ©CCo2Em02mU3oCRnospLWyicreiognhsrtekfAostrthrtihboiusptpioanpPe4rr.0obIynctietesrenaaduttiihononragsl.s(CUCs(eCBpYEer4mU.0i)tR.te-d WundSe.roCrregat)ive suentliokfeptohtee ntrtaiadlitciaonndaildAatuetsh.oIrtsihsiipmAptotrrtibanuttitoonntoastek,thoautr,
1See https://www.istat.it/it/archivio/284591, https: information source consists of user-generated reviews
//literacytrust.org.uk/research-services/annual-literacy-survey/
rather than the books authored by the novelists them- books mainly as a consumer good. Goodreads reviews
selves. This distinction adds a layer of complexity to the are typically exploited to predict the orientation of the
task, making it particularly challenging and novel in its book market [
        <xref ref-type="bibr" rid="ref11 ref13">11, 13</xref>
        ], to map reading preferences across
approach. As a crucial step towards this objective, we in- various communities of users [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], as well as to analyze
troduce a novel dataset of Amazon2 and Goodreads3 book the linguistic style adopted by readers to describe their
reviews, the ‘Literary Voices Corpus’ (LVC). The dataset reading experiences [
        <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
        ]. Conversely, reviews posted
successfully served in diverse experimental settings we on Amazon Books have mostly been investigated within
explored in this work aimed at training and testing pre- marketing and buyers’ behaviour studies, often relying
trained and traditional machine learning models, that use on sentiment analysis [
        <xref ref-type="bibr" rid="ref23 ref24 ref25">23, 24, 25</xref>
        ].
diferent configurations of lexical and (morpho-)syntactic When building LVC, we first chose popular novelists
features, to accomplish the new prediction task. in order to acquire a diverse but rich collection of reviews
      </p>
      <p>
        The work presented in this study falls within the con- from amateur readers. These are J.K. Rowling, Stephen
text of collective eforts to foster the habit of reading and King, J.R.R. Tolkien, Jane Austen, Sarah J. Maas, and Dan
enlarge the readership across diferent target audiences 4. Brown.6
Among these initiatives, LettERE (Letture pER TE) is a Since literary genre is not a monolithic notion [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the
project that aims to encourage and promote the practice books of these authors traverse multiple genres. For
of reading by creating a reading recommendation system example, King’s repertoire encompasses horror, thriller,
that provides personalised recommendations tailored to and science-fiction, while Maas’s fantasy novels also
inthe reader’s language skills and interests (see Acknowl- corporate a substantial element of romance. Then, we
edgements). In this regard, the research presented in this extracted the reviews for their respective books from the
paper contributes significantly to the LettERE project’s ‘A Good Review’ corpus and we integrated the set with
objectives by showing that user-generated reviews can new books if necessary using the ISBN number of a book
be efectively used to identify readers sharing common to unambiguously identify it on Amazon and Goodreads
interests and ultimately provide personalised book rec- and to collect its reviews written in Italian. This was done
ommendations. to reach a minimum of 1,100 reviews per novelist from
      </p>
      <p>The remainder of the paper is organised as follows. Goodreads and 800 reviews from Amazon. While we
Section 2 presents LVC, the novel collection of Italian successfully obtained the desired number of reviews for
book reviews referring to the books of six popular au- most authors, we encountered challenges for Austen and
thors. Section 3 introduces the Book Author Prediction Maas on Amazon. Nonetheless, the number of reviews
task and details the methodology and models exploited collected for these authors can still be considered
reasonin this work to address it. Section 4 presents the results ably comparable to the desired amount. The statistics of
of our experiments. Finally, Section 5 ofers conclusions the final LVC dataset are reported in Table 1.
and outlines potential future research directions. As can be noted, the two portions of the dataset (i.e.,</p>
      <sec id="sec-1-1">
        <title>Amazon and Goodreads) are quite diferent in terms of</title>
        <p>the length of a single review. This diference arises in part
2. The Literary Voices Corpus from the lower number of reviews collected from
Amazon, but mostly from the comparatively greater length
of Goodreads reviews in terms of sentences and tokens.</p>
      </sec>
      <sec id="sec-1-2">
        <title>Thus, achieving a balanced number of reviews across au</title>
        <p>thors does not correspond to an equal number of tokens.</p>
      </sec>
      <sec id="sec-1-3">
        <title>Furthermore, we notice a tendency to produce longer re</title>
        <p>views among the readers of certain authors, such as King,</p>
      </sec>
      <sec id="sec-1-4">
        <title>Maas, or Austen, on both platforms. This represents one of the first general characterization of the diversity across literary voices we collected.</title>
      </sec>
      <sec id="sec-1-5">
        <title>We performed our experiments on the ‘Literary Voices</title>
      </sec>
      <sec id="sec-1-6">
        <title>Corpus’ (LVC), which encompasses a collection of book</title>
        <p>reviews in Italian published on two leading platforms
for Digital Social Reading (DSR), Amazon Books and</p>
      </sec>
      <sec id="sec-1-7">
        <title>Goodreads and covering the work of several authors of</title>
        <p>ifction novels. 5 This corpus is a spin-of of the ‘A Good</p>
      </sec>
      <sec id="sec-1-8">
        <title>Review’ corpus, which we introduced in [18]. The LVC</title>
        <p>corpus is aimed at being representative of two diferent
approaches to writing book reviews, a diversity specific
to the peculiarities of the two platforms. In fact, while</p>
      </sec>
      <sec id="sec-1-9">
        <title>Goodreads gathers a large community of amateur readers to exchange opinions and reading recommendations, Amazon has a marked commercial vocation and treats</title>
        <p>3. Book Author Prediction</p>
      </sec>
      <sec id="sec-1-10">
        <title>2https://www.amazon.it</title>
      </sec>
      <sec id="sec-1-11">
        <title>3https://www.goodreads.com</title>
      </sec>
      <sec id="sec-1-12">
        <title>4See for instance: https://www.regione.toscana.it/-/</title>
        <p>un-patto-per-la-lettura.</p>
      </sec>
      <sec id="sec-1-13">
        <title>5The LVC corpus is freely available under request for research</title>
        <p>purposes.</p>
      </sec>
      <sec id="sec-1-14">
        <title>The novel task of Book Author Prediction consists of</title>
        <p>predicting the author of a book from the readers’
reviews. We explored the performance on the task of a
suite of machine learning algorithms that vary with
re</p>
      </sec>
      <sec id="sec-1-15">
        <title>6The complete list of books whose reviews in Italian have been</title>
        <p>included in LVC can be found in Appendix A.
7
1,100
6,224
180,680</p>
        <p>5.65
164.25</p>
        <p>6
800
2,695
48,275
3.36
60.34</p>
        <sec id="sec-1-15-1">
          <title>3.1. Models</title>
          <p>Linear Support Vector Machine We define two
LinearSVM models, referred to as ‘Profiling’ and ‘Ngrams’
models. The former takes the set of linguistic
characteristics described in Sec. 3.2. Ngrams exploits lexical
information since it uses as input feature a simple
contiguous sequence of n words acquired from the reviews
(i.e. n-grams, with n equal to 1, 2, and 3).</p>
        </sec>
      </sec>
      <sec id="sec-1-16">
        <title>Neural Language Model We relied on the Italian pre</title>
        <p>
          trained version of the BERT model (12 layers, 768
hidden units) [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]7, which was pretrained using the Italian
        </p>
      </sec>
      <sec id="sec-1-17">
        <title>Wikipedia and the Italian portion of the OPUS corpus [28], a multilingual collection of translated open source documents available on the Internet, and fine-tuned on the Book Author Classification task.</title>
      </sec>
      <sec id="sec-1-18">
        <title>LinearSVM + NLM We combined the previous models</title>
        <p>
          Table 2 into a classifier based on LinearSVM and trained using the
Linguistic features acquired from book reviews. internal representations of the BERT model fine-tuned
on the author classification tasks. We refer to this model
as SVM (BERT). SVM (BERT+Profiling) is an additional
Linspect to the architecture and features used for training earSVM model trained using both the fine-tuned
repre(see Section 3.1). The models leverage a wide spectrum of sentations produced by BERT and Profiling-UD features.
text properties acquired from the reviews of increasing The BERT representations used as input features of the
informativeness, which range from n-grams of words SVM model were computed by averaging the embeddings
to stylistic features (Section 3.2), up to contextual sen- of all the tokens in each review.
tence representations of Neural Language Models. For all
models, we adopted a 5-fold cross-validation approach Baselines We compared the performance of the above
for training and testing. The train and test sets always models against a random uniform classifier, i.e. a model
contain reviews of diferent books, thus increasing the that uniformly generates random predictions for each
complexity of the classification tasks. Note that, consid- author.
ering the high discriminative power of proper nouns in
this classification scenario, we performed the linguistic
analysis of reviews and sanitized the text [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] by masking
all tokens marked as proper nouns (POS = PROPN).
Model
Baseline
Profiling
Ngrams
BERT
SVM (BERT)
SVM (BERT + Profiling)
Average
Baseline
Profiling
Ngrams
BERT
SVM (BERT)
SVM (BERT+Profiling)
Average
0.16
0.25
0.44
0.74
0.56
0.57
0.51
0.14
0.22
0.39
0.61
0.43
0.36
0.40
0.16
0.26
0.44
0.73
0.54
0.52
0.50
0.16
0.26
0.42
0.61
0.46
0.43
0.44
        </p>
        <sec id="sec-1-18-1">
          <title>3.2. Linguistic Features</title>
          <p>
            To model the linguistic properties of the reviews, we
relied on a set of 150 linguistic features. These features
correspond to specific aspects of the document structure
and were derived using Profiling-UD [
            <xref ref-type="bibr" rid="ref29">29</xref>
            ], a web-based
tool conceived to linguistically profile multilingual texts
by relying on the Universal Dependencies (UD)
formalism [
            <xref ref-type="bibr" rid="ref30">30</xref>
            ]. The features encompass 9 dimensions of
document structure, which are detailed in Table 2. They
range from morpho-syntactic and inflectional properties
to more complex aspects of sentence structure, such as
the depth of the syntactic tree. Other features pertain to
the structure of sub-trees and include the order of
subjects and objects in relation to the verb, as well as the use
of subordination.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Results</title>
      <p>hibit on average higher accuracy scores overall. This is
possibly due to a typical trait of commercial platforms like</p>
      <sec id="sec-2-1">
        <title>Amazon, whose reviews frequently encompass aspects</title>
        <p>beyond the book’s content, such as parcel delivery or the
edition’s book cover. These topics cause the reviews to
be quite standardised, thus more dificult to discriminate.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Conversely, Goodreads reviews primarily focus on the</title>
        <p>book’s content possibly containing a larger amount of
stylistic elements which help the automatic classification.</p>
      </sec>
      <sec id="sec-2-3">
        <title>This trend holds also when classifying individual authors,</title>
        <p>except Rowling for the Profiling and Ngrams models.</p>
        <p>When looking at the results obtained for individual
authors, Sara J. Maas turned out to be the most
accurately predicted author on both platforms, considering
the average scores across all models. However, upon
closer inspection of the results obtained with the
topperforming model (BERT ), we observe that while Maas
remains the most accurately identified author in Amazon
reviews, the reviews of Jane Austen’s books exhibit the
highest level of distinctiveness on Goodreads.</p>
        <p>Table 3 presents the classification accuracies for the task
of Book Author Prediction. Notably, all models
outperformed the random uniform baseline on both Amazon 4.1. Discussion
and Goodreads. Upon closer examination of the models,
we notice that lexical information has more discrimina- To take a closer look at the classification results, Fig. 1
tive power than linguistic properties in the task. As proof, reports the confusion matrices with the percentage of
consider the global and author-level scores obtained by the predictions made by all models in the Book Author
the Profiling model compared to the Ngram and, most Prediction task. This complements the classification
renotably, the BERT models. Interestingly, using the fine- sults by showing which authors are more confusing and
tuned BERT representations as input features for the SVM which are the most wrongly classified ones.
classifier ( SVM (BERT)) yielded lower results than simply In general, we observe that as the model performance
using pre-trained BERT, and the results are comparable – improves, the matrices become less sparse, regardless of
or lower – when combining contextualized representa- the platform. This means that when the correct author
tions with linguistic features (SVM (BERT+Profiling) ). is predicted most of the time, the erroneous predictions</p>
        <p>Comparing the two platforms, Goodreads reviews ex- are distributed quite evenly among all possible authors.
Consider, for instance, the matrices obtained from the to Goodreads reviews, we observe that Maas is the most
analysis of BERT and compare them with the matrices frequently predicted author, leading to other authors’
referring to the Profiling and Ngrams models, which yield books being frequently misclassified as Maas’s works.
the most sparse matrices. Notably, the reviews of It by King and of the fourth book</p>
        <p>Notable diferences arise in the distribution of pre- from the Harry Potter saga by Rowling are often
incordicted authors across the two platforms. For in- rectly assigned to Maas. The content of these books, at
stance, when considering the Profiling model applied the crossroads between the fantasy and horror genres,
may contribute to the model confusion. However, the BERT model. Both authors, despite their diferences, are
most influencing factor to the Profiling model predictions known for building suspense and tension in their
narraappears to be the review length. On Goodreads, reviews tives and incorporating detailed historical settings and
of King’s and Rowling’s books that are longer than 150 psychological aspects into their work.
tokens are wrongly classified as referring to Maas in over The classification of Goodreads review performed by
40% of cases. On Amazon, we observe an opposite ten- the SVM (BERT) and SVM (BERT + Profiling) models
dency, but for a diferent author: when a review has less highlight author commonalities that did not emerge so
than 10 tokens, the model assigns the review to Rowling strongly with other models. The reviews of Rowling’s
in around 60% of cases. books, for instance, are frequently wrongly classified</p>
        <p>The analysis of the feature rankings8 produced by the as referring to Maas’s work. Both authors are known
classifiers trained on both Amazon and Goodreads re- for their contributions to popular literature, particularly
views confirms the importance of review length for the in the genres of fantasy and young adult fiction, which
Profiling model. Indeed, features that capture structural attract a readership interested in exploring themes of
perproperties are particularly relevant for the model: the sonal growth and self-discovery through the characters’
use of subordination (subordinate_dist) is crucial for clas- coming-of-age journeys.
sifying Rowling’s and King’s reviews on Goodreads, as Overall, no particular author appears to be
systematithey exhibit respectively the lowest and highest use of cally confused by all models. This finding is particularly
subordinate clauses. Conversely, on Amazon, the average interesting from our perspective since it shows that using
number of verb dependents (verb_edges) and the distribu- user-generated reviews as an information source allows
tion of function words (namely, conjunctions, auxiliary to successfully address the Book Author Prediction task.
verbs and determiners) are discriminative for Rowling, It suggests that books authored by diferent novelists
atTolkien, and Maas. tract readers who are interested in similar topics and also</p>
        <p>For what concerns the Ngram model, the feature rank- adopt similar communication strategies in their writing.
ing consists of the n-grams employed by the model or- It also implies that the proposed methodology could have
dered by relevance for book author classification pur- a positive impact on the development of user-centric
poses on Amazon and on Goodreads. Quite expectedly, book recommender systems.
the analysis of the top 100 most relevant n-grams reveals
that, on Amazon, parcel delivery is a highly referenced
topic (e.g. ‘tempi previsti’, expected timing, and ‘ben con- 5. Conclusions
fezionato’, well packaged), especially among the readers
of Tolkien and Rowling, which have the most similar This paper has explored an innovative approach that
n-gram rankings (Spearman correlation score = 0.235, leverages user reviews as a source of information for
 &lt; 0.05). The two authors are the most frequently Book Author Prediction. Building upon our prior work,
confused by the model, especially for what concerns the we introduced a novel dataset of Amazon and Goodreads
reviews of Tolkien’s ‘The Hobbit’ and ‘The Silmarillion’, book reviews, LVC, which has been used for training
wrongly classified as referring to Rowling’s books. In- and evaluating machine learning models addressing the
deed, it is possible that the two authors attract a sim- novel book author prediction task.
ilar readership interested in books involving intricate Our findings highlight the challenging nature of
premythologies, and that feature multi-dimensional charac- dicting the author of a novel from a reader’s review.
Howters with strengths, flaws, and internal struggles. Such ever, the analysis of erroneous predictions pointed us to
closeness between the Amazon reviews of these authors cases of books sharing a similar readership. This
observais captured also by the BERT model which, although per- tion supports the intuition that user-generated reviews
forming better than other models on the task, seems quite can efectively serve as a basis for personalized book
recconfused by the reviews of the same Tolkien books. ommendations. By analyzing reviews, we gained insights</p>
        <p>On Goodreads reviews, where parcel delivery is not rel- into readers’ preferences beyond the writing style of the
evant, the most impactful n-grams tend to revolve around book’s author, opening up new avenues for more tailored
and user-centric recommendations.
book appreciation (e.g., ‘ho apprezzato’, I appreciated;
‘letMoving forward, this research could be expanded by
tura piacevole’, pleasant reading; ‘non mi aspettavo’, I did investigating the impact of exploiting user judgments as
not expect) or plot (‘il maghetto’, the little wizard; ‘signore
an additional feature for classification. Furthermore, the
di’, lord of; ‘chiesa’, church; ‘di epoca’, historical; ‘drago’,
dragon; ‘di vampiri’, of vampires). Therefore, it is not sentiment expressed by readers about a book, whether
surprising to see that King’s reviews are most frequently positive or negative, could be leveraged to validate and
misclassified as referring to Brown’s work, also by the ifne-tune personalized recommendations.</p>
      </sec>
      <sec id="sec-2-4">
        <title>8See Appendix B and C.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <sec id="sec-3-1">
        <title>We thank the “Letture pER TE” (LettERE) project (2022</title>
        <p>2024) funded by Regione Toscana (Progetti Congiunti di</p>
      </sec>
      <sec id="sec-3-2">
        <title>Alta Formazione – POR FSE 2014-2020 Investimenti a favore della crescita e dell’occupazione) in collaboration with M.E.T.A. Srl company.</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>A. Books of the Literary Voices</title>
    </sec>
    <sec id="sec-5">
      <title>Corpus</title>
      <p>Author</p>
      <p>B. Feature ranking Profiling Model (Goodreads)
Feature
ttr_lemma_chunks_100
ttr_form_chunks_100
aux_tense_dist_Pres
ttr_form_chunks_200
ttr_lemma_chunks_200
n_prepositional_chains
n_tokens
upos_dist_AUX
upos_dist_ADP
dep_dist_orphan
upos_dist_DET
aux_mood_dist_Ind
dep_dist_aux
aux_tense_dist_Imp
dep_dist_case
dep_dist_cop
dep_dist_mark
verbs_form_dist_Part
dep_dist_flat:name
aux_num_pers_dist_Sing+3</p>
      <p>Stephen King
Feature
upos_dist_CCONJ
dep_dist_cc
avg_prepositional_chain_len
ttr_form_chunks_200
ttr_lemma_chunks_200
prep_dist_2
prep_dist_1
subordinate_post
dep_dist_orphan
prep_dist_3
subordinate_dist_1
tokens_per_sent
n_tokens
aux_tense_dist_Pres
ttr_lemma_chunks_100
avg_verb_edges
subordinate_pre
verbal_head_per_sent
dep_dist_case
upos_dist_ADP</p>
      <p>C. Feature ranking Profiling Model (Amazon)
Feature
upos_dist_AUX
dep_dist_det
dep_dist_aux
upos_dist_ADV
ttr_lemma_chunks_100
ttr_form_chunks_100
dep_dist_cop
upos_dist_DET
dep_dist_root
dep_dist_advmod
verb_edges_dist_2
verb_edges_dist_3
ttr_lemma_chunks_200
aux_tense_dist_Pres
verb_edges_dist_4
avg_verb_edges
verbs_form_dist_Part
dep_dist_case
ttr_form_chunks_200
verb_edges_dist_1</p>
      <p>Stephen King
Feature
dep_dist_det
dep_dist_cc
upos_dist_CCONJ
upos_dist_DET
ttr_form_chunks_200
ttr_lemma_chunks_200
avg_verb_edges
verbs_form_dist_Part
aux_tense_dist_Pres
verbs_form_dist_Fin
verbs_form_dist_Inf
lexical_density
ttr_lemma_chunks_100
verb_edges_dist_1
dep_dist_root
upos_dist_AUX
dep_dist_aux
principal_proposition_dist
dep_dist_det:poss
dep_dist_flat:foreign</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Alharthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Inkpen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Szpakowicz</surname>
          </string-name>
          ,
          <article-title>Authorship identification for literary book recommendations</article-title>
          ,
          <source>in: Proceedings of the 27th International Conference on Computational Linguistics (COLING)</source>
          ,
          <source>ACL</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>390</fpage>
          -
          <lpage>400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Alharthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Inkpen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Szpakowicz</surname>
          </string-name>
          ,
          <article-title>A survey of book recommender systems</article-title>
          ,
          <source>Journal of Intelligent Information Systems</source>
          <volume>51</volume>
          (
          <year>2018</year>
          )
          <fpage>139</fpage>
          -
          <lpage>160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Schafer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Konstan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          , Recommender systems in e-commerce,
          <source>in: Proceedings of the 1st ACM conference on Electronic commerce</source>
          ,
          <year>1999</year>
          , pp.
          <fpage>158</fpage>
          -
          <lpage>166</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>J.-M. Schaefer</surname>
          </string-name>
          ,
          <article-title>Qu'est-ce qu'un genre littéraire?</article-title>
          ,
          <source>Seuil</source>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Biber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Conrad</surname>
          </string-name>
          , Genre, Register, Style, Cambridge University Press,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Shamir</surname>
          </string-name>
          , UDAT:
          <article-title>Compound quantitative analysis of text using machine learning</article-title>
          ,
          <source>Digital Scholarship in the Humanities</source>
          <volume>36</volume>
          (
          <year>2020</year>
          )
          <fpage>187</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Rahul</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ayush</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Vijay</surname>
          </string-name>
          ,
          <article-title>Genre classification using character networks</article-title>
          ,
          <source>in: Proceedings of the 5th International Conference on Intelligent Computing and Control Systems (ICICCS)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>216</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Worsham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kalita</surname>
          </string-name>
          ,
          <article-title>Genre identification and the compositional efect of genre in literature</article-title>
          ,
          <source>in: Proceedings of the 27th International Conference on Computational Linguistics (COLING)</source>
          ,
          <source>ACL</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1963</fpage>
          -
          <lpage>1973</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ozsarfati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sahin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Saul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yilmaz</surname>
          </string-name>
          ,
          <article-title>Book genre classification based on titles with comparative machine learning algorithms</article-title>
          ,
          <source>in: Proceedings of 2019 4th International Conference on Computer and Communication Systems (ICCCS)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Buczkowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sobkowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kozlowski</surname>
          </string-name>
          ,
          <article-title>Deep learning approaches towards book covers classification</article-title>
          ,
          <source>in: Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods (ICPRAM)</source>
          ,
          <source>SCITEPRESS-Science and Technology Publications</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>309</fpage>
          -
          <lpage>316</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          , Y. Han,
          <article-title>Exploring Goodreads reviews for book impact assessment</article-title>
          ,
          <source>Journal of Informetrics</source>
          <volume>13</volume>
          (
          <year>2019</year>
          )
          <fpage>874</fpage>
          -
          <lpage>886</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Aerts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Smits</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Verlegh</surname>
          </string-name>
          ,
          <article-title>How online consumer reviews are influenced by the language and valence of prior reviews: A construal level perspective</article-title>
          ,
          <source>Computers in Human Behavior</source>
          <volume>75</volume>
          (
          <year>2017</year>
          )
          <fpage>855</fpage>
          -
          <lpage>864</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Maity</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panigrahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <article-title>Analyzing social book reading behavior on Goodreads and how it predicts Amazon best sellers, Influence and Behavior Analysis in Social Networks and Social Media (</article-title>
          <year>2019</year>
          )
          <fpage>211</fpage>
          -
          <lpage>235</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zamal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ruths</surname>
          </string-name>
          ,
          <article-title>Goodreads versus Amazon: the efect of decoupling book reviewing and book selling</article-title>
          ,
          <source>in: Proceedings of International AAAI Conference on Web and Social Media (ICWSM)</source>
          , volume
          <volume>9</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>602</fpage>
          -
          <lpage>605</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Thelwall</surname>
          </string-name>
          ,
          <article-title>Reader and author gender and genre in Goodreads</article-title>
          ,
          <source>Journal of Librarianship and Information Science</source>
          <volume>51</volume>
          (
          <year>2019</year>
          )
          <fpage>403</fpage>
          -
          <lpage>430</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Saraswat</surname>
          </string-name>
          ,
          <article-title>Leveraging genre classification with rnn for book recommendation</article-title>
          ,
          <source>International Journal of Information Technology</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Scofield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Silva</surname>
          </string-name>
          , L. de
          <string-name>
            <surname>Melo-Gomes</surname>
            ,
            <given-names>M. M.</given-names>
          </string-name>
          <string-name>
            <surname>Moro</surname>
          </string-name>
          ,
          <article-title>Book genre classification based on reviews of portuguese-language literature</article-title>
          ,
          <source>in: Proceedings of the International Conference on Computational Processing of the Portuguese Language (PROPOR)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>188</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>C.</given-names>
            <surname>Alzetta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Miaschi</surname>
          </string-name>
          , E. Prat, G. Venturi,
          <article-title>Tell me how you write and I'll tell you what you read: a study on the writing style of book reviews</article-title>
          ,
          <source>Journal of Documentation Forthcoming</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <article-title>A survey of modern authorship attribution methods</article-title>
          ,
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>60</volume>
          (
          <year>2009</year>
          )
          <fpage>538</fpage>
          -
          <lpage>556</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bourrier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Thelwall</surname>
          </string-name>
          ,
          <article-title>The social lives of books: Reading victorian literature on Goodreads</article-title>
          ,
          <source>Journal of Cultural Analytics</source>
          <volume>5</volume>
          (
          <year>2020</year>
          )
          <fpage>12049</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>B.</given-names>
            <surname>Driscoll</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Rehberg Sedo, Faraway, so close: Seeing the intimacy in Goodreads reviews</article-title>
          ,
          <source>Qualitative Inquiry</source>
          <volume>25</volume>
          (
          <year>2019</year>
          )
          <fpage>248</fpage>
          -
          <lpage>259</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>L.</given-names>
            <surname>Nuttall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Harrison</surname>
          </string-name>
          ,
          <article-title>Wolfing down the twilight series: Metaphors for reading in online reviews, Contemporary media stylistics (</article-title>
          <year>2020</year>
          )
          <fpage>35</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Impact of online consumer reviews on Amazon books sales: Empirical evidence from india</article-title>
          ,
          <source>Journal of Theoretical and Applied Electronic Commerce Research</source>
          <volume>16</volume>
          (
          <year>2021</year>
          )
          <fpage>2793</fpage>
          -
          <lpage>2807</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chiavetta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Pilato</surname>
          </string-name>
          ,
          <article-title>A lexicon-based approach for sentiment classification of Amazon books reviews in Italian language</article-title>
          ,
          <source>in: International Conference on Web Information Systems and Technologies (WEBIST)</source>
          , volume
          <volume>3</volume>
          ,
          <string-name>
            <surname>Scitepress</surname>
          </string-name>
          ,
          <year>2016</year>
          , pp.
          <fpage>159</fpage>
          -
          <lpage>170</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Srujan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nikhil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. Raghav</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Karthik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Harish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. Keerthi</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Classification of Amazon book reviews based on sentiment analysis</article-title>
          ,
          <source>in: Information Systems Design and Intelligent Applications</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>401</fpage>
          -
          <lpage>411</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>V.</given-names>
            <surname>Vasudevan</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. John,</surname>
          </string-name>
          <article-title>A review on text sanitization</article-title>
          ,
          <source>International Journal of Computer Applications</source>
          <volume>95</volume>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          , V. Sanh, alii, Transformers:
          <article-title>Stateof-the-art natural language processing</article-title>
          ,
          <source>in: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <source>ACL</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tiedemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Nygaard</surname>
          </string-name>
          ,
          <article-title>The OPUS corpus - parallel and free</article-title>
          ,
          <source>in: Proceedings of the Conference on Language Resources and Evaluation (LREC)</source>
          ,
          <source>ELRA</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>D.</given-names>
            <surname>Brunato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Venturi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Montemagni</surname>
          </string-name>
          ,
          <article-title>Profiling-UD: a tool for linguistic profiling of texts</article-title>
          ,
          <source>in: Proceedings of the Conference on Language Resources and Evaluation (LREC)</source>
          ,
          <source>ELRA</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>7147</fpage>
          -
          <lpage>7153</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>M. C. De Marnefe</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Nivre</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zeman</surname>
          </string-name>
          , Universal dependencies,
          <source>Computational linguistics 47</source>
          (
          <year>2021</year>
          )
          <fpage>255</fpage>
          -
          <lpage>308</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>