Datasets and Models for Authorship Attribution
                                on Italian Personal Writings
                          Gaetana Ruggiero• , Albert Gatt• , Malvina Nissim
          •
             Institute of Linguistics and Language Technology, University of Malta, Malta
         
             Center for Language and Cognition, University of Groningen, The Netherlands
          garuggiero@gmail.com, albert.gatt@um.edu.mt, m.nissim@rug.nl


                        Abstract                                   AA research in Italian has largely focused on
                                                                the single case of Elena Ferrante (Tuzzi and Corte-
    Existing research on Authorship Attribu-
                                                                lazzo, 2018) 1 . The present work seeks a more
    tion (AA) focuses on texts for which a lot
                                                                realistic take, using more diverse, user-generated
    of data is available (e.g novels), mainly in
                                                                data namely web forums comments and diary frag-
    English. We approach AA via Authorship
                                                                ments, thereby introducing two novel datasets for
    Verification on short Italian texts in two
                                                                this task: ForumFree and Diaries.
    novel datasets, and analyze the interaction
    between genre, topic, gender and length.                       We cast the AA problem as authorship verifica-
    Results show that AV is feasible even with                  tion (AV). Rather than identifying the specific au-
    little data, but more evidence helps. Gen-                  thor of a text (the most common task in AA), AV
    der and topic can be indicative clues, and                  aims at determining whether two texts were writ-
    if not controlled for, they might overtake                  ten by the same author or not (Koppel and Schler,
    more specific aspects of personal style.                    2004; Koppel et al., 2009).
                                                                   The GLAD system of Hürlimann et al. (2015)
1    Introduction and Background
                                                                was specifically developed to solve AV problems,
Authorship Attribution (AA) is the task of iden-                and has been shown to be highly adaptable to new
tifying authors by their writing style. In addition             datasets (Halvani et al., 2018). GLAD uses an
to being a tool for studying individual language                SVM with a variety of features including charac-
choices, AA is useful for many real-life appli-                 ter level ones, which have proved to be most effec-
cations, such as plagiarism detection (Stamatatos               tive for AA tasks (Stamatatos, 2009; Moreau et al.,
and Koppel, 2011), multiple accounts detection                  2015; Hürlimann et al., 2015), and is freely avail-
(Tsikerdekis and Zeadally, 2014), and online se-                able. Moreover, Kestemont et al. (2019) show that
curity (Yang and Chow, 2014).                                   many of the best models for authorship attribution
   Most work on AA focuses on English, on rela-                 are based on Support Vector Machines. Hence we
tively long texts such as novels and articles (Juola,           adopt GLAD in the present study.
2015) where personal style could be mitigated due                  More specifically, we run GLAD on our
to editorial interventions. Furthermore, in many                datasets and study the interaction of four differ-
real-world applications the texts of disputed au-               ent dimensions: topic, gender, amount of evidence
thorship tend to be short (Omar et al., 2019).                  per author, and genre. In practice, we design intra-
     The PAN 2020 shared task was originally                    topic, cross-topic, and cross-genre experiments,
meant to investigate multilingual AV in fanfiction,             controlling for gender and amount of evidence per
focusing on Italian, Spanish, Dutch and English                 author. The focus on cross-topic and cross-genre
(Bevendorff et al., 2020). However, the datasets                AV is in line with the PAN 2015 shared task (Sta-
were eventually restricted to English only, to max-             matatos et al., 2015); this setting has been shown
imize the amount of available training data (Keste-             to be more challenging than the task definitions
mont et al., 2020), emphasizing the difficulty in               of previous editions (Juola and Stamatatos, 2013;
compiling large enough datasets for less-resourced              Stamatatos et al., 2014).
languages.
     Copyright c 2020 for this paper by its authors. Use per-
                                                                   1
mitted under Creative Commons License Attribution 4.0 In-            https://www.newyorker.com/culture/cultural-
ternational (CC BY 4.0).                                        comment/the-unmasking-of-elena-ferrante
Contributions We advance AA for Italian in-                    ten in the past, were removed from the dataset, to-
troducing two novel datasets, ForumFree and Di-                gether with their authors when this was the only
aries, which contribute to enhance the amount of               text associated with them.
available Italian data suitable for AA tasks.2                    The stories narrated in the diaries are of a very
    Running a battery of experiments on personal               personal nature, which means that many proper
writings, we show that AV is feasible even with                nouns and names of locations are used. To avoid
little data, but more evidence helps. Gender and               relying on these explicit clues, which are strong
topic can be indicative clues, and if not controlled           but not indicative of personal writing style, we
for, they might overtake more specific aspects of              perform Named Entity Recognition (NER), us-
personal style.                                                ing spaCy (Honnibal, 2015). Person names, lo-
                                                               cations and organizations were replaced by their
2        Data                                                  corresponding labels, namely PER, LOC, ORG.
For the present study, we introduce two novel                  The fourth label used by spaCy, MISC (miscel-
datasets, ForumFree and Diaries. Although al-                  lany), was not considered; dates were also not nor-
ready compiled (Maslennikova et al., 2019), the                malized. Moreover, a separate set of experiments
original ForumFree dataset was not meant for AA.               was performed by bleaching the diary texts prior
Therefore, we reformat it following the PAN for-               to their input to the GLAD system. The bleach-
mat3 . The dataset contains web forum comments                 ing method was proposed by van der Goot et al.
taken from the ForumFree platform4 , and the sub-              (2018) in the context of cross-lingual Gender Pre-
set used in this work covers two topics, Medicina              diction, and consists of transforming tokens into
Estetica (“Aesthethic Medicine”) and Programmi                 an abstract representation that masks lexical forms
Tv (“Tv Programmes”; Celebrities in the origi-                 while maintaining key features. We only use 4 of
nal dataset). A third subset, Mix, is the union of             the 6 original features. Shape transforms upper-
the first two. The Diaries dataset is originally as-           case letters into ‘U’, lowercase ones into ‘L’, dig-
sembled for the present study, and contains a col-             its into ‘D’, and the rest into ‘X’. PunctA replaces
lection of diary fragments included in the project             emojis with ‘J’, emoticons with ‘E’, punctuation
Italiani all’estero: i diari raccontano (“Italians             with ‘P’ and one or more alphanumeric characters
abroad: the diaries narrate”).5 For Diaries, no                with a single ‘W’. Length represents a word by the
topic classification has been taken into account.              number of its characters. Frequency corresponds
Table 1 shows an overview of the datasets.                     to the log frequency of a token in the dataset. The
                                                               features are then concatenated. The word ‘House’
    Subset          # Authors      # Docs   W/A   D/A   W/D    would be rewritten as ‘ULLLL W 05 6’.
                F      M     Tot
    Med Est      33    44     77    56198    63   661    48
                                                               2.2   Reformatting
    Prog TV      78    71   149    153019    32   812    22
    Mix         111   115   276    209217    41   791    29    We reformat both datasets in order to make them
    Diaries      77   188   275      1422   462     5   477    suitable for AV. The data is divided into so-called
                                                               problems: each problem is made of a known and
Table 1: Overview of the datasets. W/A = Avg                   an unknown text of equal length.
words per author; D/A = Avg docs per author;
                                                                  To account for the shortness of the texts and to
W/D = Avg words per doc.
                                                               avoid topic biases that would derive by taking con-
                                                               secutive text as known and unknown fragments,
2.1       Preprocessing                                        all the documents written by the same author are
                                                               first shuffled and then concatenated into a single
For the ForumFree dataset, comments which only                 string. The string is split into two spans contain-
contained the word up, commonly used on the in-                ing the same number of words, so that the words
ternet to give new visibility to a post that was writ-         contained in the unknown span come from subsets
     2
      Further information about the datasets can be found at   of texts which are different from the ones that form
https://github.com/garuggiero/Italian-Datasets-for-AV          the known one. An example of this process is dis-
    3
      https://pan.webis.de/clef15/pan15-web/authorship-
verification.html
                                                               played in Figure 1. Rather than being represented
    4
      https://www.forumfree.it/                                by individual productions, each author is therefore
    5
      https://www.idiariraccontano.org                         represented by a set of texts, whose original se-
Figure 1: Example of the creation of known and unknown documents for the same author when consid-
ering 400 words per author.


quential order has been altered. Each known text            Given that no topic classification is available
is paired with an unknown text from the same au-         for the diaries, the CT experiments are only per-
thor. To create negative instances, given a dataset      formed on the ForumFree dataset. We train the
with multiple problems, one can (i) make use             system on Medicina Estetica and test it on Pro-
of external documents (extrinsic approach (Seid-         grammi Tv, and vice versa.
man, 2013; Koppel and Winter, 2014)), or (ii) use
fragments collated from all authors in the train-        Gender Previous work has shown that similarity
ing data, except the target author (intrinsic ap-        can be observed in writings of people of the same
proach). We create negative instances with an            gender (Basile et al., 2017; Rangel et al., 2017).6
intrinsic approach. More specifically, following         In order to assess the influence of same vs different
Dwyer (2017), the second half of the unknown ar-         gender in AA, we consider three gender settings:
ray is shifted by one, so that the texts of the second   only female authors and only male authors (single-
half of the known array are paired with a different-     gender), and mixed-gender, where the known and
author text in the unknown array. In this way, the       unknown document can be either written by two
label distribution is balanced.                          authors of the same gender, or by a male and a
                                                         female author. In dividing the subsets according
3   Method                                               to the gender of the authors, we consider gender
                                                         implicitly. However, we also perform experiments
Given a pair of known and unknown fragments              adding gender as feature to the instance vectors,
(KU pair), the task is to predict whether they are       indicating both the gender of the known and un-
written by the same author or not. In designing our      known documents’ authors and whether or not the
experiments, we control for topic, gender, amount        gender of the authors is the same.
of evidence, and genre. The latter is fostered by
the diverse nature of our datasets.                      Evidence Following Feiguina and Hirst (2007),
                                                         we experiment with KU pairs of different sizes,
Topic Maintaining the topic roughly constant             i.e. with 400, 1 000, 2 000 and 3 000 words per au-
should allow stylistic features to gain more dis-        thor. Each element of the KU pair is thus made up
criminative value. We design intra-topic (IT) and        of 200, 500, 1 000 and 1 500 words respectively.
cross-topic experiments (CT). In IT, we distin-          To observe the effect of the different text sizes on
guish same- and different-topic KU pairs. In             the classification, we manipulate the number of in-
same-topic, we train and test the system on KU           stances in training and test, so that the same au-
pairs from the same topic. In different-topic, we        thors are included in all the different word settings
include the Mix set and the diaries. Since we train      of a single topic-gender experiment.
and test on a mixture of topics and there can be            6
                                                              Binary gender is a simplification of a much more nu-
topic overlap, these are not truly cross-topic, and      anced situation in reality. Following previous work, we adopt
we do not consider them as such.                         it for convenience.
Genre We perform cross-genre experiments                failed to outperform a majority baseline (de Vries,
(CG) by training on ForumFree and testing on the        2020). He concluded that Tranformer-encoder
Diaries, and vice versa.                                models might not suitable for AA tasks, since they
                                                        will likely overfit if the documents contain no re-
Splits and Evaluation We train on 70% and test          liable clues of authorship (de Vries, 2020).
on 30% of the instances. However, since we are
controlling for gender and topic, the number of         4    Results and Discussion
instances contained in the training and test sets
varies in each experiment. We keep the test sets        The number of experiments is high due to the in-
stable across IT, CT and CG experiments, so that        teraction of the dimensions we consider.
we can compare results. Following the PAN eval-            Tables 2 and 3 only include the mixed-gender
uation settings (Stamatatos et al., 2015), we use       results of the IT experiments on Mix (which cor-
three metrics. c@1 takes into account the num-          responds to the entire ForumFree dataset used for
ber of problems left unanswered and rewards the         this study) and Diaries, respectively. Results con-
system when it classifies a problem as unanswered       cerning all dimensions considered are anyway dis-
rather than misclassifying it.                          cussed in the text. We refer to the combined score.
   Probability scores are converted to binary an-       Since the baseline results are different for each set-
swers: every score greater than 0.5 becomes a           ting, we do not include them. However, all mod-
positive answer, every score smaller than 0.5 cor-      els perform consistently above their corresponding
responds to a negative answer and every score           baseline.
which is exactly 0.5 is considered as an unan-             For the Mix topic, we achieved 0.966 with 96
swered problem. The AU C measure corresponds            authors in total and 3 000 words (Table 2). For the
to the area under the ROC curve (Fawcett, 2006),        diaries, we achieved 0.821 with 46 authors in total
and tests the ability of the system to rank scores      and 3 000 words each (Table 3).7 Although the
properly, assigning low values to negative prob-        training and test sets are of different sizes for both
lems and high values to positive ones (Stamatatos       datasets, more evidence seems to help the model
et al., 2015). The third measure is the product of      to solve the problem.
c@1 and AU C.                                              In the IT experiments, the highest score for
                                                        Medicina Estetica is 0.923, with 41 authors in total
Model We run all experiments using GLAD                 and 1 000 words per author, and for Programmi Tv
(Hürlimann et al., 2015). This is an SVM with rbf      0.944, with 59 authors and 3 000 words each. In
kernel, implemented using Python’s scikit-learn         the CT setting, the scores stay basically the same
(Pedregosa et al., 2011) library and NLTK (Bird et      in both directions. In CG, when training on the
al., 2009). GLAD was designed to work with 24           diaries and testing on Mix, we obtain the same
different features, which take into account stylom-     score when training on Mix with 3 000 words.
etry, entropy and data compression measures. We         When training on Mix and testing on Diaries, we
compare GLAD to a simple baseline which ran-            achieved 0.737 on the same test set, and 0.748 with
domly assigns a label from the set of possible la-      1 000 words per instance.
bels (i.e. ‘YES’ or ‘NO’) to each test instance.
   Our choice fell on GLAD for a variety of rea-        Discussion When more variables interact in the
sons. As a general observation, even in later chal-     same subset, as in mixed-gender sets of the Fo-
lenges, SVMs have proven to be the most effec-          rumFree and Diaries dataset, we found that the
tive for AA tasks (Kestemont et al., 2019). More        classifier uses the implicit gender information. In-
specifically, in a survey of freely available AA sys-   deed, it achieves slightly better scores in mixed-
tems, GLAD showed best performance and espe-            gender settings than in female- and male-only
cially high adaptability to new datasets (Halvani       ones, suggesting that the classifier might be using
et al., 2018). Lastly, de Vries (2020) has ex-          internal clustering of the data rather than writing
plored fine-tuning a pre-trained model for AV in        style characteristics. This also explains why re-
Dutch, a less-resourced language compared to En-        sults are higher in Mix than in separate topics, be-
glish. He found that fine-tuning BERTje (a Dutch        cause the classifier can use topic information.
monolingual BERT-model, (de Vries et al., 2019))            7
                                                              Using a bleached representation of the texts, the score
with PAN 2015 AV data (Stamatatos et al., 2015),        increased by 0.36
                                     # Problems                               Eval
                # W/A     # Auth
                                     Train Test        C       I       U     c@1 AUC             *
                    400       127       88      39     33      6       0    0.846      0.947    0.801
                  1 000       109       76      33     30      3       0    0.909      0.926    0.842
                  2 000       100       70      30     29      1       0    0.967      0.995    0.962
                  3 000        96       67      29     28      1       0    0.966      1.000    0.966

Table 2: Training and test set configurations and IT evaluation scores on Mix texts written by female and
male authors. C,I and U are Correct, Incorrect, Unanswered problems.

                                    # Problems                                 Eval
                # W/A     # Auth
                                    Train Test         C       I       U      c@1 AUC             *
                   400       229      160      69      47     21        1    0.691      0.725   0.500
                 1 000       180      126      54      43     11        0    0.796      0.891   0.709
                 2 000        98       68      30      25      5        0    0.833      0.905   0.754
                 3 000        46       32      14      12      2        0    0.857      0.958   0.821

Table 3: Training and test configurations and IT evaluation scores on diaries made of NE converted text
written by both genders. C,I and U are Correct, Incorrect, Unanswered problems.


   We also observe that by adding gender as an ex-          While making the task more challenging, control-
plicit feature in topic- and gender-controlled sub-         ling for gender and topic ensures that the system
sets, GLAD uses this information to improve clas-           prioritizes authorship over different data clusters.
sification, especially in mixed-gender scenarios.           Although the datasets used are intended for AV
   Although previous research demonstrated that             problems, they can be easily adapted to other AA
CT and CG experiments are harder than IT ones               tasks. We believe this to be one of the major con-
(Sapkota et al., 2014; Stamatatos et al., 2015),            tributions of our work, as it can help to advance
in our case the scores for the three settings are           the up-to-now limited AA research in Italian.
comparable. However, since we only performed
CT and CG experiments on mixed-gender subsets,              Acknowledgments
the gender-specific information might have also
                                                            The ForumFree dataset was a courtesy of the Ital-
played a role in this process (see above).
                                                            ian Institute of Computational Linguistics “Anto-
   Overall, the experiments show that using a
                                                            nio Zampolli” (ILC) of Pisa.8
higher number of words per author is preferable.
Although 3 000 words seems to be optimal for
most settings, in the large number of experiments           References
that we carried out (not all included in this paper)
we also observed that lower amounts of words also           Angelo Basile, Gareth Dwyer, Maria Medvedeva, Jo-
                                                              sine Rawee, Hessel Haagsma, and Malvina Nis-
led to comparable results. This aspect will require           sim. 2017. N-GrAM: New Groningen Author-
further investigation.                                        profiling Model—Notebook for PAN at CLEF 2017.
                                                              In CEUR Workshop Proceedings, volume 1866.
5   Conclusion
                                                            Janek Bevendorff, Bilal Ghanem, Anastasia Gi-
We experimented with AV on Italian forum com-                  achanou, Mike Kestemont, Enrique Manjavacas,
                                                               Martin Potthast, Francisco Rangel, Paolo Rosso,
ments and diary fragments. We compiled two                     Günther Specht, Efstathios Stamatatos, Benno Stein,
datasets and performed experiments which consid-               Matti Wiegmann, and Eva Zangerle. 2020. Shared
ered the interaction among topic, gender, length               Tasks on Authorship Analysis at PAN 2020. In Joe-
and genre. Even when the texts are short and                   mon M. Jose, Emine Yilmaz, João Magalhães, Pablo
present more individual variation than traditional             Castells, Nicola Ferro, Mário J. Silva, and Flávio
                                                               Martins, editors, Advances in Information Retrieval,
texts used in AA, AV is a feasible task, but having
                                                               8
more evidence per author improves classification.                  http://www.ilc.cnr.it/
  pages 508–516, Cham. Springer International Pub-            Overview of the Cross-Domain Authorship Verifi-
  lishing.                                                    cation Task at PAN 2020. In Linda Cappellato,
                                                              Carsten Eickhoff, Nicola Ferro, and Aurélie Névéol,
Steven Bird, Ewan Klein, and Edward Loper. 2009.              editors, CLEF 2020 Labs and Workshops, Notebook
   Natural language processing with Python: analyz-           Papers. CEUR-WS.org, September.
   ing text with the natural language toolkit. ” O’Reilly
   Media, Inc.”.                                            Moshe Koppel and Jonathan Schler. 2004. Authorship
                                                             verification as a one-class classification problem. In
Wietse de Vries, Andreas van Cranenburgh, Arianna            Proceedings of the twenty-first international confer-
  Bisazza, Tommaso Caselli, Gertjan van Noord, and           ence on Machine learning, page 62.
  Malvina Nissim. 2019. Bertje: A Dutch BERT
  model. arXiv preprint arXiv:1912.09582.                   Moshe Koppel and Yaron Winter. 2014. Determin-
                                                             ing if two documents are written by the same author.
Wietse de Vries. 2020. Language Models are not               Journal of the Association for Information Science
  just English Anymore: Training and Evaluation              and Technology, 65(1):178–187.
  of a Dutch BERT-based Language Model Named
  BERTje. Master Thesis in Information Science,             Moshe Koppel, Jonathan Schler, and Shlomo Arga-
  University of Groningen, The Netherlands.                  mon. 2009. Computational methods in authorship
                                                             attribution. Journal of the American Society for in-
Gareth Terence Bryan Dwyer. 2017. Novel approaches           formation Science and Technology, 60(1):9–26.
  to authorship attribution. Master Thesis in Lan-          Aleksandra Maslennikova, Paolo Labruna, Andrea
  guage and Communication Technologies, Informa-              Cimino, and Felice Dell’Orletta. 2019. Quanti anni
  tion Science, University of Groningen, The Nether-          hai? Age Identification for Italian. In Proceedings
  lands.                                                      of 6th Italian Conference on Computational Linguis-
Tom Fawcett. 2006. An introduction to roc analysis.           tics (CLiC-it), 13-15 November, 2019, Bari, Italy.
  Pattern recognition letters, 27(8):861–874.               Erwan Moreau, Arun Jayapal, Gerard Lynch, and Carl
                                                              Vogel. 2015. Author verification: basic stacked
Olga Feiguina and Graeme Hirst. 2007. Authorship              generalization applied to predictions from a set
  attribution for small texts: Literary and forensic ex-      of heterogeneous learners-notebook for pan at clef
  periments. In Proceedings of the SIGIR’07 Work-             2015. In Linda Cappellato, Nicola Ferro, Gareth
  shop on Plagiarism Analysis, Authorship Identifica-         Jones, and Eric San Juan, editors, CLEF 2015 Eval-
  tion, and Near-Duplicate Detection (PAN 2007).              uation Labs and Workshop – Working Notes Papers,
Oren Halvani, Christian Winter, and Lukas Graner.             8-11 September, Toulouse, France. CEUR-WS.org.
  2018. Unary and binary classification approaches          Abdulfattah Omar, Basheer Ibrahim Elghayesh, and
  and their implications for authorship verification.         Mohamed Ali Mohamed Kassem. 2019. Author-
  arXiv preprint arXiv:1901.00399.                            ship attribution revisited: The problem of flash fic-
                                                              tion a morphological-based linguistic stylometry ap-
Matthew Honnibal. 2015. spacy: Industrial-strength            proach. Arab World English Journal (AWEJ) Vol-
 natural language processing (nlp) with python and            ume, 10.
 cython.
                                                            Fabian Pedregosa, Gaël Varoquaux, Alexandre Gram-
Manuela Hürlimann, Benno Weck, Esther van den                fort, Vincent Michel, Bertrand Thirion, Olivier
 Berg, Simon Suster, and Malvina Nissim. 2015.                Grisel, Mathieu Blondel, Peter Prettenhofer, Ron
 Glad: Groningen lightweight authorship detection.            Weiss, Vincent Dubourg, et al. 2011. Scikit-learn:
 In CLEF (Working Notes).                                     Machine learning in python. the Journal of machine
                                                              Learning research, 12:2825–2830.
Patrick Juola and Efstathios Stamatatos.       2013.
  Overview of the Author Identification Task at PAN         Francisco Rangel, Paolo Rosso, Martin Potthast, and
  2013. CLEF (Working Notes), 1179.                           Benno Stein. 2017. Overview of the 5th author pro-
                                                              filing task at pan 2017: Gender and language variety
Patrick Juola. 2015. The Rowling case: A pro-                 identification in twitter. Working notes papers of the
  posed standard analytic protocol for authorship             CLEF, pages 1613–0073.
  questions. Digital Scholarship in the Humanities,
  30(suppl 1):i100–i113.                                    Upendra Sapkota, Thamar Solorio, Manuel Montes,
                                                              Steven Bethard, and Paolo Rosso. 2014. Cross-
Mike Kestemont, Efstathios Stamatatos, Enrique Man-           topic authorship attribution: Will out-of-topic data
  javacas, Walter Daelemans, Martin Potthast, and             help? In Proceedings of COLING 2014, the 25th In-
  Benno Stein. 2019. Overview of the Cross-domain             ternational Conference on Computational Linguis-
  Authorship Attribution Task at PAN 2019. In CLEF            tics: Technical Papers, pages 1228–1237.
  (Working Notes).
                                                            Shachar Seidman. 2013. Authorship verification us-
Mike Kestemont, Enrique Manjavacas, Ilia Markov,              ing the impostors method. In CLEF 2013 Eval-
  Janek Bevendorff, Matti Wiegmann, Efstathios Sta-           uation labs and workshop–Working notes papers,
  matatos, Martin Potthast, and Benno Stein. 2020.            pages 23–26. Citeseer.
Efstathios Stamatatos and Moshe Koppel. 2011. Pla-
  giarism and authorship analysis: introduction to the
  special issue. Language Resources and Evaluation,
  45(1):1–4.
Efstathios Stamatatos, Walter Daelemans, Ben Verho-
  even, Martin Potthast, Benno Stein, Patrick Juola,
  Miguel A Sanchez-Perez, and Alberto Barrón-
  Cedeño. 2014. Overview of the author identifi-
  cation task at pan 2014. In CLEF 2014 Evalu-
  ation Labs and Workshop Working Notes Papers,
  Sheffield, UK, 2014, pages 1–21.
Efstathios Stamatatos, Walter Daelemans, Ben Verho-
  even, Patrick Juola, Aurelio López-López, Martin
  Potthast, and Benno Stein. 2015. Overview of
  the author identification task at pan 2015. clef 2015
  evaluation labs and workshop, online working notes,
  toulouse, france. In CEUR Workshop Proceedings,
  pages 1–17.

Efstathios Stamatatos. 2009. A survey of modern au-
  thorship attribution methods. Journal of the Ameri-
  can Society for information Science and Technology,
  60(3):538–556.

Michail Tsikerdekis and Sherali Zeadally. 2014. Mul-
  tiple account identity deception detection in social
  media using nonverbal behavior. IEEE Transactions
  on Information Forensics and Security, 9(8):1311–
  1321.

Arjuna Tuzzi and Michele A Cortelazzo. 2018. Draw-
  ing Elena Ferrante’s Profile: Workshop Proceed-
  ings, Padova, 7 September 2017. Padova UP.

Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malv-
  ina Nissim, and Barbara Plank. 2018. Bleaching
  text: Abstract features for cross-lingual gender pre-
  diction. arXiv preprint arXiv:1805.03122.

Min Yang and Kam-Pui Chow. 2014. Authorship at-
  tribution for forensic investigation with thousands of
  authors. In IFIP International Information Security
  Conference, pages 339–350. Springer.