<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Datasets and Models for Authorship Attribution on Italian Personal Writings</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Gaetana</forename><surname>Ruggiero</surname></persName>
							<email>garuggiero@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Institute of Linguistics and Language Technology</orgName>
								<orgName type="institution">University of Malta</orgName>
								<address>
									<country key="MT">Malta</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Center for Language and Cognition</orgName>
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Albert</forename><surname>Gatt</surname></persName>
							<email>albert.gatt@um.edu.mt</email>
							<affiliation key="aff0">
								<orgName type="department">Institute of Linguistics and Language Technology</orgName>
								<orgName type="institution">University of Malta</orgName>
								<address>
									<country key="MT">Malta</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Center for Language and Cognition</orgName>
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Malvina</forename><surname>Nissim</surname></persName>
							<email>m.nissim@rug.nl</email>
						</author>
						<title level="a" type="main">Datasets and Models for Authorship Attribution on Italian Personal Writings</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">9E781A6CFB21D4EBD02C8BF9467D9406</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-19T15:40+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Existing research on Authorship Attribution (AA) focuses on texts for which a lot of data is available (e.g novels), mainly in English. We approach AA via Authorship Verification on short Italian texts in two novel datasets, and analyze the interaction between genre, topic, gender and length. Results show that AV is feasible even with little data, but more evidence helps. Gender and topic can be indicative clues, and if not controlled for, they might overtake more specific aspects of personal style.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction and Background</head><p>Authorship Attribution (AA) is the task of identifying authors by their writing style. In addition to being a tool for studying individual language choices, AA is useful for many real-life applications, such as plagiarism detection <ref type="bibr" target="#b25">(Stamatatos and Koppel, 2011)</ref>, multiple accounts detection <ref type="bibr" target="#b29">(Tsikerdekis and Zeadally, 2014)</ref>, and online security <ref type="bibr" target="#b32">(Yang and Chow, 2014)</ref>.</p><p>Most work on AA focuses on English, on relatively long texts such as novels and articles <ref type="bibr" target="#b12">(Juola, 2015)</ref> where personal style could be mitigated due to editorial interventions. Furthermore, in many real-world applications the texts of disputed authorship tend to be short <ref type="bibr" target="#b20">(Omar et al., 2019)</ref>.</p><p>The PAN 2020 shared task was originally meant to investigate multilingual AV in fanfiction, focusing on Italian, Spanish, Dutch and English <ref type="bibr" target="#b1">(Bevendorff et al., 2020)</ref>. However, the datasets were eventually restricted to English only, to maximize the amount of available training data <ref type="bibr" target="#b14">(Kestemont et al., 2020)</ref>, emphasizing the difficulty in compiling large enough datasets for less-resourced languages.</p><p>Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p><p>AA research in Italian has largely focused on the single case of Elena Ferrante <ref type="bibr" target="#b30">(Tuzzi and Cortelazzo, 2018)</ref> <ref type="foot" target="#foot_0">1</ref> . The present work seeks a more realistic take, using more diverse, user-generated data namely web forums comments and diary fragments, thereby introducing two novel datasets for this task: ForumFree and Diaries.</p><p>We cast the AA problem as authorship verification (AV). Rather than identifying the specific author of a text (the most common task in AA), AV aims at determining whether two texts were written by the same author or not <ref type="bibr" target="#b15">(Koppel and Schler, 2004;</ref><ref type="bibr" target="#b17">Koppel et al., 2009)</ref>.</p><p>The GLAD system of <ref type="bibr" target="#b10">Hürlimann et al. (2015)</ref> was specifically developed to solve AV problems, and has been shown to be highly adaptable to new datasets <ref type="bibr" target="#b8">(Halvani et al., 2018)</ref>. GLAD uses an SVM with a variety of features including character level ones, which have proved to be most effective for AA tasks <ref type="bibr" target="#b28">(Stamatatos, 2009;</ref><ref type="bibr" target="#b19">Moreau et al., 2015;</ref><ref type="bibr" target="#b10">Hürlimann et al., 2015)</ref>, and is freely available. Moreover, <ref type="bibr" target="#b13">Kestemont et al. (2019)</ref> show that many of the best models for authorship attribution are based on Support Vector Machines. Hence we adopt GLAD in the present study.</p><p>More specifically, we run GLAD on our datasets and study the interaction of four different dimensions: topic, gender, amount of evidence per author, and genre. In practice, we design intratopic, cross-topic, and cross-genre experiments, controlling for gender and amount of evidence per author. The focus on cross-topic and cross-genre AV is in line with the PAN 2015 shared task <ref type="bibr" target="#b27">(Stamatatos et al., 2015)</ref>; this setting has been shown to be more challenging than the task definitions of previous editions <ref type="bibr" target="#b11">(Juola and Stamatatos, 2013;</ref><ref type="bibr" target="#b26">Stamatatos et al., 2014)</ref>.</p><p>Contributions We advance AA for Italian introducing two novel datasets, ForumFree and Diaries, which contribute to enhance the amount of available Italian data suitable for AA tasks. <ref type="foot" target="#foot_1">2</ref>Running a battery of experiments on personal writings, we show that AV is feasible even with little data, but more evidence helps. Gender and topic can be indicative clues, and if not controlled for, they might overtake more specific aspects of personal style.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Data</head><p>For the present study, we introduce two novel datasets, ForumFree and Diaries. Although already compiled <ref type="bibr" target="#b18">(Maslennikova et al., 2019)</ref>, the original ForumFree dataset was not meant for AA. Therefore, we reformat it following the PAN format<ref type="foot" target="#foot_2">3</ref> . The dataset contains web forum comments taken from the ForumFree platform<ref type="foot" target="#foot_3">4</ref> , and the subset used in this work covers two topics, Medicina Estetica ("Aesthethic Medicine") and Programmi Tv ("Tv Programmes"; Celebrities in the original dataset). A third subset, Mix, is the union of the first two. The Diaries dataset is originally assembled for the present study, and contains a collection of diary fragments included in the project Italiani all'estero: i diari raccontano ("Italians abroad: the diaries narrate"). <ref type="foot" target="#foot_4">5</ref> For Diaries, no topic classification has been taken into account. Table <ref type="table" target="#tab_0">1</ref> shows an overview of the datasets. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Subset</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Preprocessing</head><p>For the ForumFree dataset, comments which only contained the word up, commonly used on the internet to give new visibility to a post that was writ-ten in the past, were removed from the dataset, together with their authors when this was the only text associated with them.</p><p>The stories narrated in the diaries are of a very personal nature, which means that many proper nouns and names of locations are used. To avoid relying on these explicit clues, which are strong but not indicative of personal writing style, we perform Named Entity Recognition (NER), using spaCy <ref type="bibr" target="#b9">(Honnibal, 2015)</ref>. Person names, locations and organizations were replaced by their corresponding labels, namely PER, LOC, ORG. The fourth label used by spaCy, MISC (miscellany), was not considered; dates were also not normalized. Moreover, a separate set of experiments was performed by bleaching the diary texts prior to their input to the GLAD system. The bleaching method was proposed by van der <ref type="bibr" target="#b31">Goot et al. (2018)</ref> in the context of cross-lingual Gender Prediction, and consists of transforming tokens into an abstract representation that masks lexical forms while maintaining key features. We only use 4 of the 6 original features. Shape transforms uppercase letters into 'U', lowercase ones into 'L', digits into 'D', and the rest into 'X'. PunctA replaces emojis with 'J', emoticons with 'E', punctuation with 'P' and one or more alphanumeric characters with a single 'W'. Length represents a word by the number of its characters. Frequency corresponds to the log frequency of a token in the dataset. The features are then concatenated. The word 'House' would be rewritten as 'ULLLL W 05 6'.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Reformatting</head><p>We reformat both datasets in order to make them suitable for AV. The data is divided into so-called problems: each problem is made of a known and an unknown text of equal length.</p><p>To account for the shortness of the texts and to avoid topic biases that would derive by taking consecutive text as known and unknown fragments, all the documents written by the same author are first shuffled and then concatenated into a single string. The string is split into two spans containing the same number of words, so that the words contained in the unknown span come from subsets of texts which are different from the ones that form the known one. An example of this process is displayed in Figure <ref type="figure">1</ref>. Rather than being represented by individual productions, each author is therefore represented by a set of texts, whose original se-Figure <ref type="figure">1</ref>: Example of the creation of known and unknown documents for the same author when considering 400 words per author. quential order has been altered. Each known text is paired with an unknown text from the same author. To create negative instances, given a dataset with multiple problems, one can (i) make use of external documents (extrinsic approach (Seidman, 2013; Koppel and Winter, 2014)), or (ii) use fragments collated from all authors in the training data, except the target author (intrinsic approach). We create negative instances with an intrinsic approach. More specifically, following Dwyer (2017), the second half of the unknown array is shifted by one, so that the texts of the second half of the known array are paired with a differentauthor text in the unknown array. In this way, the label distribution is balanced.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Method</head><p>Given a pair of known and unknown fragments (KU pair), the task is to predict whether they are written by the same author or not. In designing our experiments, we control for topic, gender, amount of evidence, and genre. The latter is fostered by the diverse nature of our datasets.</p><p>Topic Maintaining the topic roughly constant should allow stylistic features to gain more discriminative value. We design intra-topic (IT) and cross-topic experiments (CT). In IT, we distinguish same-and different-topic KU pairs. In same-topic, we train and test the system on KU pairs from the same topic. In different-topic, we include the Mix set and the diaries. Since we train and test on a mixture of topics and there can be topic overlap, these are not truly cross-topic, and we do not consider them as such.</p><p>Given that no topic classification is available for the diaries, the CT experiments are only performed on the ForumFree dataset. We train the system on Medicina Estetica and test it on Programmi Tv, and vice versa.</p><p>Gender Previous work has shown that similarity can be observed in writings of people of the same gender <ref type="bibr" target="#b0">(Basile et al., 2017;</ref><ref type="bibr" target="#b22">Rangel et al., 2017)</ref>. <ref type="foot" target="#foot_5">6</ref>In order to assess the influence of same vs different gender in AA, we consider three gender settings: only female authors and only male authors (singlegender), and mixed-gender, where the known and unknown document can be either written by two authors of the same gender, or by a male and a female author. In dividing the subsets according to the gender of the authors, we consider gender implicitly. However, we also perform experiments adding gender as feature to the instance vectors, indicating both the gender of the known and unknown documents' authors and whether or not the gender of the authors is the same.</p><p>Evidence Following <ref type="bibr" target="#b7">Feiguina and Hirst (2007)</ref>, we experiment with KU pairs of different sizes, i.e. with 400, 1 000, 2 000 and 3 000 words per author. Each element of the KU pair is thus made up of 200, 500, 1 000 and 1 500 words respectively. To observe the effect of the different text sizes on the classification, we manipulate the number of instances in training and test, so that the same authors are included in all the different word settings of a single topic-gender experiment.</p><p>Genre We perform cross-genre experiments (CG) by training on ForumFree and testing on the Diaries, and vice versa.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Splits and Evaluation</head><p>We train on 70% and test on 30% of the instances. However, since we are controlling for gender and topic, the number of instances contained in the training and test sets varies in each experiment. We keep the test sets stable across IT, CT and CG experiments, so that we can compare results. Following the PAN evaluation settings <ref type="bibr" target="#b27">(Stamatatos et al., 2015)</ref>, we use three metrics. c@1 takes into account the number of problems left unanswered and rewards the system when it classifies a problem as unanswered rather than misclassifying it.</p><p>Probability scores are converted to binary answers: every score greater than 0.5 becomes a positive answer, every score smaller than 0.5 corresponds to a negative answer and every score which is exactly 0.5 is considered as an unanswered problem. The AU C measure corresponds to the area under the ROC curve <ref type="bibr" target="#b6">(Fawcett, 2006)</ref>, and tests the ability of the system to rank scores properly, assigning low values to negative problems and high values to positive ones <ref type="bibr" target="#b27">(Stamatatos et al., 2015)</ref>. The third measure is the product of c@1 and AU C.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model</head><p>We run all experiments using GLAD <ref type="bibr" target="#b10">(Hürlimann et al., 2015)</ref>. This is an SVM with rbf kernel, implemented using Python's scikit-learn <ref type="bibr" target="#b21">(Pedregosa et al., 2011)</ref> library and NLTK <ref type="bibr" target="#b2">(Bird et al., 2009)</ref>. GLAD was designed to work with 24 different features, which take into account stylometry, entropy and data compression measures. We compare GLAD to a simple baseline which randomly assigns a label from the set of possible labels (i.e. 'YES' or 'NO') to each test instance.</p><p>Our choice fell on GLAD for a variety of reasons. As a general observation, even in later challenges, SVMs have proven to be the most effective for AA tasks <ref type="bibr" target="#b13">(Kestemont et al., 2019)</ref>. More specifically, in a survey of freely available AA systems, GLAD showed best performance and especially high adaptability to new datasets <ref type="bibr" target="#b8">(Halvani et al., 2018)</ref>. Lastly, de Vries (2020) has explored fine-tuning a pre-trained model for AV in Dutch, a less-resourced language compared to English. He found that fine-tuning BERTje (a Dutch monolingual <ref type="bibr">BERT-model, (de Vries et al., 2019)</ref>) with PAN 2015 AV data <ref type="bibr" target="#b27">(Stamatatos et al., 2015)</ref>, failed to outperform a majority baseline (de Vries, 2020). He concluded that Tranformer-encoder models might not suitable for AA tasks, since they will likely overfit if the documents contain no reliable clues of authorship (de Vries, 2020).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Results and Discussion</head><p>The number of experiments is high due to the interaction of the dimensions we consider.</p><p>Tables <ref type="table" target="#tab_2">2 and 3</ref> only include the mixed-gender results of the IT experiments on Mix (which corresponds to the entire ForumFree dataset used for this study) and Diaries, respectively. Results concerning all dimensions considered are anyway discussed in the text. We refer to the combined score. Since the baseline results are different for each setting, we do not include them. However, all models perform consistently above their corresponding baseline.</p><p>For the Mix topic, we achieved 0.966 with 96 authors in total and 3 000 words (Table <ref type="table" target="#tab_1">2</ref>). For the diaries, we achieved 0.821 with 46 authors in total and 3 000 words each (Table <ref type="table" target="#tab_2">3</ref>). <ref type="foot" target="#foot_6">7</ref> Although the training and test sets are of different sizes for both datasets, more evidence seems to help the model to solve the problem.</p><p>In the IT experiments, the highest score for Medicina Estetica is 0.923, with 41 authors in total and 1 000 words per author, and for Programmi Tv 0.944, with 59 authors and 3 000 words each. In the CT setting, the scores stay basically the same in both directions. In CG, when training on the diaries and testing on Mix, we obtain the same score when training on Mix with 3 000 words. When training on Mix and testing on Diaries, we achieved 0.737 on the same test set, and 0.748 with 1 000 words per instance.</p><p>Discussion When more variables interact in the same subset, as in mixed-gender sets of the Fo-rumFree and Diaries dataset, we found that the classifier uses the implicit gender information. Indeed, it achieves slightly better scores in mixedgender settings than in female-and male-only ones, suggesting that the classifier might be using internal clustering of the data rather than writing style characteristics. This also explains why results are higher in Mix than in separate topics, because the classifier can use topic information. We also observe that by adding gender as an explicit feature in topic-and gender-controlled subsets, GLAD uses this information to improve classification, especially in mixed-gender scenarios.</p><p>Although previous research demonstrated that CT and CG experiments are harder than IT ones <ref type="bibr" target="#b23">(Sapkota et al., 2014;</ref><ref type="bibr" target="#b27">Stamatatos et al., 2015)</ref>, in our case the scores for the three settings are comparable. However, since we only performed CT and CG experiments on mixed-gender subsets, the gender-specific information might have also played a role in this process (see above).</p><p>Overall, the experiments show that using a higher number of words per author is preferable. Although 3 000 words seems to be optimal for most settings, in the large number of experiments that we carried out (not all included in this paper) we also observed that lower amounts of words also led to comparable results. This aspect will require further investigation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>We experimented with AV on Italian forum comments and diary fragments. We compiled two datasets and performed experiments which considered the interaction among topic, gender, length and genre. Even when the texts are short and present more individual variation than traditional texts used in AA, AV is a feasible task, but having more evidence per author improves classification.</p><p>While making the task more challenging, controlling for gender and topic ensures that the system prioritizes authorship over different data clusters. Although the datasets used are intended for AV problems, they can be easily adapted to other AA tasks. We believe this to be one of the major contributions of our work, as it can help to advance the up-to-now limited AA research in Italian.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="3,117.35,62.81,362.85,162.08" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc></figDesc><table><row><cell></cell><cell cols="3"># Authors</cell><cell cols="4"># Docs W/A D/A W/D</cell></row><row><cell></cell><cell>F</cell><cell>M</cell><cell>Tot</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Med Est</cell><cell>33</cell><cell>44</cell><cell>77</cell><cell>56198</cell><cell>63</cell><cell>661</cell><cell>48</cell></row><row><cell>Prog TV</cell><cell>78</cell><cell cols="3">71 149 153019</cell><cell>32</cell><cell>812</cell><cell>22</cell></row><row><cell>Mix</cell><cell cols="4">111 115 276 209217</cell><cell>41</cell><cell>791</cell><cell>29</cell></row><row><cell>Diaries</cell><cell cols="3">77 188 275</cell><cell>1422</cell><cell>462</cell><cell>5</cell><cell>477</cell></row></table><note>Overview of the datasets. W/A = Avg words per author; D/A = Avg docs per author; W/D = Avg words per doc.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Training and test set configurations and IT evaluation scores on Mix texts written by female and male authors. C,I and U are Correct, Incorrect, Unanswered problems.</figDesc><table><row><cell cols="2"># W/A # Auth</cell><cell cols="3"># Problems Train Test C I U c@1 AUC Eval</cell><cell>*</cell></row><row><cell>400</cell><cell>127</cell><cell>88</cell><cell cols="3">39 33 6 0 0.846 0.947 0.801</cell></row><row><cell>1 000</cell><cell>109</cell><cell>76</cell><cell cols="3">33 30 3 0 0.909 0.926 0.842</cell></row><row><cell>2 000</cell><cell>100</cell><cell>70</cell><cell cols="3">30 29 1 0 0.967 0.995 0.962</cell></row><row><cell>3 000</cell><cell>96</cell><cell>67</cell><cell cols="3">29 28 1 0 0.966 1.000 0.966</cell></row><row><cell cols="2"># W/A # Auth</cell><cell cols="2"># Problems Train Test C</cell><cell>Eval I U c@1 AUC</cell><cell>*</cell></row><row><cell>400</cell><cell>229</cell><cell>160</cell><cell cols="3">69 47 21 1 0.691 0.725 0.500</cell></row><row><cell>1 000</cell><cell>180</cell><cell>126</cell><cell cols="3">54 43 11 0 0.796 0.891 0.709</cell></row><row><cell>2 000</cell><cell>98</cell><cell>68</cell><cell>30 25</cell><cell cols="2">5 0 0.833 0.905 0.754</cell></row><row><cell>3 000</cell><cell>46</cell><cell>32</cell><cell>14 12</cell><cell cols="2">2 0 0.857 0.958 0.821</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Training and test configurations and IT evaluation scores on diaries made of NE converted text written by both genders. C,I and U are Correct, Incorrect, Unanswered problems.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.newyorker.com/culture/culturalcomment/the-unmasking-of-elena-ferrante</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Further information about the datasets can be found at https://github.com/garuggiero/Italian-Datasets-for-AV</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://pan.webis.de/clef15/pan15-web/authorshipverification.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">4 https://www.forumfree.it</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">/ 5 https://www.idiariraccontano.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">Binary gender is a simplification of a much more nuanced situation in reality. Following previous work, we adopt it for convenience.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">Using a bleached representation of the texts, the score increased by 0.36</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The ForumFree dataset was a courtesy of the Italian Institute of Computational Linguistics "Antonio Zampolli" (ILC) of Pisa. 8</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">N-GrAM: New Groningen Authorprofiling Model-Notebook for PAN at CLEF</title>
		<author>
			<persName><forename type="first">Angelo</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gareth</forename><surname>Dwyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maria</forename><surname>Medvedeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Josine</forename><surname>Rawee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hessel</forename><surname>Haagsma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Malvina</forename><surname>Nissim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="volume">1866</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Shared Tasks on Authorship Analysis at PAN</title>
		<author>
			<persName><forename type="first">Janek</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bilal</forename><surname>Ghanem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anastasia</forename><surname>Giachanou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mike</forename><surname>Kestemont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Enrique</forename><surname>Manjavacas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Francisco</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paolo</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Günther</forename><surname>Specht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Efstathios</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benno</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Matti</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eva</forename><surname>Zangerle</surname></persName>
		</author>
		<ptr target="http://www.ilc.cnr.it/pages508-516" />
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Joemon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Emine</forename><surname>Jose</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">João</forename><surname>Yilmaz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Pablo</forename><surname>Magalhães</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Nicola</forename><surname>Castells</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Mário</forename><forename type="middle">J</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Flávio</forename><surname>Silva</surname></persName>
		</editor>
		<editor>
			<persName><surname>Martins</surname></persName>
		</editor>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="volume">8</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">Steven</forename><surname>Bird</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ewan</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Edward</forename><surname>Loper</surname></persName>
		</author>
		<title level="m">Natural language processing with Python: analyzing text with the natural language toolkit</title>
				<imprint>
			<publisher>O&apos;Reilly Media, Inc</publisher>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">Andreas</forename><surname>Wietse De Vries</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Arianna</forename><surname>Van Cranenburgh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tommaso</forename><surname>Bisazza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gertjan</forename><surname>Caselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Malvina</forename><surname>Van Noord</surname></persName>
		</author>
		<author>
			<persName><surname>Nissim</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1912.09582</idno>
		<title level="m">Bertje: A Dutch BERT model</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Language Models are not just English Anymore: Training and Evaluation of a Dutch BERT-based Language Model Named BERTje</title>
		<author>
			<persName><forename type="first">Wietse</forename><surname>De</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vries</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<pubPlace>The Netherlands</pubPlace>
		</imprint>
		<respStmt>
			<orgName>University of Groningen</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Master Thesis in Information Science</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Novel approaches to authorship attribution</title>
		<author>
			<persName><forename type="first">Gareth</forename><surname>Terence</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bryan</forename><surname>Dwyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">in Language and Communication Technologies</title>
				<meeting><address><addrLine>The Netherlands</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
		<respStmt>
			<orgName>Information Science, University of Groningen</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Master Thesis</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">An introduction to roc analysis</title>
		<author>
			<persName><forename type="first">Tom</forename><surname>Fawcett</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pattern recognition letters</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="861" to="874" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Authorship attribution for small texts: Literary and forensic experiments</title>
		<author>
			<persName><forename type="first">Olga</forename><surname>Feiguina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Graeme</forename><surname>Hirst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the SIGIR&apos;07 Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection</title>
				<meeting>the SIGIR&apos;07 Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection<address><addrLine>PAN</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007">2007. 2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Unary and binary classification approaches and their implications for authorship verification</title>
		<author>
			<persName><forename type="first">Oren</forename><surname>Halvani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christian</forename><surname>Winter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lukas</forename><surname>Graner</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1901.00399</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">spacy: Industrial-strength natural language processing (nlp) with python and cython</title>
		<author>
			<persName><forename type="first">Matthew</forename><surname>Honnibal</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Glad: Groningen lightweight authorship detection</title>
		<author>
			<persName><forename type="first">Manuela</forename><surname>Hürlimann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benno</forename><surname>Weck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Esther</forename><surname>Van Den</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Simon</forename><surname>Berg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Malvina</forename><surname>Suster</surname></persName>
		</author>
		<author>
			<persName><surname>Nissim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF (Working Notes</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">Patrick</forename><surname>Juola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Efstathios</forename><surname>Stamatatos</surname></persName>
		</author>
		<title level="m">Overview of the Author Identification Task at PAN 2013</title>
				<imprint>
			<date type="published" when="1179">2013. 1179</date>
		</imprint>
		<respStmt>
			<orgName>CLEF</orgName>
		</respStmt>
	</monogr>
	<note>Working Notes</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">The Rowling case: A proposed standard analytic protocol for authorship questions</title>
		<author>
			<persName><forename type="first">Patrick</forename><surname>Juola</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Digital Scholarship in the Humanities</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="100" to="113" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note>suppl</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Overview of the Cross-domain Authorship Attribution Task at PAN</title>
		<author>
			<persName><forename type="first">Mike</forename><surname>Kestemont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Efstathios</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Enrique</forename><surname>Manjavacas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Walter</forename><surname>Daelemans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benno</forename><surname>Stein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF</title>
				<imprint>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
	<note>Working Notes</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Overview of the Cross-Domain Authorship Verification Task at PAN 2020</title>
		<author>
			<persName><forename type="first">Mike</forename><surname>Kestemont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Enrique</forename><surname>Manjavacas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ilia</forename><surname>Markov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Janek</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Matti</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Efstathios</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benno</forename><surname>Stein</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">CLEF 2020 Labs and Workshops</title>
				<editor>
			<persName><forename type="first">Linda</forename><surname>Cappellato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Carsten</forename><surname>Eickhoff</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Nicola</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Aurélie</forename><surname>Névéol</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2020-09">2020. September</date>
		</imprint>
	</monogr>
	<note>Notebook Papers</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Authorship verification as a one-class classification problem</title>
		<author>
			<persName><forename type="first">Moshe</forename><surname>Koppel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jonathan</forename><surname>Schler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the twenty-first international conference on Machine learning</title>
				<meeting>the twenty-first international conference on Machine learning</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page">62</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Determining if two documents are written by the same author</title>
		<author>
			<persName><forename type="first">Moshe</forename><surname>Koppel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yaron</forename><surname>Winter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the Association for Information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">65</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="178" to="187" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Computational methods in authorship attribution</title>
		<author>
			<persName><forename type="first">Moshe</forename><surname>Koppel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jonathan</forename><surname>Schler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Shlomo</forename><surname>Argamon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Society for information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="9" to="26" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Quanti anni hai? Age Identification for Italian</title>
		<author>
			<persName><forename type="first">Aleksandra</forename><surname>Maslennikova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paolo</forename><surname>Labruna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrea</forename><surname>Cimino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Felice</forename><surname>Dell'orletta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of 6th Italian Conference on Computational Linguistics (CLiC-it)</title>
				<meeting>6th Italian Conference on Computational Linguistics (CLiC-it)<address><addrLine>Bari, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019-11-15">2019. 13-15 November, 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners-notebook for pan at clef 2015</title>
		<author>
			<persName><forename type="first">Erwan</forename><surname>Moreau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Arun</forename><surname>Jayapal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerard</forename><surname>Lynch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Carl</forename><surname>Vogel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2015</title>
		<title level="s">Evaluation Labs and Workshop -Working Notes Papers</title>
		<editor>
			<persName><forename type="first">Linda</forename><surname>Cappellato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Nicola</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Gareth</forename><surname>Jones</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Eric</forename><surname>San</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Juan</forename></persName>
		</editor>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="8" to="11" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Authorship attribution revisited: The problem of flash fiction a morphological-based linguistic stylometry approach</title>
		<author>
			<persName><forename type="first">Abdulfattah</forename><surname>Omar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ibrahim</forename><surname>Basheer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mohamed</forename><surname>Elghayesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mohamed</forename><surname>Ali</surname></persName>
		</author>
		<author>
			<persName><surname>Kassem</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Arab World English Journal (AWEJ)</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Scikit-learn: Machine learning in python</title>
		<author>
			<persName><forename type="first">Fabian</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gaël</forename><surname>Varoquaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alexandre</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vincent</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bertrand</forename><surname>Thirion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Olivier</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mathieu</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Peter</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ron</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vincent</forename><surname>Dubourg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">the Journal of machine Learning research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Overview of the 5th author profiling task at pan 2017: Gender and language variety identification in twitter</title>
		<author>
			<persName><forename type="first">Francisco</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paolo</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benno</forename><surname>Stein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working notes papers of the CLEF</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1613" to="0073" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Crosstopic authorship attribution: Will out-of-topic data help?</title>
		<author>
			<persName><forename type="first">Upendra</forename><surname>Sapkota</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thamar</forename><surname>Solorio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Manuel</forename><surname>Montes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Steven</forename><surname>Bethard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paolo</forename><surname>Rosso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers</title>
				<meeting>COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1228" to="1237" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Authorship verification using the impostors method</title>
		<author>
			<persName><forename type="first">Shachar</forename><surname>Seidman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2013 Evaluation labs and workshop-Working notes papers</title>
				<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="23" to="26" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Plagiarism and authorship analysis: introduction to the special issue</title>
		<author>
			<persName><forename type="first">Efstathios</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Moshe</forename><surname>Koppel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Language Resources and Evaluation</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="4" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Overview of the author identification task at pan 2014</title>
		<author>
			<persName><forename type="first">Efstathios</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Walter</forename><surname>Daelemans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ben</forename><surname>Verhoeven</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benno</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Patrick</forename><surname>Juola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Miguel</forename><forename type="middle">A</forename><surname>Sanchez-Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alberto</forename><surname>Barrón-Cedeño</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2014 Evaluation Labs and Workshop Working Notes Papers</title>
				<meeting><address><addrLine>Sheffield, UK</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="1" to="21" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Overview of the author identification task at pan 2015. clef 2015 evaluation labs and workshop</title>
		<author>
			<persName><forename type="first">Efstathios</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Walter</forename><surname>Daelemans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ben</forename><surname>Verhoeven</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Patrick</forename><surname>Juola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aurelio</forename><surname>López-López</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benno</forename><surname>Stein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1" to="17" />
		</imprint>
	</monogr>
	<note>toulouse, france</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">A survey of modern authorship attribution methods</title>
		<author>
			<persName><forename type="first">Efstathios</forename><surname>Stamatatos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Society for information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="538" to="556" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Multiple account identity deception detection in social media using nonverbal behavior</title>
		<author>
			<persName><forename type="first">Michail</forename><surname>Tsikerdekis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sherali</forename><surname>Zeadally</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Information Forensics and Security</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="1311" to="1321" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<author>
			<persName><forename type="first">Arjuna</forename><surname>Tuzzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michele</forename><forename type="middle">A</forename><surname>Cortelazzo</surname></persName>
		</author>
		<title level="m">Drawing Elena Ferrante&apos;s Profile: Workshop Proceedings</title>
				<meeting><address><addrLine>Padova; Padova UP</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2018. 7 September 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<title level="m" type="main">Bleaching text: Abstract features for cross-lingual gender prediction</title>
		<author>
			<persName><forename type="first">Rob</forename><surname>Van Der Goot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nikola</forename><surname>Ljubešić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ian</forename><surname>Matroos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Malvina</forename><surname>Nissim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Barbara</forename><surname>Plank</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1805.03122</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Authorship attribution for forensic investigation with thousands of authors</title>
		<author>
			<persName><forename type="first">Min</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kam-Pui</forename><surname>Chow</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IFIP International Information Security Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="339" to="350" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
