<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">The Style of a Successful Story: a Computational Study on the Fanfiction Genre</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Andrea</forename><surname>Mattei</surname></persName>
							<email>a.mattei3@studenti.unipi.it</email>
						</author>
						<author>
							<persName><forename type="first">Dominique</forename><surname>Brunato</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Felice</forename><surname>Dell'orletta</surname></persName>
							<email>felice.dellorletta@ilc.cnr.it</email>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">University of Pisa</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">Istituto di Linguistica Computazionale &quot;Antonio Zampolli&quot; (ILC</orgName>
								<orgName type="laboratory">CNR) ItaliaNLP Lab</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">The Style of a Successful Story: a Computational Study on the Fanfiction Genre</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">386492004FC60B76A4A7AD4488CD7B55</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-19T15:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents a new corpus for the Italian language representative of the fanfiction genre. It comprises about 55k usergenerated stories inspired to the original fantasy saga "Harry Potter" and published on a popular website. The corpus is large enough to support data-driven investigations in many directions, from more traditional studies on language variation aimed at characterizing this genre with respect to more traditional ones, to emerging topics in computational social science such as the identification of factors involved in the success of a story. The latter is the focus of the presented case-study, in which a wide set of multi-level linguistic features has been automatically extracted from a subset of the corpus and analysed in order to detect the ones which significantly discriminate successful from unsuccessful stories 1 Introduction Computational Sociolinguistics is an emergent interdisciplinary field aimed at exploiting computational approaches to study the relationship between language and society <ref type="bibr" target="#b9">(Nguyen et al., 2016)</ref>. One of the primary factors driving its foundation is the widespread diffusion of social media and other user-generated data available online, which has promoted massive research on computer-mediated communication from several perspectives. For instance, scholars working in the field of genre and register variation have relied on quantitative approaches to inspect the peculiarities of social media language, with the purpose of providing</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>a characterization of this new genre with respect to more traditional ones <ref type="bibr" target="#b10">(Paolillo, 2001;</ref><ref type="bibr" target="#b6">Herring and Androutsopoulos, 2015)</ref>. In the NLP community, the writing style of user-generated data has been analyzed through computational stylometry approaches for addressing tasks broadly related to author profiling <ref type="bibr" target="#b3">(Daelemans, 2013)</ref>, such as gender and age detection <ref type="bibr" target="#b11">(Peersman et al., 2011;</ref><ref type="bibr" target="#b7">Koppel et al., 2002)</ref>. The vast majority of this work has taken into account contents published on few microblogging platforms considered as more representative of the contemporary user-generated mediascape, e.g. Twitter. More recently, the attention has been oriented to the language used by online communities whose members share a common interest towards an object, an activity -and more in general any area of human interest -allowing scholars to shed light on the growing phenomenon of fandom <ref type="bibr" target="#b12">(Sindoni, 2015)</ref>. One of the most prominent expressions of fandom is fanfiction (fanfic, fic or FF), i.e. fiction written by fans of a TV series, movie, book etc., using existing characters and situations to develop new plots. In many languages dedicated websites exist where users can publish their own literary works inspired to the original book they are fans of.</p><p>From a computational linguistics standpoint, one perspective from which fanfiction has been investigated aimed to infer the relationship between user-generated stories and their original source, e.g. comparing the representation of characters according to their gender, as well as to model reader reactions to stories <ref type="bibr" target="#b13">(Smitha and Bamman, 2016)</ref>. Inspired to that study, which was based on a large dataset of stories mainly in English, we collect a new corpus of fanfic stories 1 , which, to our knowledge, is the first one for the Italian language. We rely on this corpus to carry out an investigation aimed at shedding light on the possibility of computationally modeling the expected success of a fanfic story, based on the assumptions of linguistic profiling and stylometry research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Dataset collection</head><p>The corpus comprises texts collected from efpfanfic.net, a portal active since 2001 which allows users to publish stories and to comment on them. The website is made up of two sections: one for original stories and the other for fanfictions. We considered only the latter and we limited the collection to stories based on the fantasy saga by the British writer J.K. Rowling, "Harry Potter". This choice was motivated by the main purpose of our analysis, i.e. characterizing the success of a novel with respect to its writing style rather than as an effect of the various subject matters it deals with. At the same time, the preference given to a very popular book allowed us to keep a consistent number of potential readers and reviewers across the corpus, still having a large sample of texts to analyze. The data collection was performed through web scraping, with two spiders written in Python using the open-source Scrapy framework. The first spider crawls the list of stories in the category of choice and extracts their first chapters together with some metadata, including the URLs of the subsequent chapters. The second spider takes these addresses as input and downloads texts and additional information about all the chapters after the firsts. In the dataset created this way, the record for each chapter includes: ID and Reference ID, combinations used by the website to identify the webpage of each chapter. We use the ID of the first chapter as a reference to group together records belonging to the same story; Title; Rating, an estimate given by the author about the rawness of themes and scenes contained in his story; Date of posting; Author's nickname; Number of chapters in the story; Text; Total number of reviews received by the story, divided in positive, negative and neutral; Number of reviews received by the single chapter, as well as the text of the most recent ones. The crawlers downloaded 54,717 stories, for a total of 19,7310 chapters and a mean of approximately 3.6 chapter per story, which is consistent with the one calculated taking into account every entry on the website. The obtained corpus was divided into folders, each containing stories with the same number of chapters.</p><p>3 The success of a fanfiction story: an exploratory study</p><p>Based on the newly created dataset, we carried out a computational stylometric analysis aimed at studying whether there is a connection between the success of a fanfic story and its writing style. Such a connection has been demonstrated for more canonical literary works covering novel and movie domains <ref type="bibr" target="#b5">(Ganjigunte et al., 2013;</ref><ref type="bibr" target="#b14">Solorio et al., 2017)</ref>, showing that stylometry is a viable approach also in scenarios different from authorship attribution and verification.</p><p>The methodological framework of our investigation is linguistic profiling <ref type="bibr" target="#b8">(Montemagni, 2013;</ref><ref type="bibr" target="#b15">van Halteren, 2004</ref>), a NLP-based approach in which a large set of linguistically-motivated features automatically extracted from text are used to obtain a vector-based representation of it. Such representations can be then compared across texts representative of different textual genres and varieties to identify the peculiarities of each. For the purpose of our analysis, we split the original dataset into two varieties corresponding to "successful" and "unsuccessful" stories. To define success we follow an approach similar to that used by <ref type="bibr" target="#b14">Solorio et al. (2017)</ref>, which is based on the number of reviews obtained by each story. In this regard, we decided to include all reviews, not only the positive ones, which can undoubtedly testify a favorable attitude by the reader for the story. Two main reasons motivated our choice: first, we noticed that the overwhelming majority of collected reviews are written to convey appreciation, with just 0.73% among a total of nearly 900k reviews being negative; therefore, from a statistical point of view, we can reasonably get rid of the distinction between various kinds of reviews and simply take into consideration the overall amount of feedback received. Secondly, also a negative feedback proves that a given story has been read and aroused some interest in the reader. With this in mind, we define as "unsuccessful" those stories that did not receive any reviews, thus being largely ignored by their readers. Conversely, the "successful" category includes all stories with the same number of chapters having received a review count higher than the average of all stories of that length. We also decided to limit the focus of this analysis to single-chapter fanfictions written before 2018, so as to avoid the inclusion of stories not yet concluded. The resulting classes comprise 2101 un-successful texts and 14486 successful ones, with a threshold for success amounting to 5 reviews. Table <ref type="table">1</ref> shows an example of stories classified in the two categories.</p><p>All texts were pre-processed by means of regular expressions, with the aim of removing errors and inconsistencies in the use of punctuation, capitalization and special characters, in order to increase the reliability of automatic linguistic annotation and the process of feature extraction, which were performed using the Profiling-UD tool <ref type="bibr" target="#b0">(Brunato et al., 2020)</ref>.</p><p>In what follows we first provide an overview of the linguistic features used for our statistical analysis and then we discuss the ones that turned out to be more prominent in successful writing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Linguistic Features</head><p>The set of features is based on the one described in <ref type="bibr" target="#b0">Brunato et al. (2020)</ref> and counts more than 150 features, distributed across distinct levels of linguistic annotation and computed according to the Universal Dependencies (UD) annotation framework. These features have be shown to be effective in a variety of different scenarios, all related to modeling the 'form' of a text, rather than the content: e.g., from the assessment of sentence complexity by humans <ref type="bibr" target="#b1">(Brunato et al., 2018)</ref> to the identification of the native language of a speaker from his/her productions in a second language (L2) <ref type="bibr" target="#b2">(Cimino et al., 2018)</ref>. Specifically, they can be grouped into the following main phenomena:</p><p>Raw Text Features: Document length computed as the total number of tokens and of sentences ((#Tokens, #Sentences in Table <ref type="table">2</ref>); average sentence length and token length, calculated in tokens and in characters, respectively (Sent length, Word length).</p><p>Lexical Richness: Distribution of words and lemmas belonging to the Basic Italian Vocabulary <ref type="bibr" target="#b4">(De Mauro, 2000)</ref> (BIV Tok, BIV Types) and to the internal repertories (i.e. fundamental, high usage and high availability, BIV Fund; BIV High-US; BIV High-AV); Type/Token Ratio, a feature of lexical variety computed as the ratio between the number of lexical types and the number of tokens in the first 100 and 200 words of text (TTR Lemma); Lexical density.</p><p>Morpho-Syntactic Information: Distribution of all grammatical categories, with respect to the Universal part-of-speech tagset (UPOS * and the language specific tagset (XPOS *); Distribution of verbs according to tense, mood and person, both for main and auxiliar verbs (aux *; V *)).</p><p>Verbal Predicate Structure: Average distribution of verbal roots and of verbal heads for sentences (VerbHead); features related to the arity of verbs (i.e. average number of dependents for verbal head, distribution of verbs by arity).</p><p>Global and Local Parsed Tree: Average depth of the syntactic tree (MaxDepth); average depth of embedded complement chains headed by a preposition; average length of dependency links and of the maximum link (Links Len; Max Link Length); relative order of the subject and object with respect to the verb;</p><p>Syntactic relations: Distribution of typed UD dependency relations (dep *);</p><p>Use of Subordination: Distribution of main and subordinate clauses (Main clause, Subord clause), average length of subordinate chains, distribution of subordinate chains by length.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Data Analysis</head><p>For each considered feature we calculated the average value and the standard deviation in the two classes. We the assessed whether the variation between mean values is significant using the Wilcoxon rank sum test. We found that 57% (i.e. 126 out of the 219) of features are differently distributed in a significant way between successful and unsuccessful stories. In Table <ref type="table">2</ref> we report an extract of the most interesting ones.</p><p>As it can be seen, successful stories are on average longer in terms of number of tokens and sentences (1, 2), although these sentences are generally shorter (3), suggesting that readers appreciate more a plain writing style. However, when lexical factors are considered, the preference is given to texts exhibiting less frequent words, as suggested by the slightly lower distribution of words belonging to the Basic Italian Vocabulary (5,6) and especially to the Fundamental one (7). Inflectional morphology also appears as a domain of variation between the two classes. Successful fanfictions employ quite more often verbs in the second person (15), a feature typical of narrative writing related to direct speech. On the contrary, we observe a higher distribution of third person verb, specifically auxiliaries, both singular ( <ref type="formula">14</ref>) and plural (13), in less successful texts, which can hint at a preference for reported speech. For the first time in days, at the stroke of midnight, the rain stopped abruptly. Silence fell upon the districts that suddenly seemed darker. And in that piercing silence, the only noise that could be recognized was a faint and irregular tac-tac-tac. It was coming from a window. The window of a luxurious house in the city centre, the only light still on at that time. Joanne was in front of the computer, source of that trembling and was writing. She tapped her fingers on the keyboard for a few moments, then stopped, reread, deleted and rewrote. She had been going on like this for days. Her eyes were tired, but her mind was working frantically. Almost there. Unsuccessful Il cielo era tetro cosparso di nuvole che sembravano volere annunciare un acquazzone, il vento ulula forte facendo sbattere le finestre violentemente, come se volesse gridare, liberarsi da una rabbia repressa. La donna dai lunghi capelli rosso scuro continuava a fissare la devastazione attraverso il vetro che ora si era appannato dal suo stesso respiro. Aveva lo sguardo malinconico non più illuminato da quella dolce espressione che il riso le donava. Una mano le si poggiò sulla spalla e girò pian piano il volto verso la persona amata che con un ritmo lento cominciò ad accarezzarle le gote che assunsero un colorito roseo alla sua pelle pallida. Chiuse gli occhi come per assaporare quel dolce tocco che ora si era spostato nei suoi capelli. "Non guardare più oltre il vetro" Mormorò la voce con una nota di preoccupazione, apparteneva a James, marito di Lily la donna dai lunghi capelli rossi<ref type="foot" target="#foot_2">3</ref> .</p><p>The sky was bleak strewn with clouds that seemed to want to announce a downpour, the wind howls loudly making the windows slam violently, as if it wanted to scream, to free itself from a suppressed anger. The woman with the long dark red hair kept staring the devastation through the glass that was now clouded by her own breath. Her melancholic gaze was no longer lit up by that sweet look that laughter gave her. A hand rested on her shoulder and slowly turned her face towards the loved one who started slowly caressing her cheeks which took on a rosy tone on her pale skin. She closed her eyes, as if to savor that sweet touch that had now moved into her hair. "Don't look beyond the glass anymore" Whispered the voice with a note of concern, it belonged to James, husband of Lily the woman with long red hair.</p><p>Table <ref type="table">1</ref>: An extract of a 'successful' story (the most reviewed one) and of an 'unsuccessful' one.</p><p>Focusing on the distribution of morphosyntactic categories, there is a significant difference in the usage of the most common punctuation marks, commas (25) and full stops (26), which are quite more frequent in highly-reviewed fanfictions. These features relate themselves to the previously observed difference in terms of document length, as texts with more sentences necessarily use punctuation marks to divide them. Ad-ditionally we can see that balanced marks (24), i.e. parenthesis and quotation marks, occur more in successful texts, strengthening our previous claim about a more frequent presence of direct speech in this class. At syntactic level, dependency relations are slightly shorter in successful texts, both considering the average value of all dependencies (29) and the value of the maximum dependency link (30). In readability assessment studies, longer syntactic dependencies are typically found in complex texts, and the same holds for deeper syntactic trees. Both these features have lower values in highly-reviewed stories, suggest- Table <ref type="table">2</ref>: An extract of linguistic features varying significantly between successful and unsuccessful stories. All differences are significant at p &lt; 0.001, except for features marked with an asterisk, which have p &lt; 0.05.</p><p>ing that the style of successful writing is characterized by a simpler syntactic structure. Interestingly, these results, although preliminary, go in the opposite direction to those reported by Ganjigunte et al. ( <ref type="formula">2013</ref>) for successful literary works in English, which where found to be less correlated with text readability scores. Finally, subordinate clauses (33) occur slightly more often than main clauses (32) in unsuccessful texts, while there is a nearly even split between hypotaxis and parataxis in successful ones.</p><p>To deepen our analysis, we also computed the coefficient of variation σ* for all features varying significantly between the two classes, where σ* is the ratio between the standard deviation σ and the mean µ. This allowed us to evaluate the dispersion of values around the average in a standardized way, and thus to compare the stability of features pertaining to data measured on different scales. A feature that is much scattered in a class of texts and highly stable in the other has a greater chance of being a meaningful representative of the latter.</p><p>In Figure <ref type="figure" target="#fig_0">1</ref> we show the average variability in the two classes of the four groups of features distinguished according to the level of annotation they were extracted from. As a whole, we noticed that successful texts display less variability in nearly every considered feature: 117 of them (92%) are more stable in this class. In successful stories, features with greater stability compared to the other class are mainly raw text, e.g. number of sentences, number of tokens and syntactic ones, e.g. verbal heads per sentence and average depth of syntactic trees. Among the few features which are more stable in poorly received texts, we find instead verbal predicate features, such as the distributions of past tenses and of indicative moods, in addition to the frequency of usage of cardinal numbers. The set of lexical features is instead the most stable one for both classes. In this paper, we presented a NLP-based stylometric analysis on the emerging genre of fanfiction aimed at characterizing the writing style of a successful story. We collected a new large-scale corpus which -to the best of our knowledge -is the first one of this genre for Italian. We showed that successful stories, defined as those receiving a number of reviews higher that the average, are characterized by a variety of linguistic features at different levels of granularity and that these features are more uniformly distributed within them.</p><p>In the future, we would like to broad the perspective to other genres in order to study whether there are linguistic predictors of successful writing which are constant across different genres, as well as across concepts somehow similar to success, such as virality and engagement.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Average coefficient of variation in each class of features, both for successful and unsuccessful texts.</figDesc><graphic coords="5,307.28,543.42,198.42,147.59" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Terms of service forbid us to distribute this data. However, the tools used to gather it are available at https: //github.com/AndreMatte97/Fanfiction</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">The full story can be found at https://efpfanfic. net/viewstory.php?sid=607026&amp;i=1</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">The full story can be found at https://efpfanfic. net/viewstory.php?sid=27412&amp;i=1</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Profiling-UD: a Tool for Linguistic Profiling of Texts</title>
		<author>
			<persName><forename type="first">D</forename><surname>Brunato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Cimino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Dell'orletta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Venturi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Montemagni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The 12th Language Resources and Evaluation Conference, European Language Resources Association</title>
				<meeting>The 12th Language Resources and Evaluation Conference, European Language Resources Association</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="7145" to="7151" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Is this Sentence Difficult? Do you Agree?</title>
		<author>
			<persName><forename type="first">D</forename><surname>Brunato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>De Mattei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Dell'orletta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Iavarone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Venturi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP</title>
				<meeting>Conference on Empirical Methods in Natural Language Processing (EMNLP</meeting>
		<imprint>
			<date type="published" when="2018">2018. 2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Sentences and Documents in Native Language Identification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Cimino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Dell'orletta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Brunato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Venturi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of 5th Italian Conference on Computational Linguistics (CLiC-IT), 1-6</title>
				<meeting>5th Italian Conference on Computational Linguistics (CLiC-IT), 1-6<address><addrLine>Turin</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Explanation in Computational Stylometry</title>
		<author>
			<persName><forename type="first">W</forename><surname>Daelemans</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Linguistics and Intelligent Text Processing</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">A</forename><surname>Gelbukh</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">7817</biblScope>
		</imprint>
	</monogr>
	<note>CICLing 2013</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">Tullio</forename><surname>De</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mauro</forename></persName>
		</author>
		<title level="m">Grande dizionario italiano dell&apos;uso (GRADIT)</title>
				<meeting><address><addrLine>Torino, UTET</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Success with style: Using writing style to predict the success of novels</title>
		<author>
			<persName><forename type="first">V</forename><surname>Ganjigunte Ashok</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yejin</forename><surname>Choi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2013 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="1753" to="1764" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Computer-mediated discourse 2.0. The handbook of discourse</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C</forename><surname>Herring</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Androutsopoulos</surname></persName>
		</author>
		<editor>Deborah Tannen, Heidi E. Hamilton, Deborah Schiffrin</editor>
		<imprint>
			<date type="published" when="2015">2015</date>
			<publisher>John Wiley Sons</publisher>
			<biblScope unit="page" from="1753" to="1764" />
		</imprint>
	</monogr>
	<note>2nd ed</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Automatically Categorizing Written Texts by Author Gender</title>
		<author>
			<persName><forename type="first">M</forename><surname>Koppel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Argamon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Rachel</forename><surname>Shimoni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Lit. Linguistic Comput</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="401" to="412" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Tecnologie linguisticocomputazionali e monitoraggio della lingua italiana</title>
		<author>
			<persName><forename type="first">S</forename><surname>Montemagni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Studi Italiani di Linguistica Teorica e Applicata</title>
		<imprint>
			<biblScope unit="page" from="145" to="172" />
			<date type="published" when="2013">2013</date>
			<publisher>SILTA</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Computational Sociolinguistics: A Survey</title>
		<author>
			<persName><forename type="first">D</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Dogruöz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">P</forename><surname>Rosé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M G</forename><surname>De</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jong</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="537" to="593" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Language variation on Internet Relay Chat: A social network approach</title>
		<author>
			<persName><forename type="first">John</forename><surname>Paolillo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Sociolinguistics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="180" to="213" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Predicting Age and Gender in Online Social Networks</title>
		<author>
			<persName><forename type="first">C</forename><surname>Peersman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Daelemans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Van Vaerenbergh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents</title>
				<meeting>the 3rd International Workshop on Search and Mining User-Generated Contents</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="37" to="44" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">&apos;I Really Have No Idea What Non-Fandom People Do with Their Lives.&apos; A Multimodal and Corpus-Based Analysis of Fanfiction</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Sindoni</surname></persName>
		</author>
		<idno type="DOI">.org/10.1285/i22390359v13p277</idno>
	</analytic>
	<monogr>
		<title level="j">Lingue e Linguaggi</title>
		<imprint>
			<biblScope unit="issue">13</biblScope>
			<biblScope unit="page" from="277" to="300" />
			<date type="published" when="2011">2011. 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Beyond Canonical Texts: A Computational Analysis of Fanfiction</title>
		<author>
			<persName><forename type="first">M</forename><surname>Smitha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bamman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016</title>
				<meeting>the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016<address><addrLine>Austin, Texas, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016-11-01">2016. November 1-4, 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Multi-task Approach to Predict Likability of Books</title>
		<author>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes-Y-Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Suraj Maharjan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fabio</forename><forename type="middle">A</forename><surname>Ovalle</surname></persName>
		</author>
		<author>
			<persName><surname>González</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics</title>
				<meeting>the 15th Conference of the European Chapter of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1217" to="1227" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Linguistic profiling for author recognition and verification</title>
		<author>
			<persName><forename type="first">H</forename><surname>Van Halteren</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Association for Computational Linguistics</title>
				<meeting>the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="200" to="207" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
