<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Dynamic Parameter Search for Cross-Domain Authorship Attribution Notebook for PAN at CLEF 2018</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Benjamin</forename><surname>Murauer</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Universität Innsbruck</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Michael</forename><surname>Tschuggnall</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Universität Innsbruck</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Günther</forename><surname>Specht</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Universität Innsbruck</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Dynamic Parameter Search for Cross-Domain Authorship Attribution Notebook for PAN at CLEF 2018</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">48FE37C3AB6EF455A4BBA175E7CABA12</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T02:32+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we present our solution to the PAN workshop challenge of authorship attribution. In multiple sub-problems, the original authors from given documents have to be chosen from a fixed training set. The core of our approach lies in traditional character n-gram analysis in combination with a linear SVM as a classifier. To find optimal values for our model parameters, we combined predefined parameters determined by a preliminary run on a training set run with dynamically determined parameters from an ad-hoc grid search approach.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The 2018 PAN workshop <ref type="bibr" target="#b11">[12]</ref> features an authorship attribution task <ref type="bibr" target="#b2">[3]</ref> that consists of multiple sub-problems. The main challenge of the task lies in two characteristics:</p><p>First, the training documents for each author cover a different domain than the testing documents. All texts are chosen from different fan fiction domains, in which fans of a specific author, novel, movie, TV-show, etc. produce new content while adhering to the original work's environment and atmosphere. Typical examples of such domains, which will be referred to as fandom in the remainder of this paper, are Harry Potter or Star Wars. The testing part of each dataset contains at least one text from each author, but only covers one fandom. The training part covers multiple fandoms that are distinct from the evaluation fandom, whereby the same number of documents is provided for each author. Table <ref type="table" target="#tab_0">1</ref> shows an exemplary schematic overview of the task. It can be seen that every sub-problem can have a varying amount of authors, and each author can have different fandoms (displayed as capital letters) available for training.</p><p>Secondly, the sub-problems cover a variety of different languages and sizes. Documents within a sub-problem are all in the same language, and the sub-problems themselves can be English, French, Italian, Polish or Spanish.</p><p>For developing a classification model, the organizers of the challenge have provided a development dataset. The characteristics of this dataset can be seen in Table <ref type="table" target="#tab_1">2</ref>. Each contestant is given a virtual machine on the tira web service <ref type="bibr" target="#b5">[6]</ref>, on which the model can run. To prevent cheating, evaluation runs must be initiated with a web-application, and any external network connections are cut off from the virtual machine once started.</p><p>For evaluation, the macro-F1-score is computed for each sub-problem, and the arithmetic mean of these scores represents the final score of a contestant's model. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>In 2011 and 2012, similar authorship attribution tasks have been held an PAN <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. One of the main differences was that the dataset used consisted of single-topic (or domain) documents. Sapkota et al. <ref type="bibr" target="#b7">[8]</ref> show that the cross-topic nature of datasets increases the difficulty for classification tasks for many traditional models. Furthermore, they demonstrate that by increasing the number of topics that are used for training (while still not having the testing topic(s) available), the accuracy of their model can be increased significantly.</p><p>The use of character n-grams has been proven to be an efficient feature for authorship attribution in multiple works <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b8">9]</ref>. Variations to this feature (e.g., by distorting the text <ref type="bibr" target="#b10">[11]</ref> or distinguishing the relative position of each n-gram inside each word <ref type="bibr" target="#b6">[7]</ref>) can improve a model even further. Markov et al. <ref type="bibr" target="#b4">[5]</ref> showed that character-n-gram-based models can be used efficiently for cross-topic authorship attribution, and preprocessing the corpora can improve the performance of cross-topic models.</p><p>Along the line of these works, the main focus of this work is to build a generalpurpose model based on character n-grams that is able to perform well on different languages and topics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Methodology</head><p>To make the start of the challenge easier for the contestants, a baseline script was provided along with the training dataset. It uses the scikit-learn python machine learning library <ref type="foot" target="#foot_0">1</ref> and character 3-gram frequencies in combination with a linear SVM. Based on this approach, we implemented several improvements.</p><p>Originally, we wanted to implement a dynamic search for optimal parameters for each sub-problem individually. Therefore, we made use of the grid search tool shipped with scikit-learn, which tries all combinations of given parameter ranges in a bruteforce fashion. The detailed values of the tested parameters are listed in Table <ref type="table" target="#tab_2">3</ref>. All runs were performed with 5-fold cross validation, while optimizing the f1_macro target.</p><p>This way, an important limitation of the model is its computational complexity. Even with a small set of possible parameter values, the number of possible combinations increases quickly. For example, given the parameter combinations in Table <ref type="table" target="#tab_2">3</ref> and the 5-fold cross validation, each sub-problem is trained 10,800 times before one parameter combination is selected for predicting the testing data. This made this approach difficult for the evaluation run, where no information regarding the corpus size was given. Therefore, we limited the complete grid-search procedure to the development dataset and fixed the parameters that showed the most similar values for multiple problems. For example, the parameter strip accents, which if set, removes accents from letters, was chosen to be true for most of the sub-problems of the development dataset by the grid search. For the sub-problems with the parameter set to false, we were not able to determine which characteristic of the sub-problem caused the change of the parameter. Therefore, a majority vote was performed instead and the parameter was set to true for the entire evaluation set.</p><p>Two remaining parameters (i.e., the minimal document frequency of character ngrams and their size), that did not show a clear majority among the sub-problems, were left in the grid search to be determined dynamically for each part of the evaluation set, yielding 6 × 3 = 18 possible parameter combinations for each sub-problem. The fixed values are displayed in bold in Table <ref type="table" target="#tab_2">3</ref>, whereas the per-problem trained parameters are printed in italics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Results</head><p>The results of the development dataset can be seen in Table <ref type="table" target="#tab_3">4a</ref>. As expected, it can be seen that the (even numbered) shorter sub-problems (containing less authors) are generally easier to classify than the bigger ones. No significant differences between the different languages can be detected. For overall evaluation and comparison between the contestants, the arithmetic mean of all macro-F1-scores is used, which is displayed at the bottom of the table.</p><p>While the details of the evaluation dataset are not available at the time of writing this paper, the evaluation results are visible for the author. In Table <ref type="table" target="#tab_3">4b</ref>, the F1-scores for the evaluation dataset are displayed. The sparse information available suggests that problems 13,15 and 16 seem to be especially hard for our model and have lessened our total score notably. However, no implications can be made on the reasons for this performance at this point.</p><p>Table <ref type="table" target="#tab_4">5</ref> shows the final ranking of the contestants. It can be seen that our solution reached the second rank, clearly beating the baseline provided by the task organizers. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>In this paper, a combination of precomputed parameters and a dynamic grid search is used for the task of cross-fandom authorship attribution. Our method relies on traditional character n-gram analysis, which uses specific parameters for each sub-problem, which are found by a standard grid-search approach. Although the methodology is simple in its approach, we were able to reach the second place on the PAN task leader board. Given more time and resources, more parameters could be optimized at runtime rather than pre-calculating a sensible default value.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Exemplary task structure. Capital letters denote possible fanfic domains.</figDesc><table><row><cell cols="3">Problem 1: English</cell><cell cols="3">Problem 2: English</cell><cell cols="3">Problem 10: Spanish</cell></row><row><cell cols="3">author train test</cell><cell>author</cell><cell>train</cell><cell>test</cell><cell cols="2">author train</cell><cell>test</cell></row><row><cell>1</cell><cell>A, D</cell><cell>B</cell><cell>1</cell><cell>G, D</cell><cell>A . . .</cell><cell>1</cell><cell>A, D</cell><cell>C</cell></row><row><cell>2</cell><cell>E</cell><cell>B</cell><cell>2</cell><cell cols="2">B, E, F A</cell><cell>2</cell><cell>E</cell><cell>C</cell></row><row><cell>3</cell><cell>A, E</cell><cell>B</cell><cell>3</cell><cell>C, G</cell><cell>A</cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell>4</cell><cell>B</cell><cell>A</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Training dataset characteristics</figDesc><table><row><cell>Problem</cell><cell>Language</cell><cell>Training docs</cell><cell>Testing docs</cell><cell>Avg. words/doc</cell><cell>Authors</cell></row><row><cell>p01</cell><cell>EN</cell><cell>140</cell><cell>105</cell><cell>777</cell><cell>20</cell></row><row><cell>p02</cell><cell>EN</cell><cell>35</cell><cell>21</cell><cell>782</cell><cell>5</cell></row><row><cell>p03</cell><cell>FR</cell><cell>140</cell><cell>49</cell><cell>774</cell><cell>20</cell></row><row><cell>p04</cell><cell>FR</cell><cell>35</cell><cell>21</cell><cell>782</cell><cell>5</cell></row><row><cell>p05</cell><cell>IT</cell><cell>140</cell><cell>80</cell><cell>787</cell><cell>20</cell></row><row><cell>p06</cell><cell>IT</cell><cell>35</cell><cell>46</cell><cell>807</cell><cell>5</cell></row><row><cell>p07</cell><cell>PL</cell><cell>140</cell><cell>103</cell><cell>807</cell><cell>20</cell></row><row><cell>p08</cell><cell>PL</cell><cell>35</cell><cell>15</cell><cell>788</cell><cell>5</cell></row><row><cell>p09</cell><cell>ES</cell><cell>140</cell><cell>117</cell><cell>829</cell><cell>20</cell></row><row><cell>p10</cell><cell>ES</cell><cell>35</cell><cell>64</cell><cell>851</cell><cell>5</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Parameters tested with grid search. Parameters with results in bold face are fixed for each problem. Parameters written in italics are dynamically adapted.</figDesc><table><row><cell>Feature Parameter</cell><cell>Range tested</cell></row><row><cell>minimal document frequency</cell><cell>0 -5</cell></row><row><cell>n-gram order</cell><cell>3, 4, 5</cell></row><row><cell>lowercase</cell><cell>true, false</cell></row><row><cell>use idf normalization</cell><cell>true, false</cell></row><row><cell>strip accents</cell><cell>true, false</cell></row><row><cell>norm</cell><cell>None, L1, L2</cell></row><row><cell>svm C</cell><cell>0.01, 0.1, 1, 10, 100</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>F1-scores of the final solution</figDesc><table><row><cell cols="2">(a) Development dataset</cell><cell></cell><cell cols="2">(b) Evaluation dataset</cell><cell></cell></row><row><cell>Problem</cell><cell>F1-score</cell><cell>Problem</cell><cell>F1-score</cell><cell>Problem</cell><cell>F1-score</cell></row><row><cell>p01</cell><cell>0.565</cell><cell>p01</cell><cell>0.73</cell><cell>p11</cell><cell>0.841</cell></row><row><cell>p02</cell><cell>0.774</cell><cell>p02</cell><cell>0.689</cell><cell>p12</cell><cell>0.534</cell></row><row><cell>p03</cell><cell>0.658</cell><cell>p03</cell><cell>0.8</cell><cell>p13</cell><cell>0.473</cell></row><row><cell>p04</cell><cell>0.777</cell><cell>p04</cell><cell>0.83</cell><cell>p14</cell><cell>0.527</cell></row><row><cell>p05</cell><cell>0.690</cell><cell>p05</cell><cell>0.55</cell><cell>p15</cell><cell>0.369</cell></row><row><cell>p06</cell><cell>0.588</cell><cell>p06</cell><cell>0.608</cell><cell>p16</cell><cell>0.432</cell></row><row><cell>p07</cell><cell>0.549</cell><cell>p07</cell><cell>0.609</cell><cell>p17</cell><cell>0.631</cell></row><row><cell>p08</cell><cell>0.867</cell><cell>p08</cell><cell>0.662</cell><cell>p18</cell><cell>0.771</cell></row><row><cell>p09</cell><cell>0.791</cell><cell>p09</cell><cell>0.659</cell><cell>p19</cell><cell>0.783</cell></row><row><cell>p10</cell><cell>0.827</cell><cell>p10</cell><cell>0.616</cell><cell>p20</cell><cell>0.75</cell></row><row><cell>Average</cell><cell>0.708</cell><cell></cell><cell>Average</cell><cell>0.643</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 :</head><label>5</label><figDesc>Final ranks</figDesc><table><row><cell>Contestant</cell><cell>Mean F1-score</cell></row><row><cell>custodio18</cell><cell>0.685</cell></row><row><cell>murauer18</cell><cell>0.643</cell></row><row><cell>halvani18</cell><cell>0.629</cell></row><row><cell>. . .</cell><cell></cell></row><row><cell>baseline</cell><cell>0.584</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://scikit-learn.org</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of the international authorship identification competition at PAN-2011</title>
		<author>
			<persName><forename type="first">S</forename><surname>Argamon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Juola</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF (Notebook Papers/Labs/Workshop</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">An Overview of the Traditional Authorship Attribution Subtask</title>
		<author>
			<persName><forename type="first">P</forename><surname>Juola</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF (Online Working Notes/Labs/Workshop</title>
				<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Overview of the Author Identification Task at PAN-2018: Cross-domain Authorship Attribution and Style Change Detection</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kestemont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tschugnall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Daelemans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Specht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<ptr target=".org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes Papers of the CLEF 2018 Evaluation Labs</title>
		<title level="s">CEUR Workshop Proceedings, CLEF and CEUR-WS</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Cappellato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">Y</forename><surname>Nie</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Soulier</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2018-09">Sep 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">The effect of author set size and data size in authorship attribution</title>
		<author>
			<persName><forename type="first">K</forename><surname>Luyckx</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Daelemans</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Literary and Linguistic Computing</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="35" to="55" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Improving Cross-Topic Authorship Attribution: The Role of Pre-Processing</title>
		<author>
			<persName><forename type="first">I</forename><surname>Markov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sidorov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 18th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing</title>
				<meeting>the 18th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing</meeting>
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Improving the Reproducibility of PAN&apos;s Shared Tasks: Plagiarism Detection, Author Identification, and Author Profiling</title>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gollub</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Information Access Evaluation meets Multilinguality, Multimodality, and Visualization. 5th International Conference of the CLEF Initiative (CLEF 14</title>
				<editor>
			<persName><forename type="first">E</forename><surname>Kanoulas</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Lupu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Clough</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Sanderson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Hall</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Hanbury</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Toms</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014-09">Sep 2014</date>
			<biblScope unit="page" from="268" to="299" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Not All Character N-grams Are Created Equal: A Study In Authorship Attribution</title>
		<author>
			<persName><forename type="first">U</forename><surname>Sapkota</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bethard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human language Technologies</title>
				<meeting>the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human language Technologies</meeting>
		<imprint>
			<date type="published" when="2015-06">Jun 2015</date>
			<biblScope unit="page" from="93" to="102" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Cross-Topic Authorship Attribution: Will Out-Of-Topic Data Help?</title>
		<author>
			<persName><forename type="first">U</forename><surname>Sapkota</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bethard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th International Conference on Computational Linguistics (COLING&apos;</title>
				<meeting>the 25th International Conference on Computational Linguistics (COLING&apos;</meeting>
		<imprint>
			<date type="published" when="2014-08">2014. Aug 2014</date>
			<biblScope unit="page" from="1228" to="1237" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A Survey of Modern Authorship Attribution Methods</title>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Society for Information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="538" to="556" />
			<date type="published" when="2009-03">March 2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">On the Robustness of Authorship Attribution Based on Character N-Gram Features</title>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Law &amp; Policy</title>
		<imprint>
			<biblScope unit="page" from="421" to="439" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Authorship Attribution Using Text Distortion</title>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL&apos;</title>
				<meeting>the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL&apos;</meeting>
		<imprint>
			<date type="published" when="2017-04">2017. Apr 2017</date>
			<biblScope unit="page" from="1138" to="1149" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Overview of PAN-2018: Author Identification, Author Profiling, and Author Obfuscation</title>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tschuggnall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kestemont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. 9th International Conference of the CLEF Initiative (CLEF 18)</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Bellot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Trabelsi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Mothe</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Murtagh</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Nie</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Soulier</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Sanjuan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Cappellato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018-09">Sep 2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
