-

VEBAV - A Simple, Scalable and Fast Authorship Verification Scheme

Oren Halvani?

Martin Steinebach

c@1 0 0 Fraunhofer Institute for Secure Information Technology SIT Rheinstrasse 75 , 64295 Darmstadt , Germany

2014

1049 1062

We present VEBAV - a simple, scalable and fast authorship verification scheme for the Author Identification (AI) task within the PAN-2014 competition. VEBAV (VEctor-Based Authorship Verifier), which is a modification of our existing PAN-2013 approach, is an intrinsic one-class-verification method, based on a simple distance function. VEBAV provides a number of benefits as for instance the independence of linguistic resources and tools like ontologies, thesauruses, language models, dictionaries, spellcheckers, etc. Another benefit is the low runtime of the method, due to the fact that deep linguistic processing techniques like POS-tagging, chunking or parsing are not taken into account. A further benefit of VEBAV is the ability to handle more as only one language. More concretely, it can be applied on documents written in Indo-European languages such as Dutch, English, Greek or Spanish. Regarding its configuration VEBAV can be extended or modified easily by replacing its underlying components. These include, for instance the distance function (required for classification), the acceptance criterion, the underlying features including their parameters and many more. In our experiments we achieved regarding a 20%-split of the PAN 2014 AI-training-corpus an overall accuracy score of 65,83% (in detail: 80% for Dutch-Essays, 55% for Dutch-Reviews, 55% for English-Essays, 80% English-Novels, 70% for GreekArticles and 55% for Spanish-Articles).

Authorship verification one-class-classification

Authorship Verification (AV ) is a sub-discipline of Authorship Analysis [4, Page: 3], which itself is an integral part of digital text forensics. It can be applied in many forensic scenarios as for instance checking the authenticity of written contracts, threats, insults, testaments, etc. where the goal of AV remains always the same: Verify if two documents DA and DA? are written by the same author A, or not. An alternative reformulation of the goal is to verify the authorship of DA? , given a set of sample documents of A. From a Machine Learning perspective AV clearly forms an one-class-classification problem [3], due to the fact that A is the only target class to be distinguished among all other possible classes (authors), where their number can be theoretically infinite. ? Corresponding author.

In order to perform AV at least four components are mandatorily required: – The document DA? , which should be verified regarding its alleged authorship. – A training set DA = fD1A; D2A; ...g, where each DiA represents a sample document of A. – A set of features F = ff1; f2; ...g, where each fj (style marker) should help to model the writing style of DA? and each DiA 2 DA. – At least one classification method, which accepts or rejects the given authorship based on F and a predefined or dynamically determined threshold .

The aim of this paper is to provide a simple, scalable and fast AV scheme, which offers a number of benefits as for instance promising detection rates, easy implementation, low runtime, independence of language or linguistic resources as well as easy modifiability and expandability. Our proposed AV scheme, denoted by VEBAV, is based on our earlier approach regarding the PAN 2013 AI-task, which itself formed a modification of the Nearest Neighbor (NN) one-class classification technique, described by Tax in [5, Page: 69]. In a nutshell, VEBAV takes as an input a set of sample documents of a known author (DA) and one document of an unknown author (DA? ). All documents in DA are first concatenated to a big document which is then splitted again into (near) eqal-sized chunks (such that DA includes now only these chunks). Afterwards, feature vectors are constructed from each DiA 2 DA and from DA? according to preselected feature sets. Next, a representative is selected among the training feature vectors (based on two options), which is important to determine the decision regarding the alleged authorship. In the next step distances are calculated between the representative and the remaining training feature vectors, as well as between the representative and the test feature vector. Depending on all calculated distances a twofold decision regarding the alleged authorship is generated, which includes a ternary decision 2 fYes, No, Unansweredg and a probability score that describes the soundness of .

The rest of this paper is structured as follows. In the following section we explain how we extract features from the corresponding linguistic layers and which features have been used in our approach. Afterwards, we describe in section 3 our verification scheme. In section 4 we present the corpus that was used to evaluate our scheme, while in section 5 we show our evaluation results. Finally, we draw our conclusions and the challenges we were faced in section 6 and provide some ideas for future work. 2

Features

Features (style markers) are the core of AV , since they are able to approximate writing styles and thus, can help to judge automatically if two texts have a very similar style. If this holds, it is an indicator that both texts could be written by the same author. In the next subsections we explain which sources exactly quantifiable features can be retrieved from, which tools are required for this extraction process and also what kind of feature sets we used in our approach. 2.1

Linguistic layers From a linguistic point of view, features are extracted from so-called linguistic layers that can be understood as abstract units within a text. In summary, the most important linguistic layers are the following: – Phoneme layer: This layer includes phoneme-based features as for example vowels, consonants or also the more sophisticated supra-segmental or prosodic features. Such features can typically be won out of texts by using (pronouncing) dictionaries (e.g. IPA). – Character layer: This layer includes character-based features as for instance prefixes, suffixes or letter n-grams, which typically are extracted from texts via regular expressions. – Lexical layer: This layer includes token-based features as for instance function words or POS-Tags (Part-Of-Speech Tags). These features can be extracted from texts via tokenizers (which often are based on simple regular expressions). – Syntactic layer: This layer includes syntax-based features as for instance constituents (e.g. nominal phrases) or collocations. Such features can be extracted by sophisticated regular expressions or by natural language processing tools (e.g. POSTagger). However, the latter one is normally bounded to a specific language model and thus, cannot scale to multiple languages. Besides this the runtime of natural language processing tools is much more higher as the runtime caused by pattern matching via regular expressions. – Semantic Layer: This layer includes semantic-based features, e.g. semantic relations (hyponymous, synonymys, meronyms, etc.). Such features require deep linguistic processing, which often rely on external knowledge resources (e.g. WordNet) or complex tools (e.g. parsers, named entity eecognizers, etc.).

In VEBAV we only make use of the Character, Lexical and Syntactic linguistic layers, due to the effectiveness of their underlying features and the low runtime, caused by the feature extraction process via regular expressions. 2.2

Feature sets In this paper we use the term feature set, denoted by F , which represents features belonging to one or more linguistic layers. In Table 1 we list 14 feature sets that have been used in our experiments. Here, one should pay attention that F12, F13 and F14 form mixtures of existing feature sets. The idea behind it was to see if such mixtures can outperform1 single feature sets when it comes to classification. 1 This was in fact the case, as can be observed in the later experiments (section 5.3).

Fi Feature set Description Examples F1 Characters All kind of characters fa,b,1,8,#,<,%,!,: : :g F2 Letters All kind of letters fa,b, , ,ä,ß,ó,á,ñ,: : :g F3 Punctuation marks Symbols that structure sentences f.,:,;-,",(,),: : :g F4 Word k Prefixes The k starting letters of words example fe,ex,exa,exam,: : :g F5 Word k Suffixes The k ending letters of words example fe,le,ple,mple,: : :g F6 Character n-grams Overlapping character-fragments ex-ample fex-,x-a,-am,: : :g F7 Letter n-grams Overlapping letter-fragments ex-ample fexa,xam,amp,: : :g F8 Tokens Segmented character-based units A [sample] text! fA,[sample],text!g F9 Words Segmented letter-based units A [sample] text! fA,sample,textg F10 Token n-grams Overlapping token-fragments Wind and rain! fWind and, and rain!g F11 Word n-grams Overlapping word-fragments Wind and rain! fWind and, and raing F12 Mix1 A mix of three feature sets F1 [ F3 [ F6 F13 Mix2 A mix of three feature sets F1 [ F2 [ F5

F14 Mix3 A mix of four feature sets F3 [ F4 [ F5 [ F8 As can be seen in Table 1, six feature sets can be parameterized by n (n-gram sizes) and/or k (length of prefixes/suffixes). It should be emphasized that adjusting these two parameters can influence the results massively. Hence, we generalized both settings by using constant values (n = k = 3), learned from earlier experiments, which turned out to be reliable also in the current experiments among the diverse PAN subcorpora.

3 Proposed Verification Scheme

In this section we give a detailed description of our AV scheme VEBAV. For overview reasons we divided the entire workflow of the algorithm into six subsections, where we first explain what kind of preprocessing we perform on the data. The other five subsections focus on the algorithm itself. 3.1

Preprocessing In contrast to our approach in PAN-2013 we decided in our current approach neither to apply normalization nor noise reduction techniques. Instead, we treat each text as it is, which turned out to be not only less burdensome but also promising. Our only preprocessing mechanism is restricted to concatenate all DiA 2 DA to a single document DA which, depending on the length, is splitted again into ` (near) equal-sized chunks D1A; D2A; ...; D`A. Note that in our experiments ` is statically set to five chunks if, and only if, the length of DA is above 15,000 characters, otherwise ` is statically set to three chunks. The reason for this decision is that we observed quite better results in contrast by using only one fixed ` for both, short or long texts. 3.2

Vocabulary Generation In order the form a basis for the construction of feature vectors, we need to build a global feature vocabulary, denoted by V. But beforehand, we first need to select at least one (appropriate) feature set F , to know which sort of features should be taken into account. Here, appropriate refers to features that...

– ... are able to model high similarity between DA? and DA (for the case A? = A). – ... are able to discriminate well between the writing style of A and all other possible authors (for the case A? 6= A). – ... appear in all generated vocabularies.

Afterwards, we construct from DA? and each chunk DiA 2 DA the corresponding feature vocabularies VDA? and VD1A ; VD2A ; ...; VD`A . As a last step we apply an intersection among all constructed vocabularies to build the global vocabulary V:

V = VDA? \ VD1A \ VD2A \ ... \ VD`A

Note that in VEBAV we consider case-sensitive features, such that word 2 V and Word 2 V are both valid. 3.3

Constructing feature vectors Once V is generated, the next step is to construct the feature vectors F1A; F2A; ...; F`A from each DiA 2 DA and FA? from DA? . The construction process regarding these vectors is very straightforward. We look up for each fi 2 V how often it occurs in some document D and denote its absolute frequency by i. Next, we normalize i by the length of D, to obtain its relative frequency, which represents a number xi 2 [0 ; 1]. Hence, the feature vector representation regarding D is formally defined by: F = (x1; x2; ...; xn) , with n = jVj . ( 1 ) ( 2 ) 3.4

Representative Selection After constructing all feature vectors, a representative (training-) feature vector Frep must be selected. This step is essential for the later determination of the decision regarding the alleged authorship. In general, VEBAV offers two options to select Frep: 1. Selecting Frep dynamically by using a similarity function (e.g. cosine similarity) between all training feature vectors. Here, Frep is selected according to the feature vector, who is mostly dissimilar from the others. In other terms Frep can be understand as an outlier. 2. Selecting Frep manually (static).

Note that in our experiments we chose the first option. 3.5

Distances Calculations In this step we calculate the distances, needed for the decision determination regarding the alleged authorship. Concretely, we calculate the distances d1; d2; ...; d` between Frep and each F1A; F2A; ...; F`A as well as the the distance d? between Frep and FA?. The calculation of these distances requires a predefined distance function dist(X; Y ) with X = Frep and Y 2 fF1A; F2A; ...; F`Ag. We implemented a broad range of distance functions in VEBAV, where the majority have been taken from [1]. However, for complexity reasons we used only the Minkowski distance function in our experiments, which is defined formally as: dist(X; Y ) = jxi yij 1 , with 2 R+ n f0g ;

As a last step we calculate the average regarding the distances between Frep and each F1A; F2A; ...; F`A as follows: d? =

(d1 + d2 + : : : + d`) n X i=1 The goal of this step is to construct a twofold decision regarding the alleged authorship, which consists of a ternary decision 2 fYes, No, Unansweredg and a probability score . In order to calculate these terms both values are required d? and d?. For the latter one we use the following adjusted form: d 0 = d? + (! ), where ! denotes a weight and a tolerance parameter, calculated from the a standard deviation of d1; d2; ...; d`. The idea behind ! and is to cope with the presence of noisy writing styles, that might be the result of mixing different sample documents of A together in his training set DA. Note that in our experiments we chose a neutral weight ( = 1), since we could not investigate an optimal value for it (even adjustig it by 0.1 can cause a massive drop regarding the classification results). With these we first define the probability score as: ( 3 ) ( 4 ) ( 5 ) (6) Next, we define the ternary decision as follows: = 8Yes; > <

No; >:Unanswered; (d? = d 0) _ (jd? d 0j < ")

The concrete semantic behind is: – Yes: VEBAV accepts the alleged author as the true author (A? = A). – No: VEBAV rejects the alleged author as the true author (A? 6= A). – Unanswered: VEBAV is unable to generate a prediction because d? and d 0 are equal/near-equal or due to another unexpected result. In any case is set to 0.5. Note that depending on how " was chosen (regarding the case that d? and d 0 are near-equal) the number of Unanswered decisions, in the context of a corpus evaluation, can vary considerably. In our experiments we chose " = 0; 001 as this small value restricts the number of Unanswered decisions, where is near 0.5. Regarding our experiments we used the official PAN-2014 Author Identification training corpus, denoted by CPAN-14, which has been released2 by the PAN organizers on April 22, 2014. CPAN-14 consists of 695 problems (in total 2,382 documents), equally distributed regarding true/false authorships. A problem pi forms a tuple (DAi ; DAi ?), where DAi denotes the training set of Ai and DAi ? the questioned document, which is (or not) written by Ai. Each problem belongs to one of four languages (Greek, Spanish, Greek and Spanish and to one of four genres (Essays, Reviews, Novels and Articles). For simplification reasons, CPAN-14 is divided into six subcorpora and thus, can be formulated as CPAN-14 = fCDE; CDR; CEE; CEN ; CGR; CSPg. This makes it easier to treat each subcorpus independently (e.g. in terms of parameterizations). The full name of each C 2 CPAN-14 is given in Table 2.

For our experiments we used a 80% portion of CPAN-14 for training/parameter learning (denoted by CPAN-Train), while the remaining 20% portion was used for testing (denoted by CPAN-Test). Regarding CPAN-Test we used the same structure as CPAN-14 such that CPAN-Test = fCDE; CDR; CEE; CEN ; CGR; CSPg holds, where the number of problems in each C 2 CPAN-Test equals 20% of the problems in each C 2 CPAN-Train. The concrete statistic including the distribution of true/false authorships is given in Table 3. In this section we carry out our evaluation regarding the CPAN-14 = CPAN-Train [ CPAN-Test corpus. First we explain which performance measures were used and secondly, how 2 The corpus can be downloaded from: http://pan.webis.de (accessed on June 26, 2014). (7) (8) (9) (10) the most important parameters were learned from CPAN-Train. Finally, we evaluate our approach on CPAN-Test. In order to evaluate our approach, we used several performance measures, sharing the following variables: m = Number of problems in C mc = Number of correct answers per C mu = Number of unanswered problems answers per C

The first performance measure is Accuracy = mance measure we first need to define two terms: mc , whereas for the second perform m1 Xm i , i=1 1 m AUC = mc + mu mc m

Here, i denotes the probability score regarding its corresponding problem pi, which was defined in section 4. With these two measures we define the second performance measure: AUC c@1. Note that for parameter learning from the CPAN-Train corpus we only used the Accuracy measure, as (in our opinion) this measure is better interpretable. 5.2

Experiment I: Finding an optimal for the Minkowski distance function The intention behind this experiment was to find an optimal parameter, used by the Minkowski distance function. Since has a strong influence on the classification result it must be well-chosen, in order to generalize across the range of all involved corpora and all feature sets. To achieve this generalization we merged all training subcorpora, such that CPAN-Train = CDE [ CDR [ CEE [ CEN [ CGR [ CSP holds. Afterwards, we applied VEBAV on CPAN-Train, where as an input we used all mentioned feature sets in Table 1 and the following 14 predefined values f0:2; 0:4; 0:6; 0:8; 1; 2; ...; 10g. We constructed from the results a table, where the rows represent the feature sets F1; F2; : : : ; F14, while the columns represent the 14 values. Next, we derived a row from this table that includes the medians regarding all columns. This row is illustrated in Figure 1. As can be seen in Figure 1, an optimal range for is [0.2; 1], where 0:6 seems to be the most promising one in terms of robustness, among all involved feature sets. As a consequence of this analysis, we decided to use = 0:6 for the further experiments. 5.3

Experiment II: Determinig the classification stregth of all feature sets In this experiment we wanted to compare the classification stregth of all involved feature sets. For this we applied VEBAV on F1; F2; : : : ; F14 regarding the six subcorpora in CPAN-Train, where this time we used the fixed setting = 0:6. From the resulting table (rows = feature sets, columns = subcorpora) we calculated for each row (containing 100 90 80 70 60 50 40 30 20 10 0

56,42 56,25 the classification results regarding all six subcorpora) the median, which gave us a new column, illustrated in Figure 2. Here, it can be seen that the majority of all feature sets are more or less equally strong, excepting F10 and F11, which seem to be useless for VEBAV (at least for CPAN-Train). Furthermore, it can be observed that using mixed feature sets lead to slightly better classification results, compared with the majority of the non-mixed feature sets.

In order to get a better picture of how VEBAV performs without considering the medians among the six subcorpora, we show in Figure 3 a comparison of the top three performing feature sets for each individual subcorpus. One interesting observation that can be concluded from Figure 3 is that the feature set F1 (characters) is almost as strong as the mixed feature sets, which involve more sophisticated features (such as character n-grams or tokens). This shows that by using only characters as features, it is possible to verify authorships with a classification result, which is even better as a random guess (50%). Another observation, which is worth to be mentioned, is the fact that the greek corpus seems to contain problems that are more difficult to judge, compared to the problems in the other subcorpora, where the classification results are obviously better. 100 90 80 70 60 50 40 30 20 10 0

52,5 DE

We believe that this is not a language-based issue, but more an effort of the organizers to make the task more challenging, as this was also the case in PAN-2013 [2].

Experiment III: Single feature sets vs. feature set combinations In this experiment we were curious to know, if using combinations of feature sets by applying majority-voting can outperform classifications based only on single feature sets. As a setup for this experiment, we picked out the six most promising feature sets fF1; F2; F5; F12; F13; F14g and used them to construct a power set P, which includes 26 = 64 feature set combinations. Next, we removed those subsets Fcombi 2 (P n ;) comprising of an even number of feature set combinations, to enable a fair (non-random based) majority-voting and also to speed the classification process up a little by avoiding unnecessary runs. This led to 25 = 32 suitable combinations Fcomb1; Fcomb2; ...; Fcomb32, where we applied each Fcomb as an input for VEBAV regarding CPAN-Train. Next, we stored all combinations and their corresponding classification results in a list (sorted in descending order) and selected the top five combinations, listed in Table 4. Unfortunately, it can be observed from the comparison between Table 4 and Figure 2 that applying majority-voting on feature set combinations gives only negligible improvements ( 12%) for the most cases. When focussing on F12 in Figure 2 it can be even seen that its median accuracy of 62:50% outperfroms any sort of feature set combination. One possible reason for the unsatisfactory results could be the fact that many single feature sets made identical ternary decisions (Yes, No, Unanswered), such that applying majorityvoting is only effectively in few cases. We observed this phenomenon several times by stepping through the code, using the debugger. Hence, we do not consider the usage of feature set combinations for further evaluations in this paper. 5.5

Experiment IV: Obtaining corpus dependent parameters Due to the fact that the classification scores regarding the Experiments I-III were relatively low, we decided in this experiment to learn individual parameters from each corpus C 2 fCDE; CDR; CEE; CEN ; CGR; CSPg and to thereby improve the classification results. For this we first applied VEBAV on each C to obtain individual scores. Since there where six corpora, we constructed six tables, where the rows denote F1; F2; ...; F14 and the coloumns the 14 values. Then, we picked those six tuples (Fi; j ), which led to the maximum accuracy score in each table. These tuples are listed in Table 5. One can see here that it definitely make sense to use corpus-dependent (or more precisely language/genre-dependent) parameters, instead of using a global setting. However, the price for better results may be expensive in terms of overfitting. In this section we evaluate VEBAV on the test set CPAN-Test, where we used all relevant information, learned from the prior experiments. For a better overview we divide the evaluations into three subsections, where we first show how VEBAV performed with generalized parameters then how it performed with individual parameters and finally how it performed regarding the runtime.

Evaluation results regarding generalized parameters. For the first evaluation we set as an input for VEBAV the generalized parameters = 0:6 and F12 that were learned from Experiments I-II. The results are given in Table 6. As can be observed from this table, using generalized parameters seems not to be the best choice, as achieving optimal results was possible for only one subcorpus, while for two subcorpora the results where even lower than a random guess. Moreover, it can be seen that the AUC c@1 scores are very low, which are not only caused by the low accuracies themselves, but

C also by an inappropriate calculation of , performed through linear scaling. However, further investigations with other scaling methods must show if this hypothesis is valid. Evaluation results regarding individual parameters. In the second evaluation we used the individual parameters, learned in Experiment IV. The results are given in Table 7. By looking on the results in this table, we can conclude once again that using an indiC CDE (F12; 0:6) CDR (F12; 0:8) CEE (F12; 1) CEN (F3; 0:6) CGR (F3; 0:2) CSP (F7; 1) 80% 0; 40248 55% 0; 30569 55% 0; 25801125 80% 0; 4146 70% 0; 362425 55% 0; 28072275 vidual parameter setting over a global setting is much more promising. Thus, individual parameter settings should be (among other things) the subject for further investigations, when applying VEBAV on other corpora beyond those of PAN.

Evaluation results regarding runtime. In the third and last evaluation we applied the entire PAN corpus CPAN-14 = CPAN-Train [ CPAN-Test on VEBAV, where we considered only the runtime needed for the clasification, rather than the clasification results. Regarding this evaluation we used a laptop with the following configuration: Intel R CoreTM i53360M processor (2.80 GHz), 16GB RAM, 256GB SSD hard drive.

The runtime (measured in milliseconds) for each feature set and each subcorpus is given in Table 8. As can be seen in the Median column in Table 8, the fastest classification was performed with the feature set F3 (punctuation marks). The reason for this is that F3 leads to very few feature-lookups ( 20 per document), such that IO-accesses are negligible. In contrast to this, F1 leads to the highest runtime (excepting F12 and F13 since they are mixtures of existing sets), due to the fact that it requires an iteration over each character in a text. Another interesting observation is that token-based features (F8 11) also require very low runtime. This is explained by the fact that there are much less tokens in texts than characters and thus, the number of lookups is also very limited.

The overall runtime for CPAN-14, without considering the subcorpora, is given in Figure 4. An interesting observation that can be made by comparing Figure 4 to Fig13230

20808 1019 2147 2423

F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 ure 2 is that F5 can be seen as an optimal candidate when a trade-off between a high classification result and low runtime must be taken into account. Here, F5 achieves a classification result of 59; 96% by requiring only 2,423 milliseconds. F8 behaves in a similar way (55; 56% accuracy by requiring only 1,830 milliseconds). 70000 60000 50000 40000 20000 10000

0 30000 27303 21708 46140 58424

8048

Conclusion & future work

In this paper we presented a simple, scalable and fast authorship verification scheme for the Author Identification task within the PAN-2014 competition. Our method provides a number of benefits as for example that it is able to handle (even very short) texts written in several languages, across different kinds of genre. Besides this, the method is independent of linguistic resources such as ontologies, thesauruses, language models, etc. A further benefit is the low runtime of the method, since there is no need for deep linguistic processing like POS-tagging, chunking or parsing. Another benefit is that the involved components within the method can be replaced easily as for example the distance function (required for classification), the acceptance-threshold or the feature sets including their parameters. Moreover, the classification itself can be modified easily, e.g. by using an ensemble of several distance functions.

Unfortunately, besides benefits our approach has several pitfalls too. One of the biggest challenges, for example, is the inscrutability of the methods parameter-space, due to the fact that the number of possible configuration settings is near infinite. Such settings include for instance the parameter for the involved distance function, the values for the n and k parameters, the weight (!) and tolerance ( ) parameters that influence the classification quality but also other options such as ` (the number of chunks) or the used feature normalization strategy. Due to the complexity of our scheme we could only perform a small number of experiments to obtain at least an optimal and the most promising feature sets. This, however, was done only for the official PAN corpus but not for other corpora. Thus, it had not been proved if the learned parameters are able to perform satisfactorily on other corpora. Another challenge that remains unsolved is how to optimize the probability score , determined in the decision determination step, as this value has also a strong influence on the resulting AUC c@1 scores, which in our case are very low. Hence, further tests are essential for the applicability of our scheme. Acknowledgments. This work was supported by the CASED Center for Advanced Security Research Darmstadt, Germany funded by the German state government of Hesse under the LOEWE program (www.CASED.de).

1. Cha , S.H. : Comprehensive survey on distance/similarity measures between probability density functions . International Journal of Mathematical Models and Methods in Applied Sciences 1 ( 4 ), 300 - 307 ( 2007 )

2. Juola , P. , Stamatatos , E.: Overview of the Author Identification Task at PAN 2013 . In P. Forner,

Navigli , and D. Tufis (eds) CLEF 2013 Evaluation Labs and Workshop - Working Notes Papers ( 2013 )

3. Koppel , M. , Schler , J.: Authorship Verification as a One-Class Classification Problem . In: Proceedings of the twenty-first international conference on Machine learning . pp. 62 -. ICML '04 , ACM , New York, NY, USA ( 2004 )

4. Stamatatos , E.: A Survey of Modern Authorship Attribution Methods . J. Am. Soc. Inf. Sci. Technol . 60 ( 3 ), 538 - 556 ( Mar 2009 )

5. Tax , D.M.J. : One-Class Classification . Concept Learning In the Absence of Counter-Examples. Ph.D. thesis , Delft University of Technology ( 2001 )