<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the Author Identification Task at PAN 2014</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Efstathios Stamatatos</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Walter Daelemans</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ben Verhoeven</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Potthast</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benno Stein</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrick Juola</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miguel A. Sanchez-Perez</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Barrón-Cedeño</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bauhaus-Universität Weimar</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Duquesne University</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Instituto Politécnico Nacional</institution>
          ,
          <country country="MX">Mexico</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Universitat Politècnica de Catalunya</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Antwerp</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University of the Aegean</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <fpage>877</fpage>
      <lpage>897</lpage>
      <abstract>
        <p>The author identification task at PAN-2014 focuses on author verification. Similar to PAN-2013 we are given a set of documents by the same author along with exactly one document of questioned authorship, and the task is to determine whether the known and the questioned documents are by the same author or not. In comparison to PAN-2013, a significantly larger corpus was built comprising hundreds of documents in four natural languages (Dutch, English, Greek, and Spanish) and four genres (essays, reviews, novels, opinion articles). In addition, more suitable performance measures are used focusing on the accuracy and the confidence of the predictions as well as the ability of the submitted methods to leave some problems unanswered in case there is great uncertainty. To this end, we adopt the c@1 measure, originally proposed for the question answering task. We received 13 software submissions that were evaluated in the TIRA framework. Analytical evaluation results are presented where one language-independent approach serves as a challenging baseline. Moreover, we continue the successful practice of the PAN labs to examine meta-models based on the combination of all submitted systems. Last but not least, we provide statistical significance tests to demonstrate the important differences between the submitted approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Authorship analysis has attracted much attention in recent years due to both the rapid
increase of texts in electronic form and the need for intelligent systems able to handle
this information. Authorship analysis deals with the personal style of authors and
includes three major areas:</p>
      <p>
        Author identification: Given a set of candidate authors for whom some texts of
undisputed authorship exist, attribute texts of unknown authorship to one of the
candidates. This can be applied mainly to forensic applications and literary
analysis [
        <xref ref-type="bibr" rid="ref13 ref31">13, 31</xref>
        ].
      </p>
      <p>
        Author profiling: The extraction of demographic information such as gender,
age, etc. about the authors. This has significant applications mainly in market
analysis [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ].
      </p>
      <p>
        Author clustering: The segmentation of texts into stylistically homogeneous
parts. This can be applied to distinguish different authors in collaborative
writing, to detect plagiarism without a reference corpus (i.e., intrinsic plagiarism
detection [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]), and to detect changes in the personal style of a certain author by
examining their works chronologically [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Author identification is by far the most prevalent field of authorship analysis in
terms of published studies. The authorship attribution problem can be viewed as a
closed-set classification task where all possible candidate authors are known. This is
suitable in many forensic applications where the investigators of a case can provide a
specific set of suspects based on certain restrictions (e.g., access to specific material,
knowledge of specific facts, etc.). A more general definition of the authorship
attribution problem corresponds to an open-set classification task where the true
author of the disputed texts is not necessarily included in the set of candidate authors.
This setting is much more difficult in comparison to the closed-set attribution
scenario, especially when the size of the candidate author set is small [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Finally,
when the set of candidate authors is singleton, we get the author verification problem.
This is an even more difficult attribution task.
      </p>
      <p>
        The PAN-2014 evaluation lab continues the practice of PAN-2013 and focuses on
the author verification problem [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. First, this is a fundamental problem in
authorship attribution [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and by studying it we can extract more useful conclusions
about the performance of certain attribution methods. Any author identification task
can be decomposed into a series of author verification problems. Therefore, the ability
of an approach to effectively deal with this task means that it can cope with every
authorship attribution problem. Moreover, in comparison to PAN-2013, we provide a
larger collection of verification problems including more natural languages and
genres. Thus, we can study more reliably the performance of the submitted
approaches under different conditions and test their ability to be adapted to certain
properties of documents. In addition, we define more appropriate performance
measures that are suitable for this cost-sensitive task focusing on the ability of the
submitted approaches to assign confidence scores in their answers as well as their
ability to leave the most uncertain cases unanswered.
      </p>
      <p>
        Based on the successful practice of PAN-2013, we build a meta-classifier to
combine all submitted approaches and examine the performance of this ensemble
model in comparison to the individual participants [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Moreover, we use one
effective model submitted to PAN-2013 as a baseline method. This enables us to have
a more challenging baseline (in comparison to random guess) that reflects and can be
adapted to the difficulty of a certain corpus. Finally, we provide tests of statistical
significance to examine whether there are important differences in the performance of
the submitted methods, the baseline, and the meta-classifier.
      </p>
      <p>In the remainder of this paper, Section 2 reviews previous work in author
verification, Section 3 analytically describes the evaluation setup used at PAN-2014
and Section 4 presents the evaluation results in detail. A review of the submitted
approaches is included in Section 5 and Section 6 summarizes the main conclusions
that can be drawn and discusses future work directions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Relevant Work</title>
      <p>
        The author verification problem was first discussed in [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. Based on a corpus of
newspaper articles in Greek, they used multiple regression to produce a response
function for a given author and a threshold value to determine whether or not a
questioned document was by that author. False acceptance and false rejection rates
were used to evaluate this model. The same metrics were used by [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] to evaluate an
authorship verification method based on a rich set of linguistic features.
      </p>
      <p>
        Perhaps the best-known approach for author verification, the unmasking method,
was introduced in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The main idea is to build a SVM classifier to distinguish the
questioned document from the set of known documents, then to remove the most
important features and repeat this process. In case the questioned and known
documents are by the same author, the accuracy of the classifier significantly drops
after a small number of repetitions while it remains relatively high when they are not
by the same author. Accuracy and F1 were used to evaluate this method that was very
effective in long documents but fails when documents are relatively short [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ].
Modifications and additional evaluation tests for the unmasking method can be found
in [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] and [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        Luyckx and Daelemans approximated the author verification problem as a binary
classification task by considering all available texts by other authors as negative
examples [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. They used recall, precision, and F1 to evaluate their approach in a
corpus of student essays in Dutch. Escalante et al. applied particle swarm model
selection to select a suitable classifier for a given author [5]. They used F1 and
balanced error rate (the average of error rates for positive and negative class) to
evaluate their approach on two corpora of English newswire stories and Spanish
poems. More recently, Koppel and Winter proposed an effective method that attempts
to transform authorship verification from a one-class classification task to a
multiclass classification problem by introducing additional authors, the so-called
impostors, using documents found in external sources (e.g., the Web) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Accuracy
and recall-precision graphs were used to evaluate this method.
      </p>
      <p>
        Author verification was included in previous editions of the PAN evaluation lab.
The author identification task at PAN-2011 [1] included 3 author verification
problems, each comprising a number of texts (i.e., email messages) of known
authorship, all by the same author, and a number of questioned texts (either by the
author of the known texts or not). Performance was measured by macro-average
precision, recall and F1. PAN-2013 was exclusively focused on the author verification
problem [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. New training and evaluation corpora were built on three languages (i.e.,
English, Greek, and Spanish) where each verification problem included at most 10
documents by the same author and exactly one questioned document. Beyond a binary
answer for each verification problem, the participants could also produce (optionally)
a probability-like score to indicate the confidence of a positive answer. Recall,
precision, F1 and ROC graphs were used to evaluate the performance of the 18
participants. Moreover, a simple meta-model combining all the submitted methods
achieved the best overall performance. For the first time, software submissions were
requested at PAN-2013 enabling reproducibility of the results and future evaluation
on different corpora.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation Setup</title>
      <p>PAN-2014 focuses on author verification, similar to PAN-2013. Given a set of known
documents all written by the same author and exactly one questioned document, the
task is to determine whether the questioned document was written by that particular
author or not. Similar to the corresponding task at PAN-2013, best efforts were
applied to ensure that all known and questioned documents within a problem are
matched for genre, register, theme, and date of writing. In contrast to PAN-2013, the
number of known documents is limited to at most 5, while a greater variety of
languages and genres is covered. The text length of documents varies from a few
hundred to a few thousand words, depending on the genre.</p>
      <p>The participants were asked to submit their software and consider as input
parameters the language and genre of the documents. For each verification problem,
they should provide a score, a real number in [0,1], corresponding to the probability
of a positive answer (i.e., the known and the questioned documents are by the same
author). In case the participants wanted to leave some verification problems
unanswered, they could assign a probability score of exactly 0.5 to those problems.
3.1</p>
      <p>Corpus
The PAN-2014 corpus comprises author verification problems in four languages:
Dutch, English, Greek, and Spanish. For Dutch and English there are two genres in
separate parts of the corpus. An overview of the training and evaluation corpus of the
author identification task is shown in Table 1. As can be seen, beyond language and
genre there is variety of known texts per problem and text-length. The size of both
training and evaluation corpora is significantly larger than the corresponding corpora
of PAN-2013. All corpora in both training and evaluation sets are balanced with
respect to the number of positive and negative examples.</p>
      <p>
        The Dutch corpus is a transformed version of the CLiPS Stylometry Investigation
(CSI) corpus [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]. This recently released corpus contains documents from two genres:
essays and reviews, which are the two Dutch genres present in the corpus for this task.
All documents were written by language students at the University of Antwerp
between 2012 and 2014. All authors are native speakers of Dutch. The CSI corpus
was developed for use in computational stylometry research (i.e. detection of age,
gender, personality, region of origin, etc.), but has many other purposes as well (e.g.,
deception detection, sentiment analysis). We adapted the CSI corpus to match the
needs of the authorship verification task and ended up with 200 problem sets for the
review genre and 192 problem sets in the essay genre. All verification problems
include 1-5 known texts. The training and evaluation set each contain half of the
problem sets in each genre.
      </p>
      <p>The English essays corpus was derived from a previously existing corpus of
English-as-second-language students. The Uppsala Student English (USE) corpus [2]
was originally intended to become a tool for research on foreign languages learning. It
consists of university-level full-time students' essays handed by electronic means. In
this kind of texts stylistic awareness represents an important writing factor. The USE
corpus includes clear borders between writings produced in the framework of three
different terms: a, b, and c. Every essay is intended to be produced on personal,
formal, or academic style. A total of 440 authors contributed with at least one essay to
the corpus, resulting in 1,489 documents. The average size of an essay is 820 words.
Typically, one student contributed with more than one essay, often surpassing the
different terms. Taking advantage of the USE corpus meta-information, we defined
two main constraints: every document in the collection, known or questioned, should
contain at least 500 words and the number of known documents in a case must range
between one and five. As a result of the first constraint, only 435 authors were
considered. We also took advantage of the students' background information to set
case-generation rules. Firstly, all the documents in a case must come from students
from the same term (i.e., both were written within term a, b, or c). Secondly, we
divided the students in age-based clusters. To form negative verification problems,
based on the fact that the students' age ranged between 18 and 59 years, an author A
was considered as candidate match for author Aq according to the following rules:
- If Aq is younger than 20 years old, A must be younger than 20 as well;
- If Aq is between 20 and 25 years old, A must be exactly the same age;
- If Aq is between 26 and 30 years old, A must be in the same age range; and
- If Aq is older than 30 years old, A must be older than 30 as well.</p>
      <p>This combination of age- and term-related constraints allowed us to create cases
where the authors come from similar backgrounds. During our generation process, the
texts as in the USE corpus were slightly modified. Anonymization labels were
substituted by a randomly chosen proper name in English. In order not to provide any
hint about a case, the same name was used both in the questioned and known
documents. One source USE document could be considered at most twice in the
authorship verification corpus: once in a positive case and once in a negative case.</p>
      <p>The English novels used in the PAN-2014 corpus represent an attempt to provide a
narrower focus in terms of both content and writing style than many similar
collections. Instead of simply focusing on a single genre or time period, they focus on
a very small subgenre of speculative and horror fiction known generally as the
“Cthulhu Mythos”. This is specifically a shared-universe genre, based originally on
the writings of the American H.P. Lovecraft (for this reason, the genre is also called
“Lovecraftian horror”), a shared universe with a theme of human ineffectiveness in
the face of a set of powerful named “cosmic horrors”. It is also typically characterized
by extremely florid prose and an unusual vocabulary. Perhaps most significantly,
many of the elements of this genre are themselves unusual terms (e.g.,
unpronounceable proper names of these cosmic horrors such as “Cthulhu”,
“Nyarlathotep”, “Lloigor”, “Tsathoggua”, or “Shub-Niggurath”), thus creating a
strong shared element that is unusual in regular English prose. Similarly, the overall
theme and tone of these stories is strongly negative (many of them, for example, take
the form of classical tragedies and end with the death of the protagonist). For this
reason, we feel that this testbed provides a number of unusual elements that may be
appropriately explored as an example of a tightly controlled genre. The corpus covers
an extended length of time, from Lovecraft's original work to modern fan-fiction.
Documents were gathered from a variety of on-line sources including Project
Gutenberg1 and FanFiction2, and edited for uniformity of format; in some cases
lengthy works were broken down into subsections based on internal divisions such as
chapters or sections.</p>
      <p>The Greek corpus comprises newspaper opinion articles published in the Greek
weekly newspaper TO BHMA3 from 1996 to 2012. Note that the training corpus in
Greek was formed based on the respective training and evaluation corpora of
PAN2013. The length of each article is at least 1,000 words while the number of known
texts per problem varies between 1 to 5. In each verification problem, we included
texts that had strong thematic similarities indicated by the occurrence of certain
keywords. In contrast to PAN-2013, there was no stylistic analysis of the texts to
indicate authors with very similar styles or texts of the same author with notable
differences.</p>
      <sec id="sec-3-1">
        <title>1 http://www.gutenberg.org/ 2 https://www.fanfiction.net/ 3 http://www.tovima.gr</title>
        <p>The Spanish corpus refers to the same genre as the Greek corpus. Newspaper
opinion articles of the Spanish newspaper El Pais4 were considered and author
verification problems were formed taking into account thematic similarities between
articles as indicated by certain keywords used to index the articles in the website of
this newspaper. All verification problems for this corpus include exactly five known
texts, while the average text length is relatively large, exceeding 1,000 words.
3.2</p>
        <p>Performance measures
The probability scores provided by the participants are used to build ROC curves and
the area under the curve (AUC) is used as a scalar evaluation measure. This is a
wellknown evaluation technique for binary classifiers [6]. In addition, the performance
measures used in this task should be able to take unanswered problems into account.
Similarly to other tasks, like question answering, it is preferred to leave the problem
unanswered rather than responding incorrectly when there is great uncertainty. The
measures of recall and precision used at PAN-2013 were not able to reward
submissions that left problems unanswered while maintaining high accuracy in given
answers.</p>
        <p>
          In the current evaluation setup we adopted the c@1 measure, originally proposed
for question answering tasks, which explicitly extends accuracy based on the number
of problems left unanswered [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. More specifically, to use this measure we first
transform probability scores to binary answers. Every score greater than 0.5 is
considered as a positive answer (i.e., the known and questioned documents are by the
same author), every score lower than 0.5 is considered as a negative answer (i.e., the
known and questioned documents are by different authors) while all scores equal to
0.5 correspond to unanswered problems. Then, c@1 is defined as follows:
)
where n is the number of problems, nc is the number of correct answers, and nu is the
number of problems left unanswered. If a participant would provide an answer
different from 0.5 for all problems, then c@1 will be equal to accuracy. If all
problems are left unanswered, then c@1 will be zero. If only some problems are left
unanswered, this measure will be increased as if these problems were answered with
the same accuracy as the rest of the problems. Therefore, this measure rewards
participants that maintain a high number of correct answers, for which there is great
confidence, and decrease the number of incorrect answers, for uncertain cases, by
leaving them unanswered.
        </p>
        <p>To provide a final rank of participants, AUC and c@1 are combined in the final
score which is merely the product of these two measures. In addition, the efficiency of
the submitted methods is measured in terms of elapsed runtime.</p>
      </sec>
      <sec id="sec-3-2">
        <title>4 http://elpais.com</title>
        <p>3.4</p>
        <p>Baseline
The author verification task has a random guess baseline of 0.5 for both AUC and
c@1. However, this baseline is not challenging. What we need is a baseline that
corresponds to a standard method so that we know what submissions are really better
than the state of the art. Moreover, since the evaluation corpus comprises several
languages and genres, we need a baseline that can reflect and adapt to the difficulty of
a specific corpus.</p>
        <p>
          Based on the submissions of the author identification task at PAN-2013, it is
possible to use state-of-the-art methods (in particular, the PAN-2013 winners) and
apply them to PAN-2014 corpus. However, since the PAN-2014 task comprises more
languages, we need a language-independent approach. In addition, we need a method
that can provide both binary answers and probability scores (the latter was optional at
PAN-2013). Based on these requirements, we selected the approach of [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] to serve
as baseline. More specifically, this approach has the following characteristics:
- It is language-independent.
- It can provide both binary answers and real scores.
        </p>
        <p>The real scores are already calibrated to probability-like scores for a positive
answer (i.e., all scores greater than 0.5 correspond to a positive answer).
- It was the winner of PAN-2013 in terms of overall AUC scores.</p>
        <p>It should be noted that this baseline method has not been specifically trained on the
corpora of PAN-2014, so its performance is not optimized. It can only be viewed as a
general method that can be applied to any corpus. Moreover, this approach does not
leave problems unanswered, so it cannot take advantage of the new performance
measures.
3.5</p>
        <p>Meta-classifier
Following the practice of PAN-2013, we examine the performance of a meta-model
that combines all answers given by the participants for each problem. We define a
straight-forward meta-classifier that calculates the average of the probability scores
provided by the participants for each problem. It can be seen as a heterogeneous
ensemble model that combines base classifiers corresponding to different approaches.
Note that the average of all the provided answers is not likely to be exactly 0.5; hence,
this meta-model very rarely leaves problems unanswered. This meta-model can be
naturally extended by allowing all answers with a score between 0.5-a and 0.5+a to
become equal to 0.5. However, since the parameter a should be tuned to an arbitrary
predefined value or be optimized for each language/genre, we decided not to perform
such an extension.
We received 13 submissions from research teams in Australia, Canada (2), France,
Germany (2), India, Iran, Ireland, Mexico (2), United Arab Emirates, and United
Kingdom. The participants submitted and evaluated their author verification software
within the TIRA framework [8]. A separate run for each corpus corresponding to each
language and genre was performed.</p>
        <p>
          The overall results of the task concerning the performance of the submitted
approaches in the whole evaluation corpus are shown in Table 2. These evaluation
scores are the result of micro-averaging over the set of 796 verification problems. Put
in other words, each verification problem has the same weight in this analysis, so the
language and genre information are not taken into account. As can be seen, the overall
winner method of Khonji and Iraqi [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] achieved the best results in terms of AUC and
was also very effective in terms of c@1. On the other hand, it was one of the less
efficient methods requiring about 21 hours for processing the whole evaluation
corpus. The second best submission by Frery et al. [7] was much more efficient and
achieved the best c@1 score. In general, most of the submitted methods outperformed
the baseline. It has to be emphasized that the best five participants were able to leave
some problems unanswered. In total 4 out of the 13 participants answered all
problems. Moreover, one participant provided binary answers instead of probability
scores [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] and one participant did not process the Greek corpus [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. With respect to
the meta-classifier, which is averaging the answers of all 13 participants, its
performance is significantly better than each individual system, achieving a final
score greater than 0.5.
0.906
0.883
0.906
0.844
0.832
0.861
0.842
0.792
0.708
0.719
0.750
0.617
0.615
0.635
0.563
0.694
0.650
0.580
0.590
0.590
0.560
0.578
0.530
0.550
0.525
0.520
0.510
0.370
0.500
0.480
        </p>
        <p>Runtime
00:15:05
00:00:30
00:58:21
00:02:09
00:01:57
00:23:26
00:00:52
00:00:27
00:06:37
00:01:21
00:00:06
00:02:19
00:00:05
00:55:07
Runtime
00:00:16
00:12:24
00:01:25
00:00:11
00:06:24
00:00:09
00:00:12
00:00:03
00:07:01
00:56:17
00:05:43
00:01:01
00:00:07
00:01:45
0.680
0.710
0.657
0.600
0.610
0.580
0.583
0.538
0.550
0.557
0.580
0.540
0.530
0.548
0.520
0.715
0.650
0.645
0.610
0.614
0.615
0.579
0.588
0.525
0.515
0.525
0.510
0.495
0.457
0.445</p>
        <p>Runtime
00:00:54
00:16:23
00:28:15
07:42:45
00:00:07
09:10:01
00:00:07
00:02:03
01:01:07
01:31:53
00:10:22
00:03:29
01:16:35
00:16:44
Runtime
00:00:07
02:02:02
02:06:16
01:59:47
02:14:11
02:14:28
00:03:11
00:11:04
00:00:07
00:46:30
07:27:58
00:13:03
02:36:12
00:08:31
0.810
0.760
0.752
0.707
0.730
0.680
0.660
0.640
0.642
0.610
0.600
0.600
0.540
0.530
0.000
0.790
0.778
0.750
0.730
0.750
0.760
0.714
0.650
0.640
0.660
0.640
0.650
0.530
0.540
0.560</p>
        <p>Runtime
03:41:48
00:51:03
00:05:54
00:03:14
01:36:00
00:15:12
00:03:38
00:00:58
04:40:29
00:00:04
00:12:01
00:00:05
00:10:17</p>
        <p>
          Tables 3-8 present the evaluation results on each of the six corpora separately. In
all tables, the best performing submission (excluding the meta-classifier and the
baseline method) is in boldface. In terms of average performance of all submitted
approaches, the corpus of Dutch essays seems to be the easiest while the corpus of
Dutch reviews to be the hardest one. The latter can be partially explained by the fact
that the corpus provides only one known document per problem and that it contains
only short texts. Moreover, the availability of multiple relatively long known
documents seems to assist the submitted systems to achieve a better average
performance on the Greek and Spanish corpora compared to the English corpora of
essays and novels. There is a different winner for each corpus with the exception of
[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] who won on both Greek and Spanish corpora. This might indicate a better tuning
of their approach for newspaper opinion articles rather than essays, reviews or novels.
However, the performance of this submission on all corpora is notable since it is
usually included in the first 3-best performing methods with the exception of the
English essays where it is ranked 6th (excluding the meta-classifier).
        </p>
        <p>
          The performance of the baseline method varies. In the English and Spanish corpora
it is relatively low. In the Dutch and Greek corpora it is very challenging,
outperforming almost half of the participants. In addition, the meta-classifier is very
effective on all corpora. However, it is outperformed by some individual participants
on three corpora. Another interesting remark is that the problems left unanswered by
most participants are not evenly distributed across the corpora. The majority of the
problems left unanswered by Castillo et al. [4] refer to Dutch reviews (possibly
reflecting the difficulty of this corpus). Similarly, Moreau et al. [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] did not answer
many problems of Dutch essays while most of the unanswered problems of Frery et
al. [7] belong to English essays and Greek articles. On the other hand, Mayor et al.
[
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] left at least one problem unanswered in each corpus.
        </p>
        <p>
          The ROC curves of the best performing participants on the whole evaluation
corpus are shown in Figure 1. More specifically, the convex hull of all submitted
approaches together with the participants’ curves who are part of the convex hull are
shown. The overall winning approach of Khonji and Iraqi [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and the second-best
method of Frery et al. [7] dominate the convex hull in case the false positive and false
negative errors have the same cost [6]. In low values of FPR in the ROC space, where
the cost of false positives is considered higher than the cost of false negatives, the
approach of Modaresi and Gross [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] is the best. On the other hand, if the false
negatives have larger cost than the false positives, in large values of FPR in the ROC
space, the approach of Moreau et al. [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] is the most effective. Note also that the
submission by Castillo et al. [4], ranked in the 3rd position in the overall results (see
Table 2), is not part of the convex hull meaning that this approach is always
outperformed by another approach no matter the cost of the false positives and false
negatives.
        </p>
        <p>In addition, Figure 1 depicts the ROC curves of the baseline method and the
metaclassifier. The baseline is clearly less effective than the best participants. It
outperforms only Frery et al. [7] in very low values of FPR. On the other hand, the
meta-classifier clearly outperforms the convex hull of all the submitted methods in the
whole range of the curve. This means that the meta-classifier is more effective than
any individual submission for any given cost of false positives and false negatives.
0.8
0.6
R
P
T
0.4
0.2
0
0</p>
        <sec id="sec-3-2-1">
          <title>Frery et al.</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>Mayor et al.</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>Baseline</title>
        </sec>
        <sec id="sec-3-2-4">
          <title>Convex Hull</title>
          <p>
            We computed statistical significance of performance differences between systems
using approximate randomization testing [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ]5. As noted by [
            <xref ref-type="bibr" rid="ref39">39</xref>
            ] among others, for
comparing outputs from classifiers, frequently used statistical significance tests such
as paired t-tests make assumptions that do not hold for precision scores and F-scores.
Approximate randomisation testing does not make these assumptions and can handle
complicated distributions. We did a pairwise comparison of accuracy of all systems
based on this method and the results are shown in Table 9. The null hypothesis is that
there is no difference in the output of two systems. When the probability of accepting
the null hypothesis is p &lt; 0.05 we consider the systems to be significantly different
from each other. When p &lt; 0.001 the difference is highly significant, when 0.001 &lt; p
&lt; 0.01 the difference is very significant, and when 0.01 &lt; p &lt; 0.05 the difference is
significant.
5 We used the implementation by Vincent Van Asch available from the CLiPS website
http://www.clips.uantwerpen.be/scripts/art
          </p>
          <p>M
a
y
o
r
e
t
a
l
.</p>
          <p>=
=
=</p>
          <p>Z
a
m
a
n
i
e
t
a
l
.</p>
          <p>=
=
=
=</p>
          <p>S
a
t
y
a
m
e
t
a
l
.</p>
          <p>=
=
=
=
=</p>
          <p>M
o
d
a
r
e
s
i
&amp;
G
r
o
s
s
=
=
=
=
=
=
K
h
o
n
j
i
&amp;
I
r
a
q
i</p>
          <p>F
r
e
r
y
e
t
a
l
.</p>
          <p>C
a
s
t
i
l
l
o
e
t
a
l
.</p>
          <p>M
o
r
e
a
u
e
t
a
l
.
=
*
=
=
* *** *** *** *** *** *** *** *** *** *** *** ***
=
** ***
**
**
**
**
** *** *** *** *** ***</p>
        </sec>
        <sec id="sec-3-2-5">
          <title>META</title>
        </sec>
        <sec id="sec-3-2-6">
          <title>CLASSIFIER</title>
        </sec>
        <sec id="sec-3-2-7">
          <title>Khonji &amp;</title>
        </sec>
        <sec id="sec-3-2-8">
          <title>Iraqi</title>
        </sec>
        <sec id="sec-3-2-9">
          <title>Frery et al.</title>
        </sec>
        <sec id="sec-3-2-10">
          <title>Castillo et al.</title>
        </sec>
        <sec id="sec-3-2-11">
          <title>Moreau et al.</title>
        </sec>
        <sec id="sec-3-2-12">
          <title>Mayor et al.</title>
        </sec>
        <sec id="sec-3-2-13">
          <title>Zamani et al.</title>
        </sec>
        <sec id="sec-3-2-14">
          <title>Satyam et al.</title>
        </sec>
        <sec id="sec-3-2-15">
          <title>Modaresi &amp;</title>
        </sec>
        <sec id="sec-3-2-16">
          <title>Gross</title>
        </sec>
        <sec id="sec-3-2-17">
          <title>Jankowska et al.</title>
        </sec>
        <sec id="sec-3-2-18">
          <title>Halvani &amp;</title>
        </sec>
        <sec id="sec-3-2-19">
          <title>Steinebach</title>
        </sec>
        <sec id="sec-3-2-20">
          <title>BASELINE</title>
        </sec>
        <sec id="sec-3-2-21">
          <title>Vartapetiance &amp; Gillam</title>
        </sec>
        <sec id="sec-3-2-22">
          <title>Layton</title>
          <p>
            Based on this analysis, it is easy to see that there are no significant differences in
systems of neighboring rank. The winner submission of [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] is either very
significantly or highly significantly better than the rest of the approaches (with the
exception of the second winner [7]). In addition, the meta-classifier is highly
significantly better than all the participants except for the first two winners.
          </p>
          <p>H
a
r
v
e
y</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Survey of Submissions</title>
      <p>
        Among 13 participant approaches, 7 were submitted by teams that had participated
also in the PAN-2013 competition. Some of them attempted to improve the method
proposed in 2013 [
        <xref ref-type="bibr" rid="ref12 ref21 ref36 ref9">9, 12, 21, 36</xref>
        ] and others presented new models [
        <xref ref-type="bibr" rid="ref23 ref25">4, 23, 25</xref>
        ].
      </p>
      <p>
        All the submitted approaches can be described according to some basic properties.
First, an author verification method is either intrinsic or extrinsic. For each
verification problem, intrinsic methods use only the known texts and the unknown
text of that problem to make some analysis and decide whether they are by the same
author or not. They don’t make use of any other texts by other authors. The majority
of submitted approaches falls into this category [
        <xref ref-type="bibr" rid="ref10 ref12 ref21 ref24 ref25 ref29 ref36 ref9">4, 7, 9, 10, 12, 21, 24, 25, 29, 36</xref>
        ]. On
the other hand, extrinsic methods attempt to transform author verification from a
oneclass classification task (where the known texts are the positive examples and there
are no negative examples) to a binary classification task (where documents by other
authors play the role of the negative examples). To this end, for each verification
problem, extrinsic methods need additional documents by other authors found in
external resources. The approaches of [
        <xref ref-type="bibr" rid="ref17 ref23">17, 23, 40</xref>
        ] belong to this category. The winner
submission of PAN-2014 by [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] is a modification of the Impostors method [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ],
similarly to PAN-2013 [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], where a corpus of external documents for each
language/genre was used.
      </p>
      <p>
        Another important characteristic of a verification method is its type of learning.
There are lazy approaches where the training phase is nearly omitted and all necessary
processing is performed at the time they have to decide about a new verification
problem. Most of the submitted approaches follow this idea [
        <xref ref-type="bibr" rid="ref10 ref12 ref17 ref21 ref23 ref29 ref36 ref9">4, 9, 10, 12, 17, 21, 23,
29, 36, 40</xref>
        ]. On the other hand, eager methods attempt to build a general model based
on the training corpus. For example, [7] builds a decision tree for each corpus, [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]
apply a genetic algorithm to find the characteristics of the verification model for each
corpus, and [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] use fuzzy C-means clustering to extract a general description of each
corpus. Since eager methods perform most of the necessary calculations in the
training phase, they are generally more efficient in terms of runtime.
      </p>
      <p>
        With respect to the features used for text representation, the majority of the
participant methods focused on low-level measures. More specifically most of the
proposed features are either character measures (i.e., punctuation mark counts,
prefix/suffix counts, character n-grams, etc.) or lexical measures (i.e., vocabulary
richness measures, sentence/word length counts, stopword frequency, n-grams of
words/stopwords, word skip-grams, etc.). There were a few attempts to incorporate
syntactic features, namely POS tag counts [
        <xref ref-type="bibr" rid="ref17 ref25">17, 25, 40</xref>
        ], while one approach was
exclusively based on that type of information [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
6
      </p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>The author identification task at PAN-2014 focused on the author verification
problem. The task definition was practically the same as in PAN-2013. However, this
year we substantially enlarged both training and evaluation corpora and enriched them
to include several languages and genres. In that way, we enabled participants to study
how they can adapt and fine-tune their approaches according to a given language and
genre. Another important novelty was the use of different performance measures that
put emphasis on both the appropriate ranking of the provided answers in terms of
confidence (AUC) as well as the ability of the submitted systems to leave some
problems unanswered when there is great uncertainty (c@1). We believe that this
combination of performance measures is more appropriate for author verification, a
cost-sensitive task.</p>
      <p>
        Similar to PAN-2013, the overall winner was a modification of the Impostors
method [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The performance of this approach was notably stable in all six different
corpora despite the fact that it did not leave many problems unanswered. This
demonstrates the great potential of extrinsic verification methods. In addition, the
significantly larger training corpus allowed participants to explore, for the first time,
the use of eager learning methods in the author verification task. Such an approach
may be both effective and efficient as it is demonstrated by the overall performance
and runtime of the second overall winner [7].
      </p>
      <p>
        We received 13 software submissions, a reduced figure in comparison to 18
submissions at PAN-2013, possibly due to the greater difficulty of the task. Moreover,
this year the evaluation of the submitted systems was performed by participants
themselves using the TIRA framework [8]. Seven participants from PAN-2013
submitted their approaches again this year. It is remarkable that those teams that
slightly modified their existing approach did not achieve a high performance [
        <xref ref-type="bibr" rid="ref12 ref21 ref36 ref9">9, 12,
21, 36</xref>
        ]. On the other hand, the teams that radically changed their approach, including
the ability to leave some problems unanswered, achieved very good results [
        <xref ref-type="bibr" rid="ref23 ref25">4, 23,
25</xref>
        ].
      </p>
      <p>Based on the software submissions at PAN-2013, we were able to define a
challenging baseline method that is better than random guessing and can reflect the
difficulty of the examined corpus. In many cases, the baseline method was ranked in
the middle of the participants list, clearly showing the approaches with notable
performance. Given the enhanced set of methods for author verification, collected at
PAN-2013 and PAN-2014, we think that it will be possible to further improve the
quality of the baseline methods in future competitions. Moreover, following the
successful practice of PAN-2013, we examined the performance of a meta-model that
combines all submitted systems in a heterogeneous ensemble. This meta-classifier
was better than each individual submitted method while its ROC curve clearly
outperformed the convex hull of all submitted approaches. This demonstrates the
great potential of heterogeneous models in author verification, a practically
unexplored area.</p>
      <p>For the first time, we applied statistical significance tests on the results of the
submitted methods to highlight the real differences between them. According to these
tests, there is no significant difference between systems ranked in neighboring
positions. However, there are highly significant differences between the winner
approach and the rest of the submissions (with the exception of the second winner).
We believe that such significance tests are absolutely necessary to extract reliable
conclusions and we are going to adopt them in future evaluation labs.</p>
      <p>One of our ambitions in this task was to involve experts from forensic linguistics
so that they can manually (or semi-automatically) analyze the same corpora and
submit their answers. This could serve as another very interesting baseline approach
that would enable the comparison of fully-automated systems with traditional human
expert methods. Unfortunately, this attempt was not successful. So far, we were not
able to find experts in forensic linguistics willing to participate or to devote the
necessary time to solve a large amount of author verification problems under certain
time constraints. We are still working on this direction.</p>
      <p>We believe that the focus of PAN-2013 and PAN-2014 on the author verification
task has produced a significant progress in this field concerning the development of
new corpora and new methods as well as in defining an appropriate evaluation
framework. Clearly, author verification is far from being a solved task and there are
many variations that can be explored in future evaluation labs including cross-topic
and cross-genre verification (i.e., where the known and the questioned documents do
not match in terms of topic/genre) and very short text verification (i.e., where the
documents are tweets or SMS messages).</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>This work was partially supported by the WIQ-EI IRSES project (Grant No. 269180)
within the FP7 Marie Curie action and by grant OCI-1032683 from the United States
National Science Foundation. The work of the last author is funded by the Spanish
Ministry of Education and Science (TACARDI project, TIN2012-38523-C02-00).
40. H. Zamani, H. Nasr, P. Babaie, S. Abnar, M. Dehghani, and A. Shakery.</p>
      <p>Authorship Identification Using Dynamic Selection of Features from
Probabilistic Feature Set. In Proc. of the 5th International Conference of the
CLEF Initiative, 2014.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Argamon</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Juola</surname>
          </string-name>
          .
          <article-title>Overview of the International Authorship Identification Competition at PAN-2011</article-title>
          . In V. Petras,
          <string-name>
            <given-names>P.</given-names>
            <surname>Forner</surname>
          </string-name>
          , P.D. Clough (eds.) CLEF Notebook Papers/Labs/Workshop,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Axelsson. USE--The Uppsala Student English Corpus</surname>
          </string-name>
          :
          <article-title>An instrument for needs analysis</article-title>
          ,
          <source>ICAME Journal</source>
          ,
          <volume>24</volume>
          :
          <fpage>155</fpage>
          -
          <lpage>157</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Halvey</surname>
          </string-name>
          , and W. Kraaij (eds.).
          <article-title>CLEF 2014 Labs and Workshops, Notebook Papers</article-title>
          .
          <source>CEUR Workshop Proceedings (CEUR-WS.org)</source>
          ,
          <source>ISSN 1613-0073</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Castillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cervantes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vilariño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pinto</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>León</surname>
          </string-name>
          .
          <article-title>Unsupervised Method for the Authorship Identification Task - Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato, et al. [
          <volume>3</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>H.J. Escalante</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Montes-y-</article-title>
          <string-name>
            <surname>Gómez</surname>
            and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Villaseñor-Pineda</surname>
          </string-name>
          .
          <article-title>Particle Swarm Model Selection for Authorship Verification</article-title>
          .
          <source>In Proceedings of the 14th Iberoamerican Conference on Pattern Recognition</source>
          , pages
          <fpage>563</fpage>
          -
          <lpage>570</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Fawcett</surname>
          </string-name>
          .
          <article-title>An Introduction to ROC Analysis</article-title>
          .
          <source>Pattern Recognition Letters</source>
          ,
          <volume>27</volume>
          (
          <issue>8</issue>
          ):
          <fpage>861</fpage>
          -
          <lpage>874</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Fréry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Largeron</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Juganaru-Mathieu</surname>
          </string-name>
          .
          <article-title>UJM at CLEF in Author Identification - Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato, et al. [
          <volume>3</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Forner</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Paredes</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
          </string-name>
          , and B. Stein (eds),
          <source>Information Access Evaluation meets Multilinguality, Multimodality, and Visualization. 4th International Conference of the CLEF Initiative</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>O.</given-names>
            <surname>Halvani</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Steinebach. VEBAV - A Simple</surname>
          </string-name>
          ,
          <article-title>Scalable and Fast Authorship Verification Scheme - Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato, et al. [
          <volume>3</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>S.</given-names>
            <surname>Harvey</surname>
          </string-name>
          .
          <article-title>Author Verification Using PPM with Parts of Speech Tagging - Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato, et al. [
          <volume>3</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>M. Jankowska</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Kešelj</surname>
            , and
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Milios</surname>
          </string-name>
          .
          <article-title>Proximity based One-class Classification with Common N-Gram Dissimilarity for Authorship Verification Task - Notebook for PAN at CLEF 2013</article-title>
          .In P. Forner,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          , and D. Tufis (eds).
          <article-title>CLEF 2013 Evaluation Labs</article-title>
          and Workshop -Working Notes Papers,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>M. Jankowska</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Kešelj</surname>
            , and
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Milios</surname>
          </string-name>
          .
          <article-title>Ensembles of Proximity-Based OneClass Classifiers for Author Verification - Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato, et al. [
          <volume>3</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>P.</given-names>
            <surname>Juola</surname>
          </string-name>
          .
          <source>Authorship Attribution. Foundations and Trends in IR</source>
          ,
          <volume>1</volume>
          :
          <fpage>234</fpage>
          -
          <lpage>334</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>P.</given-names>
            <surname>Juola</surname>
          </string-name>
          .
          <article-title>An Overview of the Traditional Authorship Attribution Subtask</article-title>
          .
          <source>In Proc. of CLEF'12</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>P.</given-names>
            <surname>Juola</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          .
          <article-title>Overview of the Author Identification Task at PAN2013</article-title>
          . In P. Forner,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          , and D. Tufis (eds).
          <article-title>CLEF 2013 Evaluation Labs</article-title>
          and Workshop -Working Notes Papers,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>M. Kestemont</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Luyckx</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Daelemans</surname>
            , and
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Crombez</surname>
          </string-name>
          .
          <article-title>Cross-Genre Authorship Verification Using Unmasking</article-title>
          .
          <source>English Studies</source>
          ,
          <volume>93</volume>
          (
          <issue>3</issue>
          ):
          <fpage>340</fpage>
          -
          <lpage>356</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>M.</given-names>
            <surname>Khonji</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Iraqi</surname>
          </string-name>
          .
          <article-title>A Slightly-modified GI-based Author-verifier with Lots of Features (ASGALF) - Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato, et al. [
          <volume>3</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>M. Koppel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Schler</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Argamon</surname>
          </string-name>
          .
          <article-title>Authorship Attribution in the Wild</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <volume>45</volume>
          :
          <fpage>83</fpage>
          -
          <lpage>94</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>M. Koppel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Schler</surname>
            , and
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Bonchek-Dokow</surname>
          </string-name>
          .
          <article-title>Measuring Differentiability: Unmasking Pseudonymous Authors</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>8</volume>
          :
          <fpage>1261</fpage>
          -
          <lpage>1276</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>M.</given-names>
            <surname>Koppel</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Winter</surname>
          </string-name>
          .
          <article-title>Determining if Two Documents are by the Same Author</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          ,
          <volume>65</volume>
          (
          <issue>1</issue>
          ):
          <fpage>178</fpage>
          -
          <lpage>187</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>R.</given-names>
            <surname>Layton</surname>
          </string-name>
          .
          <article-title>A Simple Local n-gram Ensemble for Authorship Verification - Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato, et al. [
          <volume>3</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>K.</given-names>
            <surname>Luyckx</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Daelemans</surname>
          </string-name>
          .
          <article-title>Authorship Attribution and Verification with Many Authors and Limited Data</article-title>
          .
          <source>In Proceedings of the Twenty-Second International Conference on Computational Linguistics (COLING)</source>
          , pages
          <fpage>513</fpage>
          -
          <lpage>520</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>C. Mayor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Toledo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Martinez</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Ledesma</surname>
            , G. Fuentes,
            <given-names>and I. Meza. A Single</given-names>
          </string-name>
          <string-name>
            <surname>Author</surname>
          </string-name>
          <article-title>Style Representation for the Author Verification Task - Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato, et al. [
          <volume>3</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>P.</given-names>
            <surname>Modaresi</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Gross. A Language Independent Author Verifier Using Fuzzy C-Means</surname>
          </string-name>
          <string-name>
            <surname>Clustering</surname>
          </string-name>
          -
          <article-title>Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato, et al. [
          <volume>3</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25. E.
          <string-name>
            <surname>Moreau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Jayapal</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Vogel</surname>
          </string-name>
          . Author Verification:
          <article-title>Exploring a Large set of Parameters using a Genetic Algorithm - Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato, et al. [
          <volume>3</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <given-names>E. W.</given-names>
            <surname>Noreen</surname>
          </string-name>
          .
          <article-title>Computer Intensive Methods for Testing Hypotheses: An Introduction</article-title>
          . Wiley,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <given-names>A.</given-names>
            <surname>Peñas</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Rodrigo</surname>
          </string-name>
          .
          <article-title>A Simple Measure to Assess Nonresponse</article-title>
          .
          <source>In Proc. of the 49th Annual Meeting of the Association for Computational Linguistics</source>
          , Vol.
          <volume>1</volume>
          , pages
          <fpage>1415</fpage>
          -
          <lpage>1424</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Koppel</surname>
          </string-name>
          , E. Stamatatos, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Inches</surname>
          </string-name>
          .
          <article-title>Overview of the Author Profiling Task at PAN 2013</article-title>
          . In P. Forner,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          , and D. Tufis (eds.),
          <source>Working Notes Papers of the CLEF 2013 Evaluation Labs</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Satyam</surname>
            , Anand,
            <given-names>A. K.</given-names>
          </string-name>
          <string-name>
            <surname>Dawn</surname>
            , and
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Saha</surname>
          </string-name>
          .
          <article-title>A Statistical Analysis Approach to Author Identification Using Latent Semantic Analysis - Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato, et al. [
          <volume>3</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <given-names>S.</given-names>
            <surname>Seidman</surname>
          </string-name>
          .
          <article-title>Authorship Verification Using the Impostors Method - Notebook for PAN at CLEF 2013</article-title>
          . In P. Forner,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          , and D. Tufis (eds).
          <article-title>CLEF 2013 Evaluation Labs</article-title>
          and Workshop -Working Notes Papers,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          .
          <article-title>A Survey of Modern Authorship Attribution Methods</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          ,
          <volume>60</volume>
          :
          <fpage>538</fpage>
          -
          <lpage>556</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32. E. Stamatatos,
          <string-name>
            <given-names>N.</given-names>
            <surname>Fakotakis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Kokkinakis</surname>
          </string-name>
          .
          <article-title>Automatic Text Categorization in Terms of Genre and Author</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>26</volume>
          (
          <issue>4</issue>
          ):
          <fpage>471</fpage>
          -
          <lpage>495</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <given-names>C.</given-names>
            <surname>Sanderson</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Guenter</surname>
          </string-name>
          .
          <article-title>Short Text Authorship Attribution via Sequence Kernels, Markov Chains and Author Unmasking: An Investigation</article-title>
          .
          <source>In Proceedings of the International Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>482</fpage>
          -
          <lpage>491</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lipka</surname>
          </string-name>
          and S. Meyer zu Eissen.
          <article-title>Meta Analysis within Authorship Verification</article-title>
          .
          <source>In Proceedings of the 19th International Conference on Database and Expert Systems Applications</source>
          , pages
          <fpage>34</fpage>
          -
          <lpage>39</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lipka</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          .
          <article-title>Intrinsic Plagiarism Analysis</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <volume>45</volume>
          , pages
          <fpage>63</fpage>
          -
          <lpage>82</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <given-names>A.</given-names>
            <surname>Vartapetiance</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Gillam</surname>
          </string-name>
          .
          <article-title>A Trinity of Trials: Surrey's 2014 Attempts at Author Verification - Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato, et al. [
          <volume>3</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37. H. van Halteren.
          <article-title>Linguistic Profiling for Author Recognition and Verification</article-title>
          .
          <source>In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <given-names>B.</given-names>
            <surname>Verhoeven</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Daelemans. CLiPS Stylometry</surname>
          </string-name>
          <article-title>Investigation (CSI) Corpus: A Dutch Corpus for the Detection of Age, Gender, Personality, Sentiment and Deception in Text</article-title>
          .
          <source>In Proc. of the 9th Int. Conf. on Language Resources and Evaluation (LREC)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <given-names>A.</given-names>
            <surname>Yeh</surname>
          </string-name>
          .
          <article-title>More Accurate Tests for the Statistical Significance of Result Differences</article-title>
          .
          <source>In Proceedings of the 18th Conference on Computational Linguistics</source>
          , Volume
          <volume>2</volume>
          , pages
          <fpage>947</fpage>
          -
          <lpage>953</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>