<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Clustering by Authorship Within and Across Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Efstathios Stamatatos</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Tschuggnall</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ben Verhoeven</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Walter Daelemans</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Günther Specht</string-name>
          <email>P@10</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benno Stein</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Potthast</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bauhaus-Universität Weimar</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Antwerp</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Innsbruck</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of the Aegean</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The vast majority of previous studies in authorship attribution assume the existence of documents (or parts of documents) labeled by authorship to be used as training instances in either closed-set or open-set attribution. However, in several applications it is not easy or even possible to find such labeled data and it is necessary to build unsupervised attribution models that are able to estimate similarities/differences in personal style of authors. The shared tasks on author clustering and author diarization at PAN 2016 focus on such unsupervised authorship attribution problems. The former deals with single-author documents and aims at grouping documents by authorship and establishing authorship links between documents. The latter considers multi-author documents and attempts to segment a document into authorial components, a task strongly associated with intrinsic plagiarism detection. This paper presents an overview of the two tasks including evaluation datasets, measures, results, as well as a survey of a total of 10 submissions (8 for author clustering and 2 for author diarization).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Authorship attribution is the attempt to reveal the authors behind texts based on a
quantitative analysis of their personal style. The most common scenario adopted in the vast
majority of previous studies in this field assume the existence of either a closed or an
open set of candidate authors and samples of their writing [
        <xref ref-type="bibr" rid="ref19 ref49">19, 49</xref>
        ]. These samples are
then used as labeled data usually in a supervised classification task where a disputed text
is assigned to one of the candidate authors. However, there are multiple cases where
authorship information of documents either does not exist or is not reliable. In such a case
unsupervised authorship attribution should be applied where no labeled samples are
available [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ].
      </p>
      <p>
        Assuming all documents in a given document collection are single-authored, an
obvious task is to group them by their author [
        <xref ref-type="bibr" rid="ref18 ref28 ref46">18, 28, 46</xref>
        ]. We call this task author
clustering and it is useful in multiple applications where authorship information is
either missing or not reliable [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For example, consider a collection of novels
published anonymously or under an alias, a collection of proclamations by different
terrorist groups, or a collection of product reviews by users whose aliases may correspond
to the same person. By performing effective author clustering and combining this with
other available meta-data of the collection (such as date, aliases etc.) we can extract
interesting conclusions, such as that a collection of novels published anonymously are
by a single person, that some proclamations belonging to different terrorist groups are
by the same authors, that different aliases of users that publish product reviews actually
correspond to the same person, etc. Author clustering is strongly related to authorship
verification [
        <xref ref-type="bibr" rid="ref25 ref51 ref52">25, 52, 51</xref>
        ]. Any clustering problem can be decomposed into a series of
verification problems where the task is to determine whether any possible pair of
documents is by the same author or not. However, some of these verification problems are
strongly correlated and this information can be used to enhance the verification
accuracy. For example, in a document collection of three documents d1, d2, and d3, we can
decompose the clustering task into three verification problems: d1 vs. d2, d1 vs. d3, and
d2 vs. d3. However, if we manage to estimate that d1 and d2 are by the same author,
this information can be used to enhance the verification model for both d2 vs. d3 and
d1 vs. d3.
      </p>
      <p>
        On the other hand, if the assumption of single-author documents does not hold,
then unsupervised authorship attribution should attempt to decompose a given
document into its authorial components [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], for example, the identification of individual
contributions in collaboratively written student theses, scientific papers, or Wikipedia
articles. To identify the individual authors in such and similar multi-author documents,
an analysis that quantifies similarities and differences in personal style within a
document should be performed to build authorship clusters (each cluster comprising the text
fragments a specific author wrote). A closely related topic is the problem of plagiarism
detection. In order to reveal plagiarized text fragments, algorithms have to designed that
can deal with huge datasets to search for possible sources. This is done by so-called
external plagiarism detectors [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ] by pre-collecting data, storing it in (local) databases
and/or even by performing Internet searches on the fly [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]. In addition, intrinsic
plagiarism detection algorithms [
        <xref ref-type="bibr" rid="ref53 ref57">53, 57</xref>
        ] sidestep the problem of huge datasets and costly
comparisons by inspecting solely the document in question. Here, plagiarized sections
have to be identified by analyzing the writing style so as to identify specific text
fragment that exhibit significantly different characteristics compared to their surroundings,
which may indicate cases of text reuse or plagiarism. Although the performance of
intrinsic approaches in terms of detection performance is still inferior to that of external
approaches [
        <xref ref-type="bibr" rid="ref36 ref37">36, 37</xref>
        ], intrinsic methods are still important to plagiarism detection
overall, e.g., to limit or pre-order the search space, or to investigate older documents where
potential sources are not digitally available.
      </p>
      <p>
        This paper reports on the PAN 2016 shared tasks on unsupervised authorship
attribution, focusing on both clustering across documents (author clustering) and clustering
within documents (author diarization1). The next section presents related work in these
areas. Sections 3 and 4 describe and analyze the tasks, evaluation datasets, evaluation
measures, results, and present a survey of submissions for author clustering and author
diarization, respectively. Finally, Section 5 discusses the main conclusions and some
directions for future research.
1 The term “diarization” originates from the research field speaker diarization, where
approaches try to automatically identify, cluster, and extract different (parallel) speakers of an
audio speech signal like a telephone conversation or a political debate [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>2.1</p>
      <p>
        Author clustering
This section briefly reviews the related work on author clustering and author diarization.
Related work to author clustering, as defined in this paper, is limited. In a pioneering
study, Holmes and Forsyth [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] performed cluster analysis on the Federalist Papers. At
that time, other early authorship attribution studies only depicted texts in 2-dimension
plots based on a principal components analysis to provide visual inspection of clusters
rather than actually performing automated clustering [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        In an attempt to indicate similarities and differences between authors, Luyckx et
al. [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] applied centroid clustering to a collection of literary texts. However, since only
one sample of text per author was considered, their clusters comprised texts by different
authors with similar style rather than different texts by the same author. Almishari and
Tsudik [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] explored the linkability of anonymous reviews found on a popular review
site. However, they treat this problem as a classification task rather than a clustering
task.
      </p>
      <p>
        Iqbal et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] presented an author clustering method applied to a collection of
email messages in order to extract a unique writing style from each cluster and identify
anonymous messages written by the same author. Layton et al. [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] propose a
clustering ensemble for author clustering that was able to estimate the number of authors in
a document collection using the iterative positive Silhouette method. In another study,
they demonstrate the positive Silhouette coefficient for author clustering analysis
validation [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. Samdani et al. [
        <xref ref-type="bibr" rid="ref46">46</xref>
        ] applied an online clustering method to both
authorbased clustering and topic-based clustering of postings in an online discussion forum
and found that the order of the items was not significant for author-based clustering in
contrast to the topic-based clustering.
2.2
      </p>
      <p>
        Author diarization
The term diarization, as it is used throughout this paper, covers both intrinsic plagiarism
detection and within-document clustering problems. Although they are closely related,
many approaches exist covering the former and only a few tackling the latter. Most of
the existing intrinsic plagiarism detection approaches adhere to the following scheme:
(1) splitting the text into chunks, (2) calculating metrics for all chunks and, if needed,
for the whole document, (3) detecting outliers and (4) applying post-processing steps.
This way chunks are usually created by collecting a predefined number of characters,
words, or sentences, which sum up to all possible chunks by using sliding windows
with different lengths. Then, each text fragment is stylistically analyzed by quantifying
different characteristics (i.e., features). Typical computations to build stylistic
fingerprints include lexical features like character n-grams (e.g., [
        <xref ref-type="bibr" rid="ref24 ref50">24, 50</xref>
        ]), word
frequencies (e.g., [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]), and average word/sentence lengths (e.g., [60]); syntactic features like
part-of-speech (POS) tag frequencies/structures (e.g., [
        <xref ref-type="bibr" rid="ref54">54</xref>
        ]); and structural features like
average paragraph lengths or indentation usages (e.g., [60]). Moreover, traditional IR
measures like tf idf are often applied (e.g., [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]).
      </p>
      <p>
        To subsequently reveal plagiarism, outliers have to be detected. This is done either
by comparing features of each chunk with those of the whole document using
different distance metrics (e.g., [
        <xref ref-type="bibr" rid="ref34 ref50 ref56">34, 50, 56</xref>
        ]), by building chunk clusters and assuming each
cluster to correspond to a different author (e.g., [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]), and by using statistical methods
like the Gaussian normal distribution (e.g., [
        <xref ref-type="bibr" rid="ref55">55</xref>
        ]). In most of the scenarios thresholds are
needed, which separate non-suspicious chunks from suspicious chunks. Finally,
postprocessing steps include grouping, filtering, and unifying suspicious chunks.
      </p>
      <p>
        By comparison, there is hardly any related work on within-document clustering,
while many approaches exist to separate a document into distinguishable parts. The
aim of the latter is to split paragraphs by topics, which is generally referred to as text
segmentation or topic segmentation [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ]. In this domain, the algorithms often perform
vocabulary analysis in various forms like word stem repetitions [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] or building word
frequency models [
        <xref ref-type="bibr" rid="ref45">45</xref>
        ], where “methods for finding the topic boundaries include sliding
window, lexical chains, dynamic programming, agglomerative clustering, and divisive
clustering” [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Probably one of the first approaches that uses stylometry to automatically detect
boundaries of authors of collaboratively written text has been proposed by Glover and
Hirst [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Their main intention is not to expose authors or to gain insight into the
work distribution, but to provide a methodology for collaborative authors to equalize
their style in order to achieve better readability. Graham et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] also tried to
divide a collaborative text into different single-author paragraphs. Several stylometric
features were employed, processed by neural networks and cosine distances, revealing
that letter-bigrams lead to the best results. A mathematical approach that splits a
multiauthor document into single-author paragraphs is presented by Giannella [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the first
step of which is to divide a document into subsequences of consecutive sentences that
are written by the same author. Roughly speaking, this is done with a stochastic
generative model on the occurrences of words, where the maximum (log-joint) likelihood is
computed by applying Dijkstra’s algorithm on finding paths.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Author Clustering</title>
      <p>In this section we report on the results of the author clustering task. In particular, we
describe the experimental setup including the task definition, evaluation datasets,
evaluation measures, a survey of the submitted approaches and baselines, and finally the
evaluation results are analytically presented.
3.1
The requirements of an author clustering tool differ according to the application. In
many cases, complete information about all available clusters may be necessary.
However, especially when very large document collections are given, it may be sufficient
to extract the most likely authorship links (pairs of documents linked by authorship).
The latter case may be viewed as a retrieval task where we need to provide a list of
document pairs and rank the list based on their probability to be true authorship links.
In particular, we aim to study the following two application scenarios:
L1</p>
      <p>L3</p>
      <p>L4</p>
      <p>Ranked list:</p>
      <p>L4
L2
L1
L3
– Complete author clustering. This scenario requires a detailed analysis where the
number of different authors (k) found in the collection should be identified and
each document should be assigned to exactly one of the k clusters (each cluster
corresponds to a different author). In the illustrating example of Figure 1 (left), four
different authors are found and the color of each document indicates its author.
– Authorship-link ranking. This scenario views the exploration of the given
document collection as a retrieval task. It aims at establishing authorship links between
documents and provides a list of document pairs ranked according to a confidence
score (the score shows how likely a document pair is from the same author). In the
example of Figure 1 (right), four document pairs with similar authorship are found
and then these authorship-links are ranked according to their similarity.
In more detail, given a collection of (up to 100) documents, the task is to (1) identify
groups of documents by the same author, and (2) provide a ranked list of authorship
links (pairs of document by the same author). All documents within the collection are
single-authored, in the same language, and belong to the same genre. However, the topic
or text-length of documents may vary. The number of distinct authors whose documents
are included in the collection is not given.</p>
      <p>Let N be the number of documents in a given collection and k the number of
distinct authors in this collection. Then, k corresponds to the number of clusters in that
collection and the ratio r = k=N indicates the percentage of single-document clusters
as well as the number of available authorship links. If r is high then most documents in
the collection belong to single-document clusters and the number of authorship links is
low. If r is low then most of the documents in the collection belong to multi-document
clusters and the number of authorship-links is high. In our evaluation, we examine the
following selection of cases:
– r 0:9: only a few documents belong to multi-document clusters and it is unlikely
to find authorship links.
– r 0:7: the majority of documents belong to single-document clusters and it is
likely to find authorship links.
– r 0:5: less than half of the documents belong to single-document clusters and
there are plenty of authorship links.
lems in three languages (Dutch, English, and Greek) and two genres (articles and
reviews). Each part of the dataset has been constructed as follows:
– Dutch articles: this is a collection of opinion articles from the Flemish daily
newspaper De Standaard and weekly news magazine Knack. The training dataset was
based on a pool of 216 articles while the test dataset was based on a separate set of
214 articles,.
– Dutch reviews: this is a collection of reviews taken from the CLiPS Stylometry
Investigation (CSI) corpus [59]. These are both positive and negative reviews about
both real and fictional products from the following categories: smartphones,
fastfood restaurants, books, artists, and movies. They are written by language students
from the University of Antwerp.
– English articles: this is a collection of opinion articles published in The Guardian
UK daily newspaper.2 Each article is tagged with several thematic labels. The
training dataset was based on articles about politics and UK while the evaluation dataset
was based on articles about society.
– English reviews: this is a collection of book reviews published in The Guardian UK
daily newspaper. All downloaded book reviews were assigned to the thematic area
of culture.
– Greek articles: this is a collection of opinion articles published in the online forum
Protagon.3 Two separate pools of documents were downloaded, one about politics
and another about economy. The former was used to build the training dataset and
the latter was used for the evaluation dataset.
– Greek reviews: this is collection of restaurant reviews downloaded from the website</p>
      <sec id="sec-3-1">
        <title>Ask4Food.4</title>
      </sec>
      <sec id="sec-3-2">
        <title>2 http://www.theguardian.com</title>
      </sec>
      <sec id="sec-3-3">
        <title>3 http://www.protagon.gr</title>
      </sec>
      <sec id="sec-3-4">
        <title>4 https://www.ask4food.gr</title>
        <p>For each of the above collections, we constructed three instances of clustering problems
corresponding to r 0:9, r 0:7, and r 0:5 for both training and test datasets.
Detailed dataset statistics on each clustering problem in the training and evaluation
datasets are provided in Table 1. As can be seen, there are significant differences in the
produced instances. In cases where r 0:9, the number of authorship links is small
(less than 20), and when r 0:5, the authorship links are more than 35. In some
cases a low r corresponds to a relatively high maximum cluster size (maxC). English
book reviews and Dutch newspaper articles are relatively long texts (about 1,000 words
on average) while Dutch reviews are short texts (about 150 words on average). In all
other cases average text lengths vary between 400 to 800 words. The largest collections
correspond to Dutch articles (100 documents per problem instance) and in all other
cases the size of collections ranges between 50 and 80 documents.
3.3</p>
        <p>
          Survey of Submissions
We received 8 submissions from research teams coming from Bulgaria [61], India [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ],
Iran [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ], New Zealand [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], Switzerland (2) [
          <xref ref-type="bibr" rid="ref12 ref22">12, 22</xref>
          ], and the UK (2) [
          <xref ref-type="bibr" rid="ref47 ref58">47, 58</xref>
          ]. Two
of them have not submitted a notebook paper to describe their approach [
          <xref ref-type="bibr" rid="ref12 ref26">12, 26</xref>
          ]. The
6 remaining submissions present models that fall into two major categories. Top-down
approaches first attempt to form clusters using a typical clustering algorithm (k-means)
and then transform clusters into authorship links, assigning a score to each link [
          <xref ref-type="bibr" rid="ref31 ref47">31, 47</xref>
          ].
A crucial decision in such methods is the appropriate estimation of k, the number of
clusters (authors) in a given collection. Sari and Stevenson use a k value that optimizes
the Silhouette coefficient [
          <xref ref-type="bibr" rid="ref47">47</xref>
          ].
        </p>
        <p>
          Bottom-up approaches, on the other hand, first estimate the pairwise distance of
documents, estimating the scores for authorship links, and then use this information to
form clusters [
          <xref ref-type="bibr" rid="ref22 ref58 ref6">6, 22, 58, 61</xref>
          ]. The distance measure that attempts to capture the stylistic
similarities between documents in some cases is a modification of an authorship
verification approach [
          <xref ref-type="bibr" rid="ref58 ref6">6, 58</xref>
          ]. The effectiveness of the submission of Bagnall [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], especially
in the authorship-link ranking task, indicates that exploiting author verification methods
is a promising direction. Another idea used by Zmiycharov et al. [61] is to transform
the estimation of authorship link scores to a supervised learning task by exploiting the
training dataset. Given that the amount of true authorship links in the training dataset
is very limited in comparison to the amount of false links, this learning task suffers
from the class imbalance problem. Bottom-up approaches do not explicitly estimate
the number of authors in the collection (k) but they form clusters according to certain
criteria. Kocher [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] group texts in one cluster when they are connected by a path of
authorship links with significantly high score. A very modest strategy is used by
Bagnall [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] which practically forbids clusters with more than two items to be formed. An
interesting discussion on more sophisticated methods is also provided by Bagnall [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>
          With respect to the stylometric features used by participants, there is no significant
novelty. Some approaches are exclusively based on character-level information [
          <xref ref-type="bibr" rid="ref47 ref6">6, 47</xref>
          ],
some on very frequent terms [
          <xref ref-type="bibr" rid="ref22 ref58">22, 58</xref>
          ]. Others attempt to combine traditional features
like sentence length and type-token ratio with word frequencies and syntactic features
like part-of-speech tag frequencies and distributions [
          <xref ref-type="bibr" rid="ref31">31, 61</xref>
          ]. Evaluation results indicate
that approaches that use homogeneous features are more effective. Sari and
Stevenson [
          <xref ref-type="bibr" rid="ref47">47</xref>
          ] report that they also examined word embeddings but finally dropped these
features since preliminary results on the training dataset were not encouraging.
3.4
To be able to estimate the contribution of each submission we provide several
baseline methods. First, a baseline based on random guessing (BASELINE-Random) is
employed where the number of authors in a collection is randomly guessed and each
document is randomly assigned to one author. In addition, an authorship link is
established for any two documents belonging to the same randomly-formed cluster and a
random score is assigned to that link. We provide average scores corresponding to 50
repetitions of this baseline for each clustering problem. This baseline may be seen as
the lower limit of performance in the author clustering task. Moreover, for the
complete author clustering task, we provide a simple baseline that considers all documents
in a given collection as belonging to different authors. Thus, it forms singleton
clusters (BASELINE-Singleton). Such a baseline is very hard to beat when only a few
multi-item clusters exist in a large collection of documents. Actually, it guarantees a
BCubed precision of 1 while BCubed recall depends on the size of clusters. When the
ratio r is high, the BCubed F-score of BASELINE-Singleton will also be high. For
the authorship-link ranking task, we provide another baseline method based on cosine
similarity between documents. In more detail, each document is represented using the
normalized frequencies of all words appearing at least 3 times in the collection (each
clustering problem) and the cosine similarity between any two documents is used to
estimate the score of each possible authorship link. This baseline method
(BASELINECosine) would be affected by topical similarities between documents.
There are multiple evaluation measures available for clustering tasks. In general, a
clustering evaluation measure can be intrinsic (when the true labels of data are not available)
or extrinsic (when true labels of data are available). Given that the information about the
true authors of document is available, our task fits the latter case. Among a variety of
extrinsic clustering evaluation metrics, we opted to use BCubed Precision, Recall, and
the F-score. The latter has been found to satisfy several formal constraints including
cluster homogeneity, cluster completeness, and the rag bag criterion (where multiple
unrelated items are merged into a single cluster) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Let di be a document in a
collection (i = 1; :::; N ). Let C(di) be the cluster di is put into by a clustering model and
A(di) be the true author of di. Then, given two documents of the collection di and dj ,
a correctness function can be defined as follows:
correct(di; dj ) =
(1
0
if A(di) = A(dj ) ^ C(di) = C(dj )
otherwise.
        </p>
        <p>The BCubed precision of a document di is the proportion of documents in the cluster
of di (including itself) by the same author of di. Moreover, BCubed recall of di is the
proportion of documents by the author of di that are found in the cluster of di (including
itself). Let Ci be the set of documents in the cluster of di and Ai be the set of documents
in the collection by the author of di. BCubed precision and recall of di are then defined
as follows:</p>
        <sec id="sec-3-4-1">
          <title>X correct(di; dj )</title>
          <p>jCij
precision(di) = dj2Ci
;
recall(di) =
Finally, the overall BCubed precision and recall for one collection is the average of
precision and recall of documents in the collection, whereas the BCubed F-score is the
harmonic mean of BCubed precision and recall:</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>X correct(di; dj )</title>
          <p>dj2Ci
jAij</p>
          <p>:
1 N</p>
          <p>X recall(di);
N i=1
1 N</p>
          <p>X precision(di);</p>
          <p>N i=1
BCubed precision =</p>
          <p>BCubed recall =</p>
          <p>BCubed precision BCubed recall
BCubed F = 2 :</p>
          <p>BCubed precision + BCubed recall</p>
          <p>Regarding the authorship-link ranking task, we use average precision (AP) to
evaluate submissions. This is a standard scalar evaluation measure for ranked retrieval results.
Given a ranked list of authorship links for a document collection, average precision is
the average of non-interpolated precision values at all ranks where true authorship links
were found. Let L be the set of ranked links provided by a submitted system and T the
set of true links for a given collection. If li is the authorship link at i-th position of L
then a relevance function, precision at cutoff i in the ranked list, and AP are defined as
follows:
relevant(i) =
(1
0
if li 2 T
otherwise,
precision(i) =</p>
          <p>Pij=1 relevant(j)
i
;
AP =</p>
          <p>PjiL=j1 precision(i)
relevant(i)</p>
          <p>:
jT j
It is important to note that AP does not punish verbosity, i.e., every true link counts
even if it is at a very low rank. Therefore, by providing all possible authorship links one
can attempt to maximize AP. In order to show how effective a system is in top-ranked
predictions, we also provide R-precision (RP) and P@10, which are defined as follows:
R
precision =</p>
          <p>PR
i=1 relevant(i)</p>
          <p>R
;</p>
          <p>P10
i=1 relevant(i)
10
;
where R is the number of true authorship links. Focusing on either the top R or the top
10 results, these metrics ignore all other answers.</p>
          <p>For multiple instances of author clustering problems, mean scores of all the above
measures are used to evaluate the overall performance of submissions in all available
collections. Finally, submissions are ranked according to Mean F-score (MF) and Mean
Average Precision (MAP) for complete author clustering and authorship-link ranking,
respectively.</p>
          <p>Participant
Bagnall
Gobeill
Kocher
Kuttichira
Mansoorizadeh et al.</p>
          <p>Sari &amp; Stevenson
Vartapetiance &amp; Gillam
Zmiycharov et al.</p>
          <p>BASELINE-Random
BASELINE-Singleton
BASELINE-Cosine
Participant
Bagnall
Kocher
BASELINE-Singleton
Sari &amp; Stevenson
Zmiycharov et al.</p>
          <p>Gobeill
BASELINE-Random
Kuttichira
Mansoorizadeh et al.</p>
          <p>Vartapetiance &amp; Gillam
Participant
Bagnall
Gobeill
BASELINE-Cosine
Kocher
Sari &amp; Stevenson
Vartapetiance &amp; Gillam
Mansoorizadeh et al.</p>
          <p>Zmiycharov et al.</p>
          <p>BASELINE-Random
Kuttichira</p>
          <p>
            B3 F
ware to the TIRA experimentation platform where they were also able to run their
software on training and evaluation datasets [
            <xref ref-type="bibr" rid="ref13 ref39">13, 39</xref>
            ]. We then reviewed the participants’
runs and provided feedback in cases when a software did not complete its run
successful. Although the participants could examine various versions of their software,
only one run was considered in the final evaluation. Table 2 shows the overall results
for both complete clustering and authorship-link ranking on the evaluation dataset. All
evaluation measures are averaged over the 18 evaluation problems. The runtime of each
submission is also provided.
          </p>
          <p>A more detailed view of the results for the complete clustering task is shown in
score, while partial results are also given for the available genres (articles or reviews),
languages (English, Dutch, or Greek), and the (approximate) value of r (0.9, 0.7, or
Number of documents (N ) and authors (k) per are also given. Right table: Number of authorship
links detected by each participant in the evaluation dataset. Number of true links and maximum
links per problem are also given.</p>
          <p>
            ll
a
n
g
a
B
0.5). As can be seen, the BASELINE-Singleton method is only narrowly beaten by two
submissions. Both of these submissions were better than BASELINE-Singleton in
handling reviews and Greek documents and, quite predictably, when r is lower than 0.9.
The approach of Bagnall [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] is slightly better than Kocher’s [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ] (overall F-score 0.8223
vs. 0.8218). In terms of efficiency, Kocher’s approach is much more faster than
Bagnall’s. In general when r decreases (i.e., more multi-item clusters are available), the
performance of all submissions is negatively affected.
          </p>
          <p>
            On the other side of the table, there are 3 submissions with overall F-score less than
BASELINE-Random mainly because they failed to accurately predict the number of
clusters in each problem. Table 5 (left) shows the number of clusters formed by each
participant per problem together with the number of documents and number of true
clusters (authors) per problem. As can be seen, the approach of Kuttichira et al. [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ]
always guesses the same number of clusters while the approaches of Mansoorizadeh
et al. [
            <xref ref-type="bibr" rid="ref31">31</xref>
            ] and Vartapetiance and Gillam [
            <xref ref-type="bibr" rid="ref58">58</xref>
            ] tend to predict 20 and 1 clusters per
problem, respectively. On the other hand, the successful approaches of Bagnall [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] and
Kocher [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ] resemble BASELINE-Singleton by being modest in forming clusters with
more than one document.
baseline methods are ranked by their overall MAP score while partial results for genre,
language, and r value are also given. Roughly half of participants achieve better results
on articles and the other half perform better on reviews. The Greek part seems to be
easier in comparison to the English and Dutch parts. Finally, the performance of all
submissions is improved when r decreases and more authorship links are available.
Only two approaches are better than BASELINE-Cosine. This is surprising given that
this baseline approach is not sophisticated and does not attempt to explore stylistic
information. On the other side of the table, a couple of submissions were less effective
than or very close to BASELINE-Random.
          </p>
          <p>
            Table 5 (right) shows the number of authorship links detected by each submission
per evaluation problem. Some participants chose to report all possible authorship links
attempting to maximize MAP. However, it should be noted that the approaches of
Bagnall [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] and Gobeill [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ], which achieve the best MAP score, are also the winners with
respect to the measures RP and P@10 as shown in Table 2. It is also remarkable that
the submission of Sari and Stevenson [
            <xref ref-type="bibr" rid="ref47">47</xref>
            ] detects only a few authorship links but still
achieves a relatively high P@10 score.
4
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Author Diarization</title>
      <p>
        This section presents the task of author diarization. More specifically, it includes task
definitions and evaluation datasets, a survey of submissions and the baselines, and it
describes the evaluation results in detail.
4.1
The author diarization task continues the previous PAN tasks from 2009-2011 on
intrinsic plagiarism detection [
        <xref ref-type="bibr" rid="ref36 ref37 ref43">43, 36, 37</xref>
        ]. As already pointed out, the task is extended and
generalized by introducing within-document author clustering problems. Following the
methodology of intrinsic approaches, any comparison with external sources are
disallowed for all subtasks. In particular, the author diarization task consists of the following
three subtasks:
A) Traditional intrinsic plagiarism detection. Assuming a major author who wrote at
least 70% of a document, this subtask is to find the remaining text portions written
by one or several others.
      </p>
      <p>B) Diarization with a given number of authors. The basis for this subtask is a document
which has been composed by a known number of authors with the goal to group the
individual text fragments by authors, i.e., build author clusters.</p>
      <p>C) Unrestricted diarization. As a tightening variant of the previous scenario, the
number of collaborating authors is not given as an input variable for the this subtask.
Thus, before/during analyzing and attributing the text, also the correct number of
clusters, i.e., writers, has to be guessed.</p>
      <p>
        To ensure consistency throughout the subtasks and also to emphasize the similarity
between them, Task A also requires to construct author clusters. In this special case, there
exactly two clusters exist: one for the main author and one for the intrusive fragments.
The participants were free to create more than one cluster for the latter, e.g., to create
a cluster for each intrusive text fragment. For all three subtasks, training datasets (see
Section 4.2) were provided in order to allow for adjusting and tuning the developed
algorithms prior to the submission.
For all subtasks, distinct training and test datasets have been provided, which are all
based on the Webis Text Reuse Corpus 2012 (Webis-TRC-12) [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ]. The original corpus
contains essays on 150 topics used at the TREC Web Tracks 2009-2011 (e.g., see [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]).
The essays were written by (semi-)professional writers hired via crowdsourcing. For
each essay, a writer was assigned a topic (e.g., “Barack Obama: write about Obama’s
family”), then asked to use the ChatNoir search engine [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ] to retrieve relevant sources
of information, and to compose an essay from the search results, reusing text from the
retrieved web pages. All sources of the resulting document were annotated, so that the
origin of each text fragment is known.
      </p>
      <p>From these documents, assuming that each distinct source represents a different
author, the respective datasets for all subtasks have been randomly generated by
varying several parameters as shown in Table 6. Beside the number of authors and words,
also authorship boundary types have been altered to be on sentence or paragraph
levels: i.e., authors may switch either after or even within sentences or only after whole
paragraphs (separated by one or more line breaks). For the diarization datasets, the
authorship distribution has been configured to either be uniformly distributed (each
author contributed approximately the same amount) or randomly distributed (resulting in
contributions like: authors (A; B; C; D) ! (94; 3; 2; 1)%). As the original corpus has
already been partly used and published, the test documents are created from previously
unpublished documents only. Table 7 shows statistics of the generated datasets.
4.3</p>
      <p>Survey of Submissions
We received software submissions from two teams, both solving all three subtasks. This
section summarizes the main principles of these approaches.</p>
      <p>
        Approach of Kuznetsov et al. [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. The authors describe an algorithm that
operates on all three subtasks with only slight modifications. At first, the text is split into
sentences, and for each sentence selected stylometric features, including word and
ngram frequencies (n = f1; 3; 4g) are calculated. A relational frequency is calculated
by comparing the features of the selected sentence with the global document features
which results in three features for each measure (and n): a 5%, 50% and 95%
percentile. Additionally, features such as sentence length, punctuation symbol counts, and
selected part-of-speech (POS) tag frequencies are calculated. The extracted features
serve as input for a classifier, namely an implementation of Gradient Boosting
Regression Trees [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which outputs a model, i.e., an author style function. For the prediction
of the label of a sentence (plagiarized, non-plagiarized), also nearby sentences are
included in the decision. Finally, outliers are detected by defining a threshold, which
compares to the degree of mismatch with the main author style that is calculated by the
classifier. To train the classifier, the PAN 2011 dataset for intrinsic plagiarism detection
was used [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ].
      </p>
      <p>
        To solve the diarization task for known numbers of authors, the algorithm is slightly
modified. Instead of finding outliers on a threshold basis, a segmentation is calculated
by using a Hidden Markov Model with Gaussian emissions [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Finally, the number
of authors is estimated to solve Task C by computing segmentations for #authors =
[2; ::20] and measuring the clusterings’ discrepancy.
      </p>
      <p>
        Approach of Sittar et al. [
        <xref ref-type="bibr" rid="ref48">48</xref>
        ]. Reflecting the similarity between all three subtasks,
the submitted algorithm of this team represents an “all-in-one” solution for all three
tasks by calculating clusters. Like the previous approach, it is based on analyzing
features on individual sentences. A total of 15 lexical metrics are extracted, including
character and word counts, average word length, and ratios of digits, letters, or spaces. With
these features, a distance between every pair of sentences is calculated using the
ClustDist metric [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Using these distances, i.e., a feature vector consisting of the distances
to all other sentences, K-Means is applied to generate clusters.
      </p>
      <p>For tackling the respective subtasks, the only modification is the predefined number
of clusters that is given to K-Means. In case of the plagiarism detection task, the number
of clusters is set to two (one for the main author and one for the intrusive authors). For
the diarization tasks, the number of clusters is set to the given corresponding authors
(Task B) or randomly assigned (Task C). As a final optimization step, also the grouping
of sentences has been evaluated. The distances are not calculated for single sentences
only, but also for sentence groups. The authors report that the best results on the
provided training dataset was achieved by using sentence groups of size 7 (Task A) and 5
(Task B, and C). Consequently, this configuration has been used for the test dataset as
well.</p>
      <p>NNP
(Insanity)</p>
      <p>NP</p>
      <p>VP</p>
      <p>VP</p>
      <p>(::)</p>
      <p>VP (expVeBcGting) NP
(dVoBinGg) NP ADVP ADVP (diffJeJrent) (reNsNuSlts)</p>
      <p>DT JJ NN RP CC RP RB
(the) (same) (thing) (over) (and) (over) (again)</p>
      <p>S
FRAG</p>
      <p>S
CC
(and)
(S1)</p>
      <p>PRP
(It)</p>
      <p>VP
V(iBsZ) JJADJPS
(insane) VP
TO
(to)</p>
      <p>VB
(expect)</p>
      <p>VP
NP</p>
      <p>SBAR
(diffJeJrent) (reNsNuSlts) WHADVP S</p>
      <p>(wWhReBn) VP
(dVoBinGg) NP ADVP ADVP</p>
      <p>
        DT JJ NN RP CC RP RB
(the) (same) (thing) (over) (and) (over) (again)
(S2)
To quantify the performance of the submitted approaches, baselines for each subtask
have been computed. The baseline for Task A is the PQPlagInn approach [
        <xref ref-type="bibr" rid="ref56">56</xref>
        ]. It is the
best working variant of the PlagInn algorithm [
        <xref ref-type="bibr" rid="ref55">55</xref>
        ] and operates solely on the grammar
syntax of authors. The main idea is that authors differ in their way of constructing
sentences, and that these differences can be used as style markers to identify plagiarism.
For example, Figure 2 shows the syntax trees (parse trees) of the Einstein quote
“Insanity: doing the same thing over and over again and expecting different results” (S1)
and the slightly modified version “It is insane to expect different results when doing the
same thing over and over again” (S2). It can be seen that the trees differ significantly,
although the semantic meaning is the same. To quantify such differences of parse trees,
the concept of pq-grams is used [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In a nutshell, pq-grams can be seen as “n-grams
for trees” since they represent structural parts of the tree, where p defines how much
nodes are included vertically, and q defines the number of nodes to be considered
horizontally. The set of possible pq-grams serve as the feature vectors that are compared
with the global document’s features by using sliding windows and a selected distance
metric. Finally, suspicious sentences are found using (several) thresholds and applying
a filtering/grouping algorithm [
        <xref ref-type="bibr" rid="ref55">55</xref>
        ]. As a baseline for Task A, the PQPlagInn algorithm
is used in an unoptimized version, i.e., optimized for the PAN 2011 dataset [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] and not
considering the specifications of the current dataset. For example, the facts that the main
author contribution is at least 70%, or that it can be assumed implicitly that no
document is plagiarism-free are disregarded. Thus, the performance of PQPlagInn should
give a stable orientation, but still provide room for improvement.
      </p>
      <p>
        As author diarization is tackled for the first time at PAN 2016 there exist, to the best
of our knowledge, no comparable algorithms for this specific task, a random baseline
has been created for Tasks B and C. This has been done by dividing the document into
n parts of equal length, and assigning each part to a different author. Here, n is set to
the exact number of authors for Task B, and randomly chosen for Task C.
The participants submitted their approaches as executable software to TIRA [
        <xref ref-type="bibr" rid="ref13 ref39">13, 39</xref>
        ],
where they were executed against the test datasets hosted there. Performances on the
provided training data were visible immediately to participants, whereas results on the
test data were revealed only after the submission deadline only. This section details the
evaluation results of the submitted approaches for each subtask.
      </p>
      <p>
        Task A: Intrinsic Plagiarism Detection Results The performance of the intrinsic
plagiarism detection subtask has been measured with the metrics proposed in by
Potthast et al. [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ]. The proposed micro-averaged variants of the metrics incorporate the
length of each plagiarized section, whereas the macro-averaged variants do not. To
illustrate the difference, consider the following situation: a document contains two
plagiarized sections, but the second one is very short compared to the first one. If an
algorithm finds 100% of the first section, but misses the second one, the macro-F would be
only 50%, whereas the micro-F can easily exceed 90% (depending on section lengths).
Favoring coverage of different sources and conforming with the previous PAN
plagiarism detection tasks, the final ranking is based on the macro-averaged scores.
      </p>
      <p>
        Table 8 shows the final results. Kuznetsov et al. [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] exceed the baseline,
achieving a macro-F of 0.17. Interestingly, the approach of Sittar et al. [
        <xref ref-type="bibr" rid="ref48">48</xref>
        ] achieves better
macro-averaged scores than micro-averaged ones, whereas this is the other way around
for the other approaches which is closer to our expectation. In what follows, detailed
analyses depending on different dataset parameters of the test datasets are presented.
Figure 3 (top left) shows the F-scores over ranges of documents lengths in terms of
number of words. While there are no notable differences for documents with less than
2000 words, Kuznetsov et al. achieve a peak performance of over 0.4 for longer
documents (2000-2500 words). The results with respect to the authorship border types are
depicted in Figure 3 (bottom left). As expected, all approaches perform better on
documents having intrusive sections only on paragraph boundaries, and not between or
even within sentences. In Figure 3 (top right), the results per percentage of intrusive
text is shown. With the exception of the two documents containing 20-25% intrusive
text—which are obviously difficult to to identify—the chart reveals a steady increase
of performance with the percentage of intrusive text. For documents with a very high
percentage of intrusive text, Kuznetsov et al. achieved an F-score of nearly 0.6, and
also the other approaches perform best on those documents. These performances
correspond to the diarization results presented later, and emphasize once again that the
latter tasks can also be seen as diarization tasks, e.g., with authorship contributions like
(A; B) ! (70; 30)%. As can be seen in Figure 3 (bottom right), the performances of
Sittar et al. and the baseline do not change significantly with the number of intrusive
0,45
0,4
0,35
0,3
-oF0,25
r
ca0,2
M0,15
0,1
0,05
0,25
0,2
-F0,15
o
r
c
aM0,1
0,05
0
0,7
0,6
0,5
-F0,4
o
r
c
aM0,3
0,2
0,1
0,7
0,6
0,5
-F0,4
o
r
c
aM0,3
0,2
0,1
      </p>
      <p>baseline</p>
      <p>Sittar et al.
l.
a
t
tzsvuneoeK ilttt.raeaS liseaben
sentence-level (14) paragraph-level (15)</p>
      <p>Authorship borders
sections. Solely Kuznetsov et al. achieves a peak performance on the three documents
having seven or eight intrusive sections. Finally, Table 9 shows the three best results on
individual problem instances. Kuznetsov et al. achieve a top performance of 0.94 with
perfect precision. Sittar et al. reach an F-score of 0.58 on the problem-9 document,
which is also among the best three for all approaches.</p>
      <p>
        Tasks B and C: Diarization Results The diarization subtasks have been measured
with the BCubed clustering metrics [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], as they reflect the clustering nature on the one
hand, and are also used for the evaluations of the PAN 2016 across-document clustering
problems on the other hand (see Section 3). Table 10 shows the respective results for the
Tasks B and C. They reveal that the tasks are hard to tackle, as none of the participants
surpasses the random baseline, neither for Task B nor C. Nevertheless, Kuznetsov et al.
achieves the highest precision for both subtasks, meaning that characters grouped
together really belong together (with an accuracy significantly beyond random guessing).
      </p>
      <p>As excepted, the results for an unknown number of authors (Task C) are slightly
below the results where the exact number of authors are known beforehand (Task B).
An exception is Sittar et al.’s approach, whose results are the opposite of the
expectation. To investigate possible upsides and downsides of the approaches, detailed
evaluations depending on parameters of the test datasets have been conducted. Figure 4
(top) shows the performance scores with respect to document length. While
performances for Task B are quite stable, it can be seen that they decrease with the
number of words when the number of authors had to be estimated. The results with
respect to the number of corresponding authors are presented in Figure 4 (middle). Here
also the scores follow no recognizable pattern for Task B, but become lower with the
Instance</p>
      <p>Authors</p>
      <p>Words</p>
      <p>Border
Instance</p>
      <p>Authors</p>
      <p>Words</p>
      <p>Border
problem-3
problem-6
problem-9
problem-9
problem-23
problem-20
problem-9
problem-6
problem-3
number of authors that corresponded in Task C. Remarkably, Kuznetsov et al.
significantly exceeds the baseline for documents with two to four authors. As depicted in
Figure 4 (bottom), Tasks B and C reveal similar results depending on the
distribution of the corresponding authors. The results for randomly distributed contributions
(e.g., (A; B; C) ! (80; 7; 13)%) are generally better than those for uniformly
distributed contributions (e.g., (A; B; C) ! (33; 32; 35)%). An explanation for this
outcome may be that the submitted approaches are designed for, or originate from intrinsic
plagiarism detection, focusing on finding outliers. In case of the diarization problems,
this seems not to be a good choice, especially when the contributions among authors are
equally distributed, i.e., when there are no “outliers”. Finally, also for these subtasks,
the borders between authorships have been altered, i.e., either within sentences, at the
end of sentences, or after paragraphs only. In contrast to Task A, there were no
significant differences in performances for Tasks B and C with respect to this parameter.
Although the baseline could not be exceeded on the whole dataset, Table 11 underlines
Number of authors</p>
      <p>Rank</p>
      <p>Team
known (Task B)
unknown (Task C)</p>
      <p>BASELINE
Kuznetsov et al.</p>
      <p>Sittar et al.</p>
      <p>BASELINE
Kuznetsov et al.</p>
      <p>Sittar et al.
&lt; 1k (2) 1k-2k (9) 2k-3k (7) 3k-4k (6) 4k-5k (6)</p>
      <p>&lt; 1k (2) 1k-2k (9) 2k-3k (7) 3k-4k (6) 4k-5k (6)
2 (4) 3 (5) 4 (2) 5 (3) 6 (3) 7 (5) 8 (3) 10 (6)</p>
      <p>Number of authors per document
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
2 (3) 3 (3) 4 (5) 5 (2) 6 (4) 7 (3) 8 (4) 9 (3) 10 (2)
that the approaches nevertheless produce very good results on individual instances. On
problem-12, Kuznetsov et al. achieve a BCubed F-score of 0.88 for Task B.
Remarkably, the score of a document containing ten different authors is among the best
three, with an F-score of 0.78 at a precision of 0.93. The best result of Sittar et al. is on
problem-13 with a BCubed F-score of 0.61 on the designated more difficult Task C.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>For the first time, PAN 2016 focused on unsupervised authorship attribution, an
underexplored line of research that is associated with important applications. Two main
problems were studied: clustering by authorship across documents and clustering by
authorship within documents. In general, these are quite challenging tasks that are hard to
model, and the performance of submitted approaches, in many cases very close or
inferior to simple baseline methods, indicates that there is a lot of space for improvement.</p>
      <p>
        The author clustering task introduced authorship-link ranking as a separate retrieval
problem in unsupervised authorship attribution. This problem is useful when huge
document collections are available and the main task is to help human experts to closely
examine specific cases, the most probable authorship links. The best results were achieved
by a modification of the winner approach at the PAN 2015 authorship verification
task [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This indicates that authorship verification and author clustering are strongly
related tasks and the expertise gained in one field can help providing reliable solutions
to the other. Moreover, we introduced the ratio r that represents both the quantity of
authorship links and the number of single-item clusters. It has been shown that when
r is high, a naive baseline approach that assigns each document to a separate cluster is
hard to beat. It is expected that if a method is able to estimate the r value in a given
collection, then it is more likely to provide reliable answers. The author clustering problem
can become even more challenging if we drop the assumption that all documents within
a collection belong to the same genre. In that case, it would be extremely difficult to
separate stylistic similarities and differences that are caused by genre or the personal
style of authors. In addition, in most of the clustering problems provided at PAN 2016,
the documents within a collection fall into the same general thematic area. Although the
specific topics of documents differ, it would be even more challenging if the thematic
area of documents would vary.
      </p>
      <p>The author diarization task focused on the problem of clustering by authorship
within documents. A traditional intrinsic plagiarism detection subtask was used as an
entry point, keeping up with previous PAN events. Moreover, to generalize the problem,
designated subtasks have been added that deal with the decomposition of multi-author
documents, where the number of corresponding authors was either given or had to be
estimated. Both submitted approaches tackle all subtasks and rely on an analysis on the
sentence-level, extracting lexical and syntactic features and feeding them to different
machine learning techniques. One of the approaches exceeds the baseline for intrinsic
plagiarism detection, whereas a random baseline for the novel subtasks focusing on
clustering of text by authors could not be outperformed. The results of the diarization
task underline once again that intrinsic plagiarism detection represents a difficult
problem, and that clustering by authorship within documents seems to be even harder. A
possible explanation of the low scores for the latter problem is that the approaches only
modify intrinsic plagiarism detection algorithms. It can be assumed that by tailoring
algorithms to author clustering within documents, results can be improved significantly.
Moreover, as the author diarization task was held the first time at PAN 2016 receiving
only two submissions, future PAN labs may attract more participants that help
narrowing the gap.
[59] Verhoeven, B., Daelemans, W.: Clips stylometry investigation (csi) corpus: A Dutch
corpus for the detection of age, gender, personality, sentiment and deception in text. In:
Proceedings of the 9th International Conference on Language Resources and Evaluation
(LREC 2014). Reykjavik, Iceland (2014)
[60] Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online
messages: Writing-style features and classification techniques. Journal of the American
Society for Information Science and Technology 57(3), 378–393 (2006)
[61] Zmiycharov, V., Alexandrov, D., Georgiev, H., Nakov, P., Kiprov, Y., Georgiev, G.,
Koychev, I.: Authorship-link Ranking and Complete Author Clustering. In: CLEF 2016
Working Notes. CEUR Workshop Proceedings, CLEF and CEUR-WS.org (2016)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Almishari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsudik</surname>
          </string-name>
          , G.:
          <article-title>Exploring linkability of user reviews</article-title>
          .
          <source>In: Computer Security, ESORICS</source>
          <year>2012</year>
          , pp.
          <fpage>307</fpage>
          -
          <lpage>324</lpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Amigó</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artiles</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verdejo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A comparison of extrinsic clustering evaluation metrics based on formal constraints</article-title>
          .
          <source>Information Retrieval</source>
          <volume>12</volume>
          (
          <issue>4</issue>
          ),
          <fpage>461</fpage>
          -
          <lpage>486</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Amigó</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artiles</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verdejo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A comparison of extrinsic clustering evaluation metrics based on formal constraints</article-title>
          .
          <source>Information retrieval 12</source>
          (
          <issue>4</issue>
          ),
          <fpage>461</fpage>
          -
          <lpage>486</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Augsten</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Böhlen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gamper</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The pq-Gram Distance between Ordered Labeled Trees</article-title>
          .
          <source>ACM Transactions on Database Systems (TODS)</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Baayen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Van Halteren,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Tweedie</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution</article-title>
          .
          <source>Literary and Linguistic Computing</source>
          <volume>11</volume>
          (
          <issue>3</issue>
          ),
          <fpage>121</fpage>
          -
          <lpage>132</lpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Bagnall</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Authorship Clustering Using Multi-headed Recurrent Neural Networks</article-title>
          .
          <source>In: CLEF 2016 Working Notes. CEUR Workshop Proceedings, CLEF and CEUR-WS.org</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>F.Y.</given-names>
          </string-name>
          :
          <article-title>Advances in Domain Independent Linear Text Segmentation</article-title>
          . In:
          <article-title>Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference</article-title>
          . pp.
          <fpage>26</fpage>
          -
          <lpage>33</lpage>
          . Association for Computational Linguistics (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Clarke</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Craswell</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soboroff</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voorhees</surname>
            ,
            <given-names>E.M.:</given-names>
          </string-name>
          <article-title>Overview of the TREC 2009 web track</article-title>
          .
          <source>Tech. rep., DTIC Document</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          :
          <article-title>Greedy function approximation: a gradient boosting machine</article-title>
          . Annals of statistics pp.
          <fpage>1189</fpage>
          -
          <lpage>1232</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Giannella</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>An improved algorithm for unsupervised decomposition of a multi-author document</article-title>
          .
          <source>Technical Papers</source>
          ,
          <source>The MITRE Corporation (February</source>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Glover</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hirst</surname>
          </string-name>
          , G.:
          <article-title>Detecting stylistic inconsistencies in collaborative writing</article-title>
          .
          <source>In: The New Writing Environment</source>
          , pp.
          <fpage>147</fpage>
          -
          <lpage>168</lpage>
          . Springer (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Gobeill</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Submission to the Author Clustering Task at PAN-2016</article-title>
          . http://www.uni-weimar.de/medien/webis/events/pan-16 (
          <year>2016</year>
          ), HES-SO
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burrows</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Ousting Ivory Tower Research:
          <article-title>Towards a Web Framework for Providing Experiments as a Service</article-title>
          . In: Hersh,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Callan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Maarek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Sanderson</surname>
          </string-name>
          , M. (eds.) 35th
          <source>International ACM Conference on Research and Development in Information Retrieval (SIGIR 12)</source>
          . pp.
          <fpage>1125</fpage>
          -
          <lpage>1126</lpage>
          . ACM (Aug
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Graham</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hirst</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marthi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Segmenting documents by stylistic character</article-title>
          .
          <source>Natural Language Engineering</source>
          <volume>11</volume>
          (
          <issue>04</issue>
          ),
          <fpage>397</fpage>
          -
          <lpage>415</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Guthrie</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Unsupervised Detection of Anomalous Text</article-title>
          .
          <source>Ph.D. thesis</source>
          , University of Sheffield (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          :
          <article-title>The evolution of stylometry in humanities scholarship</article-title>
          .
          <source>Literary and Linguistic Computing</source>
          <volume>13</volume>
          (
          <issue>3</issue>
          ),
          <fpage>111</fpage>
          -
          <lpage>117</lpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Holmes</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forsyth</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>The federalist revisited: New directions in authorship attribution</article-title>
          .
          <source>Literary and Linguistic Computing</source>
          <volume>10</volume>
          (
          <issue>2</issue>
          ),
          <fpage>111</fpage>
          -
          <lpage>127</lpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Iqbal</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Binsalleeh</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fung</surname>
            ,
            <given-names>B.C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debbabi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Mining writeprints from anonymous e-mails for forensic investigation</article-title>
          .
          <source>Digital Investigation</source>
          <volume>7</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>56</fpage>
          -
          <lpage>64</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Juola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Authorship Attribution.
          <source>Foundations and Trends in Information Retrieval</source>
          <volume>1</volume>
          ,
          <fpage>234</fpage>
          -
          <lpage>334</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Keogh</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hart</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pazzani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Segmenting time series: A survey and novel approach</article-title>
          .
          <source>Data mining in time series databases 57</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luyckx</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Intrinsic Plagiarism Detection Using Character Trigram Distance Scores</article-title>
          .
          <source>In: Notebook Papers of the 5th Evaluation Lab on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN)</source>
          . Amsterdam, The Netherlands (
          <year>September 2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Kocher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : UniNE at CLEF 2016:
          <article-title>Author Clustering</article-title>
          .
          <source>In: CLEF 2016 Working Notes. CEUR Workshop Proceedings, CLEF and CEUR-WS.org</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akiva</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dershowitz</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dershowitz</surname>
          </string-name>
          , N.:
          <article-title>Unsupervised decomposition of a document into authorial components</article-title>
          . In: Lin,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Matsumoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Mihalcea</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <article-title>Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics</article-title>
          . pp.
          <fpage>1356</fpage>
          -
          <lpage>1364</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Argamon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Computational methods in authorship attribution</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>60</volume>
          (
          <issue>1</issue>
          ),
          <fpage>9</fpage>
          -
          <lpage>26</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Determining if two documents are written by the same author</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>65</volume>
          (
          <issue>1</issue>
          ),
          <fpage>178</fpage>
          -
          <lpage>187</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Kuttichira</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krishnan</surname>
            ,
            <given-names>K.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pooja</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahalakshmi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Submission to the Author Clustering Task at PAN-2016</article-title>
          . http://www.uni-weimar.de/medien/webis/events/pan-16 (
          <year>2016</year>
          ), Amrita Vishwa Vidyapeetham
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motrenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsova</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strijov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Methods for Intrinsic Plagiarism Detection and Author Diarization</article-title>
          .
          <source>In: Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org (Sep</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Layton</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watters</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dazeley</surname>
          </string-name>
          , R.:
          <article-title>Automated unsupervised authorship analysis using evidence accumulation clustering</article-title>
          .
          <source>Natural Language Engineering</source>
          <volume>19</volume>
          ,
          <fpage>95</fpage>
          -
          <lpage>120</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Layton</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watters</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dazeley</surname>
          </string-name>
          , R.:
          <article-title>Evaluating authorship distance methods using the positive silhouette coefficient</article-title>
          .
          <source>Natural Language Engineering</source>
          <volume>19</volume>
          ,
          <fpage>517</fpage>
          -
          <lpage>535</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Luyckx</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoutte</surname>
          </string-name>
          , E.:
          <article-title>Stylogenetics: Clustering based stylistic analysis of literary corpora</article-title>
          .
          <source>In: Workshop Toward Computational Models of Literary Analysis</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Mansoorizadeh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aminiyan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahguy</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eskandari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>: Multi Feature Space Combination for Authorship Clustering</article-title>
          .
          <source>In: CLEF 2016 Working Notes. CEUR Workshop Proceedings, CLEF and CEUR-WS.org</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Miro</surname>
            ,
            <given-names>X.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bozonnet</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fredouille</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedland</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Speaker diarization: A review of recent research</article-title>
          . Audio, Speech, and Language Processing,
          <source>IEEE Transactions on 20(2)</source>
          ,
          <fpage>356</fpage>
          -
          <lpage>370</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Niezgoda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Way</surname>
            ,
            <given-names>T.P.</given-names>
          </string-name>
          :
          <article-title>Snitch: A software tool for detecting cut and paste plagiarism</article-title>
          .
          <source>In: Proceedings of the 37th Technical Symposium on Computer Science Education (SIGCSE)</source>
          . pp.
          <fpage>51</fpage>
          -
          <lpage>55</lpage>
          . ACM, Houston, Texas, USA (March
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Oberreuter</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          'Huillier,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Ríos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Velásquez</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.D.</surname>
          </string-name>
          :
          <article-title>Approaches for Intrinsic and External Plagiarism Detection</article-title>
          .
          <source>In: Notebook Papers of the 5th Evaluation Lab on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN)</source>
          . Amsterdam, The Netherlands (
          <year>September 2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Ponte</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croft</surname>
          </string-name>
          , W.B.:
          <article-title>Text Segmentation by Topic</article-title>
          .
          <source>In: Research and Advanced Technology for Digital Libraries</source>
          , pp.
          <fpage>113</fpage>
          -
          <lpage>125</lpage>
          . Springer (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrón-Cedeño</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eiselt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Overview of the 2nd International Competition on Plagiarism Detection</article-title>
          . In: Braschler,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Harman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Pianta</surname>
          </string-name>
          , E. (eds.)
          <source>Working Notes Papers of the CLEF 2010 Evaluation Labs (Sep</source>
          <year>2010</year>
          ), http://www.clef-initiative.eu/publication/working-notes
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eiselt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrón-Cedeño</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Overview of the 3rd International Competition on Plagiarism Detection</article-title>
          .
          <source>In: Notebook Papers of the 5th Evaluation Lab on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN)</source>
          . Amsterdam, The Netherlands (
          <year>September 2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiesel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oberländer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tippmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrón-Cedeno</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , et al.:
          <article-title>Overview of the 5th international competition on plagiarism detection</article-title>
          .
          <source>In: Notebook Papers of the 9th Evaluation Lab on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN)</source>
          . Valencia,
          <string-name>
            <surname>Spain</surname>
          </string-name>
          (
          <year>September 2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Improving the Reproducibility of PAN's Shared Tasks: Plagiarism Detection, Author Identification, and Author Profiling</article-title>
          . In: Kanoulas,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Clough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Sanderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Hanbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Toms</surname>
          </string-name>
          , E. (eds.)
          <article-title>Information Access Evaluation meets Multilinguality, Multimodality, and Visualization</article-title>
          .
          <source>5th International Conference of the CLEF Initiative (CLEF 14)</source>
          . pp.
          <fpage>268</fpage>
          -
          <lpage>299</lpage>
          . Springer, Berlin Heidelberg New York (
          <year>Sep 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graßegger</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tippmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welsch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>ChatNoir: A Search Engine for the ClueWeb09 Corpus</article-title>
          . In: Hersh,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Callan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Maarek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Sanderson</surname>
          </string-name>
          , M. (eds.) 35th
          <source>International ACM Conference on Research and Development in Information Retrieval (SIGIR 12)</source>
          . p.
          <fpage>1004</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (Aug
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Völske</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Crowdsourcing Interaction Logs to Understand Text Reuse from the Web</article-title>
          . In: Fung,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Poesio</surname>
          </string-name>
          , M. (eds.)
          <article-title>Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 13)</article-title>
          . pp.
          <fpage>1212</fpage>
          -
          <lpage>1221</lpage>
          . Association for Computational Linguistics (
          <year>Aug 2013</year>
          ), http://www.aclweb.org/anthology/P13-1119
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrón-Cedeño</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>An evaluation framework for plagiarism detection</article-title>
          .
          <source>In: Proceedings of the 23rd international conference on computational linguistics: Posters</source>
          . pp.
          <fpage>997</fpage>
          -
          <lpage>1005</lpage>
          . Association for Computational Linguistics (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eiselt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrón-Cedeño</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Overview of the 1st International Competition on Plagiarism Detection</article-title>
          . In: Stein,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Koppel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Agirre</surname>
          </string-name>
          , E. (eds.) SEPLN 09 Workshop on Uncovering Plagiarism, Authorship, and
          <source>Social Software Misuse (PAN 09)</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . CEUR-WS.
          <source>org (Sep</source>
          <year>2009</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-502
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <surname>Reynar</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          :
          <article-title>Topic segmentation: Algorithms and applications</article-title>
          .
          <source>IRCS Technical Reports Series</source>
          p.
          <volume>66</volume>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <surname>Reynar</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          :
          <article-title>Statistical Models for Topic Segmentation</article-title>
          .
          <source>In: Proc. of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics</source>
          . pp.
          <fpage>357</fpage>
          -
          <lpage>364</lpage>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <surname>Samdani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>K.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A discriminative latent variable model for online clustering</article-title>
          .
          <source>In: Proceedings of The 31st International Conference on Machine Learning</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <surname>Sari</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stevenson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Exploring Word Embeddings and Character N-Grams for Author Clustering</article-title>
          .
          <source>In: CLEF 2016 Working Notes. CEUR Workshop Proceedings, CLEF and CEUR-WS.org</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <surname>Sittar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iqbal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nawab</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Author Diarization Using Cluster-Distance Approach</article-title>
          .
          <source>In: Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org (Sep</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>A Survey of Modern Authorship Attribution Methods</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>60</volume>
          ,
          <fpage>538</fpage>
          -
          <lpage>556</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Intrinsic Plagiarism Detection Using Character n-gram Profiles</article-title>
          .
          <source>In: Notebook Papers of the 5th Evaluation Lab on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN)</source>
          . Amsterdam, The Netherlands (
          <year>September 2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhoeven</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Juola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>López-López</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the author identification task at PAN 2015</article-title>
          . In: Working Notes of CLEF 2015 -
          <article-title>Conference and Labs of the Evaluation forum</article-title>
          , Toulouse, France, September 8-
          <issue>11</issue>
          ,
          <year>2015</year>
          . (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [52]
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhoeven</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Juola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sánchez-Pérez</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrón-Cedeño</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Overview of the author identification task at PAN 2014</article-title>
          . In: Working Notes for CLEF 2014 Conference, Sheffield, UK,
          <source>September 15-18</source>
          ,
          <year>2014</year>
          . pp.
          <fpage>877</fpage>
          -
          <lpage>897</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [53]
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lipka</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Intrinsic plagiarism analysis</article-title>
          .
          <source>Language Resources and Evaluation</source>
          <volume>45</volume>
          (
          <issue>1</issue>
          ),
          <fpage>63</fpage>
          -
          <lpage>82</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          [54]
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specht</surname>
          </string-name>
          , G.:
          <article-title>Countering Plagiarism by Exposing Irregularities in Authors' Grammar</article-title>
          .
          <source>In: Proceedings of the European Intelligence and Security Informatics Conference (EISIC)</source>
          . pp.
          <fpage>15</fpage>
          -
          <lpage>22</lpage>
          . IEEE, Uppsala, Sweden (
          <year>August 2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          [55]
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specht</surname>
          </string-name>
          , G.:
          <article-title>Detecting Plagiarism in Text Documents Through Grammar-Analysis of Authors. In: Proceedings of the 15th Fachtagung des GI-Fachbereichs Datenbanksysteme für Business, Technologie und Web (BTW)</article-title>
          . pp.
          <fpage>241</fpage>
          -
          <lpage>259</lpage>
          . LNI, GI, Magdeburg, Germany (March
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          [56]
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specht</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Using Grammar-Profiles to Intrinsically Expose Plagiarism in Text Documents</article-title>
          .
          <source>In: Proc. of the 18th Conf. of Natural Language Processing and Information Systems (NLDB)</source>
          . pp.
          <fpage>297</fpage>
          -
          <lpage>302</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          [57]
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specht</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Automatic decomposition of multi-author documents using grammar analysis</article-title>
          .
          <source>In: Proceedings of the 26th GI-Workshop on Grundlagen von Datenbanken. CEUR-WS</source>
          , Bozen, Italy (
          <year>October 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          [58]
          <string-name>
            <surname>Vartapetiance</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gillam</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>A Big Increase in Known Unknowns: from Author Verification to Author Clustering</article-title>
          .
          <source>In: CLEF 2016 Working Notes. CEUR Workshop Proceedings, CLEF and CEUR-WS.org</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>