<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UniNE at CLEF 2016: Author Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mirco Kocher</string-name>
          <email>Mirco.Kocher@unine.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Neuchâtel</institution>
          <addr-line>rue Emile Argand 11 2000 Neuchâtel</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <abstract>
        <p>This paper describes and evaluates an effective unsupervised author clustering authorship linking model called SPATIUM-L1. The suggested strategy can be adapted without any problem to different languages (such as Dutch, English, and Greek) in different genres (e.g., newspaper articles and reviews). As features, we suggest using the m most frequent terms of each text (isolated words and punctuation symbols with m at most 200). Applying a simple distance measure, we determine whether there is enough indication that two texts were written by the same author. The evaluations are based on six test collections (PAN AUTHOR CLUSTERING task at CLEF 2016).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The authorship attribution problem is an interesting problem in computational
linguistics but also in applied areas such as criminal investigation and historical studies
where knowing the author of a document (such as a ransom note) may be able to save
lives. With the Web 2.0 technologies, the number of anonymous or pseudonymous
texts is increasing and in many cases one person writes in different places about
different topics (e.g., multiple blog posts written by the same author). Therefore,
proposing an effective algorithm to the authorship problem presents a real interest. In
this case, the system must regroup all texts by the same author (written according to
different genres) into the same group or cluster. A justification supporting the proposed
answer and a probability that the given answer is correct can be given to improve the
confidence attached to the response
        <xref ref-type="bibr" rid="ref8">(Savoy, 2016)</xref>
        .
      </p>
      <p>This author clustering task is more demanding than the classical authorship
attribution problem. Given a document collection the task is to group documents
written by the same author such that each cluster corresponds to a different author. The
number of distinct authors whose documents are included is not given. This task can
also be viewed as establishing authorship links between documents and is related to the
PAN 2015 task of authorship verification.</p>
      <p>This paper is organized as follows. The next section presents the test collections and
the evaluation methodology used in the experiments. The third section explains our
proposed algorithm called SPATIUM-L1. In the last section, we evaluate the proposed
scheme and compare it to the best performing schemes using six different test
collections. A conclusion draws the main findings of this study.</p>
      <p>
        To evaluate the effectiveness of a clustering algorithm, the number of tests must be
large and run on a common test set. To create such benchmarks, and to promote studies
in this domain, the PAN CLEF evaluation campaign was launched
        <xref ref-type="bibr" rid="ref11">(Stamatatos et al.,
2016)</xref>
        . Multiple research groups with different backgrounds from around the world
have participated in the PAN CLEF 2016 campaign. Each team has proposed a
clustering strategy that has been evaluated using the same methodology. The
evaluation was performed using the TIRA platform, which is an automated tool for
deployment and evaluation of the software
        <xref ref-type="bibr" rid="ref4">(Gollub et al., 2012)</xref>
        . The data access is
restricted such that during a software run the system is encapsulated and thus ensuring
that there is no data leakage back to the task participants
        <xref ref-type="bibr" rid="ref7">(Potthast et al., 2014)</xref>
        . This
evaluation procedure also offers a fair evaluation of the time needed to produce an
answer.
      </p>
      <p>During the PAN CLEF 2016 evaluation campaign, six collections were built each
containing six problems (training + testing). In each problem, all the texts matched the
same language, are in the same genre, and are single-authored, but they may differ in
text-length and can be cross-topic. The number of distinct authors is not given. In this
context, a problem is defined as:</p>
      <p>Given a collection of up to 100 documents, identify authorship
links and groups of documents by the same author.</p>
      <p>The six collections are a combination of one of three languages (English, Dutch, or
Greek) and one of two genres (newspaper articles or reviews). An overview of these
collections is depicted in Table 1. The training set will be used to evaluate our approach
and the test set will be used in order to be able to compare our results with those of the
PAN CLEF 2016 campaign.</p>
      <sec id="sec-1-1">
        <title>Corpus</title>
        <p>English Newspaper
English Reviews
Dutch Newspaper
Dutch Reviews
Greek Newspaper
Greek Reviews</p>
        <p>For each benchmark we have three problems in the training dataset containing the
same number of texts with the exact corresponding number given under the label
“Texts”. The number of distinct authors for each problem is indicated in the column
“Authors”, and the number of authors with only a single document under the label
“Single”. For example, with the English newspaper collection (training set), 50 texts
are written by 35 authors and in this text subset we can find 27 authors who wrote only
one single article. These metrics are not available for the test corpora because the
datasets remained undisclosed thanks to the TIRA system. We only know that the same
combinations of language and genre are present.</p>
        <p>When inspecting the training collection of Dutch reviews, the number of words
available is rather small (in mean 130 words for each document). Overall, there are
many authors who only wrote a single text, so the number of authors per problem is
rather large. This means we should only cluster two documents if there are enough
signs for a single authorship.</p>
        <p>During the PAN CLEF 2016 campaign, a system must return two outputs in a JSON
structure. First, the detected groups have to be written to a file indicating the author
clustering. Each document has to belong to exactly one cluster, thus the clusters have
to be non-overlapping. Second, a list of document pairs with a probability of having
the same author has to be written to another file representing the authorship links.</p>
        <p>As performance measure, two evaluation measures were used during the PAN CLEF
campaign. The first performance measure is the BCubed F-Score (Amigo et al., 2007)
to evaluate the clustering output. This value is the harmonic mean of the precision and
recall associated to each document. The document precision represents how many
documents in the same cluster are written by the same author. Symmetrically, the recall
associated to one document represents how many documents from that author appear
in its cluster.</p>
        <p>
          As another measure, the PAN CLEF campaign adopts the mean average precision
(MAP) measure for the authorship links between document pairs
          <xref ref-type="bibr" rid="ref6">(Manning et al.,
2008)</xref>
          . This evaluation measure provides a single-figure measure of quality across
recall levels. The MAP is roughly the average area under the precision-recall curve for
a set of problems. Therefore, this measure gives more emphasis on the first positions
and a misclassification with a lower probability is less penalized.
        </p>
        <p>Considering the six benchmarks as a whole, we have 18 problems to solve and 18
problems to train (pre-evaluate) our system. Because there are many authors with only
a single document, we can compare our approach with a naïve baseline, which clusters
each text in an individual cluster. This means the document precision is always 100%.
The documents recall is lower, but should still be competitive due to the low number
of expected clusters. Furthermore, random scores are assigned for all combinations in
the authorship links.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3 Simple Clustering Algorithm</title>
      <p>
        To solve the clustering problem, we suggest an unsupervised approach based on a
simple feature extraction and distance measure called SPATIUM-L1 (Latin word
meaning distance). The selected stylistic features correspond to the top m most frequent
terms (isolated words without stemming but with the punctuation symbols). For
determining the value of m, previous studies have shown that a value between 200 and
300 tends to provide the best performance
        <xref ref-type="bibr" rid="ref3 ref8">(Burrows, 2002; Savoy 2016)</xref>
        . Some
documents were rather short and we further excluded the words only appearing once in
the text. This filtering decision was taken to prevent overfitting to single occurrences.
The effective number of terms m was set to at most 200 terms but was in most cases
well below. With this reduced number the justification of the decision will be simpler
to understand because it will be based on words instead of letters, bigrams of letters or
combinations of several representation schemes or distance measures.
      </p>
      <p>To measure the distance between one document A and another text B, SPATIUM-L1
uses the L1-norm as follows:
∆( ,  ) = ∆ = ∑ =1|  [ ] −   [  ]|

(1)
where m indicates the number of terms (words or punctuation symbols), and PA[ti] and
PB[ti] represent the estimated occurrence probability of the term ti in the first text A and
in the other text B respectively. To estimate these probabilities, we divide the term
occurrence frequency (tfi) by the length in tokens of the corresponding text (n), Prob[ti]
= tfi / n, without smoothing and therefore accepting a 0.0 probability.</p>
      <p>To verify whether the resulting ∆</p>
      <p>value is small or rather large, we need to have a
comparison. To achieve this, the distance from A to all other k texts from the current
problem was calculated. If this ∆</p>
      <p>value is 2.0 standard deviations below the average
of all distances, then this is a first indication of an author link. Since the m terms are
always selected from the first text, the ∆</p>
      <p>value might be different from the ∆
We therefore calculate the distance of text B with all other k texts and if this ∆
is as well 2.0 standard deviations below the average of all distances, then this is our
value.
value
second indication of an author link. The exact difference to the mean divided by the
standard deviation is used to calculate how much the indication weights, where a higher
number means more evidence of a shared authorship.</p>
      <sec id="sec-2-1">
        <title>For example, in the second</title>
        <p>English review problem we cluster document 9 together with document 50. The ∆9;50
value is 32, while the average ∆9;</p>
        <p>value to all other texts is 45 with a standard
deviation of 5.6, which results in a first indication of (45 − 32)/5.6 = 2.3. A higher
value means more evidence of a shared authorship.</p>
        <p>For the grouping stage we follow the transitivity rule. If we have enough indication
that the texts A and B are written by the same author and we also have indication that
the documents B and C have a single authorship, then we will group A, B, and C
together even if we don’t have enough evidence that A and C have the same writer.</p>
        <p>For the author link, on the other hand, we only report A-B and B-C as a having the
same author in this scenario, while leaving out A-C due to the absence of any previous
sign for a single authorship. Furthermore, since this step allows a ranked listing of the
author links, we assigned the highest probability to the text pair where we have the most
evidence. A rather low probability is attributed to document pairs where we only have
partial indication of a shared authorship.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>Since our system is based on an unsupervised approach we were able to directly
evaluate it using the training set. In Table 2, we have reported the same performance
measure applied during the PAN CLEF campaign, namely the BCubed F-Score and the
MAP. Each collection consists of three sets of problems and we report the average of
them. The final score is the mean between the two reported metrics.</p>
      <p>The algorithm returns the best results for the Greek Review collection with a final
score of 0.5590 followed by the Greek Article and Dutch Article corpora. The worst
result is achieved with the two English collections which are slightly worse than the
Dutch Review corpus. For the two Dutch collections we can clearly see the difference
in text length reflected in the final score, as the newspaper corpus contains almost 10
times more words and achieves a noteworthy higher value. Our approach achieves an
F-Score that is slightly higher than the one from the naïve baseline, but a significantly
higher MAP.</p>
      <p>The test set is then used to rank the performance of all 7 participants in this task.
Based on the same evaluation methodology, we achieve the results depicted in Table 3
corresponding to the six test corpora.</p>
      <p>As we can see, the final score with the Greek Review corpus is the highest as
expected from the training set. The results we achieved in the two English collection
is as low as in the training set. On the other hand, the Greek result achieved for the
newspaper part is only slightly worse than the estimation from the training set.
Generally, we see a very similar performance when comparing it with the training set.
Therefore, the system seems to perform stable independent of the underlying text
collection and is not over-fitted to the data.</p>
      <p>To put those values in perspective we can see in Table 4 our result in comparison
with the top three of all participants using macro-averaging for the effectiveness
measures and showing the total runtime. We have also added our naïve baseline as
described above. As in the training collections, our approach achieves an F-Score that
is slightly higher than the one from the naïve baseline, but a significantly higher MAP.
Therefore, some documents were wrongly clustered together, which decreases the
document precision part of the BCubed F-Score. But we cluster many documents
correctly together (increases document recall) and assign them a high score for their
authorship link (increases MAP). Overall, this is beneficial and we are ranked second
out of eight approaches.</p>
      <sec id="sec-3-1">
        <title>Rank User</title>
        <p>1 bagnall16
2 kocher16
3 Naïve Baseline
4 sari16
… …</p>
        <p>The runtime only shows the actual time spent to classify the test set. On TIRA there
was the possibility to first train the system using the training set which had no influence
on the final runtime. Since we have an unsupervised system it did not need to train any
parameters, but this possibility might have been used by other participants. Overall,
we achieve excellent results using a rather simple and fast approach in comparison with
the other solutions1.</p>
        <p>In text categorization studies, we are convinced that a deeper analysis of the
evaluation results is important to obtain a better understanding of the advantages and
drawbacks of a suggested scheme. By just focusing on overall performance measures,
we only observe a general behavior or trend without being able to acquire a better
explanation of the proposed assignment. To achieve this deeper understanding, we
could analyze some problems extracted from the English corpus. Usually, the relative
frequency (or probability) differences with very frequent words such as when, is, in,
that, to, or it can explain the decision.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5 Conclusion</title>
      <p>
        This paper proposes a simple unsupervised technique to solve the author clustering
problem. As features to discriminate between the proposed author and different
candidates, we propose using at most the top 200 most frequent terms (words and
punctuations). This choice was found effective for other related tasks such as
authorship attribution
        <xref ref-type="bibr" rid="ref3">(Burrows, 2002)</xref>
        . Moreover, compared to various feature
selection strategies used in text categorization
        <xref ref-type="bibr" rid="ref10">(Sebastiani, 2002)</xref>
        , the most frequent
terms tend to select the most discriminative features when applied to stylistic studies
        <xref ref-type="bibr" rid="ref9">(Savoy, 2015)</xref>
        . In order to take the author linking decision, we propose using a simple
distance measure called SPATIUM-L1 based on the L1 norm.
      </p>
      <sec id="sec-4-1">
        <title>1 http://www.tira.io/task/author-clustering/</title>
        <p>
          The proposed approach tends to perform very well in three different languages
(Dutch, English, and Greek) and in two genres (newspaper articles and reviews, but
keeping the same genre inside a given test collection). Such a classifier strategy can be
described as having a high bias but a low variance
          <xref ref-type="bibr" rid="ref5">(Hastie et al., 2009)</xref>
          . Changing the
training data does not change a lot the decision. However, the suggested approach
ignores other significant information such as mean sentence length, POS (part of
speech) distribution, or topical terms. Even if the proposed system cannot capture all
possible stylistic features (bias), changing the available data does not modify
significantly the overall performance (variance).
        </p>
        <p>It is common to fix some parameters (such as time period, size, genre, or length of
the data) to minimize the possible source of variation in the corpus. However, our goal
was to present a simple and unsupervised approach without many predefined
arguments.</p>
        <p>With SPATIUM-L1 the proposed clustering could be clearly explained because it is
based on a reduced set of features on the one hand and, on the other, those features are
words or punctuation symbols. Thus the interpretation for the final user is clearer than
when working with a huge number of features, when dealing with n-grams of letters or
when combing several similarity measures. The SPATIUM-L1 decision can be explained
by large differences in relative frequencies of frequent words, usually corresponding to
functional terms.</p>
        <p>To improve the current classifier, we will investigate the consequence of some
smoothing techniques, the effect of other distance measures, and different feature
selection strategies. In the latter case, we want to maintain a reduced number of terms.
In a better feature selection scheme, we can take account of the underlying text genre,
as for example, the most frequent use of personal pronouns in narrative texts. As
another possible improvement, we can ignore specific topical terms or character names
appearing frequently in an author profile, and terms that can be selected in the feature
set without being useful in discriminating between authors.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The author wants to thank the task coordinators for their valuable effort to promote
test collections in author clustering. This research was supported, in part, by the NSF
under Grant #200021_149665/1.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Amigo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artiles</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Verdejo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>A comparison of Extrinsic Clustering Evaluation Metrics based on Formal Constraints</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Information</given-names>
            <surname>Retrieval</surname>
          </string-name>
          ,
          <volume>12</volume>
          (
          <issue>4</issue>
          ),
          <fpage>461</fpage>
          -
          <lpage>486</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Burrows</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          <year>2002</year>
          .
          <article-title>Delta: A Measure of Stylistic Difference and a Guide to Likely Authorship</article-title>
          .
          <source>Literary and Linguistic Computing</source>
          ,
          <volume>17</volume>
          (
          <issue>3</issue>
          ),
          <fpage>267</fpage>
          -
          <lpage>287</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Burrows</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2012</year>
          . Ousting Ivory Tower Research:
          <article-title>Towards a Web Framework for Providing Experiments as a Service</article-title>
          . In: Hersh,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Callan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Maarek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            , &amp;
            <surname>Sanderson</surname>
          </string-name>
          , M. (eds.)
          <source>SIGIR. The 35th International ACM</source>
          ,
          <volume>1125</volume>
          -
          <fpage>1126</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hastie</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tibshirani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>The Elements of Statistical Learning</article-title>
          .
          <source>Data Mining, Inference, and Prediction</source>
          . Springer-Verlag: New York (NY).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          5.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghaven</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Schütze</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <year>2008</year>
          . Introduction to Information Retrieval. Cambridge University Press.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          6.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Improving the Reproducibility of PAN's Shared Tasks: - Plagiarism Detection, Author Identification, and Author Profiling</article-title>
          . In: Kanoulas,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Clough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Sanderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Handbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            , &amp;
            <surname>Toms</surname>
          </string-name>
          ,
          <string-name>
            <surname>E</surname>
          </string-name>
          . (eds.)
          <source>CLEF. Lecture Notes in Computer Science</source>
          , vol.
          <volume>8685</volume>
          ,
          <fpage>268</fpage>
          -
          <lpage>299</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          7.
          <string-name>
            <surname>Savoy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Estimating the Probability of an Authorship Attribution</article-title>
          .
          <source>Journal of American Society for Information Science &amp; Technology</source>
          ,
          <volume>67</volume>
          (
          <issue>6</issue>
          ),
          <fpage>1462</fpage>
          -
          <lpage>1472</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          8.
          <string-name>
            <surname>Savoy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Comparative Evaluation of Term Selection Functions for Authorship Attribution</article-title>
          .
          <source>Digital Scholarship in the Humanities</source>
          ,
          <volume>30</volume>
          (
          <issue>2</issue>
          ),
          <fpage>246</fpage>
          -
          <lpage>261</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          9.
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2002</year>
          .
          <article-title>Machine Learning in Automatic Text Categorization</article-title>
          .
          <source>ACM Computing Survey</source>
          ,
          <volume>34</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          10.
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhoeven</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specht</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Clustering by Authorship Within and Across Documents</article-title>
          .
          <source>In Working Notes Papers of the CLEF 2016 Evaluation Labs, CEUR Workshop Proceedings, CEUR-WS.org.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>