<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Open-Vocabulary Approach to Authorship Attribution</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Independent Researcher</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>The PAN 2019 Authorship Attribution shared task presents the challenge of the open-set condition, i.e. given a text and a set of possible authors, we have to predict who is the true author, but there is no guarantee that she is among the candidates. In this paper we present our participation to this shared task. Our best performing system consists of a linear model using sparse features: in this notebook we present this system in detail. We found that including rare words as features helps our model. Furthermore, we present a series of models which did not outperform the submitted system. On the official test set we achieved 0.613 open-set F1-score.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Authorship analysis goes back at least to the 15th century, when the Italian humanist
Lorenzo Valla showed by means of a linguistic analysis that the Donation of
Constantine was a forgery. Today it has several applications: history [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], history of philosophy
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], intelligence [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]; [
        <xref ref-type="bibr" rid="ref21 ref8">8,21</xref>
        ] provide an exhaustive overview on the field and on the
methods.
      </p>
      <p>Authorship analysis covers different tasks: a) given a text, it is possible to predict the
demographics of its authors (author profiling), b), verify if it was written by a specific
author (author verification), c) compare its style to other texts (plagiarism detection)
and d) if the author of said text is unknown, it can be possible to discover it (authorship
attribution). We focus on this last task.</p>
      <p>
        In this paper we present our participation to the Cross-domain Authorship
Attribution shared task [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], part of the PAN Evaluation Forum [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This edition presents
several challenges, as it includes texts written in four different languages (English,
Italian, French and Spanish) and frames the task as an open-set problem, allowing for the
possibility that the true author of a document is not among the candidates.
Furthermore, the task organisers designed a cross-domain scenario by sampling the known
and unknown texts for each problem from two different sources. All these challenges
combined lead us to design a simple, lexical, profile-based model using sparse features.
      </p>
      <p>We experimented with with no success with different stylometric features. In this
paper we present our system together with an overview of a series of failed attempts to
outperform it.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data</title>
      <p>PROBLEM LANGUAGE KNOWN TEXTS VOCABULARY SIZE UNKNOWN TEXTS</p>
      <p>The data released by the organisers of the tasks consists of fanfiction literature,
written in four languages (i.e. English, French, Italian and Spanish). Table 1 shows an
overview of the dataset. Fanfiction is a literary genre consisting of writings inspired
by certain well-known authors or works, known as fandom. This task challenges
participants by asking to predict the author of a text belonging to a given fandom, given
only developments texts belonging to different fandoms, providing thus a cross-domain
(or cross-fandom) condition. For the development of our system we only used the data
released by the organisers.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Submitted System</title>
      <p>
        Our best system consists of a linear Support Vector Machine, fed with words and
character penta-grams, including all the words, regardless of their frequency. We apply no
pre-processing to the data. Instances assigned a probability lower than 0.1 are
classified as written by an unknown author. The features are normalised using TF-IDF.
The value of the C hyper-parameter of the SVM is 1. For the implementation we used
scikit-learn [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], starting from the baseline released by the task organisers.
Although extremely simple, this system was not outperformed by more complex ones, as
described in Section 4. We note that by allowing all the words occurring in the corpus to
be part of the feature space, we outperform by 0.2 F1 points an identical system using
only words occurring at least five times in the corpus.
This task presents several challenges, which all combined lead us to the choice of a
simple system for the official submission. First, the evaluation platform limits the
experimentation with large pre-trained neural models which are dependent on a GPU for
running in a reasonable time: considering the reproducibility issues involved with
neural models — as described in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] — this is not necessarily a negative aspect for a
shared task. Second, this being a multi-lingual task, we hypothesised that a system
relying on linguistic knowledge would have been too dependent on the availability of
specific resources (e.g. POS taggers, parsers, etc.). Third, considering that there is no
overlap between the set of authors present in the development corpus and those present
in the evaluation corpus, we could not rely on traditional techniques for fine-tuning the
system.
      </p>
      <p>Given all these constraints, we experimented with language-neutral methods, mostly
leveraging frequency- and surface-based features.</p>
      <p>
        Compression A first language-independent method is the compression-based method
described in [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]; we used the implementation based on [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Impostor A second
language independent method, also used in an implementation released by the organisers
of this shared task, is the Impostors method [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Ensemble We experimented with a
majority-voting ensemble system, built from combining the submitted system with the
Compression and Impostors system. Readability metrics We tried following [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], by
leveraging the readability of a text as a proxy for its true author: we used a battery of
readability metrics [
        <xref ref-type="bibr" rid="ref10 ref13 ref2 ref20 ref4 ref7">7,20,10,4,13,2</xref>
        ], using the computed score as a feature in isolation
and in combination with word n-grams: both approaches failed. Bleaching As shown in
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], frequency and surface-level features can be useful for cross-lingual author profiling
tasks, which are loosely related to authorship attribution. Table 3 shows an illustration
of the bleaching feature abstraction method.
      </p>
      <p>TOKEN</p>
      <p>FEATURE</p>
      <p>EXAMPLE</p>
      <p>
        We experimented with the bleaching approach of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], resulting in a lower
performance when compared to the submitted system.
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>
        The official evaluation is conducted on the TIRA Platform [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], using an F1 metric
modified in order to account for the open-set scenario [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Here we present i) some
development results obtained on the development data released by the organisers and
ij) the official results obtained on the test set on the TIRA platform.
5.1
      </p>
      <p>Development Results
model
submitted
compressor
impostor
ensemble
readability
bleaching
open-f1 score
0.619
0.554
0.449
0.618
0.078
0.133
Our submitted system scored 0.613 open-set macro F1, 0.8 points less than the
bestperforming system, which scored an average 0.69 open-set F1.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper we presented our participation to the PAN Authorship Attribution shared
task. We showed that linear models combined with sparse features work well for this
task, at least under the constraints of a) limited computing power, b) language
independence and c) out-of-domain data. We report that combining different systems into an
ensemble model does not help improving performance. We show that word and
character n-grams are good features for this task, even though they might allow for interference
with topic effects. For reproducibility, we release all the code used in this paper. Our
official submission scored 0.613 open-f1 score; we note that our system is the fastest
one among the submitted runs on the official test set (00:17:08).</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>The author is thankful to the three anonymous reviewers who helped improving the
quality of this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abbasi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , H.:
          <article-title>Applying authorship analysis to extremist-group web forum messages</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          <volume>20</volume>
          (
          <issue>5</issue>
          ),
          <fpage>67</fpage>
          -
          <lpage>75</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Lix and rix: Variations on a little-known readability index</article-title>
          .
          <source>Journal of Reading</source>
          <volume>26</volume>
          (
          <issue>6</issue>
          ),
          <fpage>490</fpage>
          -
          <lpage>496</lpage>
          (
          <year>1983</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Brandwood</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The chronology of Plato's dialogues</article-title>
          . Cambridge University Press (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Coleman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liau</surname>
            ,
            <given-names>T.L.</given-names>
          </string-name>
          :
          <article-title>A computer readability formula designed for machine scoring</article-title>
          .
          <source>Journal of Applied Psychology</source>
          <volume>60</volume>
          (
          <issue>2</issue>
          ),
          <volume>283</volume>
          (
          <year>1975</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manjavancas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specht</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zangerle</surname>
          </string-name>
          , E.: Overview of PAN 2019:
          <article-title>Author Profiling, Celebrity Profiling, Cross-domain Authorship Attribution and Style Change Detection</article-title>
          . In: Crestani,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Braschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Savoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Rauber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Heinatz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <source>Proceedings of the Tenth International Conference of the CLEF Association (CLEF</source>
          <year>2019</year>
          ). Springer (Sep
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>van der Goot</surname>
          </string-name>
          , R.,
          <string-name>
            <surname>Ljubesic</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matroos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nissim</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plank</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Bleaching text: Abstract features for cross-lingual gender prediction</article-title>
          .
          <source>In: ACL</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gunning</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>The fog index after twenty years</article-title>
          .
          <source>Journal of Business Communication</source>
          <volume>6</volume>
          (
          <issue>2</issue>
          ),
          <fpage>3</fpage>
          -
          <lpage>13</lpage>
          (
          <year>1969</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Juola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , et al.:
          <article-title>Authorship attribution</article-title>
          .
          <source>Foundations and Trends R in Information Retrieval</source>
          <volume>1</volume>
          (
          <issue>3</issue>
          ),
          <fpage>233</fpage>
          -
          <lpage>334</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manjavacas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the Cross-domain Authorship Attribution Task at PAN 2019</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Müller</surname>
          </string-name>
          , H. (eds.)
          <article-title>CLEF 2019 Labs and Workshops, Notebook Papers</article-title>
          .
          <source>CEUR-WS.org (Sep</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kincaid</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fishburne</surname>
            <given-names>Jr</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.P.</given-names>
            ,
            <surname>Rogers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.L.</given-names>
            ,
            <surname>Chissom</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.S.:</surname>
          </string-name>
          <article-title>Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel (</article-title>
          <year>1975</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Determining if two documents are written by the same author</article-title>
          .
          <source>JASIST 65</source>
          ,
          <fpage>178</fpage>
          -
          <lpage>187</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>López-Anguita</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montejo-Ráez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Díaz-Galiano</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Complexity Measures and POS N-grams for Author Identification in Several Languages-Notebook for PAN at CLEF 2018</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.Y.</given-names>
            ,
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (eds.)
          <article-title>CLEF 2018 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers,
          <volume>10</volume>
          -
          <fpage>14</fpage>
          September, Avignon, France.
          <source>CEUR-WS.org (Sep</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>McLaughlin</surname>
            ,
            <given-names>G.H.</given-names>
          </string-name>
          :
          <article-title>Clearing the smog</article-title>
          .
          <source>J Reading</source>
          (
          <year>1969</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mendes-Junior</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          R., de Souza, R.M.,
          <string-name>
            <surname>de Oliveira</surname>
            <given-names>Werneck</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.V.</given-names>
            ,
            <surname>Pazinato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.V.</given-names>
            ,
            <surname>de Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.R.</given-names>
            ,
            <surname>Penatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.A.B.</given-names>
            ,
            <surname>da Silva</surname>
          </string-name>
          <string-name>
            <surname>Torres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Rocha</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Nearest neighbors distance ratio open-set classifier</article-title>
          .
          <source>Machine Learning</source>
          <volume>106</volume>
          ,
          <fpage>359</fpage>
          -
          <lpage>386</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mosteller</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          :
          <article-title>Applied Bayesian and classical inference: the case of the Federalist papers</article-title>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>VanderPlas</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Braun</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buz</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duffhauss</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gülzow</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Köhler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lötzsch</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          , et al.:
          <article-title>Who wrote the web? revisiting influential author identification research applicable to information retrieval</article-title>
          .
          <source>In: European Conference on Information Retrieval</source>
          . pp.
          <fpage>393</fpage>
          -
          <lpage>407</lpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>TIRA Integrated Research Architecture</article-title>
          . In: Ferro,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <surname>C</surname>
          </string-name>
          . (eds.)
          <article-title>Information Retrieval Evaluation in a Changing World - Lessons Learned from 20 Years of</article-title>
          CLEF. Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Reimers</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Why comparing single performance scores does not allow to draw conclusions about machine learning approaches</article-title>
          . arXiv preprint arXiv:
          <year>1803</year>
          .
          <volume>09578</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Senter</surname>
          </string-name>
          , R.:
          <source>Automated readability index. AMRL-TR. Aerospace Medical Research Laboratories (US)</source>
          pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          (
          <year>1967</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Stamatatos</surname>
          </string-name>
          , E.:
          <article-title>A survey of modern authorship attribution methods</article-title>
          .
          <source>Journal of the American Society for information Science and Technology</source>
          <volume>60</volume>
          (
          <issue>3</issue>
          ),
          <fpage>538</fpage>
          -
          <lpage>556</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Teahan</surname>
            ,
            <given-names>W.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harper</surname>
            ,
            <given-names>D.J.:</given-names>
          </string-name>
          <article-title>Using compression-based language models for text categorization</article-title>
          . In:
          <article-title>Language modeling for information retrieval</article-title>
          , pp.
          <fpage>141</fpage>
          -
          <lpage>165</lpage>
          . Springer (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>