<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bootstrap Distance Imposters: High precision authorship ⋆ verification with improved interpretability</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ben Nagy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Polish Language, Polish Academy of Sciences (IJP PAN) al.</institution>
          <addr-line>Mickiewicza 31 Kraków</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <fpage>482</fpage>
      <lpage>493</lpage>
      <abstract>
        <p>This paper describes an update to the open-source Python implementation of the General Imposters method of authorship verification by Mike Kestemont et al. The new algorithm, called Bootstrap Distance Imposters (henceforth BDI), incorporates a key improvement introduced by Potha and Stamatatos, as well as introducing a novel method of bootstrapping that has several attractive properties when compared to the reference algorithm. Initially, we supply an updated version of the Kestemont et al. code (for Python 3.x) which incorporates the same basic improvements. Next, the two approaches are benchmarked using the problems from the multi-lingual PAN 2014 author identification task, as well as the more recent PAN 2021 task. Additionally, the interpretability advantages of BDI are showcased via real-world verification studies. When operating as a summary verifier, BDI tends to be more conservative in its positive attributions, particularly when applied to difÏcult problem sets like the PAN2014 en_novels. In terms of raw performance, the BDI verifier outperforms all PAN2014 entrants and appears slightly stronger than the improved Kestemont GI according to the PAN metrics for both the 2014 and 2021 problems, while also ofering superior interpretability.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;authorship verification</kwd>
        <kwd>stylometry</kwd>
        <kwd>bootstrapping</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Question Text
Question Text</p>
    </sec>
    <sec id="sec-2">
      <title>2. Motivation and Design</title>
      <p>The Imposters method of authorship verification examines aquestion text in order to determine whether
the text was written by acandidate author. To do this, it performs bootstrap comparisons of the question
text to candidate texts and several (perhaps very manyim)poster texts which should ideally be chosen
to be similar in genre, topic and register. If the question text is markedly more similar to the candidate
author’s ‘style’ than to any of the imposters then we infer, with some level of statistical likelihood, that
it was authored by the candidate. The method of Imposters can be used with any features that reflect
style, although it is most commonly applied to distances between characte-grram or word frequency
vectors.</p>
      <p>Note that while I speak of ‘statistical likelihood’ I am carefully avoiding the words ‘probability’ and
‘confidence’. The GI method is, in machine learning terms, an ensemble classifier. These classifiers
regularise well, but while they produce a real-valued output, it is problematic to interpret this number
as a probability. Many factors can make the verification results less reliable—the lengths of the texts,
the language in which they are written, the closeness of the imposter texts (dissimilar texts make a
positive attribution less convincing), and the amount of available data (a lack of comparison data risks
bias). Verification problems in the real world are seldom under ideal conditions, and there is no magical
formula by which the uncertainties imposed by the problem setting can be convincingly rendered as
a forensic probability. Never the less, the Imposters method has a well deserved reputation for robust,
understandable results, even in the face of severe limitations (for example in the length of samples or
the availability of suitable imposters).</p>
      <p>Figure1 attempts an intuitive explanation of the basic operation of standard GI and the key
modification used in BDI. The output of the Kestemont GI classifier is a percentage of binarized ‘votes’ (the
number of times a candidate text was closer than an imposter). In contrast, the raw output from the BDI
algorithm is a bootstrapped distribution of diferences. At each step, the distance (with a bootstrapped
feature set) between the candidates and the imposters is recorded, using any vector distance measure
 ∶ ℝ  × ℝ → ℝ. If the candidates are further, the diference between the distances is negative, if closer
it is positive. If these individual distances follow a Gaussian distribution (which is a reasonable prior
expectation) then their diference is also Gaussian. Expressing the results this way has some advantages.
The first is that we can diferentiate a negative result (not the candidate) as either ‘none of the above’
or ‘more like an imposter’ (the true author is in the imposters set). A ‘none of the above’ result would
have a statistically expected distance of zero (equally unlike the candidate and the imposters), and so
we would see a distribution centred around1 0O. n the other hand, ‘more like an imposter’ results show
distributions centred around a negative value (examples of this can be seen in Secti4obnelow). The
1Note carefully that this is a one-way implication—a true author that is neither the candidate nor one of the imposters should
have a distance distribution centred around zero, but not all such distributions guarantee that the true author is not among
the imposters.
other advantage is that for strong positives, we have additional data about the match. Distributions
centred around larger positive numbers are better matches, but distributions with high variance show
more feature dependence (since the strength of the match varies greatly depending on the bootstrap
feature sets). In summary, positive matches (with most or all of the probability mass above zero) can
be much more meaningfully compared.</p>
      <p>It is worth noting here that the overall best performing method at PAN 2014 by Khonji and Ira8q]i [
also modified the classic GI algorithm to utilize the distance between vectors (in that case the relative
distance of the test vector to candidates vs imposters was considered as part of the decision function for
a ‘standard’ voting-based classifier), so this paper is not the first to recognise the value of this additional
information.</p>
      <sec id="sec-2-1">
        <title>2.1. Binary Classification</title>
        <p>
          Based on the BDI algorithm, which outputs a distribution, it is obviously useful to have a summary
statistic that can be interpreted as evidence for authorship verification tasks. For this paper I used a
simple approach that considers the amount of probability mass that lies above 0. If every test is closer
to a candidate than an imposter then the result will be 1, if every test is more like an imposter, it will
be 0, etc. This is implemented simply as the inverse percentile of (a distance of) 0. Thus armed with a
method that outputs a ‘probability-like’ result [i0n, 1], I wrapped the code in a classifier that follows
scikit-learn [
          <xref ref-type="bibr" rid="ref11">13</xref>
          ] conventions likefit() and predict_proba() and evaluated the BDI classifier
directly against the updated Kestemont GOIrder2Verifier using the PAN shared tasks from 2014 and
2021. This provided a convenient benchmark, and also an opportunity to compare the results against
a number of other verification approaches.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Score Shifting</title>
        <p>The C@1 metric introduced in PAN 2014 rewards (or at least penalises less harshly) classifiers that
choose not to answer some problems. This leads naturally to algorithms that use the training data
to define classifier output ranges that will be assigned to 0.5 (indicating an unanswered problem). In
the case of classic GI, this means that classifier scores (vote percentages) within certain ranges will be
rectified to precisely 0.5, while positive and negative classifications are shifted to the ranges above and
below that value. This (hopefully) improves the C@1 score as compared to basic accuracy. In the PAN
2020–21 competitions, a similar measure was used called0.5F (introduced in 2[]).</p>
        <p>The score shifting method implemented in Kestemont GI attempts to produce something more like
a probability by linearly scaling the output values. The code defines an upper and lower bound for the
unanswered regionp,1 and p2 The scaling code (in Python) looks like this:
for score in list(scores):
if score &lt;= p1:</p>
        <p>new_scores.append(rescale(score, min(scores), max(scores), 0.0, p1))
elif score &gt;= p2:</p>
        <p>new_scores.append(rescale(score, min(scores), max(scores), p2, 1.0))
else:</p>
        <p>new_scores.append(0.5)</p>
        <p>Scores belowp1 are scaled to[0, p1), scores abovep2 are scaled to(p2, 1], and the rest are coerced
to 0.5. There is an issue with this scaling algorithm, however. Sinpce1 and p2 are chosen by grid
search to maximise the PAN score, the score shifter sometimes fits values forp2 that are well below 0.5.
This can lead to decisions that are defined as positive (since they are abovep2) being scaled to below
0.5, where they are evaluated by the scoring metrics as a negative result (and scored as such). In the
updated code, I modified the shifting code to scale more simply to[0, 0.5) (negative), 0.5 (unanswered),
and (0.5, 1] (positive). This does not retain the global distributional properties of the original results (as
implemented in [6]).
for score in scores:
if score &lt;= p1:
new_scores.append( rescale(score, orig_min=0, orig_max=p1, new_min=0.0,</p>
        <p>↪ new_max=0.499) )
elif score &gt;= p2:
new_scores.append( rescale(score, orig_min=p2, orig_max=1,</p>
        <p>↪ new_min=0.501, new_max=1.0) )</p>
        <p>Based on the evaluation problems, the BDI algorithm is not as sensitive to this score shifting, deriving
only a modest benefit from fitting. The fitting process raises natural questions about the
representativeness of the training data, and also causes some problems in domains that sufer from limited data
availability (where it can be hard to sacrifice data for training). In these circumstances, BDI works well
with manual score shifting, allowing the user to choose a confidence level empirically.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Changes to Kestemont GI</title>
        <p>As is the nature of software, the code in the repository documenting the GI algorithm and for the related
work on the Caesarian corpus no longer ran. I reworked the code slightly, and made the following small
changes, which are available in my own repository11[].</p>
        <p>
          • Update the code to work with Python 3 (these minimal changes have been incorporated into the
original repository based on a PR);
• Implement a fast ‘nini’ metric (fuzzy Jaccard similarity) as described 1in2][;
• Implement the Potha &amp; Stamatatos ‘ranking’ improvement for the consensus sco2rdeescribed in
[
          <xref ref-type="bibr" rid="ref12">14</xref>
          ];
• Remove most non-core code;
• Modify the score-shifting algorithm, as described above.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluation</title>
      <p>The BDI classifier was compared head-to-head with the updated KestemontOrder2Verifier on the
full PAN 2014 evaluation corpus, which is a challenging set of varied authorial styles across four
languages.3 Additionally, both verifiers were evaluated below using the PAN 2021 problems, since that
competition included some deep-learning approaches, discussed further below. For this evaluation I
used character -grams, since that feature universe is a generally reliable and uncontroversial choice
for modern stylometric work. There may be feature universes that perform better, or features that
work better for a specific task, but character -grams are ‘solid if boring’. Likewise, t he-gram
frequencies are  -scaled (based only on the training variances) since this is the standard approach. I tested
two  -gram configurations, 2,3,4-grams and 2,3,4,5-grams, 10,000 max features, with fitted and manual
score shifting. For the Order2Verifier bootstrapping was performed at 50% (5000 random features per
bootstrap iteration) since that is the configuration used in 6[]; for BDI, bootstrapping was performed
at 33%. Finally, for the distance metric used to determine ‘closeness’ at each step I tested the cosine
distance (the most traditional choice) and the minmax (Ružička) metric promoted by Kestemont et al.
2Instead of a strict 1 (candidate closest) or 0 (candidate not closest), Potha &amp; Stamatatos proposed a score improvement1based
on the ordinal rank of the closest candidate, so if a candidate was the second-closest, the score for that iteration wou2l.d be
The same paper proposed a distance-based culling method to select more relevant imposters, but this was not implemented
because of poor Big-O complexity.
3There is a small inconsistency that I was unable to resolve—the only copy of the verification problems I could find were
archived in the Kestemont repository, but they are apparently missing 50 of the ‘Dutch Reviews’ evaluation problems, so
there are a total of 746, versus 796 reported in the PAN 2014 wrapup report.</p>
      <p>485
both updated Kestemont GI and the winning PAN 2014 entrant. It must be noted, however, that the
individual corpus results are not dominated by either verifier but vary considerably according to evaluation
strategy and corpus. The strength of the BDI classifiers, however, is twofold: first, they do not really
require any training corpus, and second, the BDI approach is high precision (it yields very few false
positives), making it a conservative verifier whose positive results are reliable (at the cost of more false
negatives). This can be seen clearly in Figur2e in which the best performing GI classifier is compared
to a manually fitted BDI classifier (results between 11% and 89% are left unanswered) using the same
features and metrics. This is not the best performing BDI verifier: it was chosen because these verifiers
have an almost identical overall PAN14 score but quite diferent detailed characteristics. The results for
each subcorpus are broken down in more detail in Tabl2e&amp;s 3. In those tables, the respective results
for the en_novels set are of particular interest. This sub-corpus was extremely challenging, causing
problems for all entrants due to the shared-genre nature of Lovecraftian horror and the unifying force
of its pervasive style (explained in more detail i1n6[, p. 882]). BDI underperformed here according to
the PAN metrics, but did so conservatively, with a perfect precision score of 1.0.
3.2. Testing vs PAN 2021
Since the state of the art in authorship verification is now exploring the possibilities ofered by deep
learning, the BDI verifier (and theOrder2Verifier) were also evaluated using the data from the PAN
2020–2021 shared task. The final evaluation report from that task is available i7n][. The 2020–21 tasks
used English-language fan-fiction, which is available in vast amounts. Although the huge amount of
training data and the limitation to English makes the problems somewhat less interesting, it is
interesting to know how much we ‘lose’ by using simple machine learning approaches that can be widely
applied as compared to cutting-edge techniques applied to best-case scenarios. The large amount of
training data (thelarge training set contained 176,000 total text pairs, and even these were
synthetically augmented by some teams!), as well as the more recent date, meant that the results here were
dominated by complex deep learning models, with the winning entry being an extremely impressive
siamese network using four subcomponents3[]. It is not possible to provide a strict ‘apples to apples’
comparison, since the entrants were required to assess each problem as an atomic pairwise
determi487</p>
      <sec id="sec-3-1">
        <title>Fitting Results: PAN 2021 (small training corpus), 1000 problems</title>
        <p>‘trained’ per-se, both the score shifting parameters for the fitted shifters as well as the variances used
in  -scaling were derived from a subset of thsemall training corpus. This is in spirit with the real
PAN competition where the final evaluation dataset was fully blinded, unlike in previous years. The
amount of training data employed was very small, using just 4256 texts from the ‘small’ training set,
which contained 106,000 texts. The final evaluation used texts from the 2021 evaluation set (19999 pairs)
which contained no texts from authors in the training sets. To fairly assess AUC, a roughly even split of
positive and negative determinations are required. In addition, since the 2021 shared task was focused
on previously-unseen authors, unseen authors were included as noise in both the comparison set
(candidate profiles) and the evaluation set (verification problems). The evaluation set that was eventually
used contained 10,152 problems, and was broken down as follows:
• 5076 problems from authors with at least one other text, tested against the true label;
• 2538 problems from authors with at least one other text, tested against a randomly chosen false
• 2538 problems from unseen authors, tested against a random false label.</p>
        <p>The candidate/comparison set is composed as follows:
• 1692 authors who are never seen again, as noise;
• 7451 authors with one comparison sample (who might appear against a true or false label);
• 905 authors with 5 comparison samples;
As can be seen, the large majority of authors have only one comparison candidate. The unnatural
distribution of counts for texts per author is an artifact of the PAN data, and is presumably related to
the corpus compilation process. The evaluation process used all of the evaluation texts by authors with
two or more total samples, but the bulk of the singleton texts were not used. The full evaluation process
is available as commented Jupyter notebooks at the accompanying repository.</p>
        <p>Overall, the best performers in training were fitted versions of the 2,3,4-gram Kestemont GI and
BDI verifiers, once again with the minmax metric. The full results can be seen in Tabl4e. Since the</p>
      </sec>
      <sec id="sec-3-2">
        <title>Evaluation Results: PAN 2021, 10,152 test problems</title>
        <p>bootstrapping process was extremely time consuming on the full evaluation set, only the 2,3,4-gram
vectorizer was used. The best performing BDI verifier posted a final overall score of 0.889, using the
PAN 2021 metric (the mean of several accuracy measures, calculated using the ofÏcial evaluation code).
This would place it near the bottom of the four “strong runner up7s,”p[. 7], still comfortably
outperforming the machine-learning benchmarks and 5 human teams. It seems reasonable to assume that
a more sophisticated feature vector, such as the one used in weerasinghe2117[], would improve the
performance of both Order2Verifier and BDI. By comparison, the final score for boenninghof21 was
0.9545. Once again, it is clear from Table4s and 5 that the fitting is nice to have in terms of the overall
PAN21 score (which uses several balanced accuracy measures), but in fact reduces raw precision, which
is relevant where research problems require the smallest chance of false positives.</p>
        <p>As further exploration, some diferential / ablation results are included in Ta5b.lTehe results under
the shifter method ‘fitted (optimal)’ calculate the optimal post-hocp1 and p2 (as discussed above) based
on the results from the evaluation itself—in other words it shows the advantage if the training regime
were to perfectly represent the probability distribution of the true problems. As can be seen, this is
quite minor (an overall score of 0.896 vs 0.889), which is encouraging. However, the relatively strong
performance of the manual shifter with the BDI verifier, as well as its higher precision, seems to suggest
that the small increase in some of the measures does not warrant the methodological uncertainty (in
terms of representativeness and bias) added by the fitting process in general. Additionally, it can be
seen that the score shifting optimisation is very sensitive to the evaluation measure, incurring quite a
large precision penalty—not a component of the target metric—in order to improve the overall score.
Finally, the new addition of ‘ranking’ was evaluated by ablation. As in the PAN 2014 results (Table
1, PAN14-U), the ranking regularisation appears to ofer a small but consistent benefit to AUC and
F-measures, which is impressive considering how few of the authors (about 10%) had more than one
comparison sample. The unranked verifiers, however, showed even higher precision which may need
to be considered for some applications.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Showcase</title>
      <p>This section refers to two verification studies in which BDI has been applied, one of which is, at the
time of writing, still in press. These figures are not full summaries of the research, but simply illustrate
some of the features of the BDI method that I believe to be useful. As mentioned above, the output of
the BDI algorithm is a distribution of distances, not a summary statistic. These examples attempt to
show that examining the full distribution conveys extra information and can improve our intuition and
confidence in the analysis.
vide evidence for the genuineness of theNux, but of more interest in this context is the analysis of the
Consolatio ad Liviam. The Consolatio was once considered to be a genuine work of Ovid, but is now
accepted by most scholars to be a first-century imitation. By using BDI we attempted to show that
metrical technique was a powerful enough stylistic feature to disambiguate even deliberate imitation
from genuine works. In Figur3e, we see the value of visualising distributions where all of the distances
are positive (closer to the candidate author than an imposter), which would be summarised as a
‘probability’ of 1.0. This figure measures similarity in terms of lexico-grammatical features, operationalized
as character -grams. In fact, as can be seen, although the chunks from theConsolatio are much more
like Ovid than they are like any of the distractor poets, they anroet as much like Ovid as most of the
candidate comparison works. This kind of comparability between strong matches is very difÏcult with
the standard GI approach. However, in Figur4e, which measures metrical features, the diference is
clear—the sections from theConsolatio are centered around 0 (or near enough) as compared to the other
works where at least 90% of the distribution mass is above 0, supporting a positive attribution. This
result suggests that theConsolatio is not Ovidian, but also that it is not a good stylistic match for any
of the distractor poets (Tibullus, Propertius, and Catullan elegy). This is consistent with the current
(weak) consensus that the Consolatio is a late first-century imitation by an unknown.</p>
      <p>
        Figures5 &amp; 6 are from an analysis of translator style, examining medieval translations from Greek to
Latin. Translator style (as opposed to authorial style) ofers a unique set of challenges, covered in much
more detail in the full paper1][. In this study, a small set of function words was used, in accordance
with the well-known theory that closed-class words are used more unconsciously, and thus are more
indicative of individual preferences than nouns, verbs, and adjectives (which are highly afected by
genre and topic). Overall, this was found to be an efective approach. Here, however, I note two more
useful properties of the BDI method. The first is that by splitting the work into smaller chunks, and
visualising the distribution for each chunk we are able to see the degree of stylistic variation in a single
translator. It is also clear from Figu5rtehat some passages are less ‘stylistically clear’, showing much
more pronounced spread—this can be interpreted as greater sensitivity to the individual feature subsets.
Overall, Figure5 is centred around a negative value, indicating that it is significantly more similar to
one of the imposter translators than to Bartholemew. In Figu6r, ewe performed the same process for
a diferent text that is a translation of the same work (Aristotle’Rshetoric) generally accepted to be by
William of Moerbeke. In the latter case we see the expected result—almost all of the chunks are fairly
strongly centred around a positive value. The strength of the match is not as clear as in the Ovidian
ifgures, but this is perhaps to be expected, since the amount of style that a translator brings to a work
can be reasonably assumed to be less than that brought by an author (this is a well studied field; see for
example [
        <xref ref-type="bibr" rid="ref13">15</xref>
        ] with references).
1–13
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Future Work</title>
      <p>
        As can be seen fromFigure2, predictionsfromthe BDI classifier (after shifting) cluster strongly at the
extremes, with most mis-predictionsbeing high-confidence false n egatives. The Kestemont GI
classiifer shows the most mis-predictions in the central band (near 0.5), which is intuitive if the outputs are
interpretedas probabilities.The currentlinkfunction fromthe BDI distributionsto a ‘probability’in
[
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] is a fairlysimpleidea, and can almostcertainlybe improvedto producea smootherand
morestatisticallyinformeddistributionacrossthe output range (perhaps logisticregression,oreven empirical
distributionalstatistics). This is left for future work.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>The most common goalin authorshipverification work is to positively attribute works to a uthors. In
this context, althoughbalancedaccuracyis not unimportant,precision(fewer falsepositives) is often
moreimportantthan recall.The balancedmetricsused forthe PAN 2014 / 2021 authorshipverification
competitionsbalance overallAUC (false positives and false negatives) with the abilityfora classifier
to degradegracefullywhen the resultis unclear. In general,this is a usefulinnovation, particularly
in comparisonto standardmachine-learningclassifiers which are obliged to assign each problem to a
discreteclass(even if the trueauthoris not one of the availableanswers). Whilethe
widely-usedGeneralImpostersmethod stillperformsextremelywell,it seems wastefulto discardthe detaileddistance
informationthat is calculatedin any case duringthe bootstrap/ voting process.</p>
      <p>BDI attempts to addressthese issues by outputtinga fulldistance distributionwhich can be manually
inspected. As demonstratedin Section 4, this can be very usefulwhen comparingresultsthat are all
strong matches. When operatingas a summary verifier, BDI tends to b e conservative in its positive
attributions,particularlywhen appliedto very difÏcult problem sets like the PAN2014 e n_novel.s In
terms of raw performance,the BDI verifier appears slightly stronger than the improved Kestemont GI
accordingto the PAN metricsforboth the 2014 and 2021 problems,whilealsoofering superior
interpretability.The advantage of the BDI verifier is even clearer when score shifting isnot us ed. Overall,
the BDI approachseems to be a strongchoice, especiallywhere trainingdata is limitedand/orreliable
positive resultsare moreimportantthan balancedperformancemetrics.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Availability of Data and Code</title>
      <p>The preprintmay be foundat https://github.com/bnagy/bdi-pape r. Allcode and data is availableunder
CC-BY, except where restrictedby upstream licenses. The code repositoryincludesfullreproduction
data and code forthe evaluation,as wellas varioussupplementalfigures and explanations.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments References</title>
      <p>This work was supported by project 2020/39/O/HS2/02931, funded by Poland's National Science Centre</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Pieter</given-names>
            <surname>Beullens</surname>
          </string-name>
          , Wouter Haverals, and Ben Nagy. “
          <article-title>The Elementary Particles: A Computational Stylometric Inquiry into the Mediaeval Greek-Latin Aristotle”</article-title>
          . MIne:diterranea.
          <source>International Journal on the Transfer of Knowledge</source>
          <volume>9</volume>
          (
          <year>2024</year>
          ), pp.
          <fpage>385</fpage>
          -
          <lpage>408</lpage>
          . doi:
          <volume>10</volume>
          .21071/mijtk.v9i.
          <fpage>16723</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Janek</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          et al. “
          <article-title>Generalizing Unmasking for Short Texts”</article-title>
          .
          <article-title>InP:roceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). Ed. by Jill Burstein, Christy Doran, and
          <string-name>
            <given-names>Thamar</given-names>
            <surname>Solorio</surname>
          </string-name>
          . Minneapolis, Minnesota: Association for Computational Linguistics,
          <year>2019</year>
          , pp.
          <fpage>654</fpage>
          -
          <lpage>659</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1068.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Benedikt</given-names>
            <surname>Boenninghof</surname>
          </string-name>
          ,
          <string-name>
            <surname>Robert M. Nickel</surname>
          </string-name>
          , and Dorothea Kolossa. “O2D2:
          <article-title>Out-Of-Distribution Detector to Capture Undecidable Trials in Authorship Verification-Notebook for PAN at CLEF 2021”</article-title>
          . In:
          <article-title>CLEF 2021 Labs and Workshops</article-title>
          , Notebook Papers. Ed. by Guglielmo Faggioli et al.
          <source>CEURWS.org</source>
          ,
          <year>2021</year>
          . url:http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2936</volume>
          /paper-158.pd.f Maciej Eder, Jan Rybicki, and Mike Kestemont. “
          <article-title>Stylometry with R: a package for computational text analysis”</article-title>
          .
          <source>In:R Journal 8.1</source>
          (
          <issue>2016</issue>
          ), pp.
          <fpage>107</fpage>
          -
          <lpage>121</lpage>
          . url: https://journal.r-project.org/archive/20 16/RJ-2016-007/index.html.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Mike</given-names>
            <surname>Kestemont</surname>
          </string-name>
          . Ružička: Authorship Verification in Python .
          <year>2015</year>
          . url: https://github.com/mike kestemont/ruzicka.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Mike</given-names>
            <surname>Kestemont</surname>
          </string-name>
          et al. “
          <article-title>Authenticating the writings of Julius Caesar”</article-title>
          .
          <source>InEx:pert Systems with Applications</source>
          <volume>63</volume>
          (
          <year>2016</year>
          ), pp.
          <fpage>86</fpage>
          -
          <lpage>96</lpage>
          . doi: https://doi.org/10.1016/j.eswa.
          <year>2016</year>
          .
          <volume>06</volume>
          .02 9.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Mike</given-names>
            <surname>Kestemont</surname>
          </string-name>
          et al. “
          <article-title>Overview of the Cross-Domain Authorship Verification Task at PAN 2021”</article-title>
          .
          <source>In: CLEF 2021 Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2021</year>
          . url:https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2936</volume>
          /pa per-147.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Mahmoud</given-names>
            <surname>Khonji</surname>
          </string-name>
          and
          <string-name>
            <given-names>Youssef</given-names>
            <surname>Iraqi</surname>
          </string-name>
          .
          <article-title>“A slightly-modified GI-based author-verifier with lots of features (ASGALF) - Notebook for PAN at CLEF 2014”</article-title>
          .
          <source>InC:LEF 2014 Working Notes</source>
          <volume>1180</volume>
          (
          <year>2014</year>
          ), pp.
          <fpage>977</fpage>
          -
          <lpage>983</lpage>
          . url: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1180</volume>
          /
          <fpage>CLEF2014wn</fpage>
          -Pan-KonijEt2014.
          <article-title>p d.f Moshe Koppel and Yaron Winter. “Determining If Two Documents Are Written by the Same Author”</article-title>
          .
          <source>In:Journal of the Association for Information Science and Technology</source>
          <volume>65</volume>
          (
          <year>2014</year>
          ). doi: 10.1 002/asi.22954.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Ben</given-names>
            <surname>Nagy</surname>
          </string-name>
          .
          <article-title>Preprint: Bootstrap Distance Imposters: High precision authorship verification with improved interpretability</article-title>
          .
          <year>2024</year>
          . url: https://github.com/bnagy/bdi-pape r.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Ben</given-names>
            <surname>Nagy</surname>
          </string-name>
          . Ružička: Authorship Verification in Python .
          <year>2023</year>
          . url: https://github.com/bnagy/ruzic ka.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Nini</surname>
          </string-name>
          .
          <article-title>A Theory of Linguistic Individuality for Authorship Analysis</article-title>
          .
          <source>Elements in Forensic Linguistics</source>
          . Cambridge University Press,
          <year>2023</year>
          . doi1:
          <fpage>0</fpage>
          .1017/9781108974851.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          et al. “
          <article-title>Scikit-learn: Machine Learning in Python”</article-title>
          .
          <source>IJno:urnal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          ), pp.
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Nektaria</given-names>
            <surname>Potha</surname>
          </string-name>
          and
          <string-name>
            <given-names>Efstathios</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          . “
          <article-title>An Improved Impostors Method for Authorship Verification”</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 8th International Conference of the CLEF Association, CLEF</source>
          <year>2017</year>
          , Dublin, Ireland,
          <source>September 11-14</source>
          ,
          <year>2017</year>
          , Proceedings 8.
          <year>2017</year>
          , pp.
          <fpage>138</fpage>
          -
          <lpage>144</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -65813-1\_
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Jan</given-names>
            <surname>Rybicki</surname>
          </string-name>
          . “
          <article-title>The great mystery of the (almost) invisible translator”. QInu:antitative Methods in Corpus-Based Translation Studies: A practical guide to descriptive translation research</article-title>
          . Ed. by Michael P. Oakes and
          <string-name>
            <given-names>Meng</given-names>
            <surname>Ji</surname>
          </string-name>
          . John Benjamins Publishing Company Amsterdam,
          <year>2012</year>
          , pp.
          <fpage>231</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Efstathios</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          et al. “
          <article-title>Overview of the author identification task at PAN 2014”</article-title>
          .
          <source>InC:EUR Workshop Proceedings</source>
          <volume>1180</volume>
          (
          <year>2014</year>
          ), pp.
          <fpage>877</fpage>
          -
          <lpage>897</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Janith</surname>
            <given-names>Weerasinghe</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Rhia</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Rachel</given-names>
            <surname>Greenstadt</surname>
          </string-name>
          . “
          <article-title>Feature Vector Diference based Authorship Verification for Open-World Settings</article-title>
          .” In:
          <article-title>CLEF 2021 Labs and Workshops</article-title>
          , Notebook Papers. Ed. by Guglielmo Faggioli et al.
          <source>CEUR-WS.org</source>
          ,
          <year>2021</year>
          . urhlt:tps://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2936</volume>
          /paper-1 97.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>