<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cross-Domain Authorship Verification Based on Topic Agnostic Features</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oren Halvani ?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lukas Graner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roey Regev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer Institute for Secure Information Technology SIT Rheinstrasse 75</institution>
          ,
          <addr-line>64295 Darmstadt</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>1</fpage>
      <lpage>1</lpage>
      <abstract>
        <p>Authorship verification (AV) is a research branch in digital text forensics that deals with the problem to determine whether two documents were written by the same author. Research activities in the context of AV have steadily increased in recent years, which have led to a variety of approaches trying to solve this problem. Many of these approaches, however, make use of features that are related to or influenced by the topic of the documents. Therefore, it may accidentally happen that their verification results are based not on the writing style alone (the actual focus of AV), but on the topic of the documents. To address this problem, we propose in the context of the AV shared task at the PAN 2020 workshop an alternative approach, which considers only topic-agnostic features in its classification decision. On the official test set, our approach was ranked third out of all submitted approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>With the constant increase of documents worldwide, more and more possibilities of
identity misuse are becoming established. One example of such identity abuse is “CEO
Fraud” – a sophisticated email scam – in which an attacker sends an email to an
employee on behalf of a CEO to perform a specific action (e. g., transferring money or
sending confidential company information). Another form of identity abuse occurs in
the context of compromised accounts, where the attacker distributes messages in the
name of the victim. In addition, identity abuse can occur in fake reviews in which, for
example, an attempt is made on behalf of an alleged person to positively advertise a
product or service provider. A countermeasure regarding these scenarios is to compare
the writing style of the questioned documents with the writing style of those documents
for which the true author A is known. By this, the question can be answered (with a
certain degree of probability) whether the unknown document was also written by A.
The comparison of documents based on their writing style is particularly relevant if
no other metadata are available to clarify the identity of the unknown author.
Authorship verification (AV), which is a branch of digital text forensics, has been dealing with
this question for over two decades. Technically, AV represents a similarity detection
problem, where for an unknown document D U and a known document DA it has to be
determined whether both were written by the same author A. The focus of the similarity
determination in the context of AV is on the writing style and not on other factors such
as the topic or genre. Therefore, if D U and DA share the same topic but were written
by different authors, a naive AV method might erroneously assume a high degree of
similarity, resulting in a clear failure to achieve its intended goal.</p>
      <p>
        A large number of existing AV methods including [
        <xref ref-type="bibr" rid="ref21 ref22 ref25 ref26 ref4 ref6">4,6,21,22,25,26</xref>
        ] make use of
character n-grams (overlapping character sequences), which are known to be closely
associated to particular content words and, therefore, can be problematic when dealing
with authorship [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Style analysis, however, must abstract from content and focus
on content-independent formal properties of linguistic expressions in a text [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In the
light of this conclusion, we propose an alternative approach which, by design, considers
only such text units that reflect valid stylistic markers. Our contribution in this paper is
twofold: First, we propose a number of topic-agnostic feature categories that effectively
quantify the writing style of documents. Second, we propose a transparent AV method
that can be applied to challenging AV tasks. These include cases, where D U and DA
consist of only a few sentences or cases where both differ thematically.
The rest of the paper1 is organized as follows. Section 2 discusses previous work in
the context of AV. In Section 3, we propose a number of feature categories, which
will be used by our AV method introduced in Section 4. Afterwards, we present our
experimental evaluation in Section 5 and, finally, in Section 6 we conclude the work
and provide ideas for future work.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Previous Work</title>
      <p>
        The core of every AV method is a classification model that aims to decide whether a
questioned document D U was written by a certain author A, for which a set DA =
fD1; D2; : : :g of reference documents is given. With regard to their classification
models, we have identified three categories of AV methods in our previous research work
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which are summarized below.
      </p>
      <p>
        The first category are unary AV methods that determine their classification model
solely on the basis of DA. A unary AV method assumes D U to be written by A, if
it is stylistically similar to the documents in DA. The second category are
binaryintrinsic AV methods that determine their classification model on the basis of a given
training corpus. This corpus consists of a number of verification cases with a ground
truth regarding the classes Y (same-author) and N (different-author). A binary-intrinsic
AV method treats the unknown and known documents as a single unit X (for
example, a feature vector). If X is more similar to the Y-cases, the method accepts A as the
author of D U . If, on the other hand, X is more similar to the N-cases, D U is assumed
to be written by another author. In any case, the decision is made solely on the basis
1 Portions of this paper are based on our published work [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. We therefore kindly ask the
interested reader who would like to cite this article to use this reference. Note that a video
presentation of our approach is available at http://bit.ly/TAVeer
of X and the learned model (hence, intrinsic). The third category are binary-extrinsic
AV methods that determine their classification model on the basis of external (so-called
impostor [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]) documents which, for example, are gathered by using a search engine.
In this context, the documents in DA represent samples of the true author A, while the
impostor documents act as samples of an author different than A. Binary-extrinsic AV
methods assume D U to be written by A, if it is stylistically similar to the documents in
DA. However, if D U is more similar to the impostor documents, it is assumed to have
been written by an author other than A.
      </p>
      <p>
        Over the last two decades, numerous AV approaches have been proposed that can be
assigned to one of the above categories. An approach that we refer to as AVIF and that
belongs to the category of unary AV methods was developed by Neal et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] for the
purpose of continuous verification. Their method is based on an isolation forest
classifier, which, like many other AV methods, considers character n-grams as underlying
features. AVIF achieved a high recognition accuracy using very small training
samples of 50 and 100-character blocks. However, in their study the authors explain that
the method was only evaluated on positive samples (in other words, instances of the
Y-class). Therefore, it is not clear how well AVIF performs under realistic conditions
where both cases (Y and N) are present.
      </p>
      <p>
        A well-known binary-intrinsic AV approach, which we denote by the name ProfAV,
was proposed by Potha and Stamatatos [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Their method considers two documents
D U and DA as character n-gram profiles and measures their relative differences using
a predefined dissimilarity function. If the resulting dissimilarity score exceeds a certain
threshold (derived from the distribution of Y/N-samples in a given training corpus), D U
is assumed to be written by A. Potha and Stamatatos [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] demonstrated that ProfAV
was able to outperform every single AV method submitted to the first AV-competition
as a part of the PAN shared tasks [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        One of the most influential and successful binary-extrinsic AV approach is the
Impostors Method (IM) proposed by Koppel and Winter [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], which laid the foundations for
many subsequent AV approaches including [
        <xref ref-type="bibr" rid="ref17 ref18 ref19 ref25 ref28">28,17,18,19,25</xref>
        ]. IM can be broken down
into two steps. First, appropriate impostor documents have to be collected according to
a predefined strategy (for example, using a search engine). In the second step, a
feature selection technique based on character n-grams is applied iteratively to measure
the similarity between pairs of documents. If, given this measure, a suspect is picked
out from among the impostor set with sufficient salience, then the suspect is assumed
to be the author of D U [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. The IM variants of Khonji and Iraqi [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and Seidman
[
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] were the best-performing approaches in the first and second PAN-AV competitions
[
        <xref ref-type="bibr" rid="ref16 ref29">16,29</xref>
        ]. Another strong approach that belongs to the category of binary-extrinsic AV
methods is the so-called NNCD method proposed by Veenman and Li [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. In
contrast to IM, their method delegates the entire feature engineering procedure to a state
of the art compression-algorithm. Here, D U is assumed to be written by A if the
compressed representation of D U is dissimilar to those of the impostor documents. Both
NNCD [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] and GenIM [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] were the best-performing approaches in the first PAN AV
competition [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>Feature Categories</title>
      <p>In this section, we propose a number of feature categories that are used by our AV
approach to capture the writing style of documents. A part of these are derived from
certain feature categories used in previous studies. The remaining feature categories,
however, have been not considered so far in the context of AV, at least to our best
knowledge. All feature categories are summarized in Table 1 along with a number of
examples. In the following subsections, we first introduce all feature categories in detail.
Afterwards, we explain which design decisions we made in regard to their
hyperparameters. Finally, we describe the scope from where all proposed features are extracted and
how we normalized them.</p>
      <p>ID Feature category Sample output Range
F1 3 Punctuation n-grams f(,.)g n 2 f1; 2; 3g
F4 TA sentence and clause starters f(however) ; (, there)g —
F5 TA sentence endings f(this)g —
F6 9 TA token n-grams f(however , there) ; (, there is) ; (there is an) ; (to this .)g n 2 f1; 2; 3; 4g
F10 11 TA masked token n-grams f(is an #) ; (# to this)g n 2 f3; 4g
Table 1. All 11 feature categories considered by TAVeer (feature categories with the TA-prefix
are proposed by us). The third column shows the output for the sample sentence: "However,
there is an opposing view to this." Note that for the n-gram-based feature
categories, each setting of n results in an individual feature category.
3.1</p>
      <sec id="sec-3-1">
        <title>Topic-Agnostic Words and Phrases</title>
        <p>
          Function words can be seen as the most common choice in the field of authorship
analysis, when it comes to select topic-agnostic features. However, in the literature it often
remains unclear what is exactly understood and represented under the term “function
words”. In many existing studies (for example, [
          <xref ref-type="bibr" rid="ref15 ref34 ref7">7,15,34</xref>
          ]) no detailed explanation is
provided regarding the question, which specific function word categories (or at least
which specific words) were taken into account. Another peculiarity that can be seen in
the literature, is the varying number of considered function words. For example,
Chandrasekaran [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], Binongo [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], Srinivasa [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] and Zhao and Zobel [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] make use of 24,
50, 150 and 365 function words, respectively. In view of these different numbers, the
question arises why only individual subsets are considered rather than using the entire
spectrum of function words. Instead of making use of non-structured and incomplete
lists, Varela et al. [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] and Pavelec et al. [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] follow a different approach, in which they
consider subcategories of function words such as pronouns, conjunctions, subclasses of
adverbs and other word forms. By this, a better insight can be gained regarding the
question which specific types of function words were actually taken into account.
Motivated by this idea, we opted for a similar but more systematic approach, in which
we consider all existing categories of function words along with other carefully
selected topic-agnostic (hereafter, abbreviated as TA) categories. First, we assemble a
comprehensive list LTA consisting of words and phrases that belong to these
categories (cf. Table 2). Based on LTA, we then derive different TA feature categories
(described below) that can be used to model the writing style of documents across
different linguistic layers. For the construction of LTA, we use a variety of words and
phrases classified into 20 categories including function words, empty verbs,
contractions, generic adverbs as well as transitional words and phrases. All considered words
and phrases, which are known in the literature [
          <xref ref-type="bibr" rid="ref23 ref3 ref30">23,3,30</xref>
          ] to be content and topic
independent, have been collected from different sources, in particular, linguistic books
and stylometry papers. The transitional phrases cover a number of categories including
causation, contrast, similarity, clarification, conclusion, purpose and summary. With
regard to the verbs, we also take the respective tenses2 into account (for example, give
! fgives, giving, gave, giveng) in order to enrich LTA. All categories of
words and phrases contained in LTA are summarized in Table 2 along with a number of
examples. Note that due to the ambiguities occurring in the English language, a
numCategory
Conjunctions
Determiners
Prepositions
Pronouns
Quantifiers
        </p>
        <p>Examples
fand, as, because, but, either, for, hence, however, if, neither, ... g
fa, an, both, each, either, every, no, other, our, some, ... g
fabove, after, among, below, beside, between, beyond, inside, ... g
fall, another, any, anyone, anything, everything, few, he, her, ... g
fany, certain, each, either, few, less, lots, many, more, most, ... g
Auxiliary verbs fcan, could, might, must, ought, shall, will, ... g
Delexicalised verbs fget, go, take, make, do, have, give, set, ... g
Empty verbs fdo, did, does, got, have, had, had, gives, giving, gave, ... g
Helping verbs fam, is, are, was, were, be, been, will, should, would, could, ... g
Contractions</p>
        <p>
          fi’m, i’d, i’ll, i’ve, he’s, it’s, we’d, she’s, it’ll, we’re, ... g
Adverbs of degree falmost, enough, hardly, just, nearly, quite, simply, so, too, ... g
Adverbs of frequency fagain, always, never, normally, rarely, seldom, sometimes, ... g
Adverbs of place fbelow, everywhere, here, in, inside, into, nowhere, out, ... g
Adverbs of time falready, during, immediately, just, recently, still, then, yet, ... g
Pronominal adverbs fhereafter, hereby, thereafter, thereby, therefore, therein, ... g
Focusing adverbs fespecially, mainly, particularly, generally, only, simply, ... g
Conjunctive adverbs flikewise, meanwhile, moreover, namely, nonetheless, otherwise, ... g
Transition words fbesides, furthermore, generally, hence, thus, however, ... g
Transitional phrases fof course, as a result, in addition, because of, in contrast, ... g
Phrasal prepositions fas opposed to, in regard to, in relation to, inspite of, out of, ... g
ber of function words appear in multiple categories. For example, "but" and "for"
are both prepositions and conjunctions, whereas "few" represents a pronoun and a
quantifier. However, regarding the features in LTA, we do not differentiate between the
different meanings of these homographs3. Based on LTA, we derive additional feature
categories which are described in the following.
2 For this we used pattern [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] available at https://github.com/clips/pattern.
3 Homographs are words with the same spelling but different meaning.
        </p>
        <p>
          Punctuation n-Grams (F1 3) Punctuation marks represent syntactic features that
quantify the grammatical structures an author uses and, thus, are content and topic
independent [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. As punctuation n-grams we define a sequence of consecutive
punctuation marks where letters, digits and other non-punctuation characters are skipped (cf.
Table 1). Among others, punctuation n-grams capture specific symbols that occur at
word-internal level such as hyphens or apostrophes used in contractions (e. g., we’ve
or they’re). Furthermore, they allow to recognize unusual punctuation habits
reflecting the individual writing style of an author such as combinations of question and
exclamation marks (e. g., ?!? or !?!), which occur in informal documents. In total,
we consider three punctuation n-gram feature categories (F1 3) that are not dependent
on the list LTA. However, the feature categories F6 11 make use of F1 (punctuation
unigram).
        </p>
        <p>TA Sentence and Clause Starters (F4) Words or phrases that appear at the beginning
of sentences or clauses can reflect one aspect of an author’s writing style. We therefore
consider such sentences and clause starters as a distinct feature category. However, since
our focus lies on TA-based features, we make sure that a word or phrase appearing at
the beginning of a sentence or a clause is included in LTA. Note that in case of clauses,
we consider the preceding punctuation mark (comma or semicolon) together with the
subsequent word or phrase as a whole feature (cf. Table 1).</p>
        <p>TA Sentence Endings (F5) Words or phrases that appear at the end of sentences might
also reflect a stylistic habit of authors. We therefore consider such features as a distinct
feature category and make sure (analogous to F4) that they are included in LTA.
TA Token n-Grams (F6 9) These feature categories are a form of standard token
ngrams with the restriction that each token ti in a token n-gram (t1; t2; : : : ; tn) represents
either a punctuation or a word appearing in LTA (cf. Table 1). Note that for n = 1, the
respective feature category F6 is essentially the list LTA, which is obtained by merging
all categories listed in Table 2.</p>
        <p>TA Masked Token n-Grams (F10 11) These feature categories also represent a form
of token n-grams with the restriction that n 1 tokens in a token n-gram (t1; t2; : : : ; tn)
are either punctuation marks or words appearing in LTA. The remaining n 2 tokens,
on the other hand, represent topic-related words, which are then masked by the
nonpunctuation character #. The intention here is to enable the detection of contexts
surrounding or adjacent to topic-agnostic words (cf. Table 1).
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Feature Category Ranges</title>
        <p>
          In previous AV works (e. g., [
          <xref ref-type="bibr" rid="ref14 ref24 ref4">24,14,4</xref>
          ]) n-gram-based feature categories have been
treated as a single concept, where the most suitable n was chosen on the basis of a
hyperparameter optimization procedure. In contrast to this, we treat n-gram-based feature
categories independently so that, for example, punctuation 2- and 3-grams represent
two individual feature categories. There is a simple justification for this decision: If we
would restrict ourselves to a specific n optimized on a training corpus, we might miss
important features occurring in the unseen data (test corpus) that can only be captured
with an alternative setting of n. Allowing multiple settings of n for the same feature
category can therefore help counteract a mismatch of existing features between training
and test data.
        </p>
        <p>In the following, we explain the considerations behind the ranges of the n-gram-based
feature categories listed in Table 1. For the punctuation n-grams, we set n = 1 as a
lower limit which is useful in cases where sentences comprise only a single punctuation
(e. g., full-stop, question or exclamation mark). As an upper limit, we set n = 3, as it
can be expected that longer punctuation sequences between the unknown and known
documents will be scarce (more on this in the next subsection). Regarding TA token
n-grams, we set n = 1 and n = 4 as a lower and upper limit, respectively. For the
former, we aim to capture at least single words in the documents. Here, we expect that
a part of these features will be present in both documents, in most of the cases. With
regard to longer sequences, we aim to capture specific phrases that can be relevant
for individual authors. However, sequences with more than four tokens are less likely
to appear, especially between short documents so that n = 4 can be seen as a good
compromise. For the TA masked token n-grams, we set n = 3 as a lower limit, as
one of our intentions is to capture (masked) topic words surrounded by topic-agnostic
words, so that n = 3 is a minimum limit. As an upper limit, we set n = 4 for the same
reason mentioned for TA token n-grams.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Scope of Feature Extraction</title>
        <p>In existing AV studies it is often not mentioned which scope is considered to extract
n-gram-based features. Here, the scope might be the entire text, paragraphs, sentences,
clauses, phrases or tokens. Depending on the considered scope, the dimension of the
generated feature space may vary which, in turn, may affect the verification results.
For example, extracting token n-grams from single sentences would result in a smaller
number of features, in contrast to the extraction from the whole text. This is because
token n-grams cross sentence boundaries, so that respective cross-sentence features are
not taken into account. Despite the smaller number of available features, we have
decided, with regard to our AV approach, to extract all n-gram-based features exclusively
from the sentence-level of the documents. The reason for this is that in practice short
text fragments (e. g., social media posts or email text bodies) are often concatenated to
obtain a sufficient document length, so that one sentence might not always have a
connection to a subsequent sentence. Hence, if we extract n-gram-based features from the
entire text, we would erroneously create artificial cross-sentence features that may not
occur in texts of a particular author. Note that for feature extraction, we only consider
lower case in order to capture all possible case variants (for example, "The", "the"
or "THE"), which can occur especially in informal texts.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Verification Method</title>
      <p>
        In this section, we present our AV approach TAVeer4, which is inspired by the
methodology of biometric recognition systems. These aim to recognize individuals, based on a
variety of physiological characteristics and behavioral features obtained from the hand,
vein, fingerprint, face, eye, ear or voice. Here, the "Equal Error Rate" (EER)
represents a statistic used to show biometric performance in the context of a verification
task. Essentially, EER corresponds to a point on a ROC curve where the false
acceptance rate is equal to the false rejection rate. Given a questioned document D U and a
document DA from a known author A, the goal of our method is to determine whether
D U was also written by A. To achieve this goal, TAVeer employs an ensemble of m
distance-based classifiers, where each one aims to accept or reject the questioned
authorship of D U . Each classifier is provided with a category of stylistic features extracted
from an individual linguistic layer (in each document). In this context, EER serves as
a thresholding mechanism, where erroneous verification predictions in either direction
are treated equally. This is different from other AV methods as, for example, the
approach of Bevendorff et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] that heavily prioritize precision over recall.
TAVeer can essentially be divided into the two phases training and inference. In the
training stage, a model M has to be “learned” on the basis of a given training corpus
C = (c1; c2; : : : ; cn). Here, each c denotes a verification case, for which the ground truth
(Y or N) is known. In the inference stage, the generated model M is applied to an unseen
verification case in order to accept or reject the questioned authorship. In the following
we first describe the preliminaries for TAVeer and then the two phases.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Preliminaries</title>
        <p>Before describing our approach in detail, we first explain what exactly is considered as
an input, how this input is represented and on which basic functionality it depends in
order to measure the (de)similarity between the documents.</p>
        <p>
          Document Input TAVeer follows the profile-based paradigm that, to our best
knowledge, was first described by Potha and Stamatatos [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] in the context of AV. In case that
for a known author A a set of reference documents DA = fD1; D2; : : :g is provided,
the idea behind the profile-based approach is to concatenate all documents in DA into
a single document DA. Thus, a verification case c is transformed from (D U ; DA) to
(D U ; DA), which represents the document input for TAVeer.
        </p>
        <p>Document Representation As a document representation technique, we consider a
bag-of-features model, in which all involved features are treated independently from
each other. Let F = fF1; F2; : : : Fmg be the m proposed feature categories (cf.
Table 1). We define a function f : D D F ! Sk2N Rk Rk, which transforms
D U and DA according to a given feature category F to two real valued vectors, where
k denotes the dimension of the feature space spanned by F . Consider for example F1
4 TAVeer stands for "Topic-agnostic Authorship Verifier based on equal error rate".
as a feature category, which describes a set of punctuation marks f"-", ";", "?",
...g. Applying f to D U and DA yields all punctuation marks, that exist in at least
one of the documents and adds them to a list V = (v1; v2; : : : ; vk). Then, two vectors
X = (x1; x2; : : : ; xk) and Y = (y1; y2; : : : ; yk) are created, where each xj and yj
represents the absolute frequency of the corresponding punctuation mark vj 2 V in
each document, respectively. As a final step, we normalize each vector by its
Manhattan norm k k1, so that all contained features are scaled into the (real) interval [0; 1] and
sum up to one. This procedure holds for all m feature categories.</p>
        <p>Distance Function To measure the (dis)similarity between two generated feature
vectors X and Y , we use a distance function dist(X; Y ). For this, we have chosen the
well-known Manhattan metric, defined by:
dist(X; Y ) = kX</p>
        <p>k
Y k1 = X
r=1
jXr</p>
        <p>
          Yrj
(1)
which has been used in a number of previous stylometry studies (for example, [
          <xref ref-type="bibr" rid="ref1 ref5">1,5</xref>
          ]).
The Manhattan metric benefits from its simplicity and also from the fact that it allows
easy interpretation5 of which specific features have contributed to the prediction.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Model Learning</title>
        <p>Given the training corpus C and the set of the m feature categories F = fF1; F2; : : : Fmg,
the objective of this step is to construct a model M, which represents the optimal
combination of feature categories obtained on C. In the following, we describe the necessary
sub-steps to create M.</p>
        <p>
          Computing Thresholds In this sub-step, we have to compute the individual thresholds
= ( F1 ; F2 ; : : : ; Fm ) for the m feature categories. Using Equation 1, we calculate
for each verification case cj = (DA;j ; D U;j ) 2 C and each feature category Fi the
respective distance di;j = dist(f (DA;j ; D U;j ; Fi)). As a thresholding technique, we
select the equal error rate (EER), which describes the point, where the false positives
rate is equal to the false negatives rate. Since all corpora used in our experimental setting
are balanced, a threshold, which will result in an EER, can be obtained by calculating
the median of the distances over all cases in the corpus. Consequently, for all m feature
categories, we obtain the corresponding thresholds as follows:
= ( F1 ; F2 ; : : : ; Fm ), with Fi = median(di;1; di;2; : : : ; di;n)
(2)
Note that in case where an exact EER is not feasible (for example, when multiple
distance values are equal) the median provides the closest approximation of the EER.
5 We refer the interested reader to our extended version of this paper [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], in which we explain
in detail how the interpretation can be performed.
        </p>
        <p>Similarity Function The introduced distance function (cf. Equation 1) allows us to
compute distances between a pair of two feature vectors X and Y . However, the
resulting distances are not calibrated with respect to the individual thresholds from the
previous sub-step. Therefore, we designed a similarity function sim( ) that considers as
an input a distance d, a threshold F and the upper bound dmax of the provided distance
function (in our case, the Manhattan metric). Recall that in the context of our approach,
all feature vectors are normalized using the Manhattan norm k k1. Consequently, all
features in each vector sum up to 1. Based on this fact, the lower and upper bound of
dist(X; Y ) can be calculated by
0
kX</p>
        <p>Y k1</p>
        <p>kXk1 + kY k1 = 2
such that dmax = 2 holds. An important requirement regarding our similarity function is
that the resulting score s is calibrated in a way that 0.5 represents the decision boundary.
One possible definition for a function sim( ) that transforms a distance d into the range
[0; 1] and simultaneously calibrates the resulting similarity score s with respect to this
“natural” decision boundary is:
sim(d; dmax; F ) =
easily substitute the Manhattan metric with any other distance function, as long as its
respective upper bound dmax is known. Furthermore, it should be noticed that any other
definition for sim( ) that also fulfills the same requirement can be used instead.
Classification Function The similarity function sim( ) from the previous sub-step can
already calculate a calibrated similarity value for a given distance d and a threshold for
a single feature category. However, the idea behind TAVeer is to determine whether a
questioned authorship between two documents holds based on multiple feature
categories. Let F = f(Fi; Fi )ji 2 f1; 2; : : : ; mgg denote a set, which comprises pairs
of feature categories and their associated thresholds and P(F ) the power set
(without the empty set) holding all possible combinations of these pairs. We denote a single
E 2 P(F ) by the term ensemble. Furthermore, we denote an ensemble comprising a
single pair f(F; F )g E as an atomic ensemble. To compute a similarity value with
respect to E , we define an aggregated similarity function simE ( ) as follows:
simE (D U ; DA; dmax; E ) = median (S) , with</p>
        <p>S = fsim(dist(f (D U ; DA; F )); dmax; F )j(F; F ) 2 E g
To obtain a binary prediction (Y/N) for a single verification case c based on simE ( ), we
further define a classification function:
clf(D U ; DA; dmax; E ) =
(Y; if simE (D U ; DA; dmax; E ) &gt; 0:5</p>
        <sec id="sec-4-2-1">
          <title>N; otherwise</title>
          <p>(4)
(5)
Selecting Optimal Ensemble In this last sub-step, the goal is to determine the optimal
ensemble on the basis of the training corpus C, which will serve as the model M for
the inference stage. To achieve this goal, we use Equation 5 to classify all verification
cases c1; c2; : : : ; cn in C for each possible ensemble E 2 P(F ). As a result, we obtain
jP(F )j predictions for each ci. Based on the predictions and the ground truth provided
for C, we can now calculate the accuracies for each ensemble to find the optimal one
that will represent M. One way to obtain an optimal ensemble would be to select the
one that leads to a maximum accuracy on C. In practice, however, this approach is
not always reasonable as several ensembles can share the maximum accuracy. For this
reason, we decided to consider additional criteria to obtain an optimal ensemble. Based
on the power set P(F ), we sort all the resulting ensembles one by one according to
the following three criteria (each in descending order):
1. Accuracy of an ensemble E (calculated for C)
2. Number of feature categories an ensemble E contains
3. Median accuracy regarding all atomic ensembles in E (calculated for C)
From here, it is unlikely that multiple ensembles share the same ranking regarding these
criteria. Finally, we select the first ensemble from the sorted list, which will serve as the
final model M.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3 Inference</title>
        <p>In contrast to the training phase, the inference phase is much more compact. Here,
TAVeer consumes the resulting model M from the training phase and performs the
following steps to classify an unseen verification case c? = (D U ; DA). Using Equation 4,
TAVeer first computes the similarity value s? between the unknown and known
documents D U and DA. Afterwards, a binary prediction regarding the questioned authorship
of D U is obtained by comparing s? against the decision boundary 0.5 (cf. Equation 5).
In case that s? &gt; 0:5 holds, c? is classified as Y (D U and DA are assumed to be written
by the same author), otherwise as N (both documents are probably written by different
authors).
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental Evaluation</title>
      <p>This section gives a brief description of our experimental evaluation. At the time we
developed TAVeer, we had no access to the official test corpus and the respective ground
truth of the underlying verification cases. To train and evaluate our approach, we
therefore have split the official training6 data set provided by the PAN organizers into a
training and validation corpus. In the following, we first explain how the initial training
data set was partitioned, summarize its key statistics and mention several relevant
observations we have made in regard to the verification cases in the corpus. Afterwards,
we describe which alternative performance measure we have chosen to evaluate TAVeer
on the validation corpus. Finally, we present the results on this corpus as well as on the
official test corpus.
5.1</p>
      <sec id="sec-5-1">
        <title>Corpora</title>
        <p>From the given official training data set, we reserved a fraction of 5,000 cases to train7
and 47,590 cases to evaluate TAVeer. Table 3 summarizes the statistics of both
partitions. During our examination of the documents within the corpus, we made some
observations worth mentioning. In a number of verification cases, the known and unknown
are written in different languages. For example, within the verification cases:
2225c14b-e691-5c6b-833f-0eea70a8be9c
a5bf996f-0fd1-57c0-9953-5c99155e4a47
831efc2b-edab-56a6-8a38-a8b18273363f
one document is written in English while the other is written in Spanish, Swedish and
French, respectively. Within the case:
both the unknown and known document are identical and in the case:
one document contains a valid natural language text, while the other one contains
almost entirely repetitions of the same word. Besides these manual inspected verification
cases, we further performed an automated analysis with regard to all documents
contained in the training and validation corpora. Here, we noticed that a large fraction of
the documents contain an excessive number of quotes. While trying to remove these
6 Note that we used the "small" version of the official training corpus.
7 Note that the submitted version of our approach was only trained on this partition. In other
words, we have not retrained TAVeer on the entire training data set.
quotes, we found that they made up about half of the texts and also, that apostrophes
and quotation marks have been normalized by the same character ", which further
complicated to remove the quotes. In view of these observations, we have left the documents
in their original form, so that no cleaning has been carried out at all (this also applies to
the test corpus).</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Corpus C</title>
      <p>CPAN (train)
CPAN (validation)</p>
      <p>
        jCj Distribution (Y/ N) jDAj avgjDAj avgjD U j
To assess the performance of our approach on the validation corpus, we have selected
balanced accuracy (BAC) as an alternative performance measure. Despite its robustness
and suitability especially for imbalanced corpora, BAC has not yet been considered in
the field of AV, to the best of our knowledge. In contrast to F1 and the newly proposed
measure F0:5u [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], BAC considers all four confusion matrix outcomes: true positives
(TP), false negatives (FN), false positives (FP) and true negatives (TN). This is
preferable8 in realistic forensic cases where two opposing goals are faced:
1. verify that an alleged authorship is indeed correct, or
      </p>
      <sec id="sec-6-1">
        <title>2. falsify an alleged authorship correctly</title>
        <p>
          so that both TP and TN can be measured reliably at the same time. BAC is defined
by the arithmetic mean of sensitivity = true positive rate (TPR) and specificity = false
positive rate (FPR):
When dealing with imbalanced corpora, we have observed that BAC is easier to
interpret than other recommended measures such as Cohen’s [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. For balanced corpora,
on the other hand, BAC offers another benefit making it a reliable performance measure
as it is equal to ordinary accuracy. If we consider an AV method that (due to a weak
calibration) predicts nothing but Y (same-author) or N (different-author), the resulting
BAC value will always be 0.5. When using F1, which behaves asymmetric regarding
one-sided Y- and N-predictions, the resulting score is either 23 0:66 or 0, respectively.
8 This is at least true for the real-world forensic cases we have worked on in our research
department at Fraunhofer SIT.
        </p>
        <p>In addition to BAC, we also report the confusion matrix outcomes to allow a better
comparability regarding the results made by TAVeer, as well as the four measures (AUC,
c@1, F0:5u and F1) considered by the PAN organizers for the official evaluation
results.
After training TAVeer on the partition with the 5,000 verification cases, we applied the
trained model to the imbalanced validation corpus comprising 47,590 cases. The results
for the validation corpus are shown in Table 4, while the official results9 with regard to
the test corpus are listed in Table 5. A comparison of the results of the validation and
the test corpora shows that TAVeer can generalize well, where only minimal losses can
be observed for the test corpus. However, since the PAN organizers did not report the
four confusion matrix outcomes for the test corpus, we cannot infer from the single
number metrics more fine-grained information regarding the individual classification
predictions of TAVeer or the other submitted AV approaches. Furthermore, we cannot
provide any analysis regarding the test corpus, since at the time this paper was written
we have no access to it.</p>
        <p>
          Since we cannot make any statement regarding the test corpus, we have decided to use
two self-compiled corpora CReddit and CAmazon in order to get a better understanding
regarding the cross-domain capability of TAVeer. CReddit contains (partially very
colloquial) comments from the well-known Reddit platform, while CAmazon contains product
reviews from the Amazon platform. Both corpora, which differ in topic and genre, are
described in detail in our paper [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] along with their respective corpus statistics. In what
follows, we therefore only focus on the cross-domain experiment. The question we are
seeking to answer is to what extent a model MX learned on a training corpus from a
domain X can be applied to a test corpus from a domain Y (with X 6= Y) and vice
versa.
        </p>
        <p>Using the procedure described in Section 4.2, we first learn the two models MReddit
and MAmazon (cf. Table 7) on the training partitions of the corpora CReddit and CAmazon.
Based on MReddit, we then apply TAVeer to the test partition of CAmazon. Afterwards,
we apply MAmazon to the test partition of CReddit. The results are shown in Table 6. If
we focus on the performance deviations between the models applied to the original and
cross-domain corpora, we can see in this table a slight loss of -0.005 in terms of BAC
9 The results have been taken from the PAN website https://pan.webis.de/clef20/
pan20-web/author-identification.html
and a small gain of +0.002 in terms of AUC for the CReddit test partition. Similarly,
for the test partition of CAmazon, we can observe a slightly greater loss of -0.032 and
-0.012 in terms of BAC and AUC, respectively.</p>
        <p>The reason for the small deviations can be explained by the fact that the majority of
the feature categories (more precisely, F1; F2; F4; F6 and F11) are present in both
models MReddit and MAmazon, as can be seen in Table 7. Furthermore, their respective
thresholds are very similar to each other. Consequently, both models are
interchangeable without major performance losses, so that (at least on these corpora) TAVeer can
be considered robust with respect to the different domains.</p>
        <sec id="sec-6-1-1">
          <title>Corpus Model BAC AUC c@1 F0:5u F1 TP FN FP TN</title>
          <p>CReddit (test) MReddit 0.806 0.861 0.806 0.821 0.796 455 145 88 512
CReddit (test) MAmazon 0.801 0.863 0.801 0.810 0.794 462 138 101 499
CAmazon (test) MAmazon 0.842 0.912 0.842 0.851 0.838 982 218 161 1039
CAmazon (test) MReddit 0.810 0.900 0.810 0.813 0.808 959 241 216 984
Table 6. Cross-domain evaluation results for our two self-compiled corpora CReddit and CAmazon.
Note that since both corpora are balanced, the BAC and c@1 values are equal.
6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion and Future Work</title>
      <p>We have presented a simple but effective distance-based authorship verification (AV)
approach called TAVeer to the AV 2020 shared task of the PAN competition, where
the task was to determine for a pair of documents if both texts were written by the
Corpus (F1; F1 ) (F2; F2 ) (F4; F4 ) (F6; F6 ) (F7; F7 ) (F8; F8 ) (F9; F9 ) (F10; F10 ) (F11; F11 )
CReddit (F1; 0:343) (F2; 0:757) (F4; 1:181) (F6; 0:641) (F8; 1:956) (F9; 1:996) (F10; 1:671) (F11; 1:869)
CAmazon (F1; 0:349) (F2; 0:801) (F4; 1:108) (F6; 0:680) (F7; 1:622) (F11; 1:862)
Table 7. Model analysis: Each row (starting with column two) represents a model M learned on
the respective training partition. Recall that Fi represents a feature category and Fi its
corresponding threshold.
same author. Our approach, which we call TAVeer, relies solely on topic-agnostic
feature categories based on punctuation marks, function words, contractions, transitional
phrases as well as several subclasses of verbs and adverbs. By this, the method differs
from many existing approaches that rely on implicitly defined feature categories such
as character n-grams. Using such feature categories, in particular, in the context of AV
is problematic, as one has no control over the features that are indeed captured. In the
worst case, the prediction of an AV method may be based on topic-related words rather
than on stylistic features, so that the method will miss its true purpose. The core of
TAVeer is a distance function (Manhattan metric), which in combination with a
thresholding procedure (based on equal error rate) acts as the underlying classifier.
To assess our approach, we have split the training data set into a training and validation
set, where for the former only 5,000 verification cases were used (in other words, less
than 10% of the entire data set). This model was submitted for the final evaluation on the
official test set. From the official evaluation results and those obtained on our validation
corpus, it can be concluded that TAVeer is able to generalize well across both corpora,
with minimal losses on the test corpus. Besides the official train and test corpora, we
have further performed a cross-domain experiment regarding two self-compiled
corpora. In this context, we have demonstrated that TAVeer performs robustly even though
the trained models and the test corpora come from two different domains.
Nevertheless, our AV method leaves room for further improvements. Currently, TAVeer
does not take into account misspelled words, which can lead to a loss of potentially
relevant features, especially in connection with informal texts. We therefore leave for future
work the investigation of effective possibilities to semantically match misspelled words
with respect to their common entity. One idea, for example, is to use back-translation
services that can handle difficult spelling mistakes, which cannot be corrected by
standard spell checkers. Another direction for future work is to investigate alternative
feature categories not yet been considered in this paper. In this context, one idea is to
experiment with interjections (e. g., "lol" or "aha") or topic-agnostic abbreviations
(for example, "e.g." or "etc."), which represent important idiosyncratic stylistic
markers.
7</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This research work has been funded by the German Federal Ministry of Education and
Research and the Hessen State Ministry for Higher Education, Research and the Arts
within their joint support of the National Research Center for Applied Cybersecurity
ATHENE.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ahmed</surname>
          </string-name>
          , H.:
          <article-title>The Role of Linguistic Feature Categories in Authorship Verification</article-title>
          .
          <source>Procedia Computer Science</source>
          <volume>142</volume>
          ,
          <fpage>214</fpage>
          -
          <lpage>221</lpage>
          (
          <year>2018</year>
          ), arabic Computational Linguistics
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bevendorff</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Generalizing Unmasking for Short Texts</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <fpage>654</fpage>
          -
          <lpage>659</lpage>
          . Association for Computational Linguistics, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Binongo</surname>
            ,
            <given-names>J.N.G.</given-names>
          </string-name>
          :
          <article-title>Who Wrote the 15th Book of Oz? An Application of Multivariate Analysis to Authorship Attribution</article-title>
          .
          <source>CHANCE</source>
          <volume>16</volume>
          (
          <issue>2</issue>
          ),
          <fpage>9</fpage>
          -
          <lpage>17</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Brocardo</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Traore</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woungang</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Authorship Verification for Short Messages Using Stylometry</article-title>
          . In: 2013 International Conference on Computer,
          <source>Information and Telecommunication Systems (CITS)</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          (May
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Burrows</surname>
          </string-name>
          , J.:
          <article-title>Delta: a Measure of Stylistic Difference and a Guide to Likely Authorship</article-title>
          .
          <source>Literary and Linguistic Computing</source>
          <volume>17</volume>
          (
          <issue>3</issue>
          ),
          <fpage>267</fpage>
          -
          <lpage>287</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Castro</surname>
            <given-names>Castro</given-names>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Adame</given-names>
            <surname>Arcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Pelaez</surname>
          </string-name>
          <string-name>
            <surname>Brioso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Muñoz</surname>
          </string-name>
          <string-name>
            <surname>Guillena</surname>
          </string-name>
          , R.: Authorship Verification,
          <article-title>Average Similarity Analysis</article-title>
          .
          <source>In: Proceedings of the International Conference Recent Advances in Natural Language Processing</source>
          . pp.
          <fpage>84</fpage>
          -
          <lpage>90</lpage>
          . INCOMA Ltd. Shoumen,
          <string-name>
            <surname>BULGARIA</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Chandrasekaran</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manimannan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Use of Generalized Regression Neural Network in Authorship Attribution</article-title>
          .
          <source>International Journal of Computer Applications</source>
          <volume>62</volume>
          (
          <issue>4</issue>
          ),
          <fpage>7</fpage>
          -
          <lpage>10</lpage>
          (
          <year>January 2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. De Smedt,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Daelemans</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          :
          <article-title>Pattern for Python</article-title>
          .
          <source>J. Mach. Learn. Res</source>
          .
          <volume>13</volume>
          (
          <issue>1</issue>
          ),
          <fpage>2063</fpage>
          -
          <lpage>2067</lpage>
          (
          <year>Jun 2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gamon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Linguistic Correlates of Style: Authorship Classification with Deep Linguistic Analysis Features</article-title>
          .
          <source>In: Proceedings of Coling</source>
          <year>2004</year>
          . pp.
          <fpage>611</fpage>
          -
          <lpage>617</lpage>
          . International Conference on Computational Linguistics (
          <year>August 2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Halvani</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graner</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Rethinking the Evaluation Methodology of Authorship Verification Methods</article-title>
          . In: Bellot,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Trabelsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Murtagh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.Y.</given-names>
            ,
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>SanJuan</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <string-name>
            <surname>Experimental IR Meets Multilinguality</surname>
          </string-name>
          , Multimodality, and Interaction. pp.
          <fpage>40</fpage>
          -
          <lpage>51</lpage>
          . Springer International Publishing (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Halvani</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graner</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Regev</surname>
            , R.:
            <given-names>A Step</given-names>
          </string-name>
          <string-name>
            <surname>Towards Interpretable Authorship</surname>
          </string-name>
          <article-title>Verification</article-title>
          . CoRR abs/
          <year>2006</year>
          .12418 (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Halvani</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graner</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Regev</surname>
          </string-name>
          , R.:
          <article-title>TAVeer: An Interpretable Topic-Agnostic Authorship Verification Method</article-title>
          . In: Volkamer,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Wressnegger</surname>
          </string-name>
          , C. (eds.)
          <source>ARES 2020: The 15th International Conference on Availability, Reliability and Security</source>
          , Virtual Event, Ireland,
          <source>August 25-28</source>
          ,
          <year>2020</year>
          . pp.
          <volume>41</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>41</lpage>
          :
          <fpage>10</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Halvani</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graner</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Assessing the Applicability of Authorship Verification Methods</article-title>
          .
          <source>In: Proceedings of the 14th International Conference on Availability, Reliability and Security</source>
          ,
          <string-name>
            <surname>ARES</surname>
          </string-name>
          <year>2019</year>
          ,
          <article-title>Canterbury</article-title>
          ,
          <string-name>
            <surname>UK</surname>
          </string-name>
          ,
          <year>August</year>
          26-
          <issue>29</issue>
          ,
          <year>2019</year>
          . pp.
          <volume>38</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          :
          <fpage>10</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Jankowska</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Milios</surname>
            ,
            <given-names>E.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keselj</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Author Verification Using Common N-Gram Profiles of Text Documents</article-title>
          . In: Hajic,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Tsujii</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.)
          <source>COLING</source>
          <year>2014</year>
          , 25th International Conference on Computational Linguistics,
          <source>Proceedings of the Conference: Technical Papers, August 23-29</source>
          ,
          <year>2014</year>
          , Dublin, Ireland. pp.
          <fpage>387</fpage>
          -
          <lpage>397</lpage>
          . ACL (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Juola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noecker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stolerman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ryan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brennan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greenstadt</surname>
          </string-name>
          , R.:
          <article-title>Towards Active Linguistic Authentication</article-title>
          . In: Peterson,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Shenoi</surname>
          </string-name>
          , S. (eds.) Advances in Digital Forensics IX. pp.
          <fpage>385</fpage>
          -
          <lpage>398</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Juola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
          </string-name>
          , E.:
          <article-title>Overview of the Author Identification Task at PAN 2013</article-title>
          . In: Working Notes for CLEF 2013 Conference, Valencia, Spain,
          <source>September 23-26</source>
          ,
          <year>2013</year>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Khonji</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iraqi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>A Slightly-Modified GI-Based Author-Verifier with Lots of Features (ASGALF)</article-title>
          .
          <source>In: Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15-18</source>
          ,
          <year>2014</year>
          . pp.
          <fpage>977</fpage>
          -
          <lpage>983</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Kocher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savoy</surname>
          </string-name>
          , J.:
          <article-title>Unine at CLEF 2015 author identification: Notebook for PAN at CLEF 2015</article-title>
          .
          <article-title>In: CLEF (Working Notes)</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>1391</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Kocher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savoy</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A Simple and Efficient Algorithm for Authorship Verification</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>68</volume>
          (
          <issue>1</issue>
          ),
          <fpage>259</fpage>
          -
          <lpage>269</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Argamon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Computational Methods in Authorship Attribution</article-title>
          .
          <source>JASIST</source>
          <volume>60</volume>
          (
          <issue>1</issue>
          ),
          <fpage>9</fpage>
          -
          <lpage>26</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Determining if Two Documents are Written by the Same Author</article-title>
          .
          <source>JASIST</source>
          <volume>65</volume>
          (
          <issue>1</issue>
          ),
          <fpage>178</fpage>
          -
          <lpage>187</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Neal</surname>
            ,
            <given-names>T.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sundararajan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woodard</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          :
          <article-title>Exploiting Linguistic Style as a Cognitive Biometric for Continuous Verification</article-title>
          . In: 2018 International Conference on Biometrics,
          <source>ICB</source>
          <year>2018</year>
          , Gold Coast, Australia,
          <source>February 20-23</source>
          ,
          <year>2018</year>
          . pp.
          <fpage>270</fpage>
          -
          <lpage>276</lpage>
          . IEEE (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Pavelec</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliveira</surname>
            ,
            <given-names>L.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Justino</surname>
            ,
            <given-names>E.J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batista</surname>
            ,
            <given-names>L.V.</given-names>
          </string-name>
          :
          <article-title>Using conjunctions and adverbs for author verification</article-title>
          .
          <source>J. UCS</source>
          <volume>14</volume>
          (
          <issue>18</issue>
          ),
          <fpage>2967</fpage>
          -
          <lpage>2981</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Potha</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
          </string-name>
          , E.:
          <article-title>A Profile-Based Method for Authorship Verification</article-title>
          .
          <source>In: Artificial Intelligence: Methods and Applications: 8th Hellenic Conference on AI, SETN</source>
          <year>2014</year>
          , Ioannina, Greece, May
          <volume>15</volume>
          -17,
          <year>2014</year>
          . Proceedings. pp.
          <fpage>313</fpage>
          -
          <lpage>326</lpage>
          . Springer International Publishing (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Potha</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>An Improved Impostors Method for Authorship Verification</article-title>
          . In: Jones,
          <string-name>
            <given-names>G.J.F.</given-names>
            ,
            <surname>Lawless</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <string-name>
            <surname>Experimental IR Meets Multilinguality</surname>
          </string-name>
          , Multimodality, and Interaction - 8th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2017</year>
          , Dublin, Ireland,
          <source>September 11-14</source>
          ,
          <year>2017</year>
          ,
          <source>Proceedings. Lecture Notes in Computer Science</source>
          , vol.
          <volume>10456</volume>
          , pp.
          <fpage>138</fpage>
          -
          <lpage>144</lpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Potha</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
          </string-name>
          , E.:
          <article-title>Improved Algorithms for Extrinsic Author Verification</article-title>
          .
          <source>Knowledge and Information Systems (Oct</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>O.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raju</surname>
            ,
            <given-names>N.V.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          :
          <article-title>Authorship attribution on imbalanced english editorial corpora</article-title>
          .
          <source>International Journal of Computer Applications</source>
          <volume>169</volume>
          (
          <issue>1</issue>
          ),
          <fpage>44</fpage>
          -
          <lpage>47</lpage>
          (
          <year>Jul 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Seidman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Authorship Verification Using the Impostors Method Notebook for PAN at CLEF 2013</article-title>
          . In: Working Notes for CLEF 2013 Conference , Valencia, Spain,
          <source>September 23-26</source>
          ,
          <year>2013</year>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhoeven</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Juola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sánchez-Pérez</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrón-Cedeño</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Overview of the Author Identification Task at PAN 2014</article-title>
          . In: Working Notes for CLEF 2014 Conference, Sheffield, UK,
          <source>September 15-18</source>
          ,
          <year>2014</year>
          . pp.
          <fpage>877</fpage>
          -
          <lpage>897</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Stolerman</surname>
            ,
            <given-names>A.: Authorship</given-names>
          </string-name>
          <string-name>
            <surname>Verification</surname>
          </string-name>
          .
          <source>Ph.D. thesis</source>
          (
          <year>2015</year>
          ), uMI Dissertations Publishing 2015
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Varela</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Justino</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Soares de Oliveira, L.:
          <article-title>Verbs and Pronouns for Authorship Attribution (01</article-title>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Veenman</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Authorship Verification with Compression Features</article-title>
          . In: Working Notes for CLEF 2013 Conference , Valencia, Spain,
          <source>September 23-26</source>
          ,
          <year>2013</year>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zobel</surname>
          </string-name>
          , J.:
          <article-title>Effective and Scalable Authorship Attribution Using Function Words</article-title>
          . In: Lee,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Yamada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Myaeng</surname>
          </string-name>
          , S. (eds.)
          <source>Information Retrieval Technology. Lecture Notes in Computer Science</source>
          , vol.
          <volume>3689</volume>
          , pp.
          <fpage>174</fpage>
          -
          <lpage>189</lpage>
          . Springer Berlin Heidelberg (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zobel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vines</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Using relative entropy for authorship attribution</article-title>
          . In: Ng,
          <string-name>
            <given-names>H.T.</given-names>
            ,
            <surname>Leong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.K.</given-names>
            ,
            <surname>Kan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.Y.</given-names>
            ,
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <surname>D</surname>
          </string-name>
          . (eds.) Information Retrieval Technology. pp.
          <fpage>92</fpage>
          -
          <lpage>105</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>