<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Will Longformers PAN Out for Authorship Verification?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juanita Ordoñez?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Rivera Soto?</string-name>
          <email>riverasoto1@llnl.gov</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barry Y. Chen</string-name>
          <email>chen52@llnl.gov</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lawrence Livermore National Laboratory</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>Authorship verification, the task of identifying if two text excerpts are from the same author, is an important part of evaluating the veracity and authenticity of writings and is one of the challenges for this year's PAN @ CLEF 2020 event. In this paper, we describe our PAN authorship verification submission system, a neural network that learns useful features for authorship verification from fanfiction texts and their corresponding fandoms. Our system uses the Longformer, a variant of state-of-the-art transformer models, that is pre-trained on large amounts of text. This model combines global self-attention and local self-attention to enable efficient processing of long text inputs (like the fanfiction data used for PAN @ CLEF 2020), and we augment the pre-trained Longformer model with additional fully-connected layers and fine-tune it to learn features that are useful for author verification. Finally, our model incorporates fandom information via the use of a multi-task loss function that optimizes for both authorship verification and topic correspondence, allowing it to learn useful fandom features for author verification indirectly. On a held-out subset of the PAN-provided “large training” set, our Longformer-based system attained a 0.963 overall verification score, outperforming the PAN text compression baseline by 32.8% relative. However, on the official PAN test set, our system attained a 0.685 overall score, underperforming the PAN text compression baseline by 7.6% relative.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        As more of us rely on online sources for our news and information, it becomes
increasingly important to vet their veracity and authenticity. One key component of this
vetting process is identifying the authorship of the information. Knowing the author of
the information can help us better ascertain its trustworthiness which is critical in light
of the growing amount of online misinformation propagated by so-called trolls, bots,
and other online agitators. There has been a lot of prior work on computational and
statistical methods for determining the authorship of text writings based on writing style:
word choice, punctuation usage, idiosyncratic grammatical errors, and in more recent
digital texts the use of emoticons. One vibrant community in which these
computational approaches to authorship identification are developed and evaluated is PAN [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        PAN hosts scientific evaluations for digital text forensics and stylometry, and one of
this year’s tasks in PAN @ CLEF 2020 is that of authorship verification [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], where the
goal is to determine whether two separate text excerpts come from the same author or
not. This notebook paper describes our team’s final submission system and some of the
experiments and alternative systems that we developed for the authorship verification
challenge.
      </p>
      <p>
        PAN @ CLEF 2020 builds off of earlier evaluations in Authorship Identification
tasks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] that used fanfiction [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] as the source material for the challenge. Milli, et
al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] describes fanfiction as “fan-created fiction based on a previously existing,
original work of literature.” Fanfiction is derived from the original work but extends or
changes certain aspects of the fiction like providing more development of minor
characters, adding additional characters, modifying the relationships between characters,
altering endings, etc. While fans may strive to write in the style of the original work’s
author, it is interesting to see if subtle writing style differences still make it possible
to distinguish between the original and fan authors as well as between different fan
authors. Our goal is to develop a system that can take two excerpts of fanfiction and
determine whether they come from the same fan or not. In addition to the text in these
excerpts, we are also given the fandom from which these excerpts derive.
      </p>
      <p>
        Our approach is to use neural networks to learn text and fandom embeddings that
are useful for authorship verification. Our final submission system is built using a
variant of state-of-the-art transformer models [26] that have recently been setting the pace
on a wide variety of natural language processing tasks such as translation,
questionanswering, cloze tasks, and language modeling [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We use the Longformer model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
pre-trained on about 6.5 billion tokens of text from multiple online sources including
English Wikipedia, real news outlets, book versions of movies, and a subset of
CommonCrawl dataset. The Longformer is a computationally efficient transformer model
for long text excerpts like the ones found in fanfiction. Our system augments the
Longformer with additional fully-connected layers for the authorship verification task as well
as a complementary fandom correspondence task.
      </p>
      <p>
        The key contributions that we will cover in our PAN Notebook paper are the
following:
1. We train our neural network models to incorporate auxiliary fandom information
using a multi-task loss function that combines both authorship and fandom
correspondence classification losses.
2. We investigate the effectiveness of large-scale text pre-training for authorship
verification by building a system using the state-of-the-art Longformer model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a
transformer-based model [26] that efficiently models long text excerpts such as the
fanfiction data in the PAN 2020 Author Verification Challenge.
3. We compare our word-based Longformer system with two PAN-provided baselines
and a character-based convolutional neural network ((CN)2) with self-attention
system.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Authorship verification and authorship attribution are related subfields within the larger
field of Forensic Authorship Analysis. Authorship verification seeks to determine if
two different writings are from the same author, while authorship attribution’s goal is
to identify who wrote a given writing. Both of these fields rely on the extraction of
useful text features for discriminating between different authors. Traditionally, researchers
have relied on linguistic style or stylometric features, such as the counts and frequency
of function words, average length of sentences, part-of-speech, characters, punctuation,
whitespace usage, and other low-level features [25].</p>
      <p>
        More recently, with the advent of many successful end-to-end deep learning systems
in computer vision, speech recognition, and natural language processing, researchers
have begun exploring the idea of learning what features are most useful for both
verification and attribution. Many researchers have explored using convolutional neural
networks (CNNs) and have successfully used them to extract text features [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] that
are helpful for attribution [
        <xref ref-type="bibr" rid="ref10 ref2">10,23,24,2</xref>
        ], where the CNNs learn author discriminative
ngrams of words and characters. In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], researchers learn embeddings for both words
and parts-of-speech (POS) tags and show improved generalization performance. Word
level features are good at capturing an author’s word usage style and oft used phrases,
but they ignore other writing nuances such as punctuation, whitespaces, abreviations,
and emoticons. Character-based CNNs excel at modeling these aspects; [23] shows that
character based CNN perform especially well for large scale authorship attribution as
the number of authors increase. One of the systems that we explored for PAN, our
Character-based Convolutional Neural Network (CN)2, extends the work on
characterlevel CNNs by combining CNNs with self-attention [26] layers to hone in on the most
discriminative combinations of character-level n-grams.
      </p>
      <p>
        Computational and neural network-based approaches for Natural Language
Processing (NLP) have seen a spike in the growth of their popularity mostly due to the
effectiveness of their usefulness across a wide-range of NLP tasks. Simple word-embedding
techniques like Word2Vec [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and GloVe [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], learn to “embed” or project words into
continuous feature vector spaces such that related words are proximal in feature space.
Subsequent classifiers can then use these pre-trained embeddings for NLP tasks such as
sentiment analysis, syntax parsing, semantic role labeling, etc. More recently,
sophisticated language models, some built using recurrent neural networks, like ELMO [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ],
and others derived from the self-attention based Transformer models [26] making them
well-suited for efficient training on massive amounts of data, have attained
state-ofthe-art performance on diverse sets of NLP tasks [
        <xref ref-type="bibr" rid="ref16 ref3 ref8 ref9">9,8,16,3</xref>
        ]. These models, trained on
increasingly large “Internet Scale” data (as in the case of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]), learn features that can
represent long sequences of text, and these features are then used in downstream NLP tasks.
For PAN, we wanted to see if these state-of-the-art language models pre-trained on large
amounts of data could be used for the authorship verification task. In section 3.4, we
describe our word-based system that uses the Transformer variant specifically developed
to efficiently model long text excerpts like fanfics called the Longformer [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which is
available through the Hugging Face’s excellent Transformer repository [27].
      </p>
      <p>
        Many authorship verification systems have explored architectures other than CNNs
and Transformers. Some have sought to learn features using autoencoder-inspired
architectures from a variety of word and character n-grams and POS [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Recurrent
Neural Networks (RNN) have also been successfully used. One particularly promising
approach is the work of Boenninghoff, et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], who use a Siamese Network setup for
transforming pairs of text excerpts to extract authorship features that can be compared
for determining whether the two excerpts are from the same author. Their Hierarchical
Recurrent Siamese Neural Network uses two Long Short-Term Memory (LSTM)
layers trained using a modified contrastive loss function that seeks to project text from the
same author to nearby locations in feature space.
      </p>
      <p>
        Finally, this year’s PAN baseline systems are two simple, yet effective approaches.
The “TFIDF” baseline computes the term frequency-inverse document frequency
normalized counts of character tetragrams of the pair of text excerpts, and then uses the
cosine similarities between them as an authorship verification score. The “Compress”
baseline is an adaptation of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to the verification task and uses a text compression
technique based on Prediction by Partial Matching to compute cross-entropies between the
text pair for attributing authorship. According to PAN [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], “The mean and absolute
difference of the two cross-entropies are used by a logistic regression model to estimate a
verification score in [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ]”. This technique follows similar work using text compression
for building authorship profiles in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methodologies</title>
      <p>Our work explores neural networks as a means for extracting discriminative features
directly from the text without any explicit feature engineering. Towards this end, we
explore two models: Character Convolutional Neural Network (CN)2, and Longformer.
More detail about each model will be given in Section 3.3 and Section 3.4.</p>
      <p>At a high level, there are three components to our approach: text feature extraction,
topic embedding, and final classification. Of these three, only the text feature extraction
changes depending on which of the two models is being used. To embed the fandoms1,
we apply an embedding layer Etopic that maps each fandom identifier to a 512
dimensional vector. Then, the topic embeddings are combined with the text features and
passed to two multi-layer perceptrons Mauthor and Mtopic for authorship verification
and topic correspondence respectively. The whole process is outlined in Figure 2 for
the (CN)2 model.
3.1</p>
      <sec id="sec-3-1">
        <title>Tokenization</title>
        <p>Depending on which model is being used for text feature extraction, our tokenization
differs:
1. (CN)2 - This model looks at the text from the standpoint of characters, including
all punctuation and white-space as tokens. This model can thus focus more on the
syntactic quirks of each author rather than the semantic meaning. The size of the
1 In this paper we refer to fandoms as “topics” interchangeably since the fandom of a piece of
writing can roughly be considered to be its topic.</p>
        <p>
          character vocabulary is 2; 500 and 4; 800 for the small and large version
respectively. In the case where a character in the validation set in unseen, we use an
“unknown” token.
2. Longformer - This model looks at the text from the standpoint of words and can
be thought of as placing more emphasis on the semantic meaning and particular
word combinations. Here, we tokenize our document using RoBERTa’s tokenizer
which has a vocabulary of size 50,265 tokens. More detail about this tokenizer can
be found in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Model Input</title>
        <p>Because the average length of the fanfiction excerpts is about 20,000 characters long
making it difficult for our models and their intermediate feature representations fit in
GPU memory, we chose to use multiple random windows from each excerpt as input.
The number and length of random windows varies depending on the model being used:
1. (CN)2 - Uses 10 random windows, 5 from each pair, each composed of 1000
characters.
2. Longformer - Uses 511 words from each pair, these are then separated using the
special &lt;SEP&gt; token, and the classification token &lt;CLS&gt; is prepended to learn the
document features that are most beneficial for authorship verification. See Figure 3
and Section 3.4 for more information.</p>
        <p>Figure 1 shows an example of how we prepare the fanfiction documents for input to
our authorship verification systems.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Character Convolutional Neural Network + Recursive Self Attention</title>
        <p>As mentioned previously, the (CN)2 model looks at the document from the
standpoint of characters. For input, we take in a set of 10 random windows, 5 from each
document. When each random window is embedded, we stack them on top of each
other along the channel dimension. Then, 1-D convolutions are applied over n-grams
of size one through five after which we maxpool over the length dimension. For each
n-gram, there are 512 different convolutional filters, and these filters essentially learn
the character n-grams that are most useful for authorship verification. Letting the output
of each 1-D convolution and maxpool be called C1; C2; ; C5, where C1 is the output
of the 1-D convolution and maxpool that looks at unigrams and C2; ; C5 are the ones
from higher order n-grams, we define an operation we call Recursive Self Attention:
out = SA(C5</p>
        <p>SA(C4</p>
        <p>SA(C3</p>
        <p>SA(C2</p>
        <p>C1))))
(1)</p>
        <p>Where is the concatenation operation, and SA is the Self-Attention operation.
This allows the network to learn features in a hierarchical manner starting from the
smallest n-grams to the highest. Finally, the output of this process is passed through
a fully connected layer to generate the final documents’ feature vector of size 512.
The documents’ features are then concatenated with the fandom embeddings forming
a composite feature vector F , which is then sent to two separate two-layer multilayer
perceptrons: one for learning the probability that the two documents are from the same
author and the other for the probability that they come from the same topic (i.e., the
fandom of the fanfic excerpt). The system architecture is shown on Figure 2.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Longformer</title>
        <p>
          The Longformer [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is an extension of the Transformer [26] which has achieved
state-of-the-art results in many natural language tasks. The Longformer has the
advantage of being pre-trained on approximately 6.5 billion word tokens and sets
stateof-the-art results on Wiki-Hot and Trivia-QA [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Whereas the original Transformer
architecture computed self-attention in O(n2) time, where n is the length of the input
sequence, the Longformer architecture introduces sliding window self-attention which
scales linearly instead of quadratically. The main insight is to compute the self-attention
locally instead of globally. Given a fixed window size of w, each token attends to 12 w on
each side thus resulting in an operation that can be computed in O(nw). Both the global
self-attention mechanism and the sliding window self-attention can be combined so as
to integrate both global and local context into the computation. This allows the
Longformer to take inputs that are much larger than those previously possible. See Figure 3
for a diagram of the architecture.
        </p>
        <p>
          We continue training from the weights of the RoBERTa model released by Beltagy,
et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The weights of the embedding layer and the weights of the first ten encoder
stacks are frozen, only leaving the last two Encoder stacks for fine-tuning. We use a
local attention pattern with w = 512 for every token except the CLS token for which
we use a global attention pattern as in the original Transformer architecture. We take a
random window consisting of 511 words from each author, these words are then
tokenized and the input to the model is “&lt;CLS&gt; &lt;Excerpt 1 Tokens&gt; &lt;SEP&gt; &lt;Excerpt 2
Tokens&gt;”. Where CLS is the classification token and SEP is the separator token found
in the original BERT [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] architecture. Finally, the embedding of the CLS token of size
N 768 is taken as our documents’ features, and the rest of the model flow follows as
in the (CN)2 approach. Our Longformer system is shown in Figure 3.
During training, our models optimize two objectives: authorship verification and topic
correspondence. The joint loss function may be written as follows:
        </p>
        <p>L(F; a; t) =
1 N</p>
        <p>X BCE( (Mauthor(Fi)); ai) + BCE( (Mtopic(Fi)); ti)
N i=1
(2)</p>
        <p>Where N is the number of samples, F 2 RNx3D are the features extracted from the
model, a 2 f0; 1g are the author labels, and t 2 f0; 1g are the topic labels. The label ai
is 1 if the text pair is written by the same author and 0 otherwise; t is analogous to a but
for topic correspondence. Given the features F , we map them to authorship verification
scores and topic correspondence scores using separate two-layer multilayer perceptrons
Mauthor and Mtopic respectively. These scores are then bounded between zero and one
through the use of the sigmoid function .</p>
        <p>BCE is the binary cross-entropy loss function here defined for one sample:
BCE(x; y) =
y log(x)
(1
y) log(1
x)
(3)</p>
        <p>Where x is a probability, and y 2 f0; 1g is the label.</p>
        <p>After training, the output of the neural networks Mauthor and Mtopic approximate the
posterior probabilities of the two excerpts being from the same author and being from
the same topic respectively. We use the same-author posterior probability, the output of
Mauthor, as our authorship verification score and can be thresholded at 0:5 to decide
whether the pair of fanfic excerpts are from the same author or not.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Setup and Results</title>
      <p>
        To evaluate our model, we used the script provided by the PAN organizers. This script
implements four metrics: AUC, F1-Score, C@1, and F_0.5u. C@1 [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and F_0.5u [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
are relatively new and have different properties. C@1 is a version of F1-score that
rewards the system for leaving difficult problems unanswered, i.e., when the verification
probability to be exactly 0:5. In the case where there is a balance between correct
answers and non-decisions this metric approximates the traditional accuracy computation.
If the model is unsure about too many samples, the score tends towards zero.
      </p>
      <p>F_0.5u places more importance on documents that come from the same author (i.e.,
true positives). This is in contrast to C@1 which treats both positive and negative
examples with equal importance. Because of this, in F_0.5u the unanswered samples are
treated as false negatives resulting in worse performance for models that leave a lot of
samples unanswered. Finally, F_0.5u weights true positives higher compared to false
positives and false negatives. Although both C@1 and F_0.5u interact with unanswered
samples differently, we didn’t map any scores to 0:5 and simply used the raw
sameauthor posterior probability as our verification score.
4.2</p>
      <sec id="sec-4-1">
        <title>Dataset</title>
        <p>The PAN Authorship Verification 2020 datasets come in two flavors: small and large.
Each dataset was built by scraping fanfiction writings from fanfiction.net. Pairs of
fanfictions along with their respective topic (fandom) were provided for training with the
classification of positive (same author) or negative (different authors) as ground truth.
For training our models, we used both the small and large versions of the dataset with
our final submission being based upon a model that was calibrated on the large dataset.</p>
        <p>The differences between both datasets is highlighted in Table 1. There are 52; 655
samples in the small dataset and 278; 169 in the large dataset, thus the large dataset
contains about 5.28 times more samples. In terms of document length, the average number
of characters is approximately 21,400 and the split between negative and positive
samples is close to 50% on both datasets. Finally, both the small and the large dataset share
the same 1; 600 fandoms.</p>
        <p>To validate our approaches, we split both the small and large dataset into training
and validation splits. To construct these splits, we performed the following steps:
1. Separate the positive and negative samples from each set.
2. Randomly choose 70% of the negative samples for the training set, and the
remaining 30% for the validation set.
3. We pick 15% of the authors randomly and use their positive samples in our
validation set. If two or more positive pairs existed, we used half for our training set and
the rest for the validation set.
4. The positive samples of the remaining 85% of authors are used in the training set.
4.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Results</title>
        <p>Tables 2 and 3 show our evaluation results of the baseline systems, (CN)2 and
Longformer, on the validation set for the small and large datasets respectively. Comparing
(CN)2 and Longformer against the Compress Baseline on the small dataset (Table 2),
both models see an improvement on every metric except (CN)2 on AUC. Overall, (CN)2
outperforms the baseline by 5% absolute, while the Longformer model outperforms it
by 14.5%. On the large dataset (Table 3), the improvement is even more pronounced.
(CN)2 achieves an improvement of 12.7% while Longformer achieves an improvement
of 23.8%. Both (CN)2 and Longformer benefit from having more data available which
highlights the possibility of further improving results by the addition of more data.</p>
        <p>
          Given these results, we chose the Longformer system trained on the large dataset as
our submission for the TIRA evaluation system [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. Unfortunately, as can be seen in
Table 4, the model didn’t generalize well to the test dataset.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>From the results we see that Longformer clearly outperforms both (CN)2 and the PAN
baselines on our own held-out validation set. We surmise that there are three reasons
why the Longformer architecture won out over (CN)2 on this held-out validation set:
1. It was pre-trained on about 6.5 billion tokens from multiple online sources.
2. Its deep architecture allows the model to learn more distant relationships of
elements in the text than just n-grams. In contrast, (CN)2 is limited to exploiting
relationships of n-grams of size 1 through 5.
3. Because the grammar of fanfictions are relatively clean, the word-based
Longformer is more suited to the task as it exploits semantic relationships between the
words and the structure of the sentences. On the other hand, (CN)2, focuses more
on the syntactic features, thus we hypothesize that character-level features would
be better in a dataset with shorter, more informal excerpts where words aren’t
necessarily written correctly.</p>
      <p>As of the writing of this paper, we’re still unsure as to why the Longformer model
didn’t generalize on the test set as can be seen in Table 4. We conjectured that our
validation set may not have sufficiently exhibited “the significant shift in the relation
between authors and fandoms” in the test set from those seen during training. To test
this hypothesis, we created a new dataset split such that the (author, fandom) pairs seen
in the validation set are unseen in the training set. For example, if an author wrote in
5 fandoms, we used his/her writings from 4 fandoms in the train set and reserved the
last for the validation set. This new validation set allowed us to test the most extreme
form of (author, fandom) shifts where there is no overlap between (author, fandom)
pairs between training and validation sets. Using this new train/validation data split, the
Longformer model achieved an overall score of 93.6% on the validation set. While this
score is less than that achieved in our previous validation set, it is still high, and thus
fandom shift does not fully explain our Longformer model’s failure to generalize on the
official test set.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In this paper, we described our approach to the authorship verification task in the PAN
@ CLEF 2020 challenge based on using neural networks for learning discriminative
features from the text as well as the fandom from which the text derives. We compared
two different neural network architectures. The first architecture, (CN)2, uses a
convolutional stack followed by a recursive self-attention stack to simultaneously learn useful
character n-grams and their combinations that are most useful for determining whether
the excerpt pair is from the same author. The second architecture leverages the
state-ofthe-art text features learned by the Longformer model, pre-trained on text with about
6.5 billion words, and we fine-tune the last two encoder stacks to learn useful features
for authorship verification. By leveraging a powerful text modeling architecture like
the Longformer, we sought to investigate the effectiveness of a transfer learning
approach that has shown great results for other NLP tasks. Both systems learn a separate
set of embeddings for the fandoms of both fanfic excerpts. The text based features and
the fandom features are concatenated and used to predict authorship and fandom
correspondences. We used a multi-task loss function to simultaneously optimize for both
authorship verification and topic correspondence, allowing it to learn useful fandom
features for author verification indirectly.</p>
      <p>For our validation testing, we partitioned each of the PAN provided “small” and
“large” training sets into our own training and validation sets. We evaluated our (CN)2
and Longformer systems along with the two simple PAN baseline systems. Both of our
systems outperformed each baseline, but our Longformer system was the winner by a
wide margin, attaining a 0.963 overall verification score which is a 32.8% relative
improvement over the best baseline system. Thus, we chose to submit our Longformer
model trained on the “large” training set as our submission system for PAN.
Unfortunately, this system failed to generalize on the official PAN test set, only attaining a
0.685 overall score and failing to outperform the baseline systems. Without access to
the PAN test set, we have started to diagnose the source of the generalization failure.
An initial experiment on a new train/validation split where (author, fandom) pairs do
not overlap between train and validation sets has shown that an extreme shift in
(author, fandom) pairs may not be the most significant cause of performance degradation.
Our future work will further investigate this issue and seek to improve the Longformer
system’s generalization performance.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgment</title>
      <p>This work was performed under the auspices of the U.S. Department of Energy by
Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. This
document may contain research results that are experimental in nature, and neither the
United States Government, any agency thereof, Lawrence Livermore National Security,
LLC, nor any of their respective employees makes any warranty, express or implied,
or assumes any legal liability or responsibility for the accuracy, completeness, or
usefulness of any information, apparatus, product, or process disclosed, or represents that
its use would not infringe privately owned rights. Reference to any specific commercial
product, process, or service by trade name, trademark, manufacturer, or otherwise does
not constitute or imply an endorsement or recommendation by the U.S. Government or
Lawrence Livermore National Security, LLC. The views and opinions of authors
expressed herein do not necessarily reflect those of the U.S. Government or Lawrence
Livermore National Security, LLC and will not be used for advertising or product
endorsement purposes.
23. Ruder, S., Ghaffari, P., Breslin, J.G.: Character-level and multi-channel convolutional neural
networks for large-scale authorship attribution. ArXiv abs/1609.06686 (2016)
24. Shrestha, P., Sierra, S., González, F.A., y Gómez, M.M., Rosso, P., Solorio, T.:</p>
      <p>Convolutional neural networks for authorship attribution of short texts. In: EACL (2017)
25. Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the</p>
      <p>American Society for information Science and Technology 60(3), 538–556 (2009)
26. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.,</p>
      <p>Polosukhin, I.: Attention is all you need. ArXiv abs/1706.03762 (2017)
27. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T.,
Louf, R., Funtowicz, M., Brew, J.: Huggingface’s transformers: State-of-the-art natural
language processing. ArXiv abs/1910.03771 (2019)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. PAN:
          <article-title>A series of scientific events and shared tasks on digital text forensics and stylometry</article-title>
          . https://pan.webis.de/, accessed:
          <fpage>2020</fpage>
          -07-03
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Andrews</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bishop</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Learning invariant representations of social media users</article-title>
          . ArXiv abs/
          <year>1910</year>
          .04979 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Beltagy</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Longformer: The long-document transformer</article-title>
          . ArXiv abs/
          <year>2004</year>
          .05150 (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bevendorff</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Generalizing unmasking for short texts</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <fpage>654</fpage>
          -
          <lpage>659</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bischoff</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deckers</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schliebs</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thies</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The Importance of Suppressing Domain Style in Authorship Analysis</article-title>
          . CoRR abs/
          <year>2005</year>
          .14714 (May
          <year>2020</year>
          ), https://arxiv.org/abs/
          <year>2005</year>
          .14714
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bobicev</surname>
          </string-name>
          , V.:
          <article-title>Authorship detection with PPM notebook for PAN at CLEF</article-title>
          <year>2013</year>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Boenninghoff</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nickel</surname>
            ,
            <given-names>R.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeiler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolossa</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Similarity learning for authorship verification in social media</article-title>
          .
          <source>In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          . pp.
          <fpage>2457</fpage>
          -
          <lpage>2461</lpage>
          . IEEE (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , T.B.,
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>B.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neelakantan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shyam</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sastry</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Askell</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herbert-Voss</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krüger</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Henighan</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramesh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ziegler</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hesse</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sigler</surname>
            ,
            <given-names>E.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Litwin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chess</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCandlish</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amodei</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Language models are few-shot learners</article-title>
          . ArXiv abs/
          <year>2005</year>
          .14165 (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . ArXiv abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hitschler</surname>
          </string-name>
          , J., van den Berg, E.,
          <string-name>
            <surname>Rehbein</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Authorship attribution with convolutional neural networks and POS-eliding</article-title>
          .
          <source>In: Proceedings of the Workshop on Stylistic Variation</source>
          . pp.
          <fpage>53</fpage>
          -
          <lpage>58</lpage>
          . Copenhagen, Denmark (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hosseinia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mukherjee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Experiments with neural networks for small and large scale authorship verification</article-title>
          . ArXiv abs/
          <year>1803</year>
          .06456 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manjavacas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bevendorff</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the Cross-Domain Authorship Verification Task at PAN 2020</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Eickhoff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Névéol</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <article-title>CLEF 2020 Labs and Workshops, Notebook Papers</article-title>
          .
          <source>CEUR-WS.org (Sep</source>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specht</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of the author identification task at PAN-2018: cross-domain authorship attribution and style change detection</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (eds.) Working Notes of CLEF 2018 -
          <article-title>Conference and Labs of the Evaluation (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Khmelev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teahan</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>A repetition based measure for verification of text collections and for text categorization</article-title>
          .
          <source>In: ACM SIGIR</source>
          <year>2003</year>
          . pp.
          <fpage>104</fpage>
          -
          <lpage>110</lpage>
          (
          <year>2003</year>
          ). https://doi.org/10.1145/860435.860456
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for sentence classification</article-title>
          .
          <source>ArXiv abs/1408</source>
          .5882 (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          . ArXiv abs/
          <year>1907</year>
          .11692 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          . In: Burges,
          <string-name>
            <given-names>C.J.C.</given-names>
            ,
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Welling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ghahramani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Weinberger</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.Q</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>26</volume>
          , pp.
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Milli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bamman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Beyond canonical texts: A computational analysis of fanfiction</article-title>
          .
          <source>In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <fpage>2048</fpage>
          -
          <lpage>2053</lpage>
          . Austin,
          <string-name>
            <surname>Texas</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Peñas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrigo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A simple measure to assess non-response</article-title>
          .
          <source>In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : GloVe:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In: Proc. of NAACL</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>TIRA Integrated Research Architecture</article-title>
          . In: Ferro,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <surname>C</surname>
          </string-name>
          . (eds.)
          <article-title>Information Retrieval Evaluation in a Changing World</article-title>
          . Springer (Sep
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>