<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Adapter fusion for check-worthiness detection - combining a task adapter with a NER adapter</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Inna Vogel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pauline Möhle</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Meghana Meghana</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Steinebach</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ATHENE - National Research Center for Applied Cybersecurity</institution>
          ,
          <addr-line>Rheinstrasse 75, Darmstadt, 64295, Germany, https://</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fraunhofer Institute for Secure Information Technology SIT</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Detecting check-worthy statements aims to facilitate manual fact-checking eforts by detecting claims that fact-checkers should prioritize first. It can also be considered as the first step of a fact-checking system. In this paper, we present an adapter fusion model that combines a task adapter with a NER adapter achieving state-of-the-art results on two challenging check-worthiness benchmarks. Adapters are a resource-eficient alternative to fully fine-tuning transformer models. Our best performing model obtains an  1 score of 0.92 on the CheckThat! Lab 2023 dataset. Additionally, we interpret the fusion attentions, demonstrating the efectiveness of our approach. The quantitative analysis of the fusion attentions shows that named entities contribute significantly to the prediction of the adapter fusion model.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;check-worthiness detection</kwd>
        <kwd>fact-checking</kwd>
        <kwd>adapter fusion</kwd>
        <kwd>NER</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Fact-checking online content is essential to ensure the reliability of information shared through
various online communication channels, such as news websites and social media platforms.
Fact-checkers and journalists are constantly working to identify and correct misinformation and
communicate their work as quickly as possible. But with the amount of information published
every day online and the limited resources available to journalists and fact-checkers, it is almost
impossible to keep up with this critical work.</p>
      <p>The fact-checking process consists usually of three main steps. The first step involves
identifying statements or claims in a text that are worth fact-checking, as not all claims are
equally important or contain relevant information that needs to be fact-checked. These can
be false allegations, statistics or other objectively verifiable false information. Fact-checkers
prioritize claims for verification based on their potential impact, the claim’s factual coherence
or the public interest in the claim. Once a claim has been selected, the second step is to gather
trustworthy evidence to confirm or disprove it by researching reliable sources. These sources
can include academic journals, oficial reports, reputable news organizations, subject matter
experts and primary sources such as original documents or statistics. To ensure consistency
and accuracy, fact-checkers and journalists compare information from multiple sources. The
main challenge is that the vast majority of the work of the fact-checker is still done manually.
Therefore, there is a need to develop technologies that would facilitate, speed up and improve
the task of fact-checking and journalists detecting fake news and misinformation.</p>
      <p>The first step in the fact-checking pipeline, the automatic identification of check-worthy
statements, could facilitate the work of fact-checkers or journalists by identifying and
highlighting statements within a text that require further verification. This could streamline the
fact-checking process and reduce the potential for human bias in claim selection.</p>
      <p>
        We consider the check-worthiness detection task as a binary classification task.
Checkworthy sentences or statements are usually those that contain factual information such as dates,
definitions, statistics or descriptions of events or laws. They are usually of interest to or likely to
afect the general public [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Nakov et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] extend the definition and add that such statements
can also be potentially damaging to society, public figures or a company. Non-check-worthy
statements, on the other hand, contain subjective opinions and beliefs rather than factual
claims [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
      <p>
        In this paper, we propose an adapter fusion approach that combines a task adapter with a
NER (Named Entity Recognition) adapter. Adapters are a resource-eficient alternative to fully
ifne-tuned transformer models [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. They are lightweight and modular neural networks, learning
tasks with fewer parameters and transferring knowledge across tasks and languages [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We
ifrst trained a task adapter to efectively detect check-worthy statements. As we noticed that
check-worthy claims increasingly contain facts in the form of named entities, such as personal
names, dates, financial and percentage values or names of events or historical occurrences
(e.g., "World War II" or "the Great Depression"), we fused the task adapter with a NER adapter.
Our approach achieves state-of-the-art results on two challenging check-worthiness detection
benchmarks.
      </p>
      <p>Our contributions are summarized as follows:
• We are the first to propose an adapter fusion model that combines a task adapter with a</p>
      <p>NER adapter.
• Our model achieves state-of-the-art performance by a substantial margin on challenging
check-worthiness benchmarks.
• We use an explainability tool to interpret the classification results, demonstrating the
efectiveness of our approach.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The first methods of check-worthiness detection were based on the extraction of meaningful
features from the text. Given a US presidential election transcript, ClaimBuster [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] predicts
check-worthiness by using a support vector machine and extracting a set of 6,615 features in
total (such as word count, sentiment, tf-idf weighted bag-of-words, part-of-speech tags or entity
type). Gencheva et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] extended the work of Hassan et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] by including contextual features
such as the position of the sentence, the size of a segment belonging to a speaker, topics or word
embeddings. A MAP of 0.427 was achieved using a neural network with all features combined.
      </p>
      <p>
        Meng et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] used both the ClaimBuster dataset and the CLEF CheckThat! Lab 2019 dataset
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] on the detection of check-worthy factual claims using adversarial training on transformer
models. Their model achieved an improvement in the  1 score of 4.7 points compared to other
models on these datasets.
      </p>
      <p>
        CheckThat! Lab organises multilingual check-worthiness detection tasks since 2020. They
support more languages every year for multimodal and multigenre content [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The aim of the
CheckThat! Lab challenge in 2023 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] was to determine whether the information contained in
the political debate is reliable and worthy of further fact-checking. Sawiński et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] were
the best performing team in English. They experimented with fine-tuning a variety of BERT
models and found that fine-tuning DeBERTaV3 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] yielded near-identical performance to
GPT-3, achieving an  1 score of 0.89. Frick et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] came second in the competition with
an  1 score of 0.87 by fine-tuning BERT three times, starting with a diferent seed for model
initialisation, resulting in three models. They combined these models into an ensemble using
a model souping technique that adaptively adjusts the influence of each model based on its
performance.
      </p>
      <p>
        Schlicht et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] investigated cross-training of adapter fusion models on world languages
(such as Arabic, English and Spanish) to detect check-worthiness in multiple languages.
Therefore, they used mBERT and XLM-R and adapter fusion models (combining task and language
adapters) within a transformer as well as fully fine-tuned transformers. They showed that
the models could perform better than monolingual task adapters and fully tuned models. For
the detection of English check-worthy claims, an  1 score of 0.51 was achieved using the
multilingual dataset from the CLEF CheckThat! Lab challenge 2022 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and 2021 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] applying
XLM-R in combination with a fully fine-tuned transformer.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset Description</title>
      <p>
        We used the CheckThat! Lab 2023 dataset [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and the ClaimBuster dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for our
experiments. The CheckThat! Lab 2023 English dataset consists of political debates collected from the
US presidential general election debates [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The aim of the CheckThat! Lab 2023 task was
to predict whether a text snippet from a political debate needs to be assessed manually by an
expert by estimating its check-worthiness. Examples from the dataset are shown in Table 1.
The dataset is divided into four subsets. The "Train" subset consists of 16,876 entries. Each
entry is labelled either "Yes" or "No" as to whether it is worth checking (YES) or not (No). The
development set "Dev" contains 5,625 statements, the development test set "Dev Test" contains
1,032 entries and the test set for the final evaluation "Test" contains 318 statements. The label
distributions and dataset splits are shown in Table 2.
      </p>
      <p>
        While the first three partitions primarily use the ClaimBuster dataset described in Arslan et al.
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], there have been some updates made by the CheckThat! Lab 2023 organizers to improve
the quality of the annotations. The test set includes sentences that were not featured in the
ClaimBuster dataset.
      </p>
      <p>
        The ClaimBuster dataset was labelled by 101 annotators over a period of 26 months. The
following three classes have been annotated. 1. Check-worthy factual sentences: These sentences
contain statements of fact that the general public will have an interest in finding out whether
they are true or not. Journalists and fact-checkers look for these kinds of statements to check
their veracity. 2. Unimportant factual sentences: These are factual statements, but they are not
verifiable or the general public is not interested in knowing whether these sentences are true
or false. Fact-checkers do not consider these sentences to be worth checking. 3. Non-factual
sentences: These sentences contain no factual claims. Subjective sentences such as opinions,
and beliefs fall into this category [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        To compare our results with the work of Meng et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the ClaimBuster dataset was used
as a baseline. The dataset consists of 9,674 sentences, of which 6,910 are non-check-worthy
and 2,764 are check-worthy [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The authors excluded the class of non-check-worthy factual
sentences, having observed that this class was not really useful and could negatively afect the
performance of models. To compare our results, we used the same split as the authors - 67.5%
of the dataset was used to train, 7.5% for validation and 25% to test our model.
      </p>
      <p>Both datasets are highly unbalanced, with about a quarter of the sentences being
checkworthy. This is also due to the fact that attention-worthy sentences occur less frequently in the
text than non-check-worthy sentences.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <sec id="sec-4-1">
        <title>4.1. Adapter Fusion - Combining a Task Adapter with a NER Adapter</title>
        <p>
          Transformer models, pre-trained on massive amounts of text data and then fine-tuned on target
tasks, have led to significant advances in NLP, achieving state-of-the-art results in a variety of
tasks. However, models such as RoBERTa [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] and BERT [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] consist of millions of parameters,
making it prohibitively expensive to share and distribute fully tuned models for each individual
downstream task. Adapters are a lightweight alternative to full model fine-tuning, consisting of
only a tiny set of newly initialised weights at each layer of the pre-trained model [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. These
new weights are then updated during fine-tuning and the pre-trained parameters of the
pretrained model are frozen. This means that adapters are parameter eficient, speed up training
iterations, and can be shared and composited due to their modularity and compact size without
compromising the performance of the model. Adapters have been shown to work on par with
full fine-tuning by adapting the representations at each level [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>Adapter fusion is a method of combining the knowledge of multiple pre-trained adapters
trained for diferent tasks. Adapter fusion consists of an attention module that learns how to
dynamically combine knowledge from diferent task adapters. This means that it fuses the
information learned by diferent adapters into a coherent representation. Diferent fusion
strategies can be used, such as weighted summation, gating mechanisms, or attention mechanisms.
The goal is to capture the synergies between diferent tasks and adapters. First, we trained
a task adapter to eficiently detect and classify the sentences worth checking by journalists
and fact-checkers. This model serves as our baseline. In a quantitative analysis counting the
frequencies of named entities in the two classes, we found that check-worthy sentences tend to
contain more named entities than non-check-worthy sentences. Table 3 shows the distribution
of named entities in the CheckThat! Lab 2023 dataset.</p>
        <p>
          Since the dataset contains less check-worthy sentences (5,759) than non-check-worthy
sentences (18,092), we reduced the number of sentences in the negative class so that the dataset
used for the analysis is balanced, i.e. 5,759 sentences per class. To count the frequencies of the
named entities, we used the four-class NER model for English "Flair", which achieves an  1
score of 0.93 on the CoNLL 2003 dataset [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
        </p>
        <p>The four named entity classes are: Person Name (PER), Location Name (LOC), Organisation
Name (ORG) and Miscellaneous (MISC). The distribution of the classes shows that all the classes
of named entities are more present in the sentences that are worth fact-checking. We assume
that named entities, as in journalism, are essential holders of information. The "Five Ws" in
journalism "Who?, What?, Where?, When?, and Why?" are essential questions that aim to
provide a complete understanding of a situation. The questions "Who?, Where?, When?" and
partly also "What?" (e.g. "30% of those vaccinated") can be detected by a NER recogniser.</p>
        <p>This motivated us to experiment with the fusion of a trained task adapter with a NER adapter.
The architecture of our final model is illustrated in Figure 1. The input to the architecture is a
statement while the output is a probability score that determines the check-worthiness of the
given input statement. The adapter fusion component takes as input the representations of
multiple adapters trained on diferent tasks and learns a parameterized mixer of the encoded
information.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Implementation Details</title>
        <p>
          The task adapter model "AF-TA" and the adapter fusion model "AF-TA-NER" were trained
on the CheckThat! Lab 2023 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] dataset. For this, adapter transformers from the "Adapter
Hub" repository for pre-trained adapter modules were employed [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. We used the pre-trained
RoBERTa model as it showed significant improvements in various NLP benchmarks compared
to the original BERT model [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>Our models were trained on the "Train" dataset with 16,876 instances, while the performance
of the models during training was evaluated on the "Dev"set with 5,625 instances. Finally, the
"Dev Test" set of 1,032 samples was used to compare the performance of the diferent trained
models. The overall best performing model was evaluated on the "Test" dataset (Table 2) using
the  1 score metric over the positive (check-worthy) class. We tried a number of diferent
parameters and report on the ones that performed best.The task adapter model "AF-TA" was
trained for 6 epochs with a learning rate of 1e-4 and a batch size of 32, using a maximum
sequence length of 512.</p>
        <p>
          To train our "AF-TA-NER" model, we fused the task adapter model "AF-TA" and the fine-tuned
version of the DistilRoBERTa [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] based NER model. The NER model was trained and evaluated
on the CoNLL 2003 dataset and achieves an  1 score of 0.92 [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. The adapter fusion model
"AF-TA-NER", which combines the task adapter and the NER adapter model performed best
when the number of epochs was set to 5, the learning rate to 5e-5, the batch size to 8 and the
maximum sequence length of 512.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Baselines</title>
      <p>
        To compare the performance of our adapter fusion model, "AF-TA-NER", three diferent baselines
were used. As the first baseline, we chose the best performing system of the CheckThat! Lab
2023 challenge "OpenFact" [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The highest  1 score of 0.89 was obtained by the GPT-3 Curie
model fine-tuned with approximately 7,690 examples selected on label quality criteria.
      </p>
      <p>
        Meng et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposed in their work a method by applying adversarial perturbations using
the BERT architecture. In order to detect verifiable factual claims, they used the ClaimBuster
dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. To validate their model, they performed 4-fold cross-validation, selecting the best
model from each fold using the weighted  1 score calculated on the validation set. We split
the data into 25% test, 7.5% validation and 67.5% training and applied stratified 4-fold
crossvalidation according to the authors in order to compare the performance of our model. The
reported  1 score is based on classifications across all folds. Their best performing model
"CB-BBA" achieves an average  1 score of 0.83 on the positive (check-worthy) class. We used
the same dataset split for our proposed models, applying stratified 4-fold cross-validation.
      </p>
      <p>To determine whether the proposed NER adapter fusion model "AF-TA-NER" can improve
classification results, we used the task adapter model "AF-TA" as a third baseline. The
implementation details were given in section 4.2.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Evaluation Results</title>
      <p>
        Table 4 shows the performance results of the "AF-TA-NER" model and two baselines, the "AF-TA"
model as well as the best performing system in the CheckThat! Lab 2023 challenge "OpenFact"
Sawiński et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        The "AF-TA" task adapter model achieves an  1 score of 0.87 while the GPT-3 Curie model
used in "OpenFact" achieves an  1 score of 0.89. The proposed "AF-TA-NER" outperforms both
models by achieving an  1 score of 0.92 on the positive class (check-worthy). The negative
class (not check-worthy) achieves an  1 score of 0.96.
Table 5 shows the classification results of our two proposed models compared to the approach
presented by Meng et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Our models outperform "CB-BBA" ( 1 score 0.84) by a substantial
margin. The "AF-TA-NER" model ( 1 score 0.89) performs slightly better than the "AF-TA"
model ( 1 score 0.88). Compared to the "CB-BBA" model, our best performing model achieves
a 5 point improvement in  1 score. The "AF-TA-NER" model outperforms all three baselines
and achieves state-of-the-art performance on diferent benchmarks. The confusion matrix of
our "AF-TA-NER" model is shown in Figure 2.
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Interpretation of the Fusion Attentions</title>
      <p>In this section, we present the interpretation of the fusion attentions using the "Transformers
Interpret"1 library. The core attribution methods on which Transformers Interpret is built
are "Integrated Gradients" and a variant of them, "Layer Integrated Gradients". The feature
attribution score is a summary or average of the attributions from each layer, explaining which
features were most important in a model’s prediction for a given input.</p>
      <p>The aim of the interpretation of the classification results is to analyze whether our proposed
adapter fusion model is capable of reliably classifying check-worthy statements in a text. This
analysis can also be useful for journalists and fact-checkers to check the basis on which the
model has made its decision. The following quantitative and qualitative analysis is based on the
classification results of our trained adapter fusion model "AF-TA-NER".</p>
      <p>To determine which named entity class contributes the most to the classification, we chose
a NER model for the quantitative analysis that can recognize the highest number of classes.
Therefore, we used spaCy’s2 NLP pipeline model "en_core_web_smpipeline". The model provides
a NER system that can identify named entities and classify them into 18 predefined categories.
1Transformers Interpret: https://github.com/cdpierse/transformers-interpret.
2spaCy: https://spacy.io/.</p>
      <p>The aim of the quantitative analyses was to investigate whether NER features were important
for the predictions of the "AF-TA-NER" model.</p>
      <p>Table 6 lists the classes that contributed most to the classification of the check-worthy class.
To compare, we also show how the respective NER class relates to the negative class. Positive
attribution numbers indicate whether a word contributes positively to the predicted class, while
negative numbers indicate the opposite.
The quantitative analysis shows that named entities contribute significantly to the prediction of
the adapter fusion model. The NER class "money" contributes most positively to the positive
class (e.g. "I paid $38 million one year..."), while at the same time it has little relevance for the
classification of the negative class. The same holds true for "Percent", "Cardinal" and "Date"
classes - each of them contributing significantly to the positive class (e.g. "When we were in ofice,
there was 15% less violence in America..."), while contributing negatively to the non-check-worthy
class.</p>
      <p>It is interesting to note that the classes "Person" (0.18), "Place" (0.04) and "Organisation"
(0.14), although also relevant for the classification of the check-worthy class, are less significant
than the other mentioned NER classes in Table 6. Even the class "Event" (0.27), which refers to
mentions of events in the text such as hurricanes, battles, wars or sports events, contributes
more to the model’s attention. This suggests that the model would still give good classification
results even if the names of people and places were removed from the dataset, e.g., for data
protection reasons.</p>
      <p>Using the Transformers Interpret heat map visualization, we analyzed the contribution of
tokens to the model’s prediction. This type of visualization is particularly useful for
understanding and interpreting the decisions made by complex transformer models. Each token in the
input text is colour-coded based on its contribution score. The colour intensity represents the
magnitude of the contribution. Our qualitative analysis shows that action verbs (or dynamic
verbs), which describe the action that a subject of a sentence performs (e.g. "run", "fight" ,
"sleep"), contribute to the prediction of the check-worthy class. Examples are shown in Figure 7.
The colour intensity shows that words like "paid", "released", "reduced", and "beat" contribute
positively to the prediction of the adapter fusion model.</p>
      <p>As there are only two examples of False Positive (FP) classifications (Figure 2), no reliable
analysis can be made. However, we suspect that these two examples shown in Figure 4 were
incorrectly labelled as not check-worthy by the human annotators. In our opinion, these
examples contain relevant facts that should be fact-checked. The same applies to False Negative
cases. In the following example, we found no evidence for a verifiable statement: "Yes, you said
that". However, the annotation is dificult to evaluate, as it is also subjective.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion and Future Work</title>
      <p>In this paper, we presented our work on detecting check-worthy factual claims employing an
adapter fusion approach by combining a task adapter with a NER adapter. We first trained a
task adapter to efectively detect check-worthy statements and used it as our first baseline. As
we analyzed that check-worthy claims increasingly contain facts in the form of named entities,
we fused the task adapter with a NER adapter. The goal was to capture the synergies between
diferent tasks and adapters. Our approach achieves state-of-the-art results on two challenging
benchmarks. Our best "AF-TA-NER" adapter fusion model achieves an  1 score of 0.92 on the
CheckThat! Lab 2023 dataset and an  1 score of 0.89 on the ClaimBuster dataset.</p>
      <p>Additionally, we used an explainability tool to interpret the fusion attentions, demonstrating
the efectiveness of our approach. The quantitative analysis showed that named entities
contribute significantly to the prediction of the adapter fusion model. To determine which NER
class contributes the most to the classification results, we used a NER system that can identify
18 predefined categories of named entities. By analysing the attention weights, we found that
it is not the NER classes "Person", "Location" or "Organization" that contribute most to the
positive class, but rather the classes "Money", "Percent", "Cardinal" and "Date". In the future, we
therefore plan to employ a NER adapter that can classify more than four classes. Additionally,
we want to investigate the fusion of diferent task adapters and interpret how fusion attention
difers across adapters and fusion models.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgements</title>
      <p>This work was supported by the German Federal Ministry of Education and Research (BMBF)
and the Hessian Ministry of Higher Education, Research, Science and the Arts within their joint
support of "ATHENE – CRISIS" and "Lernlabor Cybersicherheit" (LLCS).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Arslan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tremayne</surname>
          </string-name>
          ,
          <article-title>A benchmark dataset of check-worthy factual claims</article-title>
          , in: 14th
          <source>International AAAI Conference on Web and Social Media</source>
          , AAAI,
          <year>2020</year>
          . URL: https://api.semanticscholar.org/CorpusID:216870066.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Da San Martino, T. Elsayed,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Míguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Mansour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hamdan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. S.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kutlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Kartal</surname>
          </string-name>
          ,
          <article-title>Overview of the clef-2021 checkthat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news</article-title>
          ,
          <source>in: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 12th International Conference of the CLEF Association, CLEF</source>
          <year>2021</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          ,
          <source>September 21-24</source>
          ,
          <year>2021</year>
          , Proceedings, Springer-Verlag, Berlin, Heidelberg,
          <year>2021</year>
          , p.
          <fpage>264</fpage>
          -
          <lpage>291</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -85251-1_
          <fpage>19</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -85251-1_
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Houlsby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Giurgiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jastrzebski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Morrone</surname>
          </string-name>
          , Q. de Laroussilhe,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gesmundo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Attariyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gelly</surname>
          </string-name>
          ,
          <article-title>Parameter-eficient transfer learning for nlp</article-title>
          ., in: K. Chaudhuri, R. Salakhutdinov (Eds.), ICML, volume
          <volume>97</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2790</fpage>
          -
          <lpage>2799</lpage>
          . URL: http://dblp.uni-trier.de/db/conf/icml/icml2019.html# HoulsbyGJMLGAG19.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pfeifer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rücklé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Poth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kamath</surname>
          </string-name>
          , I. Vulić,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Adapterhub: A framework for adapting transformers</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP</source>
          <year>2020</year>
          )
          <article-title>: Systems Demonstrations, Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>46</fpage>
          -
          <lpage>54</lpage>
          . URL: https://www.aclweb.org/anthology/2020.emnlp-demos.
          <volume>7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Arslan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tremayne</surname>
          </string-name>
          ,
          <article-title>Toward automated fact-checking: Detecting check-worthy factual claims by claimbuster</article-title>
          ,
          <source>Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gencheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Màrquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Koychev</surname>
          </string-name>
          ,
          <article-title>A context-aware approach for detecting worth-checking claims in political debates</article-title>
          ,
          <source>in: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP</source>
          <year>2017</year>
          ,
          <string-name>
            <given-names>INCOMA</given-names>
            <surname>Ltd</surname>
          </string-name>
          .,
          <string-name>
            <surname>Varna</surname>
          </string-name>
          , Bulgaria,
          <year>2017</year>
          , pp.
          <fpage>267</fpage>
          -
          <lpage>276</lpage>
          . URL: https://doi.org/10.26615/
          <fpage>978</fpage>
          -954-452-049-6_
          <fpage>037</fpage>
          . doi:
          <volume>10</volume>
          .26615/
          <fpage>978</fpage>
          -954-452-049-6_
          <fpage>037</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jimenez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Arslan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Devasier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Obembe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Gradient-based adversarial training on transformer networks for detecting check-worthy factual claims</article-title>
          , ArXiv abs/
          <year>2002</year>
          .07725 (
          <year>2020</year>
          ). URL: https://api.semanticscholar.org/CorpusID:211146392.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Karadzhov,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohtarami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D. S.</given-names>
            <surname>Martino</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2019 checkthat! lab: Automatic identification and verification of claims. task 1: Check-worthiness</article-title>
          , in: L.
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
            ,
            <given-names>D. E.</given-names>
          </string-name>
          <string-name>
            <surname>Losada</surname>
          </string-name>
          , H. Müller (Eds.),
          <source>Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Lugano, Switzerland, September</source>
          <volume>9</volume>
          -
          <issue>12</issue>
          ,
          <year>2019</year>
          , volume
          <volume>2380</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2019</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2380</volume>
          /paper_269.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I. B.</given-names>
            <surname>Schlicht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Flek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <article-title>Multilingual detection of check-worthy claims using world languages and adapter fusion</article-title>
          , in: J.
          <string-name>
            <surname>Kamps</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maistro</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Joho</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Kruschwitz</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Caputo (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>118</fpage>
          -
          <lpage>133</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Cheema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hakimov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Míguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2023 CheckThat! lab task 1 on check-worthiness in multimodal and multigenre content</article-title>
          , in: Working Notes of CLEF 2023-
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , CLEF '
          <year>2023</year>
          , Thessaloniki, Greece,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sawiński</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Węcel</surname>
          </string-name>
          , E. Księżniak,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stróżyna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lewoniewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stolarski</surname>
          </string-name>
          , W. Abramowicz, Openfact at checkthat!
          <article-title>-2023: Head-to-head gpt vs. bert - a comparative study of transformers language models for the detection of check-worthy claims</article-title>
          ,
          <source>in: Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2023</year>
          . URL: https://api.semanticscholar.org/ CorpusID:264441775.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen,
          <article-title>Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing</article-title>
          ,
          <source>in: The Eleventh International Conference on Learning Representations, ICLR</source>
          <year>2023</year>
          , Kigali, Rwanda, May 1-
          <issue>5</issue>
          ,
          <year>2023</year>
          , OpenReview.net,
          <year>2023</year>
          . URL: https://openreview.net/pdf?id=
          <fpage>sE7</fpage>
          -
          <lpage>XhLxHA</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Frick</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Vogel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          , Fraunhofer SIT at checkthat!
          <article-title>-2023: Enhancing the detection of multimodal and multigenre check-worthiness using optical character recognition and model souping</article-title>
          , in: M.
          <string-name>
            <surname>Aliannejadi</surname>
            , G. Faggioli,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
          </string-name>
          , M. Vlachos (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2023</year>
          ), Thessaloniki, Greece,
          <source>September 18th to 21st</source>
          ,
          <year>2023</year>
          , volume
          <volume>3497</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>337</fpage>
          -
          <lpage>350</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3497</volume>
          /paper-029.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          , G. Da San Martino,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Míguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kutlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zaghouani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Kartal</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Beltrán,</surname>
          </string-name>
          <article-title>The clef-2022 checkthat! lab on fighting the covid-19 infodemic and fake news detection</article-title>
          , in: M.
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Verberne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Seifert</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Balog</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Nørvåg</surname>
          </string-name>
          , V. Setty (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer International Publishing, Cham,
          <year>2022</year>
          , pp.
          <fpage>416</fpage>
          -
          <lpage>428</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Mansour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D. S.</given-names>
            <surname>Martino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2021 checkthat! lab task 2 on detecting previously fact-checked claims in tweets and political debates</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maistro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum</source>
          , Bucharest, Romania, September 21st - to - 24th,
          <year>2021</year>
          , volume
          <volume>2936</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>393</fpage>
          -
          <lpage>405</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2936</volume>
          /paper-29.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          , ArXiv abs/
          <year>1907</year>
          .11692 (
          <year>2019</year>
          ). URL: https://api.semanticscholar.org/CorpusID:198953378.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Akbik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blythe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vollgraf</surname>
          </string-name>
          ,
          <article-title>Contextual string embeddings for sequence labeling</article-title>
          ,
          <source>in: COLING</source>
          <year>2018</year>
          , 27th International Conference on Computational Linguistics,
          <year>2018</year>
          , pp.
          <fpage>1638</fpage>
          -
          <lpage>1649</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          , T. Wolf,
          <article-title>Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</article-title>
          , ArXiv abs/
          <year>1910</year>
          .01108 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Akbik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blythe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rasul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schweter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vollgraf</surname>
          </string-name>
          ,
          <string-name>
            <surname>FLAIR:</surname>
          </string-name>
          <article-title>An easy-to-use framework for state-of-the-</article-title>
          <string-name>
            <surname>art</surname>
            <given-names>NLP</given-names>
          </string-name>
          ,
          <source>in: NAACL</source>
          <year>2019</year>
          ,
          <article-title>2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations</article-title>
          ),
          <year>2019</year>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>