<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>University of Regensburg at CheckThat! 2021: Exploring Text Summarization for Fake News Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Philipp Hartl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Udo Kruschwitz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Regensburg</institution>
          ,
          <addr-line>Universitätsstraße 31, 93053 Regensburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present our submission to the CLEF 2021 CheckThat! challenge. More specifically, we took part in Task 3a, Multi-class fake news detection of news articles. The conceptual idea of our work is that (a) transformer-based approaches represent a strong foundation for a broad range of NLP tasks including fake news detection, and that (b) compressing the original input documents into some form of automatically generated summary before classifying them is a promising approach. The oficial results indicate that this is indeed an interesting direction to explore. They also confirm that oversampling to address the class imbalance was efective to further improve the results. We also note that both abstractive and extractive summarization approaches score way better when we do not apply hypertuning of parameters suggesting that the small scale of the test collection leads to overfitting.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Fake News Detection</kwd>
        <kwd>Text Summarization</kwd>
        <kwd>Abstractive / Extractive Summarization</kwd>
        <kwd>CLEF</kwd>
        <kwd>BERT</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Fake news, misinformation and disinformation is by no means a recent phenomenon, but instead
has been around since classical antiquity when manipulated information was used to discredit
political opponents or alter battle courses [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. What did change though over time was the scale
and extent of the problem, e.g. initially dissemination happened verbally, but the invention
of the printing press marked a major milestone as easy access and distribution of information
combined with increasing literacy enabled more people to consume and create information.
The advent of social media with the freedom to publish marks the birth of a yet another era
altogether [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The term fake news has been particularly prevalent in the mainstream media
since the 2016 US election, when a large amount of intentionally false news was spread through
social media during the campaign [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. These platforms operate with a non-restrictive content
policy by design and provide various ways for automation which eases the spread of mis- and
disinformation. Combined with their enormous user bases (e.g. Facebook with 2.8 billion active
users in December 2020 1) information is able to reach many people in a very short period
of time. In an age of information pollution (irrelevant, redundant, unsolicited and low-value
information [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) it is therefore important to (semi-) automatically identify such claims and
minimize their harm – in particular as humans appear to not be very skilled at identifying
disinformation, with typical recognition rates only being slightly better than chance [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        CheckThat! Lab [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is an evaluation campaign which is part of the 2021 Cross-Language
Evaluation Forum (CLEF) conference and contains three tasks related to fact-checking or fake
news detection with two subtasks each. Our team participated in this year’s Task 3a whose
goal it is to create a system to identify fake news in a multi-label scenario. We built four models
based on fine-tuned BERT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a highly-popular bidirectional transformer architecture, and
abstractive respectively extractive summarization technologies [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. Our best submitted model
(abstractive summarization) was ranked 8th among all 25 participating teams in the lab for this
task. Post-hoc runs reveal though that the same runs but without hyperparameter tuning lead
to substantially improved results (placing our best run 3rd in the ranked list). In this paper, we
describe our participation in Task 3a at CLEF 2021 in detail.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Traditionally, fake news detection is modelled as a classification problem but often with varying
class numbers. While datasets like FakeNewsNet [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], MM-COVID [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] or ReCOVery [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
provide only two labels and hence see fake news detection as a binary classification, there
exist also several datasets which got multiple labels such as FEVER [13], NELA-GT-2019 [14] or
the dataset provided by the organizers of this task (see Section 3). Unfortunately, generating
comprehensive datasets still takes a lot of work as the ground-truth labels often need to be
assigned by, e.g., journalists or domain experts. Fake news detection systems typically adopt
one of three general approaches or a combination of them. The most commonly used way
is based on the news content which can be either linguistic, auditory (e.g., attached voice
recordings) or visual (e.g., images or videos) [15]. This is based on the assumption that real
and fantasy statements difer in content style and quality [ 16]. Therefore, it is possible to
successfully diferentiate claims solely on their content with either hand-engineered features
[17] or deep learning methods [18]. However, approaches which only focus on the news content
might miss valuable context information. Hence, feedback-based solutions target secondary
information such as user engagements [19] and dissemination networks [20]. These approaches
are often used in combination with content-based methods to increase performance [21]. While
contextual information can be useful when available, it is often not or only partially available
(as reflected by common benchmark collections for fake news detection [ 22, 13]). While both
methods discussed above are limited to a snapshot of features present at the time of training,
intervention-based methods try to dynamically interpret real-time dissemination data. These
are arguably the least common approaches used at the moment because of their dificult way
to evaluate [
        <xref ref-type="bibr" rid="ref13">23</xref>
        ]. When used though, they try to intervene the process of fake news spreading
through e.g., injecting of true news into social networks [
        <xref ref-type="bibr" rid="ref14">24</xref>
        ] or user intervention [
        <xref ref-type="bibr" rid="ref15 ref16">25, 26</xref>
        ]. In
this work we use a solely content-based approach simply because the dataset provided for this
challenge has no additional context data. Additionally, gathering of some context data was
explicitly forbidden as described in Section 4, so we decided to focus on a text-based solution.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Task Description</title>
      <p>
        This year there have been a total of three CheckThat! tasks with two subtasks each [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We
participated in Task 3a: Multi-class fake news detection of news articles, which is a part of Task
3: Fake News Detection. The goal is to “given the text of a news article, determine whether the
main claim made in the article is true, partially true, false, or other”. The data used in this task
is only available in English. As this task is designed as a four-class classification problem, the
oficial evaluation metric introduced by the organizers is the F1-macro score. The F1-macro
score is simply the mean of class-wise F1 scores:
      </p>
      <p>
        * 
 1 = 2 *  + 
 1 =
1 ∑︁  1
 =0
Up to five runs were permitted for each team. We submitted three competitive configurations
and one baseline run to compare against our own approaches. Further details on all tasks can
be found in the task overview [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Dataset</title>
      <p>
        As this work is part of this year’s CLEF CheckThatLab! [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] Task 3a, we used a modified version
of the dataset by Shahi [
        <xref ref-type="bibr" rid="ref17">27</xref>
        ] provided by the organizers. This dataset also got four diferent
classes to predict as defined in [
        <xref ref-type="bibr" rid="ref18">28</xref>
        ]. The distribution of each class in the provided training and
test data can be seen in Table 1. The dataset was given in .csv format with four columns:
• public_id — unique identifier of the news article
• title — title/heading of the news article
• text — text content of the news article
• our rating — class of the news article (either false, partially false, true or other)
      </p>
      <p>The training set contains 950 data points including the 50 sample data points released before
both batches of data. The provided test set got 364 data points without labels. We received
the ground-truth labels separately after the competition had finished (see Table 1). Each group
had to submit a .csv file with their predictions separately on Codalab 2. Additionally, through
a data sharing agreement, it was forbidden to identify individuals and the original entries on
the fact-checking websites. Therefore, we refrained from finding this information, although it
would have been useful for classification purposes as demonstrated on a similar task [17].</p>
      <sec id="sec-4-1">
        <title>2https://competitions.codalab.org/competitions/31238</title>
        <p>(1)
(2)</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Methodology</title>
      <p>In the following section we provide an overview on how we prepared the data, the models
we used as well as the training and evaluation process. Everything has been implemented in
Python and is available on Github.3</p>
      <sec id="sec-5-1">
        <title>5.1. Data preparation</title>
        <p>
          We started our preprocessing with first converting all labels to numeric values. We used 0
for true, 1 for false, 2 for partially false and 3 for other. As seen in Table 1, the four classes
are not equally distributed. We therefore applied random oversampling of all classes except
the majority class using the imbalanced-learn package [
          <xref ref-type="bibr" rid="ref19">29</xref>
          ] with the aim to train a better
classifier. Additionally, we generated abstractive and extractive summaries (we did this ofline
as in particular the generation of abstractive summaries was time-consuming). Before sending
the text into our models we also tokenized and normalized the texts.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Model architecture</title>
        <p>
          All models used are fine-tuned variants of Google’s BERT [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and use the bert-base-uncased
implementation provided by Wolf et al. [
          <xref ref-type="bibr" rid="ref20">30</xref>
          ] in conjunction with a linear layer on top to predict
the output. We have chosen BERT because it already has shown good performance in various
text classification tasks [
          <xref ref-type="bibr" rid="ref21">31</xref>
          ] as well as in fake news detection [
          <xref ref-type="bibr" rid="ref22">32</xref>
          ]. Due to limited computational
resources we could not use a more sophisticated BERT model like RoBERTa [
          <xref ref-type="bibr" rid="ref23">33</xref>
          ]. One of the
main drawbacks of BERT-based models is the maximum sequence length each model is able to
process which is at a maximum of 512 tokens (word pieces) for BERT. Unfortunately, fake news
articles often are a longer than this value [
          <xref ref-type="bibr" rid="ref24">34</xref>
          ]. In the provided dataset the mean token length is
806 with at least 55% of texts exceeding the 512 token limit. As these values are calculated with
nltk [
          <xref ref-type="bibr" rid="ref25">35</xref>
          ] and word pieces do not exactly match tokens, the real ratio is even higher (all other
token values reported are calculated similarly). By default, BERT-based models simply truncate
the text to the desired input length (or apply padding if it is too short). This leads to the loss of
potentially important information in the input text. To circumvent this issue we propose three
diferent solutions, all aimed at compressing the original text:
• Modified hierarchical transformer representation
• Extractive summarization
• Abstractive summarization
Hierarchical transformer representations have been introduced by Pappagari et al. [
          <xref ref-type="bibr" rid="ref26">36</xref>
          ]. In their
work they suggest splitting the input text into smaller text segments with overlapping parts
(stride) to represent the structure of the text. In our model we split the text into parts of 500
tokens with a stride length of 50. After getting the BERT embeddings for each text segment we
then calculated the mean representation dimensionally and fed this into BERT. The output of
BERT is then used to classify the input text. Mean embeddings have been successfully used
before by Mulyar et al. [
          <xref ref-type="bibr" rid="ref27">37</xref>
          ]
        </p>
        <sec id="sec-5-2-1">
          <title>3https://github.com/phHartl/CheckThatLab_2021</title>
          <p>
            Another possible solution is to use automatic summarization to get a more condensed text
representation. Deep learning models such as BART [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], XLNet [
            <xref ref-type="bibr" rid="ref28">38</xref>
            ] or ALBERT [
            <xref ref-type="bibr" rid="ref29">39</xref>
            ] perform
exceptionally well on summarization tasks like SQuAD [
            <xref ref-type="bibr" rid="ref30">40</xref>
            ] or ELI5 [
            <xref ref-type="bibr" rid="ref31">41</xref>
            ] - even sometimes
surpassing humans. These algorithms are able to reduce the text length by a significant amount
if desired, which is ideal for the initial problem with BERT. In our work we use the extractive
summarization technology implemented by [
            <xref ref-type="bibr" rid="ref32">42</xref>
            ]. Note that while this method is also based on
BERT it has no maximum sequence length. To ensure a better summarization quality while
keeping the running time reasonable we activated co-reference handling (better
contextualization) and used distilBERT [
            <xref ref-type="bibr" rid="ref33">43</xref>
            ] as the underlying model. In contrast to [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] we are interested in
long sequences and not only the first two sentences for classifying. After manually inspecting
diferent configurations we settled with a summarization ratio of 0.40.
          </p>
          <p>
            Apart from an extractive approach we also implemented an abstractive technique based on
BART. This model is specifically well suited for text generation, outperforming similar ones on
summarization tasks like SQuAD 1.1 [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]. The Huggingface transformers library [
            <xref ref-type="bibr" rid="ref20">30</xref>
            ] provides
an easy way to use BART-models for sequence generation. Because of the repetitive nature of
greedy and beam search [
            <xref ref-type="bibr" rid="ref34 ref35">44, 45</xref>
            ] we used Top-K [
            <xref ref-type="bibr" rid="ref36">46</xref>
            ] and Top-p sampling [
            <xref ref-type="bibr" rid="ref37">47</xref>
            ] for our summaries.
The exact model we used is sshleifer/distilbart-cnn-12-64, which is a smaller BART model trained
on a news summarization dataset by Hermann et al. [
            <xref ref-type="bibr" rid="ref38">48</xref>
            ]. In our final configuration we used
the 100 (Top-K) most likely words and a probability (Top-p) of 95%. Like BERT, BART has a
sequence limit of 1024 tokens. Therefore, if the input text was longer than 1000 tokens we
used our first approach to ensure all parts of the text are taken into consideration when getting
summarized. We also tried to get a summarization ratio of roughly 40% for better comparability
to the extractive approach. However, as both approaches are not deterministic this cannot
always be guaranteed (also, as noted, both approaches take quite a while to execute, so we
saved the results in files once generated). Additionally, due to the late release of the dataset we
could not try out many configurations but instead had to use suggested configurations.
          </p>
          <p>Finally, the submitted models all use the hierarchical text representation (even when using
text summaries). There is one model for each type of input text aka. no summary, extractive
summary or abstractive summary. We also submitted a run without oversampling for better
comparability.</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Experimental setup</title>
        <p>
          For training, we represented each input as [CLS] + title + [SEP] + text, where text is either the
original text or one of the two summaries produced and [CLS] is a classification token and
[SEP] is a token to indicate a separator between two sentences. For training, we use an 80/20
training/validation split and optimize hyperparameters based on the loss of the validation
set. We used the same initial random state and split for all configurations to provide a better
comparability. We used a batch size of 8, an initial learning rate of 5e-5, a weight decay of 0.01
with 500 warm-up steps and three training epochs with an AdamW [
          <xref ref-type="bibr" rid="ref39">49</xref>
          ] optimizer. Everything
was trained on a single RTX 2080 Ti with 11 GB VRAM using the Huggingface library.
        </p>
        <sec id="sec-5-3-1">
          <title>4https://huggingface.co/sshleifer/distilbart-cnn-12-6</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>We report three sets of results – (a) oficial results for all four of our runs, and for comparison
we also present results obtained on (b) the development set as well as (c) the test set without
hyperparameter tuning (not submitted to the challenge).</p>
      <p>First of all, in Table 2 we present the oficial results as returned to us by the shared task
organizers. We marked the best-performing model for each metric in bold.5 Recall, that
hierarchical transformer representation is applied to the source text in all of our runs, i.e. the
term "original texts" refers to text that has been created this way but without subsequently
applying abstractive or extractive summarization, respectively.</p>
      <p>To contextualise the oficial results better (and also due to the fact that at this point we
do not have oficial baseline results to compare against), we also report the results on the
validation set (see Table 3). The configuration is the same as described in section 5.3 but without
hyperparameter tuning (using a 80/20 split of the training data).</p>
    </sec>
    <sec id="sec-7">
      <title>7. Discussion</title>
      <p>First of all we observe that the non-fine-tuned model and the model which has been trained
without oversampling the minority classes perform worst in all setups. This is in line with
expectations.</p>
      <sec id="sec-7-1">
        <title>5Because of the extremely close values in Table 2 we added additional fractional digits.</title>
        <p>6The value for extractive summarization has been calculated with the oficial evaluation script afterwards, as
there was a problem when uploading the file</p>
        <p>It however gets more complicated when comparing the other models. The oficial runs
suggest that BERT w/ abstractive summaries wins overall by a tiny bit, but is on par with BERT
w/ original texts (i.e. the original articles hierarchically transformed but without applying
summarization). Given that this makes it into 8th place of 25 submissions and the fact that
abstractive summarization is becoming more and more competitive, we see this as a clear signal
that our general conceptual idea is a promising one.</p>
        <p>When taking a look at the oficial results for BERT w/ extractive summaries and BERT w/o
oversampling, both models are still reasonably well-placed in the rankings. They would have
ranked 16th and 18th respectively showing how well a vanilla BERT is pre-trained already.</p>
        <p>
          Looking beyond the oficial results, we observe some wide variation of scores though. While
BERT w/ extractive summaries performs better than other approaches when not using
hyperparameter tuning (see Table 4), it scores way worse when hyperparameter tuning is in place (Table
2). In fact, not applying hyperparameter tuning would rank the system in 3rd position of the
ranked list of 25 runs with an F1-macro of 0.508. This seems to be an indication of overfitting
happening internally. The validation set in general seems to be not well suited to learn with, as
all tuned models perform better when applying them to the test dataset directly (this is also the
case, when the training set is exactly the same). All this raises some concerns about the size,
robustness and generalisability of the test collection. This is by no means a novel finding, and
some researchers go as far as to call the current (commonly applied) NLP evaluation approach
to be broken [
          <xref ref-type="bibr" rid="ref40">50</xref>
          ]. We conclude that we will have to test our methodology on a wide range of
additional collections to gain a better understanding of its strengths and weaknesses.
        </p>
        <p>One last point to note, there seems to be only little diference in performance when using
BERT w/ original texts or BERT w/ abstractive summaries. Interestingly, the respective models
achieve very similar performance independently of the dataset and experimental setup used.</p>
        <sec id="sec-7-1-1">
          <title>7.1. Limitiatons</title>
          <p>
            Due to the nature of such challenges there was not much time to try diferent experimental
setups. Especially abstractive summarization generation has a lot of diferent parameters to
work with. Unfortunately, one iteration for those alone takes about half a day of computing time
on our system. While we always tried to use the recommended configurations when possible,
we could only use BERT with a maximum batch size of 8. It would have been interesting to see,
whether batch sizes of 16 or greater make a significant diference in performance. Previous work
on parameter tuning of BERT suggests this [
            <xref ref-type="bibr" rid="ref41">51</xref>
            ]. While BERT itself is a very sophisticated system,
an approach using an even better system like RoBERTa [
            <xref ref-type="bibr" rid="ref23">33</xref>
            ] or XLNet [
            <xref ref-type="bibr" rid="ref28">38</xref>
            ] could outperform it.
This has already been proven in their respective papers on other NLP tasks. The substantial
diference in performance between the oficial results (Table 2) and our reruns on the test set
(Table 4) indicate that the chosen experimental setup might either not have been ideal for
this task or the data sets were simply too small. While hyperparameter tuning is often useful,
in this case we achieve better results without it. However, this could also be because of the
validation/dev set we acquired. As seen in Table 3 all models perform worse here than on
the actual test set. This indicates a bad seed for our validation set we optimized on. Also, a
summarization ratio of 0.40 was picked quite arbitrarily which might or might not restrict the
full potential of summarizations.
          </p>
        </sec>
        <sec id="sec-7-1-2">
          <title>7.2. Future Work</title>
          <p>In future it would certainly be interesting to explore more configurations and applications
of automatic summarization. We believe summarization has the potential to enable better
transferable knowledge. This could be useful for a variety of classification tasks as many models
often only work in a certain domain. Therefore, it would be interesting to compare models
trained on automatic summarizations and compare their performance in diferent domains
working as a kind of “normalization” technique. We expect summarization of texts to limit
overfitting in the future. With the results of Table 4 in mind, we hypothesize that there is a lot
of room for improvement still available. We plan to apply our approaches on more datasets in
the future and try to optimize the tuning further.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusions</title>
      <p>We presented an approach for fake news detection that is based on the powerful paradigm of
transformer-based embeddings and utilises text summarization as the main text transformation
step before classifying a document. The results suggest that this is indeed a worthwhile direction
of work and in future work we plan to explore this further. We note that using oversampling
has a strong positive efect on system performance. What we did also observe was that the
performance obtained on diferent datasets and based on diferent models of hypertuning varied
substantially. One way forward is to apply our framework to larger datasets to see how robust
extractive and abstractive summarisation might be for the task at hand.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgements</title>
      <p>This work was supported by the project COURAGE: A Social Media Companion Safeguarding
and Educating Students funded by the Volkswagen Foundation, grant number 95564.
on Information &amp; Knowledge Management, Association for Computing Machinery, New
York, NY, USA, 2020, pp. 3205–3212. URL: https://doi.org/10.1145/3340531.3412880.
[13] J. Thorne, A. Vlachos, C. Christodoulopoulos, A. Mittal, FEVER: a Large-scale Dataset for
Fact Extraction and VERification, in: Proceedings of the 2018 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language
Technologies, Volume 1 (Long Papers), Association for Computational Linguistics, New
Orleans, Louisiana, 2018, pp. 809–819. URL: https://www.aclweb.org/anthology/N18-1074.
doi:10.18653/v1/N18-1074.
[14] M. Gruppi, B. D. Horne, S. Adalı, NELA-GT-2019: A Large Multi-Labelled News Dataset
for The Study of Misinformation in News Articles, arXiv:2003.08444 [cs] (2020). URL:
http://arxiv.org/abs/2003.08444, arXiv: 2003.08444.
[15] X. Zhou, J. Wu, R. Zafarani, SAFE: Similarity-Aware Multi-modal Fake News Detection,
in: H. W. Lauw, R. C.-W. Wong, A. Ntoulas, E.-P. Lim, S.-K. Ng, S. J. Pan (Eds.), Advances
in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, Springer
International Publishing, Cham, 2020, pp. 354–367. doi:10.1007/978-3-030-47436-2_
27.
[16] U. Undeutsch, Beurteilung der glaubhaftigkeit von aussagen, Handbuch der psychologie
11 (1967) 26–181.
[17] C. Yuan, Q. Ma, W. Zhou, J. Han, S. Hu, Early Detection of Fake News by
Utilizing the Credibility of News, Publishers, and Users based on Weakly Supervised
Learning, in: Proceedings of the 28th International Conference on Computational
Linguistics, International Committee on Computational Linguistics, Barcelona, Spain
(Online), 2020, pp. 5444–5454. URL: https://www.aclweb.org/anthology/2020.coling-main.475.
doi:10.18653/v1/2020.coling-main.475.
[18] L. Cui, S. Wang, D. Lee, SAME: sentiment-aware multi-modal embedding for detecting fake
news, in: Proceedings of the 2019 IEEE/ACM International Conference on Advances in
Social Networks Analysis and Mining, ASONAM ’19, Association for Computing
Machinery, New York, NY, USA, 2019, pp. 41–48. URL: https://doi.org/10.1145/3341161.3342894.
doi:10.1145/3341161.3342894.
[19] K. Shu, X. Zhou, S. Wang, R. Zafarani, H. Liu, The role of user profiles for fake news
detection, in: Proceedings of the 2019 IEEE/ACM International Conference on Advances in
Social Networks Analysis and Mining, ASONAM ’19, Association for Computing
Machinery, New York, NY, USA, 2019, pp. 436–439. URL: https://doi.org/10.1145/3341161.3342927.
doi:10.1145/3341161.3342927.
[20] K. Shu, D. Mahudeswaran, S. Wang, H. Liu, Hierarchical Propagation Networks for Fake
News Detection: Investigation and Exploitation, Proceedings of the International AAAI
Conference on Web and Social Media 14 (2020) 626–637. URL: https://ojs.aaai.org/index.
php/ICWSM/article/view/7329.
[21] K. Shu, L. Cui, S. Wang, D. Lee, H. Liu, dEFEND: Explainable Fake News Detection, in:
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery
&amp; Data Mining, ACM, Anchorage AK USA, 2019, pp. 395–405. URL: https://dl.acm.org/doi/
10.1145/3292500.3330935. doi:10.1145/3292500.3330935.
[22] W. Y. Wang, “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News
Detection, in: Proceedings of the 55th Annual Meeting of the Association for Computational</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Burkhardt</surname>
          </string-name>
          , History of Fake News,
          <source>Library Technology Reports</source>
          <volume>53</volume>
          (
          <year>2017</year>
          )
          <fpage>5</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Baeza-Yates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ribeiro-Neto</surname>
          </string-name>
          (Eds.),
          <source>Modern Information Retrieval</source>
          , 2nd ed.,
          <source>AddisonWesley</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Allcott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gentzkow</surname>
          </string-name>
          ,
          <article-title>Social media and fake news in the 2016 election</article-title>
          ,
          <source>Journal of economic perspectives 31</source>
          (
          <year>2017</year>
          )
          <fpage>211</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Orman</surname>
          </string-name>
          ,
          <article-title>Fighting Information Pollution with Decision Support Systems</article-title>
          ,
          <source>Journal of Management Information Systems</source>
          <volume>1</volume>
          (
          <year>1984</year>
          )
          <fpage>64</fpage>
          -
          <lpage>71</lpage>
          . URL: https://doi.org/10.1080/07421222.
          <year>1984</year>
          .
          <volume>11517704</volume>
          . doi:
          <volume>10</volume>
          .1080/07421222.
          <year>1984</year>
          .
          <volume>11517704</volume>
          , publisher: Routledge _eprint: https://doi.org/10.1080/07421222.
          <year>1984</year>
          .
          <volume>11517704</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V. L.</given-names>
            <surname>Rubin</surname>
          </string-name>
          ,
          <article-title>On deception and deception detection: Content analysis of computer-mediated stated beliefs</article-title>
          ,
          <source>Proceedings of the American Society for Information Science and Technology</source>
          <volume>47</volume>
          (
          <year>2010</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . Publisher: Wiley Online Library.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D. S.</given-names>
            <surname>Martino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Míguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          , T. Mandl,
          <article-title>The CLEF-2021 checkthat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news</article-title>
          , in: D.
          <string-name>
            <surname>Hiemstra</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Moens</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Perego</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Sebastiani</surname>
          </string-name>
          (Eds.),
          <source>Advances in Information Retrieval - 43rd European Conference on IR Research</source>
          , ECIR
          <year>2021</year>
          ,
          <string-name>
            <surname>Virtual</surname>
            <given-names>Event</given-names>
          </string-name>
          ,
          <year>March</year>
          28 - April 1,
          <year>2021</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , volume
          <volume>12657</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2021</year>
          , pp.
          <fpage>639</fpage>
          -
          <lpage>649</lpage>
          . URL: https: //doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -72240-1_
          <fpage>75</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -72240-1\_
          <fpage>75</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics</article-title>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://www.aclweb.org/ anthology/N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghazvininejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , L. Zettlemoyer, BART:
          <article-title>Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>7871</fpage>
          -
          <lpage>7880</lpage>
          . URL: https://www.aclweb.org/anthology/2020.acl-main.
          <volume>703</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>703</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Connecting the Dots Between Fact Verification and Fake News Detection</article-title>
          ,
          <source>in: Proceedings of the 28th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1820</fpage>
          -
          <lpage>1825</lpage>
          . URL: https://www.aclweb.org/anthology/2020.coling-main.
          <volume>165</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .coling-main.
          <volume>165</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Shu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mahudeswaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          , H. Liu,
          <article-title>FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media, Big Data 8 (</article-title>
          <year>2020</year>
          )
          <fpage>171</fpage>
          -
          <lpage>188</lpage>
          . URL: https://www.liebertpub.com/doi/abs/ 10.1089/big.
          <year>2020</year>
          .
          <volume>0062</volume>
          . doi:
          <volume>10</volume>
          .1089/big.
          <year>2020</year>
          .
          <volume>0062</volume>
          , publisher: Mary Ann Liebert, Inc., publishers.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shu</surname>
          </string-name>
          , H. Liu,
          <article-title>MM-COVID: A Multilingual and Multidimensional Data Repository for CombatingCOVID-19 Fake News</article-title>
          , arXiv:
          <year>2011</year>
          .04088 [cs] (
          <year>2020</year>
          ). URL: http: //arxiv.org/abs/
          <year>2011</year>
          .04088, arXiv:
          <year>2011</year>
          .04088 version: 1.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mulay</surname>
          </string-name>
          , E. Ferrara, R. Zafarani,
          <article-title>ReCOVery: A Multimodal Repository for COVID19 News Credibility Research</article-title>
          ,
          <source>in: Proceedings of the 29th ACM International Conference Linguistics (Volume</source>
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Vancouver, Canada,
          <year>2017</year>
          , pp.
          <fpage>422</fpage>
          -
          <lpage>426</lpage>
          . URL: https://www.aclweb.org/anthology/P17-2067. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P17</fpage>
          -2067.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ruchansky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Y. Liu, Combating Fake News:
          <article-title>A Survey on Identification and Mitigation Techniques</article-title>
          , arXiv:
          <year>1901</year>
          .06437 [cs, stat] (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1901</year>
          .06437, arXiv:
          <year>1901</year>
          .06437.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Farajtabar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Trivedi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Khalil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zha</surname>
          </string-name>
          ,
          <source>Fake News Mitigation via Point Process Based Intervention</source>
          , arXiv:
          <fpage>1703</fpage>
          .07823 [cs] (
          <year>2017</year>
          ). URL: http://arxiv.org/abs/1703.07823, arXiv:
          <fpage>1703</fpage>
          .
          <fpage>07823</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Papanastasiou</surname>
          </string-name>
          ,
          <article-title>Fake News Propagation and Detection: A Sequential Model</article-title>
          ,
          <source>Management Science</source>
          <volume>66</volume>
          (
          <year>2020</year>
          )
          <fpage>1826</fpage>
          -
          <lpage>1846</lpage>
          . URL: https://pubsonline.informs.org/doi/10.1287/mnsc.
          <year>2019</year>
          .
          <volume>3295</volume>
          . doi:
          <volume>10</volume>
          .1287/mnsc.
          <year>2019</year>
          .
          <volume>3295</volume>
          , publisher: INFORMS.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tabibian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Gomez-Rodriguez, Leveraging the Crowd to Detect and Reduce the Spread of Fake News and Misinformation</article-title>
          ,
          <source>in: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining</source>
          , WSDM '18,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2018</year>
          , pp.
          <fpage>324</fpage>
          -
          <lpage>332</lpage>
          . URL: https://doi.org/10.1145/3159652.3159734. doi:
          <volume>10</volume>
          .1145/3159652.3159734.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Amused:</surname>
          </string-name>
          <article-title>An annotation framework of multi-modal social media data</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>00502</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dirkson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Majchrzak</surname>
          </string-name>
          ,
          <article-title>An exploratory study of covid-19 misinformation on twitter</article-title>
          ,
          <source>Online Social Networks and Media</source>
          <volume>22</volume>
          (
          <year>2021</year>
          )
          <fpage>100104</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lemaître</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Aridas</surname>
          </string-name>
          ,
          <article-title>Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning</article-title>
          ,
          <source>The Journal of Machine Learning Research</source>
          <volume>18</volume>
          (
          <year>2017</year>
          )
          <fpage>559</fpage>
          -
          <lpage>563</lpage>
          . Publisher: JMLR. org.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Funtowicz, HuggingFace's Transformers: State-of-the-art natural language processing</article-title>
          , arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>03771</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ostendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bourgonje</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Berger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Moreno-Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Rehm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <article-title>Enriching BERT with Knowledge Graph Embeddings for Document Classification</article-title>
          , arXiv:
          <year>1909</year>
          .08402 [cs] (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1909</year>
          .08402, arXiv:
          <year>1909</year>
          .08402.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>BERT-Based Mental Model, a Better Fake News Detector</article-title>
          ,
          <source>in: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence</source>
          , Association for Computing Machinery, New York, NY, USA,
          <year>2020</year>
          , pp.
          <fpage>396</fpage>
          -
          <lpage>400</lpage>
          . URL: https://doi.org/10.1145/3404555.3404607.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>W.</given-names>
            <surname>Souma</surname>
          </string-name>
          , I. Vodenska,
          <string-name>
            <given-names>H.</given-names>
            <surname>Aoyama</surname>
          </string-name>
          ,
          <article-title>Enhanced news sentiment analysis using deep learning methods</article-title>
          ,
          <source>Journal of Computational Social Science</source>
          <volume>2</volume>
          (
          <year>2019</year>
          )
          <fpage>33</fpage>
          -
          <lpage>46</lpage>
          . Publisher: Springer.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>E.</given-names>
            <surname>Loper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bird</surname>
          </string-name>
          ,
          <article-title>Nltk: The natural language toolkit</article-title>
          ,
          <source>arXiv preprint cs/0205028</source>
          (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pappagari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Żelasko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Villalba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Carmiel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dehak</surname>
          </string-name>
          ,
          <article-title>Hierarchical Transformers for Long Document Classification</article-title>
          , arXiv:
          <year>1910</year>
          .10781 [cs, stat] (
          <year>2019</year>
          ). URL: http://arxiv.org/ abs/
          <year>1910</year>
          .10781, arXiv:
          <year>1910</year>
          .10781.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mulyar</surname>
          </string-name>
          , E. Schumacher,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rouhizadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dredze</surname>
          </string-name>
          ,
          <article-title>Phenotyping of Clinical Notes with Improved Document Classification Models Using Contextualized Neural Language Models</article-title>
          , arXiv:
          <year>1910</year>
          .13664 [cs] (
          <year>2020</year>
          ). URL: http://arxiv.org/abs/
          <year>1910</year>
          .13664, arXiv:
          <year>1910</year>
          .13664.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , J. Carbonell, R. Salakhutdinov,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Xlnet: Generalized autoregressive pretraining for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>08237</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goodman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , R. Soricut,
          <string-name>
            <surname>Albert:</surname>
          </string-name>
          <article-title>A lite bert for self-supervised learning of language representations</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>11942</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rajpurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lopyrev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          , Squad:
          <volume>100</volume>
          ,000+
          <article-title>questions for machine comprehension of text</article-title>
          ,
          <source>arXiv preprint arXiv:1606.05250</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Grangier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Auli, Eli5: Long form question answering</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>09190</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>D.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Leveraging BERT for extractive text summarization on lectures</article-title>
          , arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>04165</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          , T. Wolf,
          <article-title>DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter</article-title>
          , arXiv:
          <year>1910</year>
          .01108 [cs] (
          <year>2020</year>
          ). URL: http://arxiv.org/abs/
          <year>1910</year>
          . 01108, arXiv:
          <year>1910</year>
          .01108.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [44]
          <string-name>
            <surname>A. K. Vijayakumar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Cogswell</surname>
            ,
            <given-names>R. R.</given-names>
          </string-name>
          <string-name>
            <surname>Selvaraju</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Crandall</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Batra</surname>
          </string-name>
          ,
          <article-title>Diverse beam search: Decoding diverse solutions from neural sequence models</article-title>
          ,
          <source>arXiv preprint arXiv:1610.02424</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>L.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gouws</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Britz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goldie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Strope</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kurzweil</surname>
          </string-name>
          ,
          <article-title>Generating high-quality and informative conversation responses with sequence-to-sequence models</article-title>
          ,
          <source>arXiv preprint arXiv:1701.03185</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dauphin</surname>
          </string-name>
          ,
          <article-title>Hierarchical neural story generation</article-title>
          , arXiv preprint arXiv:
          <year>1805</year>
          .
          <volume>04833</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>A.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Buys</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Forbes</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Choi,</surname>
          </string-name>
          <article-title>The curious case of neural text degeneration</article-title>
          , arXiv preprint arXiv:
          <year>1904</year>
          .
          <volume>09751</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [48]
          <string-name>
            <surname>K. M. Hermann</surname>
            , T. Kočiskyˆ, E. Grefenstette,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Espeholt</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Kay</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Suleyman</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Blunsom</surname>
          </string-name>
          ,
          <article-title>Teaching machines to read and comprehend</article-title>
          ,
          <source>arXiv preprint arXiv:1506.03340</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <article-title>Decoupled weight decay regularization</article-title>
          ,
          <source>arXiv preprint arXiv:1711.05101</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Bowman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Dahl</surname>
          </string-name>
          ,
          <article-title>What will it take to fix benchmarking in natural language understanding?</article-title>
          , in: K.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rumshisky</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zettlemoyer</surname>
            , D. HakkaniTür, I. Beltagy,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Cotterell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Chakraborty</surname>
          </string-name>
          , Y. Zhou (Eds.),
          <source>Proceedings of the</source>
          <year>2021</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online</article-title>
          , June 6- 11,
          <year>2021</year>
          , Association for Computational Linguistics,
          <year>2021</year>
          , pp.
          <fpage>4843</fpage>
          -
          <lpage>4855</lpage>
          . URL: https: //www.aclweb.org/anthology/2021.naacl-main.
          <volume>385</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>M.</given-names>
            <surname>Guderlei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aßenmacher</surname>
          </string-name>
          ,
          <article-title>Evaluating Unsupervised Representation Learning for Detecting Stances of Fake News</article-title>
          ,
          <source>in: Proceedings of the 28th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>6339</fpage>
          -
          <lpage>6349</lpage>
          . URL: https://www.aclweb.org/anthology/ 2020.coling-main.
          <volume>558</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .coling-main.
          <volume>558</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>