<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Role of Images for Analyzing Claims in Social Media</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>L3S Research Center, Leibniz University Hannover</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Leibniz Information Centre for Science and Technology</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Fake news is a severe problem in social media. In this paper, we present an empirical study on visual, textual, and multimodal models for the tasks of claim, claim check-worthiness, and conspiracy detection, all of which are related to fake news detection. Recent work suggests that images are more in uential than text and often appear alongside fake text. To this end, several multimodal models have been proposed in recent years that use images along with text to detect fake news on social media sites like Twitter. However, the role of images is not well understood for claim detection, speci cally using transformer-based textual and multimodal models. We investigate state-of-the-art models for images, text (Transformer-based), and multimodal information for four di erent datasets across two languages to understand the role of images in the task of claim and conspiracy detection.</p>
      </abstract>
      <kwd-group>
        <kwd>Fake News Detection</kwd>
        <kwd>Claim Detection</kwd>
        <kwd>Conspiracy Detection</kwd>
        <kwd>Multimodal Analysis</kwd>
        <kwd>Multilingual NLP</kwd>
        <kwd>Computer Vision</kwd>
        <kwd>Transformers</kwd>
        <kwd>COVID-19</kwd>
        <kwd>5G</kwd>
        <kwd>Twitter</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Social media platforms have become an integral part of our everyday lives, where
we use them to connect with people and consume news, entertainment, and buy
or sell products. In the last decade, social media has seen exponential growth,
with more than a couple of billion users and the increasing presence of prominent
people like politicians and celebrities (also called In uencers), organizations, and
political parties. On the one hand, this allows in uential people or organizations
to reach millions of users directly, but it also allows for fake and unveri ed
information to rise and spread faster [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ] due to the nature of social media. To deal
with misinformation and false claims on online platforms, several independent
fact-checking projects like Snopes, Alt News, Our.News have been launched that
Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
manually fact-check news and publish their outcomes for public use. Although
more such initiatives are coming up worldwide, they cannot keep up with the
rate of news or information production on online platforms. Therefore, fake news
detection has gathered much interest in computer science for developing
automated methods to speed and scale up to handle the continuous fast streaming
social media data.
      </p>
      <p>
        As social media is inherently multimodal in nature, fact-checking
initiatives and computation methods consider not only text but also image
content [
        <xref ref-type="bibr" rid="ref14 ref21 ref39 ref43">14,21,39,43</xref>
        ] as it can be easily fabricated and manipulated due to the
availability of free image and video editing tools. In this paper, we investigate
the role of images in the context of claim and conspiracy detection. Claim
detection is one of the rst vital steps to identify fake news where the purpose is
to ag a statement if it contains check-worthy facts and information, while the
claim may be true or false. Whereas in conspiracy detection, a statement that
includes a conspiracy theory is fake news and consists of manipulated facts.
Although fake news on social media has been explored recently from a multimodal
perspective, images have hardly been considered for claim detection except in
recent work by Zlatkova et al. [
        <xref ref-type="bibr" rid="ref48">48</xref>
        ]. Here, meta-information of images is treated
as features, and reverse image search is performed to compare the claim text.
However, the image's semantic information is not considered, and the authors
highlight that images are more in uential than text and appear alongside fake
text or unveri ed news.
      </p>
      <p>
        Since we are interested in the impact of using images in a multimodal
framework, to keep our models simple, we focus on extracting only semantic or
contextual features from text and do not consider its structure or syntactic information.
To this end, we mainly consider deep transformer Bidirectional Encoder
Representations from Transformers (BERT) to extract contextual embeddings and use
them along with image embeddings. Taking inspiration from recent work by Cao
et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we extract image sentiment features that are widely applied for image
credibility or fake news detection in addition to object and scene information for
the semantic overlap with textual information.
      </p>
      <p>
        To carry out this study3, we experiment with four Twitter datasets4 on
binary classi cation tasks, two of which are from the recent CLEF-CheckThat!
2020 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], one in English [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] and the other one in Arabic [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The third one is
an English dataset from MediaEval 2020 [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] on conspiracy detection, and the
last one is a recent claim detection dataset (English) from Gupta et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] on
COVID-19 tweets. Four examples for claim and conspiracy detection are shown
in Figure 1. To train our unimodal and multimodal models, we use Support
Vector Machines (SVM) [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ] and Principal Component Analysis (PCA) [
        <xref ref-type="bibr" rid="ref45">45</xref>
        ] for
dimensionality reduction due to the small datasets and large size of combined
features. We also ne-tune BERT models on the text input to see the extent
of the unimodal model's performance on limited-sized datasets and use di
erent pre-trained BERT models to see the e ect of domain gap. Furthermore,
3 Code: https://github.com/cleopatra-itn/image_text_claim_detection
4 Dataset: https://zenodo.org/record/4592249

‍♂️

‍♂️
‍♂️
we investigate the recently proposed transformer-based ViLBERT [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]
(Visionand-Language BERT) model that learns semantic features via co-attention on
image and textual inputs. Just like BERT models, we perform
xed embedding
and
      </p>
      <p>ne-tuning experiments using ViLBERT to see if a large transformer-based
multimodal model can learn meaningful representation and perform better on
small-sized datasets.</p>
      <p>The remainder of the paper is organized as follows. Section 2 brie y discusses
related work on fake news detection and the sub-problems of claim and
conspiracy detection. Section 3 presents details of image, text, and multimodal features
as well as the</p>
      <p>ne-tuned and applied models. Section 4 describes the
experimental setup, results and summarizes our ndings. Section 5 concludes the paper
with future research directions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related</title>
    </sec>
    <sec id="sec-3">
      <title>Work</title>
      <p>There is a wide body of work on fake news detection that goes well beyond this
paper's scope. Therefore, we restrict this section to multimodal fake news, claim
detection, and conspiracy detection.
2.1</p>
      <sec id="sec-3-1">
        <title>Unimodal Approaches</title>
        <p>
          The earliest claim detection works go back a decade. Rosenthal et al. [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ] in their
pioneering work extracted claims from
        </p>
        <p>
          Wikipedia discussion forums. They
classi ed them via logistic regression using the sentiment, syntactic and lexical
features like POS (Part-of-Speech) tags and n-grams, and other statistical features
over text. Since then, researchers have proposed context dependent [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], context
independent [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], and cross-domain [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and in-domain approaches for claim
detection. Recently, the transformer-based models [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] have replaced
structurebased claim detection approaches due to their success in several downstream
natural language processing (NLP) tasks.
        </p>
        <p>
          For claim detection on social media in particular, recently CLEF-CheckThat!
2020 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] hosted a challenge to detect check-worthy claims in COVID-19 related
English tweets and several other topics in Arabic. The challenge attracted several
models with top submissions [
          <xref ref-type="bibr" rid="ref32 ref44 ref7">7,32,44</xref>
          ] all using some version of transformer-based
models like BERT [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and RoBERTa [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] along with tweet meta-data and
lexical features. Outside of CLEF challenges, some works [
          <xref ref-type="bibr" rid="ref12 ref27">12,27</xref>
          ] have also conducted
a detailed study on detecting check-worthy tweets in U.S. politics and proposed
real-time systems to monitor and lter them. Taking inspiration from [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], Gupta
et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] address the limitations of current methods in cross-domain claim
detection by proposing a generalized claim detection model called LESA (Linguistic
Encapsulation and Semantic Amalgamation). Their model combines contextual
transformer features with learnable POS and dependency relation embeddings
via transformers to achieve impressive results on several datasets. For conspiracy
detection, MediaEval 2020 [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] saw interesting methods to automatically detect
5G and Coronavirus conspiracy in tweets. Top submissions used BERT [
          <xref ref-type="bibr" rid="ref28 ref8">8,28</xref>
          ]
pre-trained on COVID Twitter data, tweet meta-data, graph network data and
RoBERTa models [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] along with Graph Convolutional Neural (GCN) networks.
2.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Multimodal Approaches</title>
        <p>
          For multimodal fake news in general, several benchmark datasets have been
proposed in the last few years, generating interest in developing multimodal
visual and textual models. In one of the relatively early works, Jin et al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]
explored rumor detection on Twitter using text, social context (emoticons, URLs,
hashtags), and the image by learning a joint representation with attention from
LSTM outputs over image features. The authors observed the bene t of using
the image and social context in addition to text by improving the detection of
fake news in Twitter and Weibo datasets. Later, Wang et al. [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ], proposed an
improved model that learns a multi-task model to detect fake news as one task
and event discriminator as another task to learn event invariant representations.
Since then, improvements have been proposed via using multimodal variational
autoencoders [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], transfer learning [
          <xref ref-type="bibr" rid="ref15 ref39">15,39</xref>
          ] with transformer-based text and deep
visual CNN models. Recently, Nakamura [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] et al. proposed a fake news dataset
r/Fakeddit mined from Reddit with over 1 million samples, which includes text,
images, meta-data, and comments data. The data is labeled through distant
supervision into 2-way, 3-way, and 6-way classi cation categories. In addition to
our di erent tasks, another di erence with the approaches mentioned above is
that the size of the datasets is moderate (several thousand) to large (millions)
in comparison to a few hundred or a couple of thousand samples in our four
datasets for claim and conspiracy detection.
In this section, we provide details of di erent image (Section 3.1), textual
(Section 3.2), and multimodal (Section 3.3) models and their feature encoding process
and how classi cation models (Section 3.4) are built. An overview of classi
cation models are presented in Figure 2.
The purpose of image models is to encode the presence of di erent objects, scene,
place or background, and a ective image content. When learning a multimodal
model or a classi er, speci c overlapping patterns between image and text can
act as discriminatory features for claim detection.
        </p>
        <p>
          Object Features (Io) In order to encode objects and the overall image
content, we extract features from a pre-trained ResNet [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] model trained on
ImageNet [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] dataset. The pre-trained model has been shown to boost performance
over low-level features in several computer vision tasks. We use widely
recognized ResNet -152 and its last convolution layer to extract features instead of the
object categories ( nal layer). The nal convolutional layer outputs 2048 feature
maps each of size 7 7, which is then pooled with a global average to get a
2048-dimensional vector.
        </p>
        <p>
          Place and Scene Features (Ip) In order to encode the scene information in
an image, we extract features from a pre-trained ResNet [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] model trained on
        </p>
        <p>
          Places365 [
          <xref ref-type="bibr" rid="ref47">47</xref>
          ] dataset. In this case, we use ResNet -101 and follow the same
encoding process as described for object features.
        </p>
        <p>Hybrid Object and Scene Features (Ih) We also experiment with a hybrid
model trained on both ImageNet and Places365 datasets that encodes object
and scene information in a single model. To extract these features, we again use
a ResNet -101 model and follow the same encoding process.</p>
        <p>
          Image Sentiment (Is) To encode the image sentiment, we use a pre-trained
model [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ] that is trained on three million images using weak supervision of
sentiment label from the tweet text. Although the image labels are noisy, the model
has shown superior performance on unseen Twitter testing datasets. We use
their best CNN model based on VGG-19 [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ]. The image sentiment embeddings
(Ise) are extracted from the last layer in the model, which are 4096-dimensional
vectors. Additionally, we extract the image sentiment predictions (Isp) from the
classi cation layer that outputs a three-dimensional vector corresponding to the
probabilities of three sentiment classes (Negative, Neutral and Positive).
3.2
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Textual Models (T )</title>
        <p>
          Since context and semantics of the sentence is shown [
          <xref ref-type="bibr" rid="ref2 ref6">2,6</xref>
          ] to be important for
claim detection, we use transformer-based BERT -Base [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] (TBB), to extract
contextual word embeddings and employ di erent pooling strategies to get a
single embedding for the tweet. As di erent layers of BERT capture di erent
kinds of information, we experiment with four combinations, i.e., 1) concatenate
the last four hidden layers, 2) sum of the last four hidden layers, 3) the last
hidden layer, and 4) the second last hidden layer. We nally take an average
over the word embeddings to obtain a single vector.
        </p>
        <p>
          To reduce the domain gap for our Twitter datasets in English, we experiment
with two BERT models. The rst variant is called BERTweet [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] (TBT ) a
BERT-base model that is further pre-trained on 850 million English tweets, and
the second one called COVID-Twitter-BERT [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] (TCT ), a BERT-large model
trained on 97 million English tweets on the topic of COVID-19. For Arabic
tweets, we experiment with the AraBERT [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] (TAB) that is trained on Arabic
news corpus called OSIAN [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ] and 1.5 Billion words Arabic corpus [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. We also
perform two experiments, one with raw tweets and the other with pre-processing
tweets as part of the AraBERT's language-speci c text processing method.
        </p>
        <p>
          For English text, with vanilla BERT-base model, we pre-process the text
by following the steps mentioned in Cheema et. al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] using the publicly
available text processing tool Ekphrasis [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. We also show the performance of vanilla
BERT -base on raw tweets (TBRBaw) to re ect its sensitivity towards text
preprocessing (TBCBlean). For both BERTweet and COVID-Twitter-BERT, we follow
their pre-processing steps, which normalize text, and additionally replaces user
mentions, emails, URLs with special keywords.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Multimodal Models (M )</title>
        <p>
          ViLBERT (Vision-and-Language BERT) We use ViLBERT [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], one of
the recent multimodal transformer architectures that process image and text
inputs through two separate transformer-based streams and combines them
through transformer layers with the co-attention. It eventually outputs co-attended
image and text features that can be combined (added, multiplied or
concatenated) to learn a classi er for vision and language tasks. The authors proposed
to use visual grounding as a self-supervised pre-training task on a large
conceptual captions dataset [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ]. They used the model for various downstream tasks
involving vision and language, such as visual question answering, visual
commonsense reasoning, and caption-based image retrieval.
        </p>
        <p>
          For the image branch, ViLBERT uses state-of-the-art object detection model
Mask R-CNN [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and extracts top 100 region proposals (boxes) and their
corresponding features. These features are used in a sequence through a 5-layer
image transformer, which outputs the image region embeddings. For the text
branch, it uses BERT-base model to get the contextual word embeddings. A
6-layer transformer block with the co-attention follows the individual streams
that outputs the co-attended image and text embeddings.
        </p>
        <p>Feature Extraction In our xed embedding experiments with a SVM, we
experiment with the output of pooling and last layers of image and text branches.
With pooling layers, we directly concatenate (MpCoAolT ) the image and text
outputs. With last layer outputs we average the image region embeddings and word
embeddings to get one single embedding per modality and then concatenate
them (MaCvAgT ). From pooling layers, each modality's embedding size is a
1024dimensional vector, and the last layer average of embeddings gives 1024 and
768-dimensional vectors for image and text, respectively. For ne-tuning, we
follow ViLBERT 's downstream task approach, where the pooling layer outputs
are either added (MpAoDolD) or multiplied (MpMooUl L) and passed to a classi er. For
Arabic text, we use Google Translate to convert the text into English because
all ViLBERT models are trained on English text.</p>
        <p>
          ViLBERT is ne-tuned on several downstream tasks which can be relevant
for encapsulating image-text relationship for our claim detection problem.
Therefore, we experiment with four di erent pre-trained models, namely, conceptual
captions , image retrieval (Image-Ret ), grounding referring expressions (localize
an image region given a natural language reference) (RefCOCO ), and a
multitask model [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] that is trained on 12 di erent tasks.
3.4
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>Classi cation of Tweets</title>
        <p>For our xed embedding experiments, we train SVM models with each type
of image and text embeddings for binary classi cation of tweets as shown in
Figure 2 (a). For ne-tuning textual models (Figure 2 (b)), given that we have
relatively small-sized datasets, we only experiment with ne-tuning the last two
and four layers of transformer models for each dataset. We concatenate the image
and text features for multimodal xed embedding experiments and train an SVM
model over them for classi cation.</p>
        <p>In the case of ViLBERT (Figure 2 (c)), we again train SVM over the
extracted pooled image and text outputs for classi cation. For ne-tuning, we x
the individual transformer branches and experiment with ne-tuning the last
two and four co-attention layers to activate the interaction between modalities.
It enables us to see the e ect of only the attention mechanism that can show
the bene t of an image and text in claim detection. We use a simple classi er on
top of ViLBERT outputs as recommended by the authors of ViLBERT, which
includes a linear layer for down projecting outputs to 128 dimensions, followed
by ReLU (Recti ed Linear Unit) non-linear activation function, a normalization
layer and nally a binary classi cation layer. Dropout is used to avoid
overtting, and the ne-tuning is performed by minimizing the cross-entropy loss.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <p>In this section, we describe all the datasets and their statistics, training details
and hyper-parameters, model details, experimental results, and discuss them as
obtained by di erent models mentioned in Section 3.
4.1</p>
      <sec id="sec-4-1">
        <title>Datasets</title>
        <p>
          We selected the following four publicly available Twitter datasets with
highquality annotations (which excludes [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ], besides its focus on fake news), three
of which are on claim detection and one on conspiracy detection. The number of
tweets in the original datasets is four to fteen times more as they were mined
for text-based fake news detection. We only selected tweets that have an image.
CLEF-En [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] - Released as a part of CLEF-CheckThat! 2020 challenge, the
purpose is to identify COVID-19 related tweets that are check-worthy claims vs
not check-worthy claims. Only 281 English tweets in the dataset include images,
whereas the original dataset included 964 tweets.
        </p>
        <p>
          CLEF-Ar [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] - Released in the same challenge, the dataset consists of 15
topics related to middle east including COVID-19 and the purpose is to identify
check-worthy claims. It consists of 2571 Arabic tweets and corresponding images.
MediaEval [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] - Released in MediaEval 2020 workshop [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] challenge on
identifying 5G and Coronavirus conspiracy tweets. The original dataset has three
classes, 5G and Corona conspiracy, other conspiracies, and no conspiracy. To
make the problem consistent with other datasets in this paper, we combine
conspiracy classes (Corona and others) and treat it as a binary classi cation
problem. It consists of 1724 tweets and images.
        </p>
        <p>
          LESA [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] - This is a recently proposed dataset of COVID-19 related tweets
on the problem of claim detection. Here, the problem is identifying whether a
tweet is a claim or not, and not the claim check-worthiness as in CLEF-En.
The original dataset consists of 10 000 tweets in English, out of which only 1395
consists of images.
        </p>
        <p>We applied 5-fold cross-validation to overcome the issue of low number of
samples in each dataset. We used the ratio of around 72:10:18 for training,
validation, and testing in each data split. Next, we report the experimental results
for di erent model con gurations. The reported results are averaged across ve
splits of each dataset. We report accuracy and weighted-F1 measure to account
for label imbalance in all the datasets.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Setup and Hyper-parameters</title>
        <p>SVM hyper-parameters: we perform grid search over PCA energy (%)
conservation, regularization parameter C and RBF kernel's gamma. The parameter
range for PCA varies from 100% (original features) to 95% with decrements of
1. The parameter range for C and gamma vary between 1 to 1 on a log-scale
with 15 steps. For experiments only on the CLEF-En dataset, we use the range
between 2 to 0 for C and gamma, as the number of samples are very low and
needs aggressive regularization. We normalize the nal embedding so that l2
norm of the vector is 1.</p>
        <p>Fine-tuning BERT and VilBERT: we use a batch size of 4 for CLEF-En and
16 for the other datasets. We train all the models for 6 epochs with a starting
learning rate of 5e 5 and a linear decay. A dropout with ratio 0:2 is applied
after the rst linear layer in the classi er for regularization during ne-tuning.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Results</title>
        <p>Table 1 and Table 2 show the unimodal and multimodal models' performance
for all the four datasets based on type of features and feature combinations
respectively.</p>
        <p>Unimodal Results - In Table 1, it can be seen that all the visual features
perform poorly in comparison to textual features. This is expected as visual
information on its own cannot indicate whether a social media post makes a
claim unless it has text or it's a video. Among the four types of visual
models, Object (Io) and Hybrid (Ih) features are slightly better, probably because
the place or scene information (lowest F1 for all datasets) on its own is not a
useful indicator in images for claim detection. With textual features, BERT
models that are further pre-trained on tweets (TBT ; TBy T ) and COVID -related data
(TCT ; TCy T ) perform better in comparison to vanilla BERT (TBCBlean; T CBlBeany) in
at-least three datasets. It suggests that the tweets' structure and the domain
gap are better captured and reduced respectively in Twitter corpus pre-trained
models. Further, normalizing (T Clean) the tweet text delivers better performance</p>
        <p>BB
than using the raw text (TBRBaw). In SVM training, we observed the sum of the
last four layers of BERT to compute the embeddings performs better than the
other pooling combinations. It indicates that downstream tasks can bene t from
the diverse information in di erent layers of BERT. Similarly, ne-tuning the
last four layers instead of two (marked with2) gives better performance across
all the datasets with BERT-base (TBCBleany), COVID-Twitter-BERT (TCT y) and
AraBERT (TABy).</p>
        <p>Multimodal Results - In Table 2, we can see the e ect of combining visual
features with textual features by using a simple concatenation in SVM and also
with multimodal co-attention transformer ViLBERT. Although we do not see
any bene t of using the image sentiment embeddings (Ise) in unimodal models,
here instead, we use the image sentiment predictions (Isp) that perform better
or equivalent in comparison to other visual features. For instance, in case of
CLEF-Ar, sentiment predictions Isp with AraBERT (TABy) gives the best xed
embedding performance. Similarly, combining hybrid features (Ih) with
BERTbase (TBCBleany) and object features with COVID-Twitter-BERT (TCT y) in case
of LESA and MediaEval improves the metrics by 1% over textual SVM models.</p>
        <p>With ViLBERT, it is interesting to see that with xed visual and textual
branches, it can capture some information from image and text with co-attention
to boost performance in case of LESA and MediaEval. It is worth mentioning
that the best unimodal textual models for English and Arabic are pre-trained
models further trained on Twitter and language-speci c data corpus. In the case
of ViLBERT, there is a wider domain gap, and for Arabic, the translation process
loses quite a bit of information that results in a drop in performance. Di erent
pooling operations applied for pre-trained ViLBERT models show more di
erence in xed-embedding SVM experiments where the average pooling (MaCvAgT )
yields a considerable performance, which we also observed in unimodal SVM
experiments. We observed that pre-training tasks (best two reported in Table 2)
also matter, where image retrieval (Image-Ret) and language reference grounding
(RefCOCO) features perform much better for all the datasets. It is explainable
since both tasks require capturing complex relationships and linking text to
speci c image regions in the image, enabling them to perform better for our tasks.
We can summarize the ndings of our experiments as follows: 1)
Domainspeci c languages models should be preferred for downstream tasks
such as claim detection or fake news, where underlying meaning and context
of certain words (like COVID) is essential, 2) Multimodality certainly helps
as seen with multimodal transformer models, where activating interaction
through co-attention layers between xed unimodal embeddings improves the
performance in two datasets, 3) To further understand underlying
multimodal dynamics it might be better to explicitly model multimodal
relationships, for instance, importance of image or correlation between image-text
in addition to claim detection, 4) Certain pre-training tasks in ViLBERT
are better suited for downstream tasks and need further introspection on
larger datasets, and lastly, 5) Visual models need to be better adapted to
social media images, for instance, the models used here are not su cient for</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>diagrams or images with large text, which constitute around 30-40% of LESA
and MediaEval datasets.</p>
      <p>In this paper, we have investigated the role of images and tweet text for two
problems related to fake news, claim, and conspiracy detection. For this purpose, we
combined several state-of-the-art CNN features for images with BERT features
for text. We observed the performance improvement over unimodal models in
two out of four Twitter datasets over two languages. We also experimented with
the recently proposed multimodal co-attention transformer ViLBERT and
observed a promising performance using both image and text even with relatively
small-sized datasets. In future work, we will look into other ways to include
external knowledge in domain-independent claim detection models without
relying on di erent domain-speci c language models. Second, we plan to investigate
multimodal transformers in more detail and analyze if the performance does
scale with more data in similar tasks. Finally, to address the limitation of visual
models, we will consider models that can deal with text and graphs in images
and extract suitable features.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This work was funded by European Union's Horizon 2020 research and
innovation programme under the Marie Sklodowska-Curie grant agreement no 812997.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Antoun</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baly</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajj</surname>
          </string-name>
          , H.:
          <article-title>AraBERT: Transformer-based model for Arabic language understanding</article-title>
          .
          <source>In: Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools</source>
          ,
          <article-title>with a Shared Task on O ensive Language Detection</article-title>
          . pp.
          <volume>9</volume>
          {
          <fpage>15</fpage>
          .
          <string-name>
            <surname>European Language Resource Association</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Barron-Cedeno</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elsayed</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Da San Martino, G.,
          <string-name>
            <surname>Hasanain</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suwaileh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haouari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Babulkov</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamdan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.:
          <article-title>Overview of checkthat! 2020: Automatic identi cation and veri cation of claims in social media</article-title>
          .
          <source>In: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          . pp.
          <volume>215</volume>
          {
          <fpage>236</fpage>
          . Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Baziotis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelekis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doulkeridis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : DataStories at SemEval
          <article-title>-2017 task 4: Deep LSTM with attention for message-level and topic-based sentiment analysis</article-title>
          .
          <source>In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval2017)</source>
          . pp.
          <volume>747</volume>
          {
          <fpage>754</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sheng</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Exploring the role of visual content in fake news detection</article-title>
          . Disinformation, Misinformation, and Fake News in Social Media pp.
          <volume>141</volume>
          {
          <issue>161</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eickho</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferro</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . (eds.): Working Notes of CLEF 2020 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Thessaloniki, Greece,
          <source>September 22-25</source>
          ,
          <year>2020</year>
          , CEUR Workshop Proceedings, vol.
          <volume>2696</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chakrabarty</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hidey</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McKeown</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>IMHO ne-tuning improves claim detection</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>558</volume>
          {
          <fpage>563</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cheema</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hakimov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ewerth</surname>
          </string-name>
          , R.:
          <article-title>Check square at checkthat! 2020 claim detection in social media via fusion of transformer and syntactic features</article-title>
          .
          <source>In: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece. CEUR Workshop Proceedings</source>
          , vol.
          <volume>2696</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Cheema</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hakimov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ewerth</surname>
          </string-name>
          , R.:
          <article-title>Tib's visual analytics group at mediaeval'20: Detecting fake news on corona virus and 5g conspiracy</article-title>
          .
          <source>MediaEval 2020 Workshop</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Claveau</surname>
          </string-name>
          , V.:
          <article-title>Detecting fake news in tweets from text and propagation graph: Irisa's participation to the fakenews task at mediaeval 2020</article-title>
          . In: MediaEval 2020 Workshop (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Daxenberger</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eger</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Habernal</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stab</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>What is the essence of a claim? cross-domain claim identi cation</article-title>
          .
          <source>In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <year>2055</year>
          {
          <year>2066</year>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics</source>
          . pp.
          <volume>4171</volume>
          {
          <issue>4186</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Dogan</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , et al.:
          <article-title>Detecting Real-time Check-worthy Factual Claims in Tweets Related to US Politics</article-title>
          .
          <source>Ph.D. thesis</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>El-Khair</surname>
            ,
            <given-names>I.A</given-names>
          </string-name>
          .:
          <article-title>1.5 billion words arabic corpus</article-title>
          .
          <source>ArXiv abs/1611</source>
          .04033 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Leveraging emotional signals for credibility detection</article-title>
          .
          <source>In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2019</year>
          , Paris, France. pp.
          <volume>877</volume>
          {
          <fpage>880</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Zhang, G.,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Multimodal multi-image fake news detection</article-title>
          .
          <source>In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)</source>
          . pp.
          <volume>647</volume>
          {
          <fpage>654</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sundriyal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhtar</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakraborty</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Lesa: Linguistic encapsulation and semantic amalgamation based generalised claim detection from online content</article-title>
          .
          <source>arXiv preprint arXiv:2101.11891</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Hasanain</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haouari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suwaileh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamdan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elsayed</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , BarronCeden~o,
          <string-name>
            <given-names>A.</given-names>
            , Da San Martino, G.,
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          : Overview of CheckThat! 2020 Arabic:
          <article-title>Automatic identi cation and veri cation of claims in social media</article-title>
          .
          <source>In: Cappellato et al. [5]</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gkioxari</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
          </string-name>
          , R.B.:
          <string-name>
            <surname>Mask R-CNN</surname>
          </string-name>
          .
          <source>In: IEEE International Conference on Computer Vision</source>
          , ICCV 2017, Venice, Italy. pp.
          <volume>2980</volume>
          {
          <fpage>2988</fpage>
          . IEEE Computer Society (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In: 2016 IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2016</year>
          ,
          <string-name>
            <surname>Las</surname>
            <given-names>Vegas</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NV</surname>
          </string-name>
          , USA. pp.
          <volume>770</volume>
          {
          <fpage>778</fpage>
          . IEEE Computer Society (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Luo</surname>
          </string-name>
          , J.:
          <article-title>Multimodal fusion with recurrent neural networks for rumor detection on microblogs</article-title>
          .
          <source>In: Proceedings of the 25th ACM international conference on Multimedia</source>
          . pp.
          <volume>795</volume>
          {
          <issue>816</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Khattar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goud</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varma</surname>
          </string-name>
          , V.:
          <article-title>MVAE: multimodal variational autoencoder for fake news detection</article-title>
          .
          <source>In: The World Wide Web Conference, WWW</source>
          <year>2019</year>
          , San Francisco, CA, USA. pp.
          <volume>2915</volume>
          {
          <fpage>2921</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bilu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hershcovich</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aharoni</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slonim</surname>
          </string-name>
          , N.:
          <article-title>Context dependent claim detection</article-title>
          .
          <source>In: Proceedings of COLING</source>
          <year>2014</year>
          ,
          <source>the 25th International Conference on Computational Linguistics: Technical Papers</source>
          . pp.
          <volume>1489</volume>
          {
          <fpage>1500</fpage>
          . Dublin City University and Association for Computational Linguistics (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Lippi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torroni</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Context-independent claim detection for argument mining</article-title>
          .
          <source>In: Proceedings of the Twenty-Fourth International Joint Conference on Arti cial Intelligence</source>
          ,
          <source>IJCAI</source>
          <year>2015</year>
          ,
          <string-name>
            <given-names>Buenos</given-names>
            <surname>Aires</surname>
          </string-name>
          , Argentina. pp.
          <volume>185</volume>
          {
          <fpage>191</fpage>
          . AAAI Press (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          . arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Vilbert:
          <article-title>Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems</source>
          <year>2019</year>
          , NeurIPS
          <year>2019</year>
          , Vancouver, BC, Canada. pp.
          <volume>13</volume>
          {
          <issue>23</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goswami</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rohrbach</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>12-in-1: Multi-task vision and language representation learning</article-title>
          .
          <source>In: 2020 IEEE/CVF Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2020</year>
          , Seattle, WA, USA. pp.
          <volume>10434</volume>
          {
          <fpage>10443</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Majithia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arslan</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lubal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimenez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arora</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caraballo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>ClaimPortal: Integrated monitoring, searching, checking, and analytics of factual claims on Twitter</article-title>
          .
          <source>In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations</source>
          . pp.
          <volume>153</volume>
          {
          <fpage>158</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Manh Duc Tuan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , Quang Nhat Minh,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Fakenews detection using pre-trained language models and graph convolutional networks</article-title>
          .
          <source>In: MediaEval 2020 Workshop</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29. Muller,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Salathe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Kummervold</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.E.</surname>
          </string-name>
          :
          <article-title>Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter</article-title>
          . arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>07503</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Nakamura</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , W.Y.:
          <article-title>Fakeddit: A new multimodal benchmark dataset for ne-grained fake news detection</article-title>
          .
          <source>In: Proceedings of the 12th Language Resources and Evaluation Conference</source>
          . pp.
          <volume>6149</volume>
          {
          <fpage>6157</fpage>
          .
          <string-name>
            <surname>European Language Resources Association</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>D.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tuan</surname>
            <given-names>Nguyen</given-names>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>BERTweet: A pre-trained language model for English tweets</article-title>
          .
          <source>In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          . pp.
          <volume>9</volume>
          {
          <fpage>14</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Da San Martino, G.,
          <string-name>
            <surname>Koychev</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Team Alex at CheckThat! 2020:
          <article-title>Identifying check-worthy tweets with transformer models</article-title>
          .
          <source>In: Cappellato et al. [5]</source>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Pogorelov</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schroeder</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burchard</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brenner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Filkukova</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Langguth</surname>
          </string-name>
          , J.: Fakenews:
          <article-title>Corona virus and 5g conspiracy task at mediaeval 2020</article-title>
          . In: MediaEval 2020 Workshop (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Rosenthal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McKeown</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Detecting opinionated claims in online discussions</article-title>
          .
          <source>In: 2012 IEEE sixth international conference on semantic computing</source>
          . pp.
          <volume>30</volume>
          {
          <fpage>37</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Russakovsky</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Satheesh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Ma,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Karpathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          , et al.:
          <article-title>Imagenet large scale visual recognition challenge</article-title>
          .
          <source>International journal of computer vision 115(3)</source>
          ,
          <volume>211</volume>
          {
          <fpage>252</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Shaar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Babulkov</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alam</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barron-Ceden</surname>
            ~o,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elsayed</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasanain</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suwaileh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haouari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , Da San Martino, G.,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Overview of CheckThat! 2020 English:
          <article-title>Automatic identi cation and veri cation of claims in social media</article-title>
          .
          <source>In: Cappellato et al. [5]</source>
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soricut</surname>
          </string-name>
          , R.:
          <article-title>Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning</article-title>
          .
          <source>In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          . pp.
          <volume>2556</volume>
          {
          <fpage>2565</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          .
          <source>In: 3rd International Conference on Learning Representations, ICLR</source>
          <year>2015</year>
          , San Diego, CA, USA, Conference Track Proceedings (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <surname>Singhal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakraborty</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumaraguru</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Satoh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Spotfake: A multi-modal framework for fake news detection</article-title>
          .
          <source>In: 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM)</source>
          . pp.
          <volume>39</volume>
          {
          <fpage>47</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <string-name>
            <surname>Suykens</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vandewalle</surname>
          </string-name>
          , J.:
          <article-title>Least squares support vector machine classi ers</article-title>
          .
          <source>Neural processing letters 9(3)</source>
          ,
          <volume>293</volume>
          {
          <fpage>300</fpage>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <surname>Vadicamo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carrara</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cresci</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Falchi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tesconi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Cross-media learning for image sentiment analysis in the wild</article-title>
          .
          <source>In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW)</source>
          . pp.
          <volume>308</volume>
          {
          <issue>317</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          42.
          <string-name>
            <surname>Vosoughi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aral</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The spread of true and false news online</article-title>
          .
          <source>Science</source>
          <volume>359</volume>
          (
          <issue>6380</issue>
          ),
          <volume>1146</volume>
          {
          <fpage>1151</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          43.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xun</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jha</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>EANN: event adversarial neural networks for multi-modal fake news detection</article-title>
          .
          <source>In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, KDD</source>
          <year>2018</year>
          , London, UK. pp.
          <volume>849</volume>
          {
          <fpage>857</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          44.
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novak</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          : Accenture at checkthat! 2020:
          <article-title>If you say so: Post-hoc fact-checking of claims using transformer-based models</article-title>
          .
          <source>In: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece. CEUR Workshop Proceedings</source>
          , vol.
          <volume>2696</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          45.
          <string-name>
            <surname>Wold</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Esbensen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geladi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Principal component analysis</article-title>
          .
          <source>Chemometrics and intelligent laboratory systems 2(1-3)</source>
          ,
          <volume>37</volume>
          {
          <fpage>52</fpage>
          (
          <year>1987</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          46.
          <string-name>
            <surname>Zeroual</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldhahn</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eckart</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakhouaja</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>OSIAN: Open source international Arabic news corpus - preparation and integration into the CLARINinfrastructure</article-title>
          .
          <source>In: Proceedings of the Fourth Arabic Natural Language Processing Workshop</source>
          . pp.
          <volume>175</volume>
          {
          <fpage>182</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          47.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lapedriza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khosla</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliva</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torralba</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Places: A 10 million image database for scene recognition</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          48.
          <string-name>
            <surname>Zlatkova</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koychev</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Fact-checking meets fauxtography: Verifying claims about images</article-title>
          .
          <source>In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          . pp.
          <year>2099</year>
          {
          <fpage>2108</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>