=Paper=
{{Paper
|id=Vol-3878/114_main_short
|storemode=property
|title=Multimodal Online Manipulation: Empirical Analysis of Fact-Checking Reports
|pdfUrl=https://ceur-ws.org/Vol-3878/114_main_short.pdf
|volume=Vol-3878
|authors=Olga Uryupina
|dblpUrl=https://dblp.org/rec/conf/clic-it/Uryupina24
}}
==Multimodal Online Manipulation: Empirical Analysis of Fact-Checking Reports==
Multimodal Online Manipulation: Empirical Analysis of
Fact-Checking Reports
Olga Uryupina1
1
Department of Information Engineering and Computer Science, University of Trento
Abstract
This paper presents an in-depth exploratory quantitative study of the interaction between multimedia and textual components
in online manipulative content. We discuss relations between content layers (such as proof or support) as well as unscrupulous
techniques compromising visual content. The study is based on fakes reported and analyzed by PolitiFact and comprises
documents from Facebook, Twitter and Instagram. We identify several pervasive phenomena currently, affecting the impact of
manipulative content on the reader and the possible strategies for effective de-bunking actions, and discuss possible research
directions.
Keywords
fact checking, multi modal, annotation,
1. Introduction understanding of the way the authors integrate multime-
dia into their content: most research so far has focused
Manipulative online content (fake news, propaganda, on a specific component and not on their interplay. Our
among others) is growing at an alarming rate, hinder- study aims at identifying the role of multimedia part of
ing our access to truthful and unbiased information and manipulative messages.
thus threatening principles of the democratic society. Figure 1 shows some examples from potential fakes
The problem has been addressed by professional jour- analyzed by PolitiFact. We observe different relations
nalists, who – with the help of crowd-workers – fight a between the text and the image. In particular, in (1a),
never-ending battle to prevent information contamina- the video is supposed to prove the claim by providing
tion. To enable a large-scale response to the misinforma- direct evidence, whereas in (1b), the image provides a
tion threat, the AI community has invested a considerable support (appeal to authority). In (1c), the image is a vi-
effort into building competitive models for identifying sual paraphrase of the claim, enhancing its appeal but
non-transparent content, such as false claims or altered not providing extra proof, support or informational ma-
videos (deep fakes). However, we still lack a thorough terial. Finally, in (1d), the photo is an illustration that,
understanding of the manipulative content and multi- while depicting the discussed person, does not aim at
ple aspects affecting its perception and impact on the being relevant to the claim’s veracity or impact. While
reader. This paper aims at an in-depth analysis of one of understanding the relation between the image and the
such aspects, namely, the interaction between different text is interesting from the scientific perspective, it is
(multimedia) layers of the manipulative message. More also a crucial prerequisite for efficient and meaningful
specifically, we study the semantics underlying the re- fact-checking response. For example, if a supposed proof
lation between multimedia and textual parts of the fake is a compromised photo, the response should highlight
news. Our study is based on around 800 fakes from Jan- this fact (e.g., the video in (1a) has been cropped mis-
uary till September 2022, as identified and analysed by representing the quote, which should be highlighted in
PolitiFact.1 the fact-checking report). On the contrary, if a compro-
Multimedia content, such as videos, reels, photos, mised photo is used as a mere illustration, the effective
screenshots or images is becoming increasingly popu- fact-checking report should focus on the textual claim
lar in social media: it is an appealing and powerful way per se.
of expressing and/or enhancing one’s message. Never- Another important angle is the issue with the multi-
theless, as a scientific community, we still have little media part. In our example, the video in (1a) is cropped.
On the contrary, (1b) represents an authentic screenshot,
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, yet, it has been miscaptioned by the claim: an older con-
Dec 04 — 06, 2024, Pisa, Italy tent, irrelevant for the current events/topics, has been
$ uryupina@gmail.com (O. Uryupina)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License repurposed.
1
Attribution 4.0 International (CC BY 4.0).
PolitiFact (https://www.politifact.com/) is an independent journal- The current paper focuses on these two aspects to an-
istic agency and one of the most experienced fact-checking orga- alyze empirically the interplay between multimedia and
nizations, providing detailed analytics for non-transparent online textual components in fake news, as identified by Politi-
content since 2007.
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
(a) Biden to teachers: “They’re not somebody else’s chil- (b) Now you know why there’s suddenly "a formula short-
dren. They’re yours when you’re in the classroom." age". The new age robber barrons have conveniently
(VIDEO) invested in some unholy breast milk made from hu-
man organs.
(c) In honor of #TaxDay, I remind you that Governor Evers (d) Italian football agent Mino Raiola has died after suf-
wanted to increase your taxes by $1 billion just for fering from an illness. RIP
heating your homes. Instead, Republicans cut your
taxes by more than $2 billion.
Figure 1: Different uses of layered/multimedia content
Fact. To this end, we reannotate the PolyFake dataset [1]extra content to the textual message. Cheema et al. [3]
with fine-grained labels reflecting multimedia aspects. propose a dataset of multimodal tweets, annotated for
visual relevancy and checkworthiness. Finally, Biamby
et al. [4] propose a larger-scale dataset of multimodal
2. Related Work tweets, where "falsified" claims have been added synthet-
ically to address the image repurposing problem.
While fact checking has been receiving an increasing
These studies have paved the way for evaluation cam-
amount of attention recently both from NLP and Vision
paigns and benchmarking resources, for example, [5].
communities, only very few studies focus on the interac-
Yet, these studies rely on rather straightforward annota-
tion between different modalities.
tion guidelines to reduce the per-claim cost. Moreover,
A breakthrough approach by Vempala and Preoţiuc-
the annotators are not professional fact-checkers: while
Pietro [2] focuses on two dimensions of the relationship
they can assess some aspects of the compromised content,
between text and image on Twitter: whether the text is
they still can get deceived by more challenging cases –
represented in the image and whether the image adds
after all, the manipulative content has been created on
Layer Facebook Twitter Instagram TikTok YouTube Total
none 64 12.7% 80 41.9% 4 3.9% - - - - 149 18.2%
video 195 38.6% 25 13.1% 40 38.9% 11 100% 6 100% 277 33.9%
photo 92 18.8% 31 16.2% 10 9.7% - - - - 133 16.3%
screenshot 114 22.5% 19 9.9% 45 43.7% - - - - 178 21.8%
link 29 5.7% 15 7.8% - - - - - 44 5.4%
image 14 2.8% 6 3.14% 6 5.8% - - - - 26 3.2%
thread - - 17 8.9% - - - - - - 17 2.1%
total 506 100% 191 100% 103 100% 11 100% 6 100% 818 100%
Table 1
Types of layered content.
purpose to influence and bias the reader. is based on the first nine months of PolyFake (818 en-
In a recent survey, Mubashara et al. [6] highlight tries). Each entry has been re-assessed by two annotators,
the importance of an interdisciplinary approach to fact- with further adjudication by the supervisor. The origi-
checking, proposing a framework to model different axes nal PolyFake labels are binary and encode more generic
of online manipulation, most importantly, fusing the tex- properties of fake news (e.g. whether the reasoning is
tual and visual fact-checking and survey benchmarks and fallacious or whether the document triggers emotions).
models developed by respective communities. Our study For the present study, we have designed and iteratively
is built upon the same motivation – and our main goal refined annotation guidelines for labelling multimedia
is to study empirically the interplay between different aspects of manipulative content.
modalities, based on real-world (i.e., not simulated or The annotation process is based on consulting jointly
synthesized) fakes data. not only the original content, but the PolitiFact report as
Our study aims at an in-depth exploratory analysis well. This way we make use of the wealth of analytics
of the multimodal online content. To this end, we focus provided by experienced professional fact-checkers by
on more specific labels to describe the relationship be- encoding it in more structured annotation labels.
tween different layers/modalities. We extend the scope PolyFake covers fakes from different social media
of our study to cover all the three major platforms (Face- (Twitter, Facebook, Instagram, TikTok, Threads and
book, Instagram and Twitter). Moreover, our input is YouTube). Note that manipulative content often gets
not only the claim per se, but the professionally created propagated across platforms through re-posts, sharing,
fact-checking report from PolitiFact. In our experience, linking or just copying. For example, a large propor-
PolitiFact reports contain a wealth of information about tion of Facebook videos originates from TikTok (in this
online manipulation: as opposed to 2-3 binary labels of case, PolitiFact typically analyzes the Facebook message,
common NLP fact-checking benchmarks, PolitiFact char- hence a low number of TikTok entities in the table). In
acterizes each claim with 1-3 pages of analytics. This the following study, we omit TikTok, YouTube and Tele-
analytics, however, comes in a free textual form. While gram as largely underrepresented categories with rather
it might be still impossible for the NLP community to en- straightforward patterns.
code these reports for building high-quality fact-checking
systems, we believe that we should at least learn from 3.2. Multimedia and Layered Content
them to get better insights, stop trivializing the task and
highlight understudied, yet impactful, subtasks. Layer Types. Table 1 shows the distribution of different
media types for each platform. We have identified several
types of layered content: parts of the message rendered
3. Analyzing Multimedia Content together with the initial post. The most common ones
are videos (including reels), photos and screenshots (typi-
3.1. PolyFake cally, complex visual objects combining textual content
Our study is based on the PolyFake dataset [1] covering with photos/images and referring the reader to a differ-
fake news from 2022, as analyzed by professional fact- ent source). We have also observed images (infographics,
checkers from the PolitiFact agency.2 The current study maps or drawings), links (this content typically is ren-
dered with a photo/stillshot, yet it explicitly points to a
2
PolyFake annotation guidelines cover a wide range of phenomena different online location, for example, promotion web-
related to online manipulation: from fallacious/propaganda reason- site) or threads (characteristic for Twitter, this type of
ing to emotive appeals, factual veracity etc. Current study aims at layering helps to contextualize the message). On rare
an in-depth analysis of a specific angle. The Appendix discusses occasions, social media posts might contain more than
the distribution of veracity labels across PolyFake documents.
role video photo+ screensh.+
total % total % total %
content 66 23.8 19 12.0 114 48.1
anchor 62 22.4 46 29.1 16 6.8
proof 86 31.0 36 22.8 39 16.5
support 14 5.1 4 2.5 16 6.8
paraphr. 30 10.8 6 3.8 23 9.7
context 8 2.9 3 1.9 21 8.9
illustr. 1 0.4 55 34.8 9 3.8
action 3 1.1 1 0.6 14 5.9
other 28 10.1 - - 2 0.84
total 277 158 237
Table 2
Role of mulimedia layers, per content type (photo+ includes photos and images, screenshot+ includes screenshots, links and
threads/retweets), purely textual documents discarded.
Issue video photo+ screenshot+
falsehood 93 33.6% 16 10.12% 130 54.9%
crop 12 4.3% - - 1 0.4%
miscaption 60 21.7% 47 29.7% 15 6.3%
altered/fake 17 6.1% 15 9.5% 29 12.2%
misperception 7 2.5% 5 3.2% - -
noproof 27 9.7% 3 1.9% 5 2.1%
explain 26 9.4% 6 3.8% 12 5.1%
none 13 4.7% 58 36.7% 43 18.1%
277 158 237
Table 3
Types of manipulative content for different multimedia layers.
one extra layer (e.g., videos and photos). dia levels play in PolyFake documents. We distinguish
Most importanly, only 18% of PolyFake documents are between the following roles: content (the essential part of
purely textual: adhering to the popular adage that a pic- the content is presented on the multimedia layer, whereas
ture is worth a thousand words, manipulative content the textual layer just adds minor details or suggests opin-
creators use visuals for a variety of purposes, from in- ions), proof (the multimedia layer offers a physical proof –
creasing the outreach to improving the credibility. More- cf. Example (1a)), support (the multimedia layer provides
over, the prevalence of multimedia content is way more some material to support the claim, from a reputable
critical for Facebook and Instagram – the two platforms source – cf. Example (1b)), paraphrase (the multimedia
not typically addressed by NLP practitioners. This alone layer paraphrases the claim without adding any extra
suggests that we need to pay much more attention to joint angle – cf. Example (1c)), context (while the textual claim
models and start with deeper understanding of relevant is generally self-contained, it cannot be interpreted with-
phenomena. out the context given by the multimedia part (e.g., the
A large percentage of documents are re-using or claim contains pronouns and the image presents their
spreading already existing information. This is true for referents)), illustration (the multimedia layer shows some
screenshots (21% in total) and links (5%), but also for objects/persons mentioned in the claim without any con-
many videos – only very few videos represent original nection to its semantics – cf. Example (1d)) and action
content. While there exist some studies on identifying (the multimedia layer suggests an appropriate reaction to
previously fact-checked claims, they are restricted to the the claim, for example, a scam website). Finally, a rather
textual content. We believe that a more complex multi- common role for videos and photos is anchor: in such
modal approach would be beneficial here. cases, the textual claim is about the multimedia itself (for
For presentation issues, in what follows we merge our example, "the sharpest image of the sun ever recorded.";
underrepresented categories link, image and thread with here, the multimedia is not compromised per se and the
roughly functionally similar major categories screenshot, textual claim contains no falsehoods about the world, yet
photo and screenshot respectively. the combination might be very misleading.
Layer Roles. Table 2 shows different roles multime- In more than half of the documents, multimodal layers
provide essential content. This is true for all the media planatory claims: authentic videos/photos accompanied
types (videos, photos and screenshots). We have observed by misleading explanations of what we see and what it
several possible factors contributing to this effect: in means; in such cases, the factual component might be
general, social media users tend to repost existing "fancy" non-compromised, yet the biased explanation makes the
content and not create their own texts. Even in authentic whole message an impactful and hard to debunk propa-
self-created posts, the message is often put in a visual, ganda tool. Finally, unlike videos and screenshots, most
whereas only some emotions are added in a text. We photos represent true authentic information – the textual
believe that there is a wide variety of potential reasons claims either rely on them as illustrations or use them as
for this behaviour (e.g., videos and photos get more likes, building blocks to support fallacious argumentation.
whereas texts are mostly ignored by peers), requiring a
more specialized study.
Almost one third of multimedia layers, especially 4. Conclusion
videos, supposedly present proofs. Such compromised We have presented an in-depth analysis of the interac-
proofs are out of reach for the modern evidence-based tion between textual and multimedia components of com-
automatic fact-checking: while a fact-checking model promised social media documents. We have identified
can provide extensive evidence to refute a claim, the user several high-impact issues, insufficiently studied by the
would still trust the video/photo and not the model. Hu- community at the moment. These include the interaction
man fact-checkers address such proofs from a different, between different modalities, the role of the multime-
more promising, perspective: they try to explicitly at- dia part and its impact on selecting the successful fact-
tack and debunk the proof. We believe that this is a very checking strategy, the difference between platforms and
important and largely unaddressed research direction. media types (current NLP studies predominantly focus
Issues with multimedia layers. Finally, we have on Twitter and images) and the importance of a more
identified the most common unscrupulous techniques principled approach to content re-use. We hope that this
relevant for multimedia layers. Those include: crop (es- study, motivated by human fact-checking expertise, can
sential part(s) of the original message are omitted to ren- sparkle a meaningful discussion and improve automatic
der it out of context – cf. Example (1a)); miscaption (while modeling.
the image/video is authentic, the textual claim misleads
w.r.t. some crucial details, e.g. events or timeline – cf. Acknowledgments
Example (1b)); altered/fake (the image/video has been al-
tered – beyond cropping – with the specialized software, We thank the Autonomous Province of Trento for the
including deep fakes); misperception (the image/video is – financial support of our project via the AI@TN initiative.
deliberately or not – deceiving because of its low quality,
unclear angle, optical effects etc); noproof (the – typically
long – video does not contain any components relevant References
for the claim); falsehood (the video/image is authentic,
yet its content is untrue – i.e., the textual claim spreads [1] Anonymous, PolyFake: Fine-grained multi-
the original fake generated by the video/image); and ex- perspective annotation of fact-checking reports, in:
plain (the textual part explains – misleadingly – what we Accepted for publication, 2024.
are supposed to see in the video, often of a rather low [2] A. Vempala, D. Preoţiuc-Pietro, Categorizing and
quality). inferring the relationship between the text and im-
Table 3 summarizes the distribution of problematic age of Twitter posts, in: Proceedings of the 57th
issues across the three main multimedia types, showing Annual Meeting of the Association for Computa-
several trends. First, video layers provide more possi- tional Linguistics, Association for Computational
bilities for unscrupulous content generators: cropped, Linguistics, Florence, Italy, 2019, pp. 2830–2840. URL:
otherwise altered or low quality videos are pervasive https://aclanthology.org/P19-1272. doi:10.18653/
in manipulative content. While most of the research v1/P19-1272.
focuses on images, they do not exhibit such a variety [3] G. S. Cheema, S. Hakimov, A. Sittar, E. Müller-
of manipulative strategies. Screenshots – authentic or Budack, C. Otto, R. Ewerth, MM-claims: A
fake – are largely used to disseminate falsehoods. At dataset for multimodal claim detection in so-
the same time, an increasing amount of authentic videos, cial media, in: Findings of the Association
mostly originating from TikTok, is created to spread false- for Computational Linguistics: NAACL 2022, As-
hoods and promote "critical thinking" (i.e., conspiracy sociation for Computational Linguistics, Seattle,
theories as opposed to rational argumentation). These United States, 2022, pp. 962–979. URL: https:
remain largely understudied, despite their large impact //aclanthology.org/2022.findings-naacl.72. doi:10.
on the audience. Another rather unstudied area are ex- 18653/v1/2022.findings-naacl.72.
FC label Facebook Twitter Instagram TikTok YouTube Total
pants-on-fire 95 18.6% 18 9.4% 29 28.2% 2 18.2% 2 33.3% 146
false 353 69.8% 97 50.8% 64 62.1% 9 81.8% 2 33.3% 526
mostly false 34 6.7% 36 18.8% 6 5.8% - - 1 16.7% 77
half true 17 3.3% 18 9.4% - - - - 1 16.7% 36
mostly true 6 1.2% 11 5.7% 3 2.9% - - - - 20
true 1 0.2% 10 5.2% 1 1.0 % - - - - 12
total 506 100% 191 100% 103 100% 11 100% 6 100% 818
Table 4
Manipulative content on social media fact-checked (FC) and reported by PolitiFact (Jan-Sept 2022).
[4] G. Biamby, G. Luo, T. Darrell, A. Rohrbach, Twitter-
COMMs: Detecting climate, COVID, and military
multimodal misinformation, in: Proceedings of
the 2022 Conference of the North American Chap-
ter of the Association for Computational Linguis-
tics: Human Language Technologies, Association for
Computational Linguistics, Seattle, United States,
2022, pp. 1530–1549. URL: https://aclanthology.
org/2022.naacl-main.110. doi:10.18653/v1/2022.
naacl-main.110.
[5] A. Bondielli, P. Dell’Oglio, A. Lenci, F. Marcelloni,
L. Passaro, Dataset for multimodal fake news de-
tection and verification tasks, Data in Brief 54
(2024) 110440. URL: https://www.sciencedirect.com/
science/article/pii/S2352340924004098. doi:https:
//doi.org/10.1016/j.dib.2024.110440.
[6] A. Mubashara, S. Michael, G. Zhijiang, C. Oana,
S. Elena, V. Andreas, Multimodal automated fact-
checking: A survey, 2023. arXiv:2305.13507.
A. True vs. Fake content and
multimedia layers
Our dataset by construction contains mostly untrue
claims: even though PolitiFact occasionally fact-checks
statements that turn out to be true, most of their ma-
terials are "false", "mostly false" or even "pants on fire".
Moreover, even true claims often exhibit signs of user
manipulation. In this appendix, we show statistics for
fake vs. true content in PolitiFact reports (Table 4).