=Paper=
{{Paper
|id=Vol-3878/114_main_short
|storemode=property
|title=Multimodal Online Manipulation: Empirical Analysis of Fact-Checking Reports
|pdfUrl=https://ceur-ws.org/Vol-3878/114_main_short.pdf
|volume=Vol-3878
|authors=Olga Uryupina
|dblpUrl=https://dblp.org/rec/conf/clic-it/Uryupina24
}}
==Multimodal Online Manipulation: Empirical Analysis of Fact-Checking Reports==
<pdf width="1500px">https://ceur-ws.org/Vol-3878/114_main_short.pdf</pdf>
<pre>
                                Multimodal Online Manipulation: Empirical Analysis of
                                Fact-Checking Reports
                                Olga Uryupina1
                                1
                                    Department of Information Engineering and Computer Science, University of Trento


                                                Abstract
                                                This paper presents an in-depth exploratory quantitative study of the interaction between multimedia and textual components
                                                in online manipulative content. We discuss relations between content layers (such as proof or support) as well as unscrupulous
                                                techniques compromising visual content. The study is based on fakes reported and analyzed by PolitiFact and comprises
                                                documents from Facebook, Twitter and Instagram. We identify several pervasive phenomena currently, affecting the impact of
                                                manipulative content on the reader and the possible strategies for effective de-bunking actions, and discuss possible research
                                                directions.

                                                Keywords
                                                fact checking, multi modal, annotation,


                                1. Introduction                                                                                         understanding of the way the authors integrate multime-
                                                                                                                                        dia into their content: most research so far has focused
                                Manipulative online content (fake news, propaganda, on a specific component and not on their interplay. Our
                                among others) is growing at an alarming rate, hinder- study aims at identifying the role of multimedia part of
                                ing our access to truthful and unbiased information and manipulative messages.
                                thus threatening principles of the democratic society.                                                    Figure 1 shows some examples from potential fakes
                                The problem has been addressed by professional jour- analyzed by PolitiFact. We observe different relations
                                nalists, who – with the help of crowd-workers – fight a between the text and the image. In particular, in (1a),
                                never-ending battle to prevent information contamina- the video is supposed to prove the claim by providing
                                tion. To enable a large-scale response to the misinforma- direct evidence, whereas in (1b), the image provides a
                                tion threat, the AI community has invested a considerable support (appeal to authority). In (1c), the image is a vi-
                                effort into building competitive models for identifying sual paraphrase of the claim, enhancing its appeal but
                                non-transparent content, such as false claims or altered not providing extra proof, support or informational ma-
                                videos (deep fakes). However, we still lack a thorough terial. Finally, in (1d), the photo is an illustration that,
                                understanding of the manipulative content and multi- while depicting the discussed person, does not aim at
                                ple aspects affecting its perception and impact on the being relevant to the claim’s veracity or impact. While
                                reader. This paper aims at an in-depth analysis of one of understanding the relation between the image and the
                                such aspects, namely, the interaction between different text is interesting from the scientific perspective, it is
                                (multimedia) layers of the manipulative message. More also a crucial prerequisite for efficient and meaningful
                                specifically, we study the semantics underlying the re- fact-checking response. For example, if a supposed proof
                                lation between multimedia and textual parts of the fake is a compromised photo, the response should highlight
                                news. Our study is based on around 800 fakes from Jan- this fact (e.g., the video in (1a) has been cropped mis-
                                uary till September 2022, as identified and analysed by representing the quote, which should be highlighted in
                                PolitiFact.1                                                                                            the fact-checking report). On the contrary, if a compro-
                                    Multimedia content, such as videos, reels, photos, mised photo is used as a mere illustration, the effective
                                screenshots or images is becoming increasingly popu- fact-checking report should focus on the textual claim
                                lar in social media: it is an appealing and powerful way per se.
                                of expressing and/or enhancing one’s message. Never-                                                      Another important angle is the issue with the multi-
                                theless, as a scientific community, we still have little media part. In our example, the video in (1a) is cropped.
                                                                                                                                        On the contrary, (1b) represents an authentic screenshot,
                                CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, yet, it has been miscaptioned by the claim: an older con-
                                Dec 04 — 06, 2024, Pisa, Italy                                                                          tent, irrelevant for the current events/topics, has been
                                $ uryupina@gmail.com (O. Uryupina)
                                           © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License repurposed.

                                1
                                           Attribution 4.0 International (CC BY 4.0).
                                  PolitiFact (https://www.politifact.com/) is an independent journal-                                     The current paper focuses on these two aspects to an-
                                  istic agency and one of the most experienced fact-checking orga- alyze empirically the interplay between multimedia and
                                  nizations, providing detailed analytics for non-transparent online textual components in fake news, as identified by Politi-
                                    content since 2007.


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
      (a) Biden to teachers: “They’re not somebody else’s chil-   (b) Now you know why there’s suddenly "a formula short-
          dren. They’re yours when you’re in the classroom."          age". The new age robber barrons have conveniently
          (VIDEO)                                                     invested in some unholy breast milk made from hu-
                                                                      man organs.


      (c) In honor of #TaxDay, I remind you that Governor Evers   (d) Italian football agent Mino Raiola has died after suf-
          wanted to increase your taxes by $1 billion just for        fering from an illness. RIP
          heating your homes. Instead, Republicans cut your
          taxes by more than $2 billion.

Figure 1: Different uses of layered/multimedia content


Fact. To this end, we reannotate the PolyFake dataset [1]extra content to the textual message. Cheema et al. [3]
with fine-grained labels reflecting multimedia aspects.  propose a dataset of multimodal tweets, annotated for
                                                         visual relevancy and checkworthiness. Finally, Biamby
                                                         et al. [4] propose a larger-scale dataset of multimodal
2. Related Work                                          tweets, where "falsified" claims have been added synthet-
                                                         ically to address the image repurposing problem.
While fact checking has been receiving an increasing
                                                            These studies have paved the way for evaluation cam-
amount of attention recently both from NLP and Vision
                                                         paigns and benchmarking resources, for example, [5].
communities, only very few studies focus on the interac-
                                                         Yet, these studies rely on rather straightforward annota-
tion between different modalities.
                                                         tion guidelines to reduce the per-claim cost. Moreover,
   A breakthrough approach by Vempala and Preoţiuc-
                                                         the annotators are not professional fact-checkers: while
Pietro [2] focuses on two dimensions of the relationship
                                                         they can assess some aspects of the compromised content,
between text and image on Twitter: whether the text is
                                                         they still can get deceived by more challenging cases –
represented in the image and whether the image adds
                                                         after all, the manipulative content has been created on
         Layer             Facebook           Twitter             Instagram       TikTok      YouTube             Total
         none              64 12.7%          80 41.9%               4    3.9%     -      -    -     -       149      18.2%
         video            195 38.6%          25 13.1%              40 38.9%      11 100%      6 100%        277      33.9%
         photo             92 18.8%          31 16.2%              10    9.7%     -      -    -     -       133      16.3%
         screenshot       114 22.5%          19    9.9%            45 43.7%       -      -    -     -       178      21.8%
         link              29    5.7%        15    7.8%             -        -    -      -          -        44       5.4%
         image             14    2.8%         6 3.14%               6    5.8%     -      -    -     -        26       3.2%
         thread             -        -       17    8.9%             -        -    -      -    -     -        17       2.1%
         total            506   100%        191   100%           103    100%     11 100%      6 100%        818      100%
Table 1
Types of layered content.


purpose to influence and bias the reader.                              is based on the first nine months of PolyFake (818 en-
   In a recent survey, Mubashara et al. [6] highlight                  tries). Each entry has been re-assessed by two annotators,
the importance of an interdisciplinary approach to fact-               with further adjudication by the supervisor. The origi-
checking, proposing a framework to model different axes                nal PolyFake labels are binary and encode more generic
of online manipulation, most importantly, fusing the tex-              properties of fake news (e.g. whether the reasoning is
tual and visual fact-checking and survey benchmarks and                fallacious or whether the document triggers emotions).
models developed by respective communities. Our study                  For the present study, we have designed and iteratively
is built upon the same motivation – and our main goal                  refined annotation guidelines for labelling multimedia
is to study empirically the interplay between different                aspects of manipulative content.
modalities, based on real-world (i.e., not simulated or                   The annotation process is based on consulting jointly
synthesized) fakes data.                                               not only the original content, but the PolitiFact report as
   Our study aims at an in-depth exploratory analysis                  well. This way we make use of the wealth of analytics
of the multimodal online content. To this end, we focus                provided by experienced professional fact-checkers by
on more specific labels to describe the relationship be-               encoding it in more structured annotation labels.
tween different layers/modalities. We extend the scope                    PolyFake covers fakes from different social media
of our study to cover all the three major platforms (Face-             (Twitter, Facebook, Instagram, TikTok, Threads and
book, Instagram and Twitter). Moreover, our input is                   YouTube). Note that manipulative content often gets
not only the claim per se, but the professionally created              propagated across platforms through re-posts, sharing,
fact-checking report from PolitiFact. In our experience,               linking or just copying. For example, a large propor-
PolitiFact reports contain a wealth of information about               tion of Facebook videos originates from TikTok (in this
online manipulation: as opposed to 2-3 binary labels of                case, PolitiFact typically analyzes the Facebook message,
common NLP fact-checking benchmarks, PolitiFact char-                  hence a low number of TikTok entities in the table). In
acterizes each claim with 1-3 pages of analytics. This                 the following study, we omit TikTok, YouTube and Tele-
analytics, however, comes in a free textual form. While                gram as largely underrepresented categories with rather
it might be still impossible for the NLP community to en-              straightforward patterns.
code these reports for building high-quality fact-checking
systems, we believe that we should at least learn from                 3.2. Multimedia and Layered Content
them to get better insights, stop trivializing the task and
highlight understudied, yet impactful, subtasks.                      Layer Types. Table 1 shows the distribution of different
                                                                      media types for each platform. We have identified several
                                                                      types of layered content: parts of the message rendered
3. Analyzing Multimedia Content together with the initial post. The most common ones
                                                                      are videos (including reels), photos and screenshots (typi-
3.1. PolyFake                                                         cally, complex visual objects combining textual content
Our study is based on the PolyFake dataset [1] covering with photos/images and referring the reader to a differ-
fake news from 2022, as analyzed by professional fact- ent source). We have also observed images (infographics,
checkers from the PolitiFact agency.2 The current study maps or drawings), links (this content typically is ren-
                                                                      dered with a photo/stillshot, yet it explicitly points to a
2
  PolyFake annotation guidelines cover a wide range of phenomena different online location, for example, promotion web-
  related to online manipulation: from fallacious/propaganda reason- site) or threads (characteristic for Twitter, this type of
  ing to emotive appeals, factual veracity etc. Current study aims at layering helps to contextualize the message). On rare
  an in-depth analysis of a specific angle. The Appendix discusses occasions, social media posts might contain more than
the distribution of veracity labels across PolyFake documents.
                                 role            video             photo+         screensh.+
                                              total     %        total    %       total    %
                                 content        66 23.8            19 12.0         114 48.1
                                 anchor         62 22.4            46 29.1          16    6.8
                                 proof          86 31.0            36 22.8          39 16.5
                                 support        14     5.1           4   2.5        16    6.8
                                 paraphr.       30 10.8              6   3.8        23    9.7
                                 context          8    2.9           3   1.9        21    8.9
                                 illustr.         1    0.4         55 34.8            9   3.8
                                 action           3    1.1           1   0.6        14    5.9
                                 other          28 10.1              -     -          2 0.84
                                 total         277                158              237
Table 2
Role of mulimedia layers, per content type (photo+ includes photos and images, screenshot+ includes screenshots, links and
threads/retweets), purely textual documents discarded.

                             Issue                video              photo+          screenshot+
                             falsehood          93 33.6%           16 10.12%         130   54.9%
                             crop               12    4.3%          -         -        1    0.4%
                             miscaption         60 21.7%           47    29.7%        15    6.3%
                             altered/fake       17    6.1%         15     9.5%        29   12.2%
                             misperception       7    2.5%          5     3.2%         -        -
                             noproof            27    9.7%          3     1.9%         5    2.1%
                             explain            26    9.4%          6     3.8%        12    5.1%
                             none               13    4.7%         58    36.7%        43   18.1%
                                               277                158                237
Table 3
Types of manipulative content for different multimedia layers.


one extra layer (e.g., videos and photos).                        dia levels play in PolyFake documents. We distinguish
   Most importanly, only 18% of PolyFake documents are            between the following roles: content (the essential part of
purely textual: adhering to the popular adage that a pic-         the content is presented on the multimedia layer, whereas
ture is worth a thousand words, manipulative content              the textual layer just adds minor details or suggests opin-
creators use visuals for a variety of purposes, from in-          ions), proof (the multimedia layer offers a physical proof –
creasing the outreach to improving the credibility. More-         cf. Example (1a)), support (the multimedia layer provides
over, the prevalence of multimedia content is way more            some material to support the claim, from a reputable
critical for Facebook and Instagram – the two platforms           source – cf. Example (1b)), paraphrase (the multimedia
not typically addressed by NLP practitioners. This alone          layer paraphrases the claim without adding any extra
suggests that we need to pay much more attention to joint         angle – cf. Example (1c)), context (while the textual claim
models and start with deeper understanding of relevant            is generally self-contained, it cannot be interpreted with-
phenomena.                                                        out the context given by the multimedia part (e.g., the
   A large percentage of documents are re-using or                claim contains pronouns and the image presents their
spreading already existing information. This is true for          referents)), illustration (the multimedia layer shows some
screenshots (21% in total) and links (5%), but also for           objects/persons mentioned in the claim without any con-
many videos – only very few videos represent original             nection to its semantics – cf. Example (1d)) and action
content. While there exist some studies on identifying            (the multimedia layer suggests an appropriate reaction to
previously fact-checked claims, they are restricted to the        the claim, for example, a scam website). Finally, a rather
textual content. We believe that a more complex multi-            common role for videos and photos is anchor: in such
modal approach would be beneficial here.                          cases, the textual claim is about the multimedia itself (for
   For presentation issues, in what follows we merge our          example, "the sharpest image of the sun ever recorded.";
underrepresented categories link, image and thread with           here, the multimedia is not compromised per se and the
roughly functionally similar major categories screenshot,         textual claim contains no falsehoods about the world, yet
photo and screenshot respectively.                                the combination might be very misleading.
   Layer Roles. Table 2 shows different roles multime-               In more than half of the documents, multimodal layers
provide essential content. This is true for all the media       planatory claims: authentic videos/photos accompanied
types (videos, photos and screenshots). We have observed        by misleading explanations of what we see and what it
several possible factors contributing to this effect: in        means; in such cases, the factual component might be
general, social media users tend to repost existing "fancy"     non-compromised, yet the biased explanation makes the
content and not create their own texts. Even in authentic       whole message an impactful and hard to debunk propa-
self-created posts, the message is often put in a visual,       ganda tool. Finally, unlike videos and screenshots, most
whereas only some emotions are added in a text. We              photos represent true authentic information – the textual
believe that there is a wide variety of potential reasons       claims either rely on them as illustrations or use them as
for this behaviour (e.g., videos and photos get more likes,     building blocks to support fallacious argumentation.
whereas texts are mostly ignored by peers), requiring a
more specialized study.
   Almost one third of multimedia layers, especially            4. Conclusion
videos, supposedly present proofs. Such compromised             We have presented an in-depth analysis of the interac-
proofs are out of reach for the modern evidence-based           tion between textual and multimedia components of com-
automatic fact-checking: while a fact-checking model            promised social media documents. We have identified
can provide extensive evidence to refute a claim, the user      several high-impact issues, insufficiently studied by the
would still trust the video/photo and not the model. Hu-        community at the moment. These include the interaction
man fact-checkers address such proofs from a different,         between different modalities, the role of the multime-
more promising, perspective: they try to explicitly at-         dia part and its impact on selecting the successful fact-
tack and debunk the proof. We believe that this is a very       checking strategy, the difference between platforms and
important and largely unaddressed research direction.           media types (current NLP studies predominantly focus
   Issues with multimedia layers. Finally, we have              on Twitter and images) and the importance of a more
identified the most common unscrupulous techniques              principled approach to content re-use. We hope that this
relevant for multimedia layers. Those include: crop (es-        study, motivated by human fact-checking expertise, can
sential part(s) of the original message are omitted to ren-     sparkle a meaningful discussion and improve automatic
der it out of context – cf. Example (1a)); miscaption (while    modeling.
the image/video is authentic, the textual claim misleads
w.r.t. some crucial details, e.g. events or timeline – cf.      Acknowledgments
Example (1b)); altered/fake (the image/video has been al-
tered – beyond cropping – with the specialized software,        We thank the Autonomous Province of Trento for the
including deep fakes); misperception (the image/video is –      financial support of our project via the AI@TN initiative.
deliberately or not – deceiving because of its low quality,
unclear angle, optical effects etc); noproof (the – typically
long – video does not contain any components relevant           References
for the claim); falsehood (the video/image is authentic,
yet its content is untrue – i.e., the textual claim spreads     [1] Anonymous,         PolyFake: Fine-grained multi-
the original fake generated by the video/image); and ex-            perspective annotation of fact-checking reports, in:
plain (the textual part explains – misleadingly – what we           Accepted for publication, 2024.
are supposed to see in the video, often of a rather low         [2] A. Vempala, D. Preoţiuc-Pietro, Categorizing and
quality).                                                           inferring the relationship between the text and im-
   Table 3 summarizes the distribution of problematic               age of Twitter posts, in: Proceedings of the 57th
issues across the three main multimedia types, showing              Annual Meeting of the Association for Computa-
several trends. First, video layers provide more possi-             tional Linguistics, Association for Computational
bilities for unscrupulous content generators: cropped,              Linguistics, Florence, Italy, 2019, pp. 2830–2840. URL:
otherwise altered or low quality videos are pervasive               https://aclanthology.org/P19-1272. doi:10.18653/
in manipulative content. While most of the research                 v1/P19-1272.
focuses on images, they do not exhibit such a variety           [3] G. S. Cheema, S. Hakimov, A. Sittar, E. Müller-
of manipulative strategies. Screenshots – authentic or              Budack, C. Otto, R. Ewerth,             MM-claims: A
fake – are largely used to disseminate falsehoods. At               dataset for multimodal claim detection in so-
the same time, an increasing amount of authentic videos,            cial media,      in: Findings of the Association
mostly originating from TikTok, is created to spread false-         for Computational Linguistics: NAACL 2022, As-
hoods and promote "critical thinking" (i.e., conspiracy             sociation for Computational Linguistics, Seattle,
theories as opposed to rational argumentation). These               United States, 2022, pp. 962–979. URL: https:
remain largely understudied, despite their large impact             //aclanthology.org/2022.findings-naacl.72. doi:10.
on the audience. Another rather unstudied area are ex-              18653/v1/2022.findings-naacl.72.
           FC label          Facebook         Twitter           Instagram        TikTok      YouTube     Total
           pants-on-fire     95 18.6%        18    9.4%          29 28.2%       2 18.2%      2 33.3%       146
           false            353 69.8%        97 50.8%            64 62.1%       9 81.8%      2 33.3%       526
           mostly false      34    6.7%      36 18.8%             6    5.8%     -        -   1 16.7%        77
           half true         17    3.3%      18    9.4%           -        -    -        -   1 16.7%        36
           mostly true        6    1.2%      11    5.7%           3    2.9%     -        -   -       -      20
           true               1    0.2%      10    5.2%           1   1.0 %     -        -   -       -      12
           total            506   100%      191   100%         103    100%     11    100%    6   100%      818
Table 4
Manipulative content on social media fact-checked (FC) and reported by PolitiFact (Jan-Sept 2022).


[4] G. Biamby, G. Luo, T. Darrell, A. Rohrbach, Twitter-
    COMMs: Detecting climate, COVID, and military
    multimodal misinformation, in: Proceedings of
    the 2022 Conference of the North American Chap-
    ter of the Association for Computational Linguis-
    tics: Human Language Technologies, Association for
    Computational Linguistics, Seattle, United States,
    2022, pp. 1530–1549. URL: https://aclanthology.
    org/2022.naacl-main.110. doi:10.18653/v1/2022.
    naacl-main.110.
[5] A. Bondielli, P. Dell’Oglio, A. Lenci, F. Marcelloni,
    L. Passaro, Dataset for multimodal fake news de-
    tection and verification tasks, Data in Brief 54
    (2024) 110440. URL: https://www.sciencedirect.com/
    science/article/pii/S2352340924004098. doi:https:
    //doi.org/10.1016/j.dib.2024.110440.
[6] A. Mubashara, S. Michael, G. Zhijiang, C. Oana,
    S. Elena, V. Andreas, Multimodal automated fact-
    checking: A survey, 2023. arXiv:2305.13507.


A. True vs. Fake content and
   multimedia layers
Our dataset by construction contains mostly untrue
claims: even though PolitiFact occasionally fact-checks
statements that turn out to be true, most of their ma-
terials are "false", "mostly false" or even "pants on fire".
Moreover, even true claims often exhibit signs of user
manipulation. In this appendix, we show statistics for
fake vs. true content in PolitiFact reports (Table 4).

</pre>