=Paper= {{Paper |id=Vol-3052/paper11 |storemode=property |title=PROVENANCE: An Intermediary-Free Solution for Digital Content Verification |pdfUrl=https://ceur-ws.org/Vol-3052/paper11.pdf |volume=Vol-3052 |authors=Bilal Yousuf,,M. Atif Qureshi,,Brendan Spillane,,Gary Munnelly,, Oisin Carroll,, Mathew Runswick,, Kirsty Park,, Own Conlan,, Jane Suiter |dblpUrl=https://dblp.org/rec/conf/cikm/YousufQSMCRPCS21 }} ==PROVENANCE: An Intermediary-Free Solution for Digital Content Verification== https://ceur-ws.org/Vol-3052/paper11.pdf
PROVENANCE: An Intermediary-Free Solution for Digital
Content Verification
Bilal Yousuf1,2 , M. Atif Qureshi1,2 , Brendan Spillane1 , Gary Munnelly1 , Oisin Carroll1 ,
Matthew Runswick1 , Kirsty Park3 , Eileen Culloty3 , Owen Conlan1 and Jane Suiter3
1
  ADAPT Centre, Trinity College Dublin
2
  ADAPT Centre, Technological University Dublin
3
  Institute for Future Media, Democracy and Society, Dublin City University


                                             Abstract
                                             The threat posed by misinformation and disinformation is one of the defining challenges of the 21st century. Provenance
                                             is designed to help combat this threat by warning users when the content they are looking at may be misinformation or
                                             disinformation. It is also designed to improve media literacy among its users and ultimately reduce susceptibility to the
                                             threat among vulnerable groups within society. The Provenance browser plugin checks the content that users see on the
                                             Internet and social media and provides warnings in their browser or social media feed. Unlike similar plugins, which require
                                             human experts to provide evaluations and can only provide simple binary warnings, Provenance’s state of the art technology
                                             does not require human input and it analyses seven aspects of the content users see and provides warnings where necessary.

                                             Keywords
                                             Misinformation, Disinformation, Fake News, Social Media, Plugin, Browser Extension



1. Introduction                                                                                                       plugins only provide a single broad-spectrum warning
                                                                                                                      about the content users are viewing whereas Provenance
Provenance is an intermediary-free solution for digital                                                               is capable of evaluating content under seven criteria and
content verification to combat misinformation and disin-                                                              providing individual warnings for each. Provenance’s
formation on the Internet and social media. As per [1], it                                                            warning notifications are also educational and designed
is designed to aid users by providing them with warning                                                               to inspire users to be more cautious and critical of the
notifications in their browser or social media feed when                                                              information they consume. Thus, it will improve media
viewing content that may be dangerous or problematic.                                                                 literacy among users and make them less susceptible to
The detailed warning notifications inform users which of                                                              the influence of misinformation and disinformation by
the seven criteria Provenance’s state of the art technol-                                                             making them more critical and reflective of the content
ogy has detected an issue with and why. It significantly                                                              they consume.
improves upon all known similar solutions in two ways.                                                                   There are significant research challenges in the design
Firstly, existing solutions do not analyse the content the                                                            and development of Provenance. The main challenges
user is viewing and are thus limited to providing users                                                               include the huge volume of news and other content pub-
with warnings based on the news agencies historical pub-                                                              lished each day, the combination of multimedia formats
lication record and behaviour. Secondly, existing browser                                                             in each article or story, the high churn-rate and short
                                                                                                                      shelf-life of news, and the fact that news content is often
Fourth Workshop On Knowledge-Driven Analytics And Systems                                                             republished from wire services or from other publishers.
Impacting Human Quality Of Life (KDAH-CIKM-2021), November
01–05, 2021, Gold Coast, Queensland, Australia
                                                                                                                      These are compounded by the fact that misinformation
" bilal.yousuf@adaptcentre.ie (B. Yousuf);                                                                            and disinformation are often designed to masquerade as
muhammad.qureshi@adaptcentre.ie (M. A. Qureshi);                                                                      real news. Many disinformation sources share character-
brendan.spillane@adaptcentre.ie (B. Spillane);                                                                        istics with the Lernaean Hydra of Greek mythology and
gary.munnelly@adaptcentre.ie (G. Munnelly);                                                                           re-post problematic content through multiple easy to set
oisin.carroll@adaptcentre.ie (O. Carroll);
matthew.runswick@adaptcentre.ie (M. Runswick);
                                                                                                                      up websites or social media groups and reappear under
kirsty.park@dcu.ie (K. Park); eileen.culloty@dcu.ie (E. Culloty);                                                     different guises when they are identified and shut down.
owen.conlan@scss.tcd.ie (O. Conlan); jane.suiter@dcu.ie (J. Suiter)                                                      There are also a range of individual challenges within
 0000-0001-6024-9084 (B. Yousuf); 0000-0003-4413-4476                                                                components of the Provenance platform. These include
(M. A. Qureshi); 0000-0001-5893-1340 (B. Spillane);                                                                   deriving a system to assign accurate writing quality
0000-0002-7757-6142 (G. Munnelly); 0000-0001-9398-9388
(O. Carroll); 0000-0002-0848-931X (M. Runswick);
                                                                                                                      scores for each piece of textual content, detecting when
0000-0001-7960-8462 (E. Culloty); 0000-0002-9054-9747                                                                 new facts introduced in a news article are indicative of
(O. Conlan); 0000-0002-2747-8069 (J. Suiter)                                                                          disinformation or an evolution in an unfolding story, de-
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).                     tecting image and video manipulations, or developing
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
a system that can differentiate between anger and fear        organisations have also identified misinformation and
in disinformation and anger and fear in opinion news          disinformation as a threat and have increased efforts to
articles. There is also some difficulty in differentiating    combat it. These include the United Nations through its
between news articles from alternative and independent        Verified platform [15] and the World Health Organisation
agencies and news articles from disinformation sources        [16]. More can be read about these initiatives in the Poyn-
due to often lower quality writing, more emotive content,     ter Institute’s guide to national and international efforts
and the reuse of images and videos.                           to combat misinformation and disinformation around the
   This paper provides an update on the ongoing progress      world [17].
of developing Provenance. The remainder of this paper            Provenance is a H2020 project1 , however it differs from
is organised as follows. Section 2 Motivation and Back-       many of the above as it is a user orientated intermediary-
ground delves into the impetus for this project and sit-      free solution to help consumers identify misinformation
uates it within other recent EU disinformation projects.      and disinformation as they browse the Internet and social
Section 3 Related Work provides a detailed overview of        media. It is also designed to improve media literacy skills
similar browser plugins and describes how Provenance          by equipping consumers with the tools, knowledge and
advances the state of the art. Section 4 Architecture         know-how to face this challenge now and into the future.
Overview contains system architecture diagrams and de-
scriptions of each component in the Provenance platform.
Section 5 Provenance in Action provides a detailed expla-     3. Related Work
nation of how the Provenance browser plugin provides
                                                              This review of related work will focus on comparable
warnings to the user. Section 6 Use Cases presents two
                                                              browser plugins designed to provide users with warning
use cases for the Provenance plugin to show in what
                                                              notifications about disinformation or other problematic
scenarios we envision it being used. Section 7 Evalua-
                                                              content and which are currently active or maintained.
tion briefly describes plans to evaluate the tool. Finally,
                                                              The purpose of this review is to establish how Provenance
section 8 Conclusions completes the paper with closing
                                                              advances the state of the art.
remarks.
                                                                 NewsGuard [18] provides ‘nutrition’ labels for news
                                                              websites based on nine journalistic criteria. What differ-
2. Motivation and Background                                  entiates it from many of the other fake news and bias
                                                              detection browser plugins is that it does not use auto-
The proliferation of misinformation and disinformation        mated algorithms to assess news websites but rather re-
on social media has been described as a strategic threat      lies on a team of journalists to conduct reviews. It comes
to democracy and society in the European Union (EU)           as standard with Microsoft Edge, but a subscription is
[2, 3]. A recent EU study on the issue found that the com-    needed for other Internet browsers. Its notification icons
mon narratives of society "are being splintered by filter     appear as a browser extension in the upper right corner
bubbles, and further ruined by micro-targeting." [4]. The     and within third party search engines and social media
report points out that like a virus, misinformation and       platforms. Clicking on its browser icon opens a nutrition
disinformation spread throughout society through social       label pane where users can quickly see whether the news
media and other platforms in open and closed groups to        website passes or fails any of the nine criteria. A link
the detriment of democratic systems. This occurs when         is also available for users to see a more detailed report.
"Susceptible users become weaponized as instruments for       Visually, NewsGuard employs simple but effective white
disseminating disinformation and propaganda" [4].             ✓on a green shield and red x iconography to denote
   The Presidents of the European Council, Commission         when a website has passed or failed. NewsGuard’s trans-
and Parliament have all made increasingly public calls        parent methodology has resulted in their datasets being
for concerted efforts to do more to combat the scourge        used for research [19]. While expert led analysis has
of fake news to protect democracy. The President of           its merits, it also has issues with scalability, personal bi-
the European Parliament has been the most forthright          ases, and response times. Aker also maintains that much
in this with a recent announcement that: "We must nur-        of the credibility and transparency scoring provided by
ture our democracy & defend our institutions against the      NewsGuard could be automated [20].
corrosive power of hate speech, disinformation, fake news        Décodex [21] created by Le Monde originally started
& incitement to violence." [5]. As a result, the EU have      as an online search facility for users to check URLs
funded a range of FP7, H2020 and other projects to com-       against a list of known websites which spread misin-
bat misinformation and disinformation including WeV-          formation and disinformation. They have since released
erify [6, 7], SocialTruth [8], PHEME [9, 10], EUNOMIA         a Facebook bot for users to directly chat to and a browser
[11] Fandango [12, 13] and the European Digital Media         plugin that provides red, orange or blue notifications to
Observatory (EDMO) [14]. Many other international                 1
                                                                      https://cordis.europa.eu/project/id/825227
denote whether a website regularly disseminates false             scores) to these common information portals so that
information, whose reliability is doubtful, or if they are        users may more easily choose high-quality information
a parody website. When installed, the Décodex icon be-            resources. It should be noted that this extension is not
comes active when the website being viewed is listed in           designed to provide users with detailed warning notifi-
their database. It also produces a colour-coded popup             cations when viewing a news website and thus is not
with one of three standard warnings. Users cannot ac-             directly comparable to the other systems or Provenance.
cess detailed information about warnings, nor does it             It is included here due to its use of MBFC, the fact that it
appear to be integrated with well-known search engines,           conveys limited visual information/warnings before the
social media platforms or discussion boards. Décodex’s            user visits an information source, and for plenitude.
allow/deny list approach means that scalability is difficult
and the warnings it provides are based on the historical          3.1. No Longer Active
publication record of the website, not the content cur-
rently being viewed. Transparency is also limited. While          Many other projects and services related to this work,
still available, its development appears to be in stasis.         which have been reviewed in the literature, c.f. [25, 26,
   Media Bias Fact Check (MBFC)2 [22] is an extensive             27, 11, 28, 29, 30], now no longer appear to be active or
media bias resource curated by a small team of journal-           working. This is concerning as despite the fact that mis-
ists and lay researchers who have undertaken detailed             information and disinformation have been recognised as
assessments of over 4000 media outlets. A transpar-               a threat to democracy and social cohesion, and the fact
ent assessment methodology means that their datasets              that browser plugins are one of the few citizen-orientated
have been used for several research projects [23, 20].            direct interventions which can help solve the problem at
Their team of researchers undertake in-depth analyses             source while increasing long term media literacy, very
of news organisations and assess them using a standard-           few of the proposed solutions have been actively pro-
ised methodology, with some subjective judgement, to              moted or maintained. The main reason for this appears to
calculate a left/right bias score using their published for-      be the fact that many of these plugins were developed by
mula. They also calculate scores for factual reporting            individuals or small teams, or even as part of a hackathon,
and credibility. These reports are published on their web-        and were thus lacked the resources to be actively main-
site and updated from time to time. Each news website             tained or updated to deal with changing technology such
in their database is categorised as: left bias, left-centre       as browser updates or the rapidly evolving threats posed
bias, least biased, right-centre bias, right bias, pro-science,   by misinformation and disinformation. The following
conspiracy-pseudoscience, fake news, or satire. While             present those related projects found in the literature, but
their browser extension conveys limited details, further          which now no longer appear to be actively maintained,
information about each news source is available on their          though some are still available to install. URLs have been
website. It draws on this dataset to inform users when            included for posterity where possible as many do not
they click on the notification icon as to which of these          have peer-reviewed publications.
nine categories the news website they are viewing be-                B.S Detector5 relied on matching the URLs of content
longs to, including a brief explanation of the category.          in the news feed to a known allow/deny list of sources
It also provides a link to the detailed MBFC report. The          of fake news and misinformation.
browser extension also provides Facebook and Twitter                 AreYouFakeNews.com6 utilised Natural Language
support by displaying a visual left/right bias scale on           Processing (NLP) and deep learning to identify patterns
news articles that appear in users feeds with links to the        of bias on websites.
MBFC detailed report and Factual Search3 so that the                 Fake News Detector AI7 claimed to use a neural net-
user can investigate the topic further. While a valuable          work to detect similarity between submitted URLs and
resource with considerable detail, MBFC’s expert evalua-          known fake news websites.
tions are based on the historical publication record of the          Fake News Detector8 was designed to learn from
news website and not an evaluation of the content the             webpages flagged by users to detect other similar fake
user is looking at. It is also a labour intensive and time        news webpages.
consuming process.                                                   Trusted News9 is a browser plugin that was designed
   Stopaganda Plus4 [24] is a browser extension that              to assess the objectivity of news articles. Its functionality
adds accuracy and bias decals to Facebook, Twitter, Red-          was limited to ‘long form’ news articles and it does not
dit, DuckDuckGo and Google. These visual indicators               work with social media content.
extend the functionality of MBFC (who determine the
                                                                      5
                                                                        https://www.producthunt.com/posts/b-s-detector
    2                                                                 6
      https://mediabiasfactcheck.com/                                   https://github.com/N2ITN/are-you-fake-news
    3                                                                 7
      https://factualsearch.news                                        https://www.fakenewsai.com/
    4                                                                 8
      https://browserextension.dev/blog/stopagandaplus-helps-           https://fakenewsdetector.org/
                                                                      9
understanding-media-biases/                                             https://trusted-news.com/
  Fake News Guard10 claimed to combine linguistic            where necessary, provide an easy to understand warning
and network analysis techniques to identify fake news,       to the user when the content they are viewing may be
however this can no longer be verified.                      problematic or symptomatic of disinformation. In the
  FiB11 A browser extension built in a hackathon which       cases where linguistic analysis or other machine learn-
was reviewed several times in the literature as a compa-     ing approaches have been utilized, the results are not
rable system [31].                                           presented to the user in an explainable or transparent
  TrustedNews12 Trusted News used AI to help users           way. Some of these methods have also proven susceptible
evaluate news articles by scoring their objectivity [32].    to adversarial attacks, whereby text may be augmented
However, it does not work on social media and has issues     slightly to fool pretrained models [44, 45].
with analysing webpages that require scrolling.                 Two factors differentiating Provenance from the plug-
  Trusty Tweet [26] was designed to help users deal          ins described above are their limited reach and scalability.
with fake news tweets and to increase media literacy.        Many of the above plugins do not provide any informa-
Their transparent approach is designed to prevent reac-      tion for some heavily trafficked news websites such as the
tance and increase trust. Early user evaluations showed      LA Times, Al Jazeera, and the Independent.co.uk. This
promise.                                                     is likely due to limiting factors of time and labour of in-
  Check-It [33] was designed to analyse a range of sig-      cluding humans in the disinformation judgement process.
nals to identify fake news. It was focused on user privacy   While no one doubts the benefits of highly trained expert
with computation undertaken locally. Their approach          judgement, the size and nature of the rapidly evolving
used a combination of linguistic models, fact checking,      media landscape, especially in regard to misinformation
and website and social media user allow/deny lists.          and disinformation in which publishers are prone to rapid
                                                             growth, failure and re-branding, means that providing
3.2. Out of Scope Approaches                                 human ratings is a never ending game of whack-a-mole.
                                                             Current solutions are only partially succeeding in pro-
Some misinformation and disinformation detection tools       viding judgements of some news agencies. None have
which have been reviewed in other papers have not been       attempted to analyse the millions of pieces of content
included in this literature review. This is because they     they publish daily. Unlike each of the plugins described
are not a browser plugin or they are a paid for b2b ser-     above, Provenance does not require a human-in-the-loop,
vice (Fakebox [34]; AreYouFakeNews [35]), they are fo-       nor does it need to be backed by human-generated al-
cused on an aligned but separate issue e.g., detection of    low/deny lists. Its architecture supports fully automated
bias or detection of reused and or manipulated images        and intermediary free analysis of news content.
(Ground.News [36]; SurfSafe [37]), they are specifically        The ability to evaluate news articles against seven
for fact checking (BRENDA [38], CredEye [39]), they          criteria and provide users with visual notifications and
have pivoted into a B2B platform (FightHoax [40]), they      deeper explanations is also a significant advancement on
are not user orientated (Credible News [41, 42]), or they    the state of the art and a direct benefit to users in three
are research systems and have not been made available to     ways. First, and most importantly, users will be made
the public [30, 43]. While relevant to combating disinfor-   aware of individual issues with the content they are con-
mation, these are not directly comparable to Provenance.     suming and can thus decide whether they will continue
                                                             viewing it or look for alternative sources. Second, it will
3.3. Advancing the State of the Art                          help develop users’ media literacy skills by making them
                                                             aware of the different caution worthy indicators and how
This review demonstrates that browser plugins are a          to check them, making them less susceptible to misinfor-
common user-orientated approach to combat misinfor-          mation and disinformation in the future. Third, the na-
mation and disinformation. However, Provenance adopts        ture of these systems means that they cannot be properly
a significantly more advanced and granular methodol-         examined. In contrast, a full description of Provenance’s
ogy than current or previous efforts in the domain. The      system architecture is provided below. It is also currently
warnings provided by earlier plugins are often based on      undergoing evaluation and testing and the results will
the news website’s history of publishing misinformation      be published in time.
and disinformation. Thus, they are limited to provid-
ing a coarse-grained retrospective analysis of the news
website’s publication history. In contrast, Provenance’s 4. Architecture Overview
fine-grained approach is designed to analyse the content
of the news webpage or users’ social media feeds and, The system architecture for Provenance is shown in Fig-
                                                         ure 1. The components and services use REST APIs serv-
   10
      http://fakenewsguard.com/                          ing JSON for easy, reliable, and fast data exchanges across
   11
      https://projectfib.azurewebsites.net/              internal subsystems.
   12
    https://trusted-news.com/
Figure 1: Provenance System Architecture: Dashed lines denote REST API calls, solid lines denote local access.


   Data in the form of webpages or social media con-          to further investigate the claims made in the article’s
tent is ingested by Provenance either through the Social      content. The Personalised Companion Service is used to
Network Monitor or by a Trusted Content Analyst (e.g.,        determine how this information should be presented for
a journalist or fact checker). The Social Network Moni-       an individual user.
tor service discovers content using NewsWhip’s13 social
network monitoring platform. The introduced asset is en-      4.1. Key Components
riched with social engagement data (e.g., likes and shares)
and is forwarded to the Asset Workflow Handler service.       4.1.1. Social Network Monitor
   The Asset Workflow Handler separates the incoming
                                                              The Social Network Monitor communicates with
data (e.g., a news webpage) into individual assets such
                                                              NewsWhip’s Social Network API to identify assets which
as images, video, text, etc. These assets are registered
                                                              should be ingested by Provenance. Finding assets
with the Asset Fingerprinter before being disseminated to
                                                              involves querying Newswhip’s API with a parameterized
the analytical components (Video/Image Reverse-searcher,
                                                              search request. The call to NewsWhip’s Social Network
Video/Image Manipulation Detector, Text Similarity De-
                                                              API is automatically invoked periodically to maintain
tector, Text Tone Detector, and Writing Quality Detector)
                                                              an updated record of trending news articles and social
to determine if they exhibit any features which normally
                                                              media posts. Assets detected by NewsWhip are enriched
characterise misleading, questionable, or unsubstantiated
                                                              through social scoring. The URL, titles, summaries, im-
information. The output of each analytical service, and
                                                              ages and videos (if any), along with the enrichment data,
the initial data passed from the Social Network Monitor
                                                              is extracted from the article and provided to Provenance.
are combined and sent to the Knowledge Graph where
                                                              Assets composed only of text, for example, are registered
they are stored.
                                                              in fragments consisting of news feed/article title, the
   The Knowledge Graph may be queried by the Prove-
                                                              summary, and user engagement data.
nance Query Service to retrieve the results of analysis for
a given webpage. The Provenance plugin, installed in the
user’s browser, leverages this query service to retrieve      4.1.2. Asset Registration
information about webpages that a user is currently view-     A dedicated Asset Registration web interface also allows
ing. If the webpage has been analysed by Provenance,          Trusted Content Analysts to add assets into the Asset
and exhibits questionable features, the plugin will issue     Workflow Handler. Trusted Content Analysts are stake-
a warning to the user, indicating that they may want          holders such as journalists and other representatives
   13
                                                              of news agencies and wire services, fact checkers, de-
        https://www.newswhip.com
bunkers, and original content creators who may want to         search operation for videos and images.
register their multimedia content assets. In future, this
facility will be made more widely available to allow the       4.1.5. Video/Image Manipulation Detector
general public to send content directly to Provenance. It
may also be integrated with news publication platforms         The Provenance Video/Image Manipulation Detector iden-
and content management systems so that content is au-          tifies if an image or video has been manipulated in com-
tomatically added. The primary task of this component          parison to its source. This work is based on the PIZ-
is to enable third-parties to register assets that have not    ZARO14 project. It utilises recent developments achieved
been discovered by the Social Network Monitor.                 by deep learning-based methods to enable an instant de-
                                                               tection of manipulations in visual content. In addition,
                                                               use of the latest technologies based on Convolutional Net-
4.1.3. Asset Workflow Handler
                                                               works will lead to tangible enhancements in integrity ver-
The Asset Workflow Handler is the component of the             ification in visual content. The Video/Image Manipulation
Provenance Verification Layer that is responsible for or-      Detector increases trust and improves governance. The
chestrating the components and data within the layer.          solution is designed to build a web-based system to assess
This component’s primary task is to distribute assets to       visual content in a real-world setting. The Video/Image
different components for further processing. It invokes        Manipulation Detector will further support the develop-
the service interfaces and handles the data flow between       ment of user skills in detecting false visual information
the services. By utilising the Asset Workflow Handler,         themselves by providing a world-class image forensic
components are loosely coupled, thus mitigating direct         technology. The Video/Image Manipulation Detector has
component-to-component communications. This will en-           a special focus on developing a solution that will be intu-
able Provenance to work with the variety of APIs exposed       itive and easy to understand and interpret for end-users,
from the existing tools/components. Moreover, the APIs         thereby increasing its uptake by the public and its impact
can be adjusted to meet Provenance’s specific needs. Due       on the information system. This component’s primary
to this modular design, new components can be easily           task is to detect if the image and video are manipulated
added to the Provenance Verification Layer (e.g., detection    by comparing them with previously registered images
of bias [46], tabloidization [47], and hate speech [48]),      and videos in the system.
and connected to the Asset Workflow Handler.
                                                               4.1.6. Asset Fingerprinter and Asset Registry
4.1.4. Video/Image Reverse Searcher
                                                               The Asset Fingerprinter and Asset Registry provide trace-
The Video/Image Reverse Searcher is a key component            ability of registered content. It is based on Blockchain
for creating a large-scale annotated dataset for detect-       technology, making content immutable and enabling the
ing manipulated visual content. The dataset consists of        verification of the sources and alterations to the content.
three distinct parts. The first part includes 45,000 images,   Registered assets are handed to the Asset Fingerprinter
each captured by a unique device (i.e., 45,000 different       via the Asset Workflow Handler. Due to the General Data
cameras have been used). Half of these images are real,        Protection Regulation (GDPR) and the size of some assets,
and the other half has been digitally manipulated by ap-       the hash of the data is stored on Blockchain. Azure Stor-
plying a random image processing operation to a local          age is used as the Blockchain, and the assets themselves,
area of the image. Since the sensor pattern noise present      including large files, are stored using an off-line storage
in images is unique to each sensor (i.e., camera), this        service available to store multimedia files. Blockchain is
dataset introduces large diversity, such as noise. The         used due to its innate data integrity which is important
second part of the dataset uses imaging software in cam-       to prove the traceability of registered content if the tool
eras to introduce a large diversity of artefacts in images.    was ever targeted as part of a combined disinformation
Commonly available camera brands and models were               and hacking campaign. This component’s primary task
identified and used to collect a dataset of 50,000 images.     is the traceability of registered content via Blockchain.
Half of these images were digitally manipulated using
an advanced image editing method based on Generative           4.1.7. Text Similarity Detector
Adversarial Networks (GAN) [49]. Finally, the third part
of the dataset consists of 2,000 images downloaded from        News is regularly republished nationally and locally
the Internet representing “real-life” (uncontrolled) ma-       from international wire services such as Reuters, Agence
nipulated images created by random people. For all of          France-Presse (AFP) and Associated Press (AP). In a bid
the manipulated samples collected for the third part of        to lower costs, many news agencies who are not in com-
the data, the matching unmanipulated image was also            petition negotiate deals to republish each other’s content.
collected. This component’s primary task is to enable
                                                                  14
                                                                       http://zoi.utia.cas.cz/node/180/0459504
Similarly, less trustworthy news outlets often put ‘spins’      which had characteristics symptomatic of disinforma-
on existing articles, where correct articles are modified       tion, was annotated in a crowdsourced study to identify
to contain false information.                                   terms and phrases indicative of low quality writing. A
   To combat this, the Text Similarity Detector in Prove-       WQS for each piece of content was then derived using a
nance attempts to verify the textual content of an article      standard formula. This was subject to testing and expert
by comparing it to similar articles published elsewhere.        evaluation to ensure the WQS the formula produced accu-
A backlog of trustworthy articles is stored in an Elastic-      rately reflected each piece of content. Models were then
search database with a BM25 similarity index [50]. As           trained on the dataset which showed that the WQS could
BM25 under-performs with very long documents [51],              be automatically generated with a high degree of accu-
only the title and first 10 sentences are used in the index.    racy. These models and the overall process are currently
Once similar articles have been found the component             undergoing formal evaluation.
searches for facts given in the query article in the similar
ones. Facts in an article are found by taking sentences         4.1.10. Knowledge Graph and Knowledge Graph
with a low subjectivity from TextBlob’s sentiment analy-                Builder
sis model [52]. The similarity of two facts is the cosine
similarity of the vector embedding of both, which is pro-       The Provenance Knowledge Graph stores a record of all
vided by Google’s multilingual text model [53]. If enough       the articles introduced to Provenance via the Social Net-
of the article’s factual content cannot be verified, the plu-   work Monitor service or via Asset Registration from a
gin displays a warning.                                         Trusted Content Analyst. It is also a record of all analysis
                                                                performed on said assets.
                                                                   The content is organised according to concept, cate-
4.1.8. Text Tone Detector
                                                                gories and topics. For example, a news article discussing
Intuitively, one would expect that impartial news sources       politics can be categorised according to the left/right
would use impartial, unemotive language to convey the           political spectrum followed by the topics discussed as
facts of a story. Recent research has shown that emotions       shown in Figure 2. Each node at the article level is split
such as fear, anger, sadness, doubt, and the absence of         according to text, image and video.
joy and happiness are indicative of misinformation and             The output of the Video/Image Reverse Searcher in-
disinformation [54, 55, 56]. Provenance’s Text Tone De-         cludes the N most similar images/videos, distance mea-
tector is designed to identify emotions in text which may       sures and geometric validation results. The data from the
indicate that the news source is unreliable. Threshold          Video/Image Manipulation Detector includes the proba-
values are used to determine whether caution should be          bility of manipulations and the area of polygons. These
shown, and the degree of caution is determined by how           are sent as JSON objects to the Knowledge Graph where
far the calculated value deviates from the threshold value.     they are stored as entities in a triplestore.
                                                                   Modelling of Provenance data is achieved using a com-
4.1.9. Writing Quality Detector                                 bination of the RDF Data Cube vocabulary [64] to store
                                                                statistical information such as the outputs from the vari-
Provenance’s Writing Quality Detector computes a writ-          ous analytical components, and the Dublin Core/BIBO
ing quality score (WQS) for the textual content the user        vocabularies [65] to model bibliographic information
is viewing and provides a warning when it falls below a         about the assets themselves. Some use is also made of
threshold value. Writing quality is closely related to cohe-    the FOAF15 vocabulary to model information such as
sion and coherence [57]. Within the context of news, high       content publishers, which are naturally represented as
quality writing is indicative of paid professional journal-     foaf:Agent entities.
ism from mainstream, independent, and to a lesser degree,          The Knowledge Graph Builder is responsible for expos-
alternative news agencies, whereas low quality writing is       ing a REST API which the Asset Workflow Handler may
indicative of amateur or unprofessional news production         use to upload assets as JSON, and then transforming the
processes [58]. This high/low quality differentiation is        JSON into triples which are stored in a triplestore. In
also apparent in other domains such as academia, pub-           Provenance, this is achieved using JOPA [66]: a Java li-
lishing, commercial, and blogs and information websites.        brary which can be used to map POJOs to triples. Using
While NLP techniques exist to derive writing quality [59],      Spring Boot16 , a REST API accepting JSON is exposed.
and others have called for it to be used to identify misin-     The uploaded JSON is serialized into POJOs using Spring
formation and disinformation [60, 61], only two examples        Boot’s built-in version of Jackson. JOPA is then used to
of systems could be found in the literature which actually      serialize the triples out to an RDF4J17 instance.
calculate writing quality [62, 63].
   To calculate WQSs for Provenance, a dataset of news             15
                                                                        http://xmlns.com/foaf/spec/
articles, blog posts, and other website content, much of           16
                                                                        https://spring.io/projects/spring-boot
                                                                   17
                                                                        https://rdf4j.org/
                                                               is implemented as a Chrome Extension and works on
                                                               the Facebook and Twitter platforms and with articles
                                                               published by news agencies. The Personalised Compan-
                                                               ion Service uses the user’s interests, domain knowledge,
                                                               digital literacy, and the warning preferences stored in
                                                               the Minimal User Model to determine whether to high-
                                                               light caution or show the verification indicator without
                                                               caution. The Personalised Companion Service uses the
                                                               data provided by the Asset Fingerprinter, the Video/Image
                                                               Reverse Searcher and Video/Image Manipulation Detector,
                                                               and the Text Similarity, Tone and Writing Quality Detector
                                                               components to create the set of icons that are presented
                                                               to users, who can explore the levels of verification pre-
                                                               sented through the visual iconography.


Figure 2: Knowledge Graph categorisations of assets.           5. Provenance in Action
                                                               The Provenance browser plugin is designed to provide
   The same serialization process works in reverse, al-
                                                               users with easy to understand, granular and cautionary
lowing the Provenance Query Service to expose both a
                                                               warnings about the content they are consuming. These
JSON REST endpoint which can produce JSON objects
                                                               warnings are provided via an in-browser icon beside the
from the results of a canned SPARQL query exposed via a
                                                               address bar when the user is browsing the Internet, or
Spring Boot REST endpoint, and a much lower level raw
                                                               within their Facebook and Twitter social media feeds
SPARQL endpoint from the triplestore, for those who
                                                               beside the content they are viewing. Figures 3 - 6 show
want a high level of control over their queries.
                                                               how Provenance and its visual warnings appear to a user
                                                               - who has the Provenance plugin installed - within their
4.1.11. Provenance Query Service                               Facebook social media feed. The Provenance icon appears
The Provenance Query Service is the interface to the Verifi-   as a small blue square with a white P above each content
cation Layer and offers external trusted services with the     item that it has checked. When the icon background
means to request verification information about a web-         turns red (with a small exclamation mark), it indicates to
page or article. It will also allow trusted services with      the user that the content item is worthy of a cautionary
a means to identify the relatedness of content (through        warning. The following presents the four main states of
similarity and the Knowledge Graph) and determine if           Provenance which a user will see.
content has been modified. As the results of all analy-           Figure 3 shows a user’s Facebook feed who has the
sis are stored in the Knowledge Graph, the Provenance          Provenance browser plugin installed. The Provenance
Query Service is effectively a proxy between the user-         icon is visible at the top of each news article in the user’s
facing front-end, and the query interface to whatever          feed. In this image, the icon is blue which indicates that
storage medium is used to implement the Knowledge              there are no warnings with this particular news item.
Graph.                                                            In Figure 4, the background of the Provenance icon
   As mentioned in Section 4.1.10, the Provenance Query        within the user’s news feed has turned red to indicate
Service exposes both a raw SPARQL endpoint and a REST          that this news item is worthy of one or more cautionary
API which provides endpoints for a number of canned            warnings. A small black exclamation mark has been
SPARQL queries which return JSON objects. It is envi-          added to the top right of the icon for colour blind users.
sioned that the vast majority of user cases will be covered       In Figure 5, the user has clicked on the red Provenance
by the REST API, making it easier for developers to access     icon. A window has appeared beneath the Provenance
data that is helpful to users. However, it is worthwhile to    icon to show the user which of the seven criteria the
allow lower level access to the KG’s contents in the event     news article was checked against that Provenance has de-
of unforeseen requirements being placed on the KG.             tected an issue with. In this example, the red background
                                                               and exclamation mark beneath the Writing Quality icon
4.1.12. Personalised Companion Service                         indicates that this aspect of the news article is worthy of
                                                               caution. The user may click on the downward arrow be-
The Personalised Companion Service manages the Prove-          neath each icon for further information. In this example,
nance verification indicator, the minimal user model, and      the Tone icon is greyed out indicating that this could not
user scrutability and control. The verification indicator      be assessed by Provenance in this instance.
Figure 3: A user’s Facebook feed showing the Provenance
icon in blue indicating that there are no warnings.
                                                            Figure 5: An initial explanation pane appears when then user
                                                            clicks on the Provenance icon in their social media feed.




Figure 4: The Provenance icon in red (with exclamation
mark) indicating that this article has one or more issues
which are worthy of caution.                                Figure 6: A detailed explanation pane appears when the user
                                                            clicks on any of the seven categories Provenance analyses the
                                                            news item under.
   Figure 6 shows a detailed explanation of the Writing
Quality warning after the user clicked on the option to
expand it. It contains further information about how
                                                        6. Use Cases: Provenance Plugin
Writing Quality score is calculated and why low quality
writing is indicative of misinformation and disinforma-
                                                        6.1. Social Media Timeline
tion.                                                   On the recommendation of a friend, Mary installed the
                                                        Provenance browser plugin due to increased concerns
about the spread of misinformation and disinformation.          the images. As this is just an image of a press conference,
The instructional video on the Provenance Chrome Ex-            she is confident that its use by multiple news agencies is
tension webpage explained that Provenance uses seven            not an issue.
criteria to verify digital content on the Internet and social
media feeds. After installing the Provenance plugin, she
notices that the news items in her Facebook timeline now        7. Evaluation
display the Provenance icon beside the publisher’s name.
                                                                Provenance is under development and will shortly be un-
For most of the news stories, the Provenance icon shows
                                                                dergoing human evaluation. Currently, five of the seven
a white P inside a white circle on a blue background.
                                                                news analysis functions have been implemented and have
When she clicks on the blue Provenance icon, it opens a
                                                                been integrated with the platform. These are undergoing
notification pane showing the seven verification criteria,
                                                                technical evaluation while the final two analysis tools are
all of which display a green background with a white ✓.
                                                                being completed. When the tool is fully completed, a se-
   She is able to click on each of the seven verification
                                                                ries of technical tests and human evaluation tests will be
icons to read a detailed explanation for each criterion,
                                                                undertaken to evaluate basic functionality and to ensure
why failing the criterion is an indication that the webpage
                                                                that it is providing the right warnings at the appropriate
or social media post may be misinformation or disinfor-
                                                                time. Following this, a series of experiments will be un-
mation, and how the warning is derived. As all of the
                                                                dertaken to evaluate its effect on user behaviour. This
icons are green, she is reassured about the origin, ve-
                                                                will include the likelihood of reading and sharing news
racity and overall quality of the news article. For some
                                                                articles that have cautionary warnings beside them. We
news items displayed on her timeline, she notices that
                                                                will also be analysing unintended effects of the tool. Fi-
the blue background of the Provenance icon has turned
                                                                nally, a series of long term studies are planned to evaluate
red. When she clicks on it, the same information pane
                                                                its effect on users’ media literacy.
displaying the same verification criteria appears, except
one or more of the seven verification criteria now display
a red background with an exclamation mark beneath.              8. Conclusions
When she clicks on these, an additional detailed expla-
nation pane appears underneath them to explain why it       Misinformation and disinformation are significant issues
has failed. Reading through each warning including their    that have negatively affected public discourse, politics
detailed description, she gains a better understanding      and social cohesion. The Internet and especially social
of how to identify misinformation and disinformation.       media are the primary conduits for its growth and spread.
In both instances, Mary has become more aware of the        Existing user-orientated browser plugins have limited
need to critically check the news she consumes and more     capabilities and only provide users with an historical rat-
aware of good media literacy habits in general.             ing of a website’s propensity to publish misinformation
                                                            and disinformation. They are also not capable of detailed
6.2. News Websites                                          analysis of the content of news webpages or social me-
                                                            dia feeds. The Provenance browser plugin significantly
Mary regularly visits news websites to inform herself of improves upon existing user orientated solutions by pro-
current affairs. Usually, the Provenance icon, which is viding intermediary free analysis of webpage and social
visible to the right of her browser’s address bar, displays media content using seven criteria, and where necessary
a white P inside a white circle on a blue background. providing cautionary warnings to users. The user can
However, recently when she was visiting news websites then check the detailed explanatory warning notifica-
to read more about a story relating to Covid 19 vaccina- tions to make their own judgement. This will improve
tion, she noticed that the background of the Provenance users’ media literacy and reduce susceptibility to misin-
icon would sometimes turn red. When she clicked on the formation and disinformation long term.
icon, the verification criteria information pane showed
that Provenance had detected a problem with the image
used in the news article she was reading. Clicking on 9. Acknowledgements
the arrow to open the drop-down explanation pane, she
reads that Provenance has detected that the image has The work has been supported by the PROVENANCE
been used before in another article. The image in ques- project which has received funding from the European
tion shows a picture taken at a conference of the World Union’s Horizon 2020 research and innovation pro-
Health Organisation. Looking closely, she sees a credit gramme under Grant Agreement No. 825227, and with
to the Associated Press (AP). She knows that AP is an the financial support of Science Foundation Ireland under
international news wire service, and that local and na- Grant Agreement No. 13/RC/2106_P2 at the ADAPT SFI
tional news agencies republish their articles, including Research Centre.
References                                                        sikas, V. Zorkadis (Eds.), E-Democracy – Safeguard-
                                                                  ing Democracy and Human Rights in the Digi-
 [1] G. Rehm, An infrastructure for empowering in-                tal Age, Communications in Computer and In-
     ternet users to handle fake news and other online            formation Science, Springer International Publish-
     media phenomena, in: G. Rehm, T. Declerck (Eds.),            ing, 2020, p. 196–208. doi:10.1007/978-3-030-
     Language Technologies for the Challenges of the              37545-4_13.
     Digital Age, Lecture Notes in Computer Science,         [12] D. Martín-Gutiérrez, G. Hernández-Peñaloza, J. M.
     Springer International Publishing, 2018, p. 216–231.         Menéndez, F. Álvarez, A multi-modal approach for
     doi:10.1007/978-3-319-73706-5_19.                            fake news discovery and propagation from big data
 [2] E. Commission, Action plan against disinfor-                 analysis and artificial intelligence operations (2020)
     mation (2018). URL: https://ec.europa.eu/digital-            3.
     single-market/en/news/action-plan-against-              [13] D. Martín-Gutiérrez, G. Hernández-Peñaloza,
     disinformation.                                              A. B. Hernández, A. Lozano-Diez, F. Ál-
 [3] E. Commission, Tackling online disinformation,               varez, A deep learning approach for robust
     2017. URL: https://ec.europa.eu/digital-single-              detection of bots in twitter using trans-
     market/en/tackling-online-disinformation.                    formers,     IEEE Access 9 (2021) 54591–54601.
 [4] J. Bayer, N. Bitiukova, P. Bard, J. Szakács, A. Ale-         doi:10.1109/ACCESS.2021.3068659.
     manno, E. Uszkiewicz, Disinformation and Propa-         [14] L. Ginsborg, P. Gori, Report on a survey for fact
     ganda – Impact on the Functioning of the Rule of             checkers on COVID-19 vaccines and disinforma-
     Law in the EU and its Member States, 2019. URL:              tion, 2021. URL: https://cadmus.eui.eu//handle/
     https://papers.ssrn.com/abstract=3409279.                    1814/70917, accepted: 2021-04-26T08:57:47Z.
 [5] 2021. URL: https://twitter.com/vonderleyen/status/      [15] U. N. S. Verified, Shareverified, 2021. URL: https:
     1354030170789834755.                                         //shareverified.com/en.
 [6] A. Aker, A. Sliwa, F. Dalvi, K. Bontcheva, Rumour       [16] W. H. Organisation, 1st who infodemiology con-
     verification through recurring information and an            ference, who infodemic management, 2020. URL:
     inner-attention mechanism, Online Social Net-                https://www.who.int/teams/risk-communication/
     works and Media 13 (2019) 100045. doi:10.1016/               infodemic-management/1st-who-infodemiology-
     j.osnem.2019.07.001.                                         conference.
 [7] Z. Marinova, J. Spangenberg, D. Teyssou, S. Pa-         [17] T. P. Institute, A guide to anti-misinformation
     padopoulos, N. Sarris, A. Alaphilippe, K. Bontcheva,         actions around the world, 2021. URL: https:
     Weverify: Wider and enhanced verification for                //www.poynter.org/ifcn/anti-misinformation-
     you project overview and tools, in: 2020 IEEE                actions/.
     International Conference on Multimedia Expo             [18] 2021. URL: https://www.newsguardtech.com/.
     Workshops (ICMEW), 2020, p. 1–4. doi:10.1109/           [19] J. Nørregaard, B. D. Horne, S. Adalı, Nela-gt-2018:
     ICMEW46912.2020.9106056.                                     A large multi-labelled news dataset for the study
 [8] M. Choraś, M. Pawlicki, R. Kozik, K. Demestichas,            of misinformation in news articles, Proceedings
     P. Kosmides, M. Gupta, Socialtruth project ap-               of the International AAAI Conference on Web and
     proach to online disinformation (fake news) de-              Social Media 13 (2019) 630–638.
     tection and mitigation, in: Proceedings of the          [20] A. Aker, V. Kevin, K. Bontcheva, Credibility and
     14th International Conference on Availability, Re-           transparency of news sources: Data collection and
     liability and Security, ARES ’19, Association for            feature analysis (2019) 6.
     Computing Machinery, 2019, p. 1–10. URL: https:         [21] Le Monde.fr (2017). URL: https://www.lemonde.fr/
     //doi.org/10.1145/3339252.3341497. doi:10.1145/              les-decodeurs/article/2017/01/23/le-decodex-un-
     3339252.3341497.                                             premier-premier-pas-vers-la-verification-de-
 [9] L. Derczynski, K. Bontcheva, Pheme: Veracity in              masse-de-l-information_5067709_4355770.html.
     digital social networks (2014) 4.                       [22] 2021. URL: https://mediabiasfactcheck.com/.
[10] P. K. Srijith, M. Hepple, K. Bontcheva, D. Preotiuc-    [23] V. Kevin, B. Högden, C. Schwenger, A. Şahan,
     Pietro, Sub-story detection in twitter with hierar-          N. Madan, P. Aggarwal, A. Bangaru, F. Muradov,
     chical dirichlet processes, Information Processing           A. Aker, Information nutrition labels: A plugin for
     & Management 53 (2017) 989–1003. doi:10.1016/                online news evaluation, ACL, 2018. doi:10.18653/
     j.ipm.2016.10.004.                                           v1/W18-5505.
[11] L. Toumanidis, R. Heartfield, P. Kasnesis, G. Loukas,   [24] 2020. URL: https://browserextension.dev/blog/
     C. Patrikakis, A prototype framework for assess-             stopagandaplus-helps-understanding-media-
     ing information provenance in decentralised so-              biases/.
     cial media: The eunomia concept, in: S. Kat-            [25] P. Nordberg, J. Kävrestad, M. Nohlberg, Au-
     tomatic detection of fake news, in: Proceed-                chinery, 2020, p. 2117–2120. URL: https://doi.org/
     ings of the 6th International Workshop on Socio-            10.1145/3397271.3401396.
     Technical Perspective in IS Development (STPIS         [39] K. Popat, S. Mukherjee, J. Strötgen, G. Weikum,
     2020), CEUR-WS, 2020, p. 168–179. URL: http://              Credeye: A credibility lens for analyzing and ex-
     urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19356.            plaining misinformation, in: Companion Pro-
[26] K. Hartwig, C. Reuter, Trustytweet: An indicator-           ceedings of the The Web Conference 2018, WWW
     based browser-plugin to assist users in dealing with        ’18, International World Wide Web Conferences
     fake news on twitter (2019).                                Steering Committee, 2018, p. 155–158. URL: https:
[27] A. Giełczyk, R. Wawrzyniak, M. Choraś, Evalua-              //doi.org/10.1145/3184558.3186967. doi:10.1145/
     tion of the existing tools for fake news detection,         3184558.3186967.
     in: K. Saeed, R. Chaki, V. Janev (Eds.), Computer      [40] FightHoax, Fighthoax - unlock your programmatic
     Information Systems and Industrial Management,              advertising, 2021. URL: http://34.253.212.69/.
     Lecture Notes in Computer Science, Springer Inter-     [41] M. Hardalov, I. Koychev, P. Nakov, In search of cred-
     national Publishing, 2019, p. 144–151. doi:10.1007/         ible news, in: C. Dichev, G. Agre (Eds.), Artificial
     978-3-030-28957-7_13.                                       Intelligence: Methodology, Systems, and Applica-
[28] A. Školkay, J. Filin, A comparison of fake news de-         tions, Lecture Notes in Computer Science, 2016.
     tecting and fact-checking ai based solutions, Studia        doi:10.1007/978-3-319-44748-3_17.
     Medioznawcze 20 (2019) 365–383.                        [42] M. Hardalov, mhardalov/news-credibility, 2019.
[29] K. Shu, A. Sliva, S. Wang, J. Tang, H. Liu, Fake            URL:          https://github.com/mhardalov/news-
     news detection on social media: A data mining per-          credibility.
     spective, ACM SIGKDD Explorations Newsletter           [43] X. Zhou, A. Jain, V. V. Phoha, R. Zafarani, Fake news
     19 (2017) 22–36. doi:10.1145/3137597.3137600.               early detection: A theory-driven model, Digital
[30] A. Hanselowski, A. PVS, B. Schiller, F. Caspel-             Threats: Research and Practice 1 (2020) 12:1–12:25.
     herr, D. Chaudhuri, C. M. Meyer, I. Gurevych, A             doi:10.1145/3377478.
     retrospective analysis of the fake news challenge      [44] W. E. Zhang, Q. Z. Sheng, A. Alhazmi, C. Li, Adver-
     stance-detection task, in: Proceedings of the 27th          sarial attacks on deep learning models in natural
     International Conference on Computational Lin-              language processing: A survey, arXiv:1901.06796
     guistics, Association for Computational Linguistics,        [cs] (2019). URL: http://arxiv.org/abs/1901.06796,
     2018, p. 1859–1874. URL: https://www.aclweb.org/            arXiv: 1901.06796.
     anthology/C18-1158.                                    [45] Z. Zhou, H. Guan, M. M. Bhat, J. Hsu, Fake
[31] A. Goel, ProjectFib - GitHub Repo, 2016. URL: https:        news detection via nlp is vulnerable to adver-
     //github.com/anantdgoel/ProjectFib.                         sarial attacks,       Proceedings of the 11th In-
[32] Eyeo, 2020. URL: https://chrome.google.com/                 ternational Conference on Agents and Artifi-
     webstore/detail/trusted-news/                               cial Intelligence (2019) 794–800. doi:10.5220/
     nkkghpncidknplmlkgemdoekpckjmlok?hl=en.                     0007566307940800, arXiv: 1901.09657.
[33] D. Paschalides, C. Christodoulou, R. Andreou,          [46] B. Spillane, S. Lawless, V. Wade, The impact of
     G. Pallis, M. D. Dikaiakos, A. Kornilakis,                  increasing and decreasing the professionalism of
     E. Markatos, Check-it: A plugin for detecting               news webpage aesthetics on the perception of bias
     and reducing the spread of fake news and misin-             in news articles, in: Proceedings of the 22nd
     formation on the web, in: 2019 IEEE/WIC/ACM                 International Conference On Human-Computer
     International Conference on Web Intelligence (WI),          Interaction, Lecture Notes in Computer Science,
     2019, p. 298–302.                                           Springer, 2020. doi:https://doi.org/10.1007/
[34] V. Inc, Fakebox, 2021. URL: https://machinebox.io/.         978-3-030-49059-1_50.
[35] Z. A. Estela, N2ITN/are-you-fake-news, 2021. URL:      [47] B. Spillane, I. Hoe, M. Brady, V. Wade, S. Lawless,
     https://github.com/N2ITN/are-you-fake-news.                 Tabloidization versus credibility: Short term
[36] 2021. URL: https://ground.news/.                            gain for long term pain, in: CHI ’20: The ACM
[37] A.       Bhat,        SurfSafe,     2021.       URL:        Conference on Human Factors in Computing
     https://chrome.google.com/webstore/                         Systems, ACM, 2020. URL: https://dl.acm.org/
     detail/surfsafe-join-the-fight-a/                           doi/abs/10.1145/3313831.3376388.            doi:http:
     hbpagabeiphkfhbboacggckhkkipgdmh?hl=en.                     //dx.doi.org/10.1145/3313831.3376388.
[38] B. Botnevik, E. Sakariassen, V. Setty, Brenda:         [48] A. Schmidt, M. Wiegand, A survey on hate speech
     Browser extension for fake news detection, in:              detection using natural language processing, in:
     Proceedings of the 43rd International ACM SIGIR             Proceedings of the Fifth International Workshop
     Conference on Research and Development in Infor-            on Natural Language Processing for Social Media,
     mation Retrieval, Association for Computing Ma-             Association for Computational Linguistics, 2017,
     p. 1–10. URL: https://aclanthology.org/W17-1101.            spread of fake news via third-person perception,
     doi:10.18653/v1/W17-1101.                                   Human Communication Research 47 (2021) 1–24.
[49] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,           doi:10.1093/hcr/hqaa010.
     D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio,    [59] V. Klyuev, Fake news filtering: Semantic ap-
     Generative adversarial nets, in: Advances in                proaches, in: 2018 7th International Conference
     Neural Information Processing Systems, vol-                 on Reliability, Infocom Technologies and Optimiza-
     ume 27, Curran Associates, Inc., 2014. URL:                 tion (Trends and Future Directions) (ICRITO), 2018,
     https://proceedings.neurips.cc/paper/2014/hash/             p. 9–15. doi:10.1109/ICRITO.2018.8748506.
     5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html.        [60] M. Spradling, J. Straub, J. Strong, Protection from
[50] S. E. Robertson, S. Walker, Some simple effective           ‘fake news’: The need for descriptive factual label-
     approximations to the 2-poisson model for proba-            ing for online content, Future Internet 13 (2021)
     bilistic weighted retrieval, in: SIGIR’94, Springer,        142. doi:10.3390/fi13060142.
     1994, pp. 232–241.                                     [61] N. Fuhr, A. Giachanou, G. Grefenstette, I. Gurevych,
[51] Y. Lv, C. Zhai,       When documents are very               A. Hanselowski, K. Jarvelin, R. Jones, Y. Liu,
     long, bm25 fails!,        in: Proceedings of the            J. Mothe, W. Nejdl, et al., An information nutritional
     34th international ACM SIGIR conference on                  label for online documents, ACM SIGIR Forum 51
     Research and development in Information Re-                 (2018) 46–66. doi:10.1145/3190580.3190588.
     trieval, SIGIR ’11, Association for Computing          [62] C. Fan, Classifying fake news, 2017. URL:
     Machinery, 2011, p. 1103–1104. URL: https:                  https://www.conniefan.com/wp-content/uploads/
     //doi.org/10.1145/2009916.2010070. doi:10.1145/             2017/03/classifying-fake-news.pdf, connie Fan.
     2009916.2010070.                                       [63] E. S. Jo, A. Muhamed, S. Nuthakki, A. Singhania,
[52] S. Loria, textblob documentation (2020). URL:               DeepNews: Detecting Quality in News, 2018.
     https://buildmedia.readthedocs.org/media/pdf/          [64] W. W. W. Consortium, et al., The rdf data cube
     textblob/latest/textblob.pdf, release 0.16.0.               vocabulary (2014).
[53] Y. Yang, D. Cer, A. Ahmad, M. Guo, J. Law,             [65] D. C. M. Initiative, et al., Dublin core metadata
     N. Constant, G. H. Abrego, S. Yuan, C. Tar, Y.-H.           element set, version 1.1 (2012).
     Sung, B. Strope, R. Kurzweil, Multilingual univer-     [66] M. Ledvinka, P. Kremen, Jopa: Accessing ontologies
     sal sentence encoder for semantic retrieval, 2019.          in an object-oriented way., in: ICEIS (2), 2015, pp.
     arXiv:1907.04307.                                           212–221.
[54] S. B. Parikh, V. Patil, P. K. Atrey, On the origin,
     proliferation and tone of fake news, in: 2019 IEEE
     Conference on Multimedia Information Processing
     and Retrieval (MIPR), IEEE, 2019, p. 135–140. URL:
     https://ieeexplore.ieee.org/document/8695387/.
     doi:10.1109/MIPR.2019.00031.
[55] J. Paschen, Investigating the emotional appeal of
     fake news using artificial intelligence and human
     contributions, Journal of Product & Brand Manage-
     ment 29 (2019) 223–233. doi:10.1108/JPBM-12-
     2018-2179.
[56] X. Zhang, J. Cao, X. Li, Q. Sheng, L. Zhong,
     K. Shu, Mining dual emotion for fake news
     detection,       Proceedings of the Web Con-
     ference 2021 (2021) 3465–3476. doi:10.1145/
     3442381.3450004, arXiv: 1903.01728 version: 1.
[57] I. Singh, D. P., A. K., On the coherence of fake
     news articles, in: I. Koprinska, M. Kamp, A. Ap-
     pice, C. Loglisci, L. Antonie, A. Zimmermann,
     R. Guidotti, O. Özgöbek, R. P. Ribeiro, R. Gavaldà,
     et al. (Eds.), ECML PKDD 2020 Workshops, Com-
     munications in Computer and Information Science,
     Springer International Publishing, 2020, p. 591–607.
     doi:10.1007/978-3-030-65965-3_42.
[58] M. Chung, N. Kim, When i learn the news is
     false: How fact-checking information stems the