=Paper= {{Paper |id=Vol-2877/paper1 |storemode=property |title=KI2TE: Knowledge-Infused InterpreTable Embeddings for COVID-19 Misinformation Detection |pdfUrl=https://ceur-ws.org/Vol-2877/paper1.pdf |volume=Vol-2877 |authors=William Shiao,Evangelos E. Papalexakis |dblpUrl=https://dblp.org/rec/conf/www/ShiaoP21 }} ==KI2TE: Knowledge-Infused InterpreTable Embeddings for COVID-19 Misinformation Detection== https://ceur-ws.org/Vol-2877/paper1.pdf
        KI2TE: Knowledge-Infused InterpreTable Embeddings for
                 COVID-19 Misinformation Detection
                           William Shiao                                                           Evangelos E. Papalexakis
                 University of California Riverside                                               University of California Riverside
                        wshia002@ucr.edu                                                                epapalex@cs.ucr.edu
                                                                              Articles/Claims                   Shared Embedding Space
ABSTRACT
                                                                                                                          EA                                               Reference Document
As COVID-19 continues to spread across the world, concerns re-
                                                                                                                Article/Claim Embeddings
garding the spread of misinformation about it are also growing.                                     Embedding
                                                                                                                                                   ƒ                             M




                                                                                                                                                                 Article
                                                                                                      Model                                                                                        ƒ         Real/Fake

In this work, we propose a preliminary novel method to identify                                                                                 Pairwise
                                                                                                                                           Similarity/Distance
                                                                                                                                                                                                Classifier

fake articles and claims by using information from the CORD-19                                                            ED                                               Distance Matrix

academic paper dataset. Our method uses the similarity between                                                   Document Embeddings
                                                                            Reference Documents
articles and reference manuscripts in a shared embedding space                         Figure 1: An illustration of the classification stages.
to classify the articles. This also provides an explanation for each
classification decision that links a particular article or claim to a
small number of research manuscripts that influence the decision.
We collect 90K real articles and 20K fake articles about the coron-
avirus, as well as over 700 human-labelled claims from the Google
FactCheck API, and evaluate its performance on these datasets.
We also evaluate its performance on MM-COVID [13], a recent
COVID-19 news dataset. We demonstrate the explainability of our
model and discuss its limitations.

                                                                            Many different approaches for fake news classification have
1   INTRODUCTION                                                         been proposed. One class of approaches revolve around checking
The current time dictates an unprecedented outbreak of the novel         whether or not statements are likely to be connected in a knowledge
coronavirus (SARS-CoV-2) in most countries across the world. With        graph [5–8]. The downside to this approach is that it requires the
millions of people stuck at home and accessing information via           user to either create a new knowledge graph for the task or use
social media platforms, there is an increasing concern about the         an existing one. Creating a new knowlege graph is often difficult
spread of misinformation regarding the pandemic.                         and are usually built with some human supervision [25]. However,
   In the recent years, we have experienced the proliferation of         Wang et al.[25] shows that deep language models like BERT [4]
websites and outlets that publish and perpetuate misinformation.         and the GPT models [2, 17] can be used to build knowledge graphs
However, with the pandemic and the US presidential elections in          directly. This suggests the models retain a lot of the knowledge
2020, it has become a larger problem than ever. The most effective       acquired from training on datasets.
method to counter this is human fact-checking. However, this often          Several recent state-of-the-art fake news detection models rely
requires domain expertise and can be prohibitively expensive. Do-        on a BERT architecture for processing text [12, 15, 27, 28]. While
main expertise was an especially large issue during the early stages     BERT tends to perform well for this task, a common issue is the
of the pandemic, when information about COVID-19 was limited             lack of explainability in its classification decisions.
and when conspiracy theories and snake oil “cures” propagated               In this work, we present a preliminary model that uses S-BERT
quickly.                                                                 [19] embeddings to construct a similarity matrix against a set of
   Fake news has been a large issue even before the start of the pan-    reference documents. This allows us to explain classification deci-
demic. For example, misinformation was widespread over Twitter           sions as a function of the article’s similarity to specific documents
during events like Hurricane Sandy [10] and the Boston Marathon          if we train an interpretable classifier like a random forest or logistic
bombings [9]. Studies have also shown that humans are bad at             regression.
detecting misinformation, the mean accuracy of 1,000 participants           While this model is relatively simple and can be further refined,
averaged over 100 runs being only 54% [20]. Furthermore, it has          we believe that this approach provides an interesting and useful
been shown that fake news spreads faster than real news [23],            step towards interpretable high-performance models.
making it even more important that we combat its spread.                    An overview of our contributions are shown here:
   On top of this, the recent spread of misinformation about COVID-           • Novel embedding scheme: We propose KI2 TE, a novel em-
19 poses some new issues. Information about the virus has been                   bedding scheme built on top of other embedding models.
sparse, especially during the start of the pandemic. This makes it            • Dataset collection: We gather over 100K news articles with
harder for the average person to differentiate between true and                  coarse labels.
false information. Information about the virus also evolves fairly            • Extensive evaluation: We evaluate the performance and
quickly.                                                                         explainability of KI2 TE on 3 different datasets.


KnOD'21 Workshop - April 14, 2021
Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
KnOD’21 Workshop, April 14, 2021, Virtual Event                                                           William Shiao and Evangelos E. Papalexakis


2  PROBLEM FORMULATION & PROPOSED                                        us to see which documents led to a specific classification decision
   METHOD                                                                in a random forest.
2.1 Problem Definition
                                                                         2.4    Compared to KNN
         Given                                                           At first glance, this approach may appear to be similar to a K-nearest-
            – a set A of labelled article/claims about COVID-19.         neighbors (KNN) classifier trained on the reference embedding
            – a set D of credible reference documents.                   matrix and used to classify articles. However, they are different and
         Classify each article/claim 𝑎 ∈ A as real or fake.              there are several key advantages of our approach:
         Explain the classification decision as a function of D.             (1) The reference data can be one-class data, like in our use case,
                                                                                 where all of the CORD articles are considered to be accurate.
                                                                             (2) In KNN, each of the 𝑘 nearest neighbors are considered to be
2.2        Proposed Method                                                       of equal importance, but each neighbor is assigned a different
We first embed each article/claim in A and each document in D into               weight in our approach.
a shared embedding space. We found using Sentence-BERT (SBERT)               (3) In KNN, only the 𝑘 nearest neighbors are considered for
[19] for this step led to the best results, but we also evaluate the             classification, but we consider all of the data points in our
performance of our method using FastText [11]. We then calculate                 approach.
the pairwise similarity between each article/claim and each refer-           However, one advantage of KNN over our approach is that KNN
ence document, which gives us a distance matrix M. Each row of M         scales better when there are more reference documents, especially
can be thought of as a new embedding for the corresponding article       if a database that supporting approximate nearest-neighbors is used.
in A. We then train a classifier on M. These steps are described as      We talk more about this limitation in Section 3.5, as well as ways
pseudocode in Algorithm 1 below.We evaluate our method using             to reduce its impact.
logistic regression and a random forest, both of which offer a good
balance between performance and interpretability.                        3     EXPERIMENTAL EVALUATION
                                                                         We evaluate the performance of our method along three aspects:
Algorithm 1 Given a set of articles and reference documents, re-
turns KI2 TE embeddings.                                                     (1) The classification accuracy and F1 score on the Google
    1: procedure KI2 TE(A, D)
                                                                                 FactCheck claims, the MM-COVID [13] dataset, and our
    2: EA ← ComputeEmbeddings(A)                                                 gathered set of news articles.
    3: ED ← ComputeEmbeddings(D)                                             (2) The explainability of our method.
 4:    for 𝑎𝑖 ∈ EA do                                                        (3) The sensitivity of our model with respect to the number of
 5:        for 𝑑 𝑗 ∈ ED do                                                       documents.
 6:            M𝑖,𝑗 ← dist(𝑎𝑖 , 𝑑 𝑗 )
 7:        end for                                                       3.1    Classification Performance
 8:    end for                                                           We evaluate the accuracy and F1 score of our model, and similar
 9:    return M                                                          baseline models on the 3 datasets described in Section 3.4. We also
10: end procedure                                                        evaluate them on 3 different pieces of the news dataset, as described
                                                                         in Section 3.2.

2.3        Model Explainability                                          3.2    Explanability
When trained with a interpretable classifier, this approach allows       In Fig. 2 we show a four classification results, with the top contrib-
us to explain classification decisions on an article with supporting     utors to each decision. Due to space and copyright considerations,
documents D. We evaluated our approach using two models: logistic        we provide only the titles of the articles and manuscripts. How-
regression and a random forest.                                          ever, results are taken from a model trained on subsets of the news
    Logistic regression trains a weight vector 𝑤 and bias 𝑏 such         articles focused on vaccine and transmission news.
that the cross-entropy is minimized. The magnitude of a weight 𝑤 𝑗          The reason for this is that only a small portion of the news
corresponds to the importance of a feature M𝑖,𝑗 in article A𝑖 . We can   articles contain information also present in CORD-19 documents.
find the importance of that feature in a classification decision with    Below are the titles of 5 articles that have poor explanability in our
𝑤 𝑗 ×M𝑖,𝑗 . Since M𝑖,𝑗 corresponds to the distance to document D𝑖 , we   model:
can see how much each document contributes to the classification             1) “Kevin Ferris: John Prine, thanks for the many blessings you
decision.                                                                        shared through your life and music”
    A random forest involves training a set of decision trees on             2) “San Bernardino County reports 4 more coronavirus deaths,
random samples of the training dataset. The classification results               146 new cases”
is the mode of the classification results of each of the trees in the        3) “Trump indicates he no longer has the coronavirus, says he
forest. The prediction function of a random forest can be written                is ‘immune’”
out in terms of the sum of feature contributions [21]. This allows           4) “Gary Neville slams EPL teams: Clubs are frightened”
KI2 TE: Knowledge-Infused InterpreTable Embeddings for COVID-19 Misinformation Detection                                                                     KnOD’21 Workshop, April 14, 2021, Virtual Event


                                           News                                      Claims                                      BERT + LR
                                                                                                                                 BERT + RF
            Model         Acc.                        F1                  Acc.                     F1                            FT + LR                               0.78
                                                                                                                         0.76    FT + RF
 BERT + KI2 TE + RF   0.729 ± 0.003           0.786 ± 0.002         0.921 ± 0.02          0.524 ± 0.479                                                                0.76
 BERT + RF            0.757 ± 0.004           0.809 ± 0.004         0.926 ± 0.015         0.03 ± 0.074
 BERT + KI2 TE + LR   0.742 ± 0.003           0.714 ± 0.054         0.913 ± 0.005         0.5 ± 0.498                    0.74
                                                                                                                                                                       0.74
 BERT + LR            0.791 ± 0.004           0.802 ± 0.036         0.904 ± 0.026         0.211 ± 0.064
 FT + KI2 TE + RF     0.714 ± 0.004           0.607 ± 0.008         0.912 ± 0.014         0.499 ± 0.5                                                                                                     BERT + LR




                                                                                                                  Accuracy
                                                                                                                         0.72                                          0.72                               BERT + RF
                      0.788 ± 0.004           0.804 ± 0.045         0.922 ± 0.012         0.051 ± 0.079




                                                                                                                                                                      F1
 FT + RF                                                                                                                                                                                                  FT + LR
 FT + KI2 TE + LR     0.773 ± 0.004           0.810 ± 0.003         0.907 ± 0.016         0.474 ± 0.52                                                                                                    FT + RF
 FT + LR              0.755 ± 0.004           0.724 ± 0.056         0.901 ± 0.02          0.0 ± 0.0                                                                    0.70
                                                                                                                         0.70
                                  MM-COVID                                      Filtered News                                                                          0.68
            Model         Acc.                        F1                  Acc.                     F1                    0.68
                                                                                                                                                                       0.66
 BERT + KI2 TE + RF   0.89 ± 0.011            0.778 ± 0.018         0.834 ± 0.01          0.906 ± 0.006
 BERT + RF            0.922 ± 0.005           0.948 ± 0.003         0.839 ± 0.01          0.908 ± 0.006                  0.66                                          0.64
 BERT + KI2 TE + LR   0.918 ± 0.008           0.846 ± 0.017         0.843 ± 0.008         0.91 ± 0.005                          102              103            104           102             103               104
                                                                                                                                       # of CORD Documents                          # of CORD Documents
 BERT + LR            0.943 ± 0.002           0.962 ± 0.001         0.847 ± 0.01          0.909 ± 0.007
 FT + KI2 TE + RF     0.853 ± 0.008           0.681 ± 0.021         0.828 ± 0.018         0.903 ± 0.011            Figure 3: Accuracy (left) and F1 score (right) as the number of refer-
 FT + RF              0.901 ± 0.004           0.935 ± 0.003         0.831 ± 0.012         0.904 ± 0.008            ence documents increases.
 FT + KI2 TE + LR     0.899 ± 0.01            0.806 ± 0.019         0.826 ± 0.008         0.903 ± 0.005
 FT + LR              0.864 ± 0.003           0.912 ± 0.004         0.822 ± 0.009         0.902 ± 0.005

                                Vaccine News                              Transmission News                          We can see that (1) is about the death of a celebrity from coron-
            Model         Acc.                        F1                  Acc.                     F1             avirus, and it is unlikely that any CORD-19 document would have
 BERT + KI2 TE + RF   0.865 ± 0.002           0.925 ± 0.001         0.79 ± 0.006          0.865 ± 0.005           a reference to it. (2) is about relatively small area in the U.S. and
 BERT + RF            0.866 ± 0.003           0.926 ± 0.002         0.805 ± 0.007         0.875 ± 0.006
 BERT + KI2 TE + LR   0.869 ± 0.003           0.926 ± 0.002         0.797 ± 0.014         0.866 ± 0.009
                                                                                                                  would likely not have any references to it in CORD-19. (3) is politi-
 BERT + LR            0.886 ± 0.002           0.934 ± 0.001         0.833 ± 0.013         0.889 ± 0.009           cal news and also likely does not have many CORD-19 references.
 FT + KI2 TE + RF     0.86 ± 0.002            0.923 ± 0.001         0.766 ± 0.01          0.855 ± 0.007           (4) is primarily sports news and does not contain information about
 FT + RF              0.876 ± 0.002           0.931 ± 0.001         0.807 ± 0.015         0.881 ± 0.01
 FT + KI2 TE + LR     0.873 ± 0.004           0.929 ± 0.002         0.751 ± 0.006         0.847 ± 0.004           the virus itself. (5) is about a specific restaurant and will not have
 FT + LR              0.853 ± 0.005           0.919 ± 0.003         0.725 ± 0.017         0.839 ± 0.011           any related information in CORD-19.
Table 1: Top: Results on the news, claims, and MM-COVID [14]                                                         However, KI2 TE still maintains similar accuracy to our baseline
datasets. Bottom: Results on samples of the news datasets. RF stands                                              models. This is because the document distances also serve as a
for random forest, LR stands for logistic regression, and FT stands                                               proxy to the raw embeddings, allowing it to maintain much of the
for FastText [11].                                                                                                information from the original BERT/FastText embeddings. However,
                                                                                                                  the explainability of our model suffers in this case. To resolve this,
                                                                       Face mask use        A biography of
                                                                                                                  we extract 3 versions from the news dataset.
 Fauci says likely                                Using the mask       in the general     coronaviruses from
 some degree of           This really is            – do’s and        population and      ibv to SARS-CoV-2,         We extract a filtered set, which has articles with sports teams
     aerosol              nothing like             don’ts in the     optimal resource          with their
                               flu                   COVID-19        allocation during       evolutionary         and popular cities/countries removed, and refer to it as the “Filtered
 transmission of
                                                      scenario         the COVID-19         paradigms and
 new coronavirus
                                                                         pandemic          pharmacological
                                                                                              challenges          News” dataset. We also extract only articles that contain the word
                                                                                                                  “vaccine” and call this the “Vaccine News” dataset. Finally, we extract
                                                      Public           War on Terror         Wild birds as
 What you need                                     awareness in       Cells: Strategies     reservoirs for        only articles that contain the word “transmission” and name it the
 to know about            Arboviruses              Egypt about         to Eradicate          diverse and
   coronavirus,            and their                 COVID-19              "Novel             abundant            “Transmission News” dataset. The purpose of the last two datasets is
 plus prevention            vectors                spread in the       Coronavirus"          gamma- and
   tips to slow                                   early phase of        Effectively        deltacoronavirus
      spread                                      the pandemic                                    es
                                                                                                                  to provide a smaller sample with articles focusing more on attributes
                                                                                                                  of the virus, rather than on other topics (like those shown above).
                                                                                                                  The performance of our models on these datasets are shown in
                                                                      Dietary recom-        COVID-19 and
 How to prepare          The Food Sys-                                 mendations          Food Safety: Risk      Table 1 above.
 and stock up for         tems in the             Arboviruses and       during the          Management
 the coronavirus           Era of the               their vectors     COVID-19 pan-        and Future Con-
    pandemic              Coronavirus                                     demic              siderations

                                                                                                                   3.3          Sensitivity to Number of Reference
                                                                                            Possibility of Fae-
                                                                                                                                Manuscripts
 You don't need                                   Sports balls as     Edible insects        cal-Oral Transmis-
  to obsessively          Viewpoint:                 potential       unlikely to con-
                                                                     tribute to trans-
                                                                                           sion of Novel Coro-
                                                                                                 navirus
                                                                                                                   We evaluate the accuracy and F1 score of KI2 TE as the number of
  disinfect your           Sars-cov-2               SARS-CoV-2                              (SARS-CoV-2) via
  groceries, and         (the cause of             transmission      mission of coro-        Consumption of        reference documents increase, and the results can be seen in Fig. 3.
      other               covid-19 in                 vectors             navirus         Contaminated Foods
 coronavirus tips                                                       SARS-CoV-2         of Animal Origin: A
   from experts                                                                                 Hypothesis         Generally, we can see that as the number of reference documents
Figure 2: Four sample classification results of real articles (only ti-
                                                                                                                   increase, the accuracy and F1 score of KI2 TE increases. However,
tles shown) with the top contributors (only titles shown) to the de-                                               increasing the number of reference documents has a diminishing
cision in a random forest classifier. The model was trained on the                                                 effect. Interestingly, the FastText-based models exhibit a large dip in
vaccine and transmission subsets of the news articles, as described                                                F1 score after about 1,000 documents, but it recovers and continues
in Section 3.2.                                                                                                    to increase.

                                                                                                                   3.4          Datasets
      5) “Coronavirus: Indian takeaway offering free toilet rolls with                                             In this section, we describe the steps involved in the data collection
         orders over £20”                                                                                          and filtering the news articles for analysis. We used five datasets
                                                                                                                   for this work.
KnOD’21 Workshop, April 14, 2021, Virtual Event                                                                       William Shiao and Evangelos E. Papalexakis


  We chose to crawl our own news datasets because we were                     subsample from this set of articles when training our models to
unable to find any up-to-date fake news datasets at the time of               reduce the class imbalance.
writing.
                                                                              3.4.4 Google FactCheck Dataset. We also downloaded COVID-19-
3.4.1 CORD-19. The first dataset we used was the COVID-19 Open                related claims from the Google FactCheck API4 . These claims are
Research Dataset (CORD-19) [26], which is a growing collection of             gathered from a variety of fact-checking companies and are checked
scientific paper prepared by the White House in partnership with              by humans. Each claim consists of a single sentence (or rarely,
leading research groups characterizing the wide range of literature           several sentences) and a rating from a fact checking agency. This
related to coronaviruses.                                                     rating does not necessary follow any particular format and can
   It consists of over 200,000 documents, of which 100,000 have a             range from “Fake” to other, less clear ratings like “Needs Context”
PDF parse of their full text. Although not all of these documents             or “Missing Context”. We chose to exclude those ambigious claims
have undergone peer review and includes preprints from sites like             from the dataset. This led to a total of 739 claims, of which 97 are
bioRxiv, we still consider this to be a relatively credible source of         true, and 642 are false/misleading. While there is a heavily class
information about the virus.                                                  imbalance and it is a small dataset, we chose to include this in our
                                                                              evaluation to test our model’s performance on small datasets with
3.4.2 Fake News Dataset. We crawled sites from NewsGuard’s
                                                                              accurate labels.
Misinformation Tracking Center1 for our fake news dataset. News-
Guard is an organization that rates the trustworthiness of websites           3.4.5 MM-COVID Dataset. We also used the Multilingual and Mul-
that share information online based on their credibility and trans-           tidimensional COVID-19 Fake News Data Repository (MM-COVID)
parency. We crawled on the sites based in the United States to                dataset [13], which contains fake news from 6 different languages.
ensure that we crawled only English language sites. We also only              However, we only focus on the English portion of the dataset. The
crawled the sites with sitemaps to ensure that all of the crawled             news articles are labelled by Snopes 5 and Poynter 6 , both of which
pages were in fact news articles, not other pages, like store pages.          are fact-checking companies that use human fact-checkers. The
   We also made the assumption that all articles on any of those sites        MM-COVID dataset also includes tweets and replies to those tweets,
were considered fake news. While this is a very strong assumption,            but we only use the text of the articles in the dataset.
we could not come up with a better method for labeling individual
articles. We used the Newspaper3k2 Python library library to extract          3.5     Limitations
article metadata and content.                                                 One limitation of this method is that a new feature is added for
   We chose to scrape only COVID-19-related news articles by filter-          each new reference document. This can significantly reduce the
ing the crawled articles by keywords like “COVID” or “coronavirus”.           performance of the classifier and greatly increase the distance ma-
We also removed duplicate lines (where a line is an HTML tag, not             trix time calculation when the number of reference documents is
a sentence) from the plain text of the articles. This helps prevent           large. The simplest way to mitigate this would be to simply use
pages with fixed headers or taglines appearing in the document                a random sample of reference documents, but there may be very
text. Otherwise, articles with mentions of keywords in the header             similar reference documents selected, which would not improve
or footer would also be included in the crawl.                                the performance or interpretability of the model. Another way to
   Certain properties of these sites made them difficult to crawl.            mitigate it would be to use standard feature selection methods (like
Some sites mixed in abstracts of academic papers in with their                Lasso [22]), but this still requires the calculation of the distance
articles to lend credibility. Other sites mixed in articles from Reuters      matrix across all reference documents.
or the Associated Press (AP), both of which we consider reliable                 One solution for this is to run k-means++ [1] on ED and set 𝑘
sources. The Newspaper3k library also tended to perform worse                 to the number of reference documents we want to use. Then, we
at extracting the content of the articles, likely because the library         can select the nearest neighbor to each of the 𝑘 centroids of the
was mainly tested on more mainstream news websites.                           clusters. This leaves us with 𝑘 reference documents, each of which
   Many of these sites also had other purposes in addition to provid-         theoretically represents a different part of the embedding space.
ing news articles. Some of them sold alternative medicinal products           This helps reduce the chance of similar documents being selected.
like colloidal silver. Others also had videos in addition to their text
articles. We did our best to clean this data, but it is possible that         4     RELATED WORK
some of these issues are still present in the data. After cleaning, we
                                                                              There has been a lot of work in the area of fake news detection
were left with around 20K fake news articles.
                                                                              and models use a variety of different methods. These methods can
3.4.3 Real News Dataset. We used the list from the B.S. Detector              generally be grouped into four categories: knowledge-based, style-
Chrome extension3 to pick the reputable sites. We then collected ar-          based, propagation-based, and source-based models [29].
ticles from all of the matching sites from the Common Crawl News                 Knowledge-based models often attempt to compare the claims
archive [16]. After that, the HTML for each article was processed             in a news article against facts stored in a knowledge base (KB) or
in the same manner as the fake news dataset. We gathered over                 knowledge graph (KG). Knowledge graphs are commonly repre-
95K articles that mention the novel coronavirus, but we randomly              sented as a set of subject-predicate-object triples, where the subject
1 https://www.newsguardtech.com/coronavirus-misinformation-tracking-center/   4 https://toolbox.google.com/factcheck/apis
2 https://github.com/codelucas/newspaper                                      5 https://www.snopes.com/
3 https://gitlab.com/bs-detector/bs-detector                                  6 https://www.poynter.org/
KI2 TE: Knowledge-Infused InterpreTable Embeddings for COVID-19 Misinformation Detection                                     KnOD’21 Workshop, April 14, 2021, Virtual Event


and object map to entities, which are typically represented as nodes.                   [3] Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R Hr-
These models often predict the probability of triples existing in this                      uschka, and Tom M Mitchell. [n.d.]. Toward an Architecture for Never-Ending
                                                                                            Language Learning. Technical Report. www.aaai.org
graph and use that to determine the accuracy of a statement [5–                         [4] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT:
7]. These knowledge graphs can be single-source, which uses a                               Pre-training of Deep Bidirectional Transformers for Language Understanding.
                                                                                            (10 2018). http://arxiv.org/abs/1810.04805
knowledge graph from a single source, or open-source, where the                         [5] Xin Luna Dong, Christos Faloutsos, Xian Li, Subhabrata Mukherjee, and Prashant
knowledge graph is created by merging data from multiple sources                            Shiralkar. 2018. 3-FactCheckingGraph - Google Slides. https://docs.google.com/
[29]. The downside of a knowledge-graph-based approach is that                              presentation/d/1JudymfQC14vpGdQY6nOodIcmOpec1vSnMa38FnSZiVY/edit#
                                                                                            slide=id.g3fc8173fac_2_79
it requires a knowledge graph, which is non-trivial to construct.                       [6] Valeria Fionda and Giuseppe Pirrò. 2017. Fact Checking via Evidence Patterns.
Many existing public knowledge graphs, like Wikidata [24], YAGO                             Technical Report.
[18], and NELL [3], required some level of human supervision to                         [7] Mohamed H Gad-Elrab, Daria Stepanova, Jacopo Urbani, and Gerhard Weikum.
                                                                                            2019. Tracy: Tracing Facts over Knowledge Graphs and Text. (2019). https:
construct.                                                                                  //doi.org/10.1145/nnnnnnn.nnnnnnn
    Style-based models attempt to look at the style with which the                      [8] Matthew Gardner, Tom Mitchell, William Cohen, Christos Faloutsos, and Antoine
                                                                                            Bordes. [n.d.]. Reading and Reasoning with Knowledge Graphs. Technical Report.
article was written to assess the intentions of the author, with the                        www.lti.cs.cmu.edu
assumption that fake news articles are written differently than                         [9] Aditi Gupta, Hemank Lamba, and Ponnurangam Kumaraguru. 2013. $1.00 per RT
authentic news articles. Propagation-based models look at how a                             #BostonMarathon #PrayForBoston: Analyzing fake content on twitter. In eCrime
                                                                                            Researchers Summit, eCrime. IEEE Computer Society. https://doi.org/10.1109/
news article spreads and works on a news cascade [29] or graph                              eCRS.2013.6805772
representation of that. Source-based models look focus on the au-                      [10] Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi.
thor and publisher of new articles, with the assumption that many                           [n.d.]. Faking Sandy: Characterizing and Identifying Fake Images on Twitter during
                                                                                            Hurricane Sandy. http://www.guardian.co.uk/world/us-news-
fake news items tend to come from the same sources. While we use                       [11] Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag
the term article, all of these methods are applicable to, and been                          of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of
                                                                                            the European Chapter of the Association for Computational Linguistics: Volume 2,
applied to other mediums, like social media posts.                                          Short Papers. Association for Computational Linguistics, 427–431.
                                                                                       [12] Heejung Jwa, Dongsuk Oh, Kinam Park, Jang Kang, and Hueiseok Lim. 2019.
5    CONCLUSION                                                                             exBAKE: Automatic Fake News Detection Model Based on Bidirectional Encoder
                                                                                            Representations from Transformers (BERT). Applied Sciences 9, 19 (9 2019), 4062.
In this work, we propose a similarity matrix-based embedding                                https://doi.org/10.3390/app9194062
method: KI2 TE, which allows us to interpret the decisions of embedding-               [13] Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. [n.d.]. MM-COVID: A Multilin-
                                                                                            gual and Multimodal Data Repository for Combating COVID-19 Disinformation.
based models by linking them to a set of reference documents.                               Technical Report. www.newsguardtech.com
We gather a coarsely-labelled dataset of news articles and human-                      [14] Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. 2020. MM-COVID: A Multilin-
labelled claims. We also evaluate our model on the MM-COVID                                 gual and Multimodal Data Repository for Combating COVID-19 Disinformation.
                                                                                            (11 2020). http://arxiv.org/abs/2011.04088
[14] dataset. We show that our model has similar performance to                        [15] Chao Liu, Xinghua Wu, Min Yu, Gang Li, Jianguo Jiang, Weiqing Huang, and
baseline methods, with the added benefit of explainability on some                          Xiang Lu. 2019. A Two-Stage Model Based on BERT for Short Fake News De-
                                                                                            tection. In Lecture Notes in Computer Science (including subseries Lecture Notes
classification decisions.                                                                   in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 11776 LNAI.
                                                                                            Springer, 172–183. https://doi.org/10.1007/978-3-030-29563-9{_}17
6    ACKNOWLEDGEMENTS                                                                  [16] Joel Mackenzie, Rodger Benham, Matthias Petri, Johanne R. Trippas, J. Shane
                                                                                            Culpepper, and Alistair Moffat. 2020. CC-News-En: A Large English News
The authors would like to thank Rutuja Gurav, Pravallika Devineni,                          Corpus. In Proceedings of the 29th ACM International Conference on Information
and Sara Abdali for their valuable help and feedback. Research was                          & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for
                                                                                            Computing Machinery, New York, NY, USA, 3077–3084. https://doi.org/10.1145/
partially supported by the National Science Foundation Grant no.                            3340531.3412762
1901379 and a UCR Regents Faculty Fellowship. This research was                        [17] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya
partially sponsored by the U.S. Army Combat Capabilities Develop-                           Sutskever. [n.d.]. Language Models are Unsupervised Multitask Learners. Technical
                                                                                            Report. https://github.com/codelucas/newspaper
ment Command Army Research Laboratory and was accomplished                             [18] Thomas Rebele, Fabian Suchanek, Johannes Hoffart, Joanna Biega, Erdal Kuzey,
under Cooperative Agreement Number W911NF-13-2-0045 (ARL                                    and Gerhard Weikum. 2016. YAGO: A multilingual knowledge base from
                                                                                            wikipedia, wordnet, and geonames. In Lecture Notes in Computer Science (including
Cyber Security CRA). The views and conclusions contained in this                            subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),
document are those of the authors and should not be interpreted                             Vol. 9982 LNCS. Springer Verlag, 177–185. https://doi.org/10.1007/978-3-319-
as representing the official policies, either expressed or implied, of                      46547-0{_}19
                                                                                       [19] Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings
the Combat Capabilities Development Command Army Research                                   using Siamese BERT-Networks. EMNLP-IJCNLP 2019 - 2019 Conference on Empiri-
Laboratory or the U.S. Government. The U.S. Government is autho-                            cal Methods in Natural Language Processing and 9th International Joint Conference
rized to reproduce and distribute reprints for Government purposes                          on Natural Language Processing, Proceedings of the Conference (8 2019), 3982–3992.
                                                                                            http://arxiv.org/abs/1908.10084
notwithstanding any copyright notation here on.                                        [20] Victoria L. Rubin. 2010. On deception and deception detection: Content analysis
                                                                                            of computer-mediated stated beliefs. In Proceedings of the ASIST Annual Meeting,
REFERENCES                                                                                  Vol. 47. https://doi.org/10.1002/meet.14504701124
                                                                                       [21] Ando Saabas. 2014. Interpreting random forests | Diving into data.                http:
 [1] David Arthur and Sergei Vassilvitskii. 2007. K-means++: The advantages of              //blog.datadive.net/interpreting-random-forests/
     careful seeding. In Proceedings of the Annual ACM-SIAM Symposium on Discrete      [22] Robert Tibshirani. 1996. Regression Shrinkage and Selection Via the Lasso.
     Algorithms, Vol. 07-09-January-2007.                                                   Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996).
 [2] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan,                https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
     Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda        [23] Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false
     Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan,          news online. Science 359, 6380 (3 2018), 1146–1151. https://doi.org/10.1126/
     Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter,             science.aap9559
     Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin   [24] Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A free collaborative
     Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya              knowledgebase. Commun. ACM 57, 10 (9 2014), 78–85. https://doi.org/10.1145/
     Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners.
     arXiv (5 2020). http://arxiv.org/abs/2005.14165
KnOD’21 Workshop, April 14, 2021, Virtual Event                                                                              William Shiao and Evangelos E. Papalexakis


     2629489                                                                         [27] Kai-Chou Yang, Timothy Niven, and Hung-Yu Kao. 2019. Fake News Detection
[25] Chenguang Wang, Xiao Liu, and Dawn Song. 2020. Language Models are Open              as Natural Language Inference. arXiv (7 2019). http://arxiv.org/abs/1907.07347
     Knowledge Graphs. (10 2020). http://arxiv.org/abs/2010.11967                    [28] Tong Zhang, Di Wang, Huanhuan Chen, Zhiwei Zeng, Wei Guo, Chunyan Miao,
[26] Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang,        and Lizhen Cui. 2020. BDANN: BERT-Based Domain Adaptation Neural Network
     Darrin Eide, Kathryn Funk, Rodney Kinney, Ziyang Liu, William Merrill, Paul          for Multi-Modal Fake News Detection. In Proceedings of the International Joint
     Mooney, Dewey Murdick, Devvret Rishi, Jerry Sheehan, Zhihong Shen, Brandon           Conference on Neural Networks. Institute of Electrical and Electronics Engineers
     Stilson, Alex D. Wade, Kuansan Wang, Chris Wilhelm, Boya Xie, Douglas Ray-           Inc. https://doi.org/10.1109/IJCNN48605.2020.9206973
     mond, Daniel S. Weld, Oren Etzioni, and Sebastian Kohlmeier. 2020. CORD-19:     [29] Xinyi Zhou and Reza Zafarani. 2018. A Survey of Fake News: Fundamental
     The Covid-19 Open Research Dataset. arXiv:2004.10706 [cs.DL]                         Theories, Detection Methods, and Opportunities. Comput. Surveys 53, 5 (12 2018).
                                                                                          https://doi.org/10.1145/3395046