=Paper=
{{Paper
|id=Vol-2877/paper1
|storemode=property
|title=KI2TE: Knowledge-Infused InterpreTable Embeddings for COVID-19 Misinformation Detection
|pdfUrl=https://ceur-ws.org/Vol-2877/paper1.pdf
|volume=Vol-2877
|authors=William Shiao,Evangelos E. Papalexakis
|dblpUrl=https://dblp.org/rec/conf/www/ShiaoP21
}}
==KI2TE: Knowledge-Infused InterpreTable Embeddings for COVID-19 Misinformation Detection==
KI2TE: Knowledge-Infused InterpreTable Embeddings for COVID-19 Misinformation Detection William Shiao Evangelos E. Papalexakis University of California Riverside University of California Riverside wshia002@ucr.edu epapalex@cs.ucr.edu Articles/Claims Shared Embedding Space ABSTRACT EA Reference Document As COVID-19 continues to spread across the world, concerns re- Article/Claim Embeddings garding the spread of misinformation about it are also growing. Embedding ƒ M Article Model ƒ Real/Fake In this work, we propose a preliminary novel method to identify Pairwise Similarity/Distance Classifier fake articles and claims by using information from the CORD-19 ED Distance Matrix academic paper dataset. Our method uses the similarity between Document Embeddings Reference Documents articles and reference manuscripts in a shared embedding space Figure 1: An illustration of the classification stages. to classify the articles. This also provides an explanation for each classification decision that links a particular article or claim to a small number of research manuscripts that influence the decision. We collect 90K real articles and 20K fake articles about the coron- avirus, as well as over 700 human-labelled claims from the Google FactCheck API, and evaluate its performance on these datasets. We also evaluate its performance on MM-COVID [13], a recent COVID-19 news dataset. We demonstrate the explainability of our model and discuss its limitations. Many different approaches for fake news classification have 1 INTRODUCTION been proposed. One class of approaches revolve around checking The current time dictates an unprecedented outbreak of the novel whether or not statements are likely to be connected in a knowledge coronavirus (SARS-CoV-2) in most countries across the world. With graph [5–8]. The downside to this approach is that it requires the millions of people stuck at home and accessing information via user to either create a new knowledge graph for the task or use social media platforms, there is an increasing concern about the an existing one. Creating a new knowlege graph is often difficult spread of misinformation regarding the pandemic. and are usually built with some human supervision [25]. However, In the recent years, we have experienced the proliferation of Wang et al.[25] shows that deep language models like BERT [4] websites and outlets that publish and perpetuate misinformation. and the GPT models [2, 17] can be used to build knowledge graphs However, with the pandemic and the US presidential elections in directly. This suggests the models retain a lot of the knowledge 2020, it has become a larger problem than ever. The most effective acquired from training on datasets. method to counter this is human fact-checking. However, this often Several recent state-of-the-art fake news detection models rely requires domain expertise and can be prohibitively expensive. Do- on a BERT architecture for processing text [12, 15, 27, 28]. While main expertise was an especially large issue during the early stages BERT tends to perform well for this task, a common issue is the of the pandemic, when information about COVID-19 was limited lack of explainability in its classification decisions. and when conspiracy theories and snake oil “cures” propagated In this work, we present a preliminary model that uses S-BERT quickly. [19] embeddings to construct a similarity matrix against a set of Fake news has been a large issue even before the start of the pan- reference documents. This allows us to explain classification deci- demic. For example, misinformation was widespread over Twitter sions as a function of the article’s similarity to specific documents during events like Hurricane Sandy [10] and the Boston Marathon if we train an interpretable classifier like a random forest or logistic bombings [9]. Studies have also shown that humans are bad at regression. detecting misinformation, the mean accuracy of 1,000 participants While this model is relatively simple and can be further refined, averaged over 100 runs being only 54% [20]. Furthermore, it has we believe that this approach provides an interesting and useful been shown that fake news spreads faster than real news [23], step towards interpretable high-performance models. making it even more important that we combat its spread. An overview of our contributions are shown here: On top of this, the recent spread of misinformation about COVID- • Novel embedding scheme: We propose KI2 TE, a novel em- 19 poses some new issues. Information about the virus has been bedding scheme built on top of other embedding models. sparse, especially during the start of the pandemic. This makes it • Dataset collection: We gather over 100K news articles with harder for the average person to differentiate between true and coarse labels. false information. Information about the virus also evolves fairly • Extensive evaluation: We evaluate the performance and quickly. explainability of KI2 TE on 3 different datasets. KnOD'21 Workshop - April 14, 2021 Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). KnOD’21 Workshop, April 14, 2021, Virtual Event William Shiao and Evangelos E. Papalexakis 2 PROBLEM FORMULATION & PROPOSED us to see which documents led to a specific classification decision METHOD in a random forest. 2.1 Problem Definition 2.4 Compared to KNN Given At first glance, this approach may appear to be similar to a K-nearest- – a set A of labelled article/claims about COVID-19. neighbors (KNN) classifier trained on the reference embedding – a set D of credible reference documents. matrix and used to classify articles. However, they are different and Classify each article/claim 𝑎 ∈ A as real or fake. there are several key advantages of our approach: Explain the classification decision as a function of D. (1) The reference data can be one-class data, like in our use case, where all of the CORD articles are considered to be accurate. (2) In KNN, each of the 𝑘 nearest neighbors are considered to be 2.2 Proposed Method of equal importance, but each neighbor is assigned a different We first embed each article/claim in A and each document in D into weight in our approach. a shared embedding space. We found using Sentence-BERT (SBERT) (3) In KNN, only the 𝑘 nearest neighbors are considered for [19] for this step led to the best results, but we also evaluate the classification, but we consider all of the data points in our performance of our method using FastText [11]. We then calculate approach. the pairwise similarity between each article/claim and each refer- However, one advantage of KNN over our approach is that KNN ence document, which gives us a distance matrix M. Each row of M scales better when there are more reference documents, especially can be thought of as a new embedding for the corresponding article if a database that supporting approximate nearest-neighbors is used. in A. We then train a classifier on M. These steps are described as We talk more about this limitation in Section 3.5, as well as ways pseudocode in Algorithm 1 below.We evaluate our method using to reduce its impact. logistic regression and a random forest, both of which offer a good balance between performance and interpretability. 3 EXPERIMENTAL EVALUATION We evaluate the performance of our method along three aspects: Algorithm 1 Given a set of articles and reference documents, re- turns KI2 TE embeddings. (1) The classification accuracy and F1 score on the Google 1: procedure KI2 TE(A, D) FactCheck claims, the MM-COVID [13] dataset, and our 2: EA ← ComputeEmbeddings(A) gathered set of news articles. 3: ED ← ComputeEmbeddings(D) (2) The explainability of our method. 4: for 𝑎𝑖 ∈ EA do (3) The sensitivity of our model with respect to the number of 5: for 𝑑 𝑗 ∈ ED do documents. 6: M𝑖,𝑗 ← dist(𝑎𝑖 , 𝑑 𝑗 ) 7: end for 3.1 Classification Performance 8: end for We evaluate the accuracy and F1 score of our model, and similar 9: return M baseline models on the 3 datasets described in Section 3.4. We also 10: end procedure evaluate them on 3 different pieces of the news dataset, as described in Section 3.2. 2.3 Model Explainability 3.2 Explanability When trained with a interpretable classifier, this approach allows In Fig. 2 we show a four classification results, with the top contrib- us to explain classification decisions on an article with supporting utors to each decision. Due to space and copyright considerations, documents D. We evaluated our approach using two models: logistic we provide only the titles of the articles and manuscripts. How- regression and a random forest. ever, results are taken from a model trained on subsets of the news Logistic regression trains a weight vector 𝑤 and bias 𝑏 such articles focused on vaccine and transmission news. that the cross-entropy is minimized. The magnitude of a weight 𝑤 𝑗 The reason for this is that only a small portion of the news corresponds to the importance of a feature M𝑖,𝑗 in article A𝑖 . We can articles contain information also present in CORD-19 documents. find the importance of that feature in a classification decision with Below are the titles of 5 articles that have poor explanability in our 𝑤 𝑗 ×M𝑖,𝑗 . Since M𝑖,𝑗 corresponds to the distance to document D𝑖 , we model: can see how much each document contributes to the classification 1) “Kevin Ferris: John Prine, thanks for the many blessings you decision. shared through your life and music” A random forest involves training a set of decision trees on 2) “San Bernardino County reports 4 more coronavirus deaths, random samples of the training dataset. The classification results 146 new cases” is the mode of the classification results of each of the trees in the 3) “Trump indicates he no longer has the coronavirus, says he forest. The prediction function of a random forest can be written is ‘immune’” out in terms of the sum of feature contributions [21]. This allows 4) “Gary Neville slams EPL teams: Clubs are frightened” KI2 TE: Knowledge-Infused InterpreTable Embeddings for COVID-19 Misinformation Detection KnOD’21 Workshop, April 14, 2021, Virtual Event News Claims BERT + LR BERT + RF Model Acc. F1 Acc. F1 FT + LR 0.78 0.76 FT + RF BERT + KI2 TE + RF 0.729 ± 0.003 0.786 ± 0.002 0.921 ± 0.02 0.524 ± 0.479 0.76 BERT + RF 0.757 ± 0.004 0.809 ± 0.004 0.926 ± 0.015 0.03 ± 0.074 BERT + KI2 TE + LR 0.742 ± 0.003 0.714 ± 0.054 0.913 ± 0.005 0.5 ± 0.498 0.74 0.74 BERT + LR 0.791 ± 0.004 0.802 ± 0.036 0.904 ± 0.026 0.211 ± 0.064 FT + KI2 TE + RF 0.714 ± 0.004 0.607 ± 0.008 0.912 ± 0.014 0.499 ± 0.5 BERT + LR Accuracy 0.72 0.72 BERT + RF 0.788 ± 0.004 0.804 ± 0.045 0.922 ± 0.012 0.051 ± 0.079 F1 FT + RF FT + LR FT + KI2 TE + LR 0.773 ± 0.004 0.810 ± 0.003 0.907 ± 0.016 0.474 ± 0.52 FT + RF FT + LR 0.755 ± 0.004 0.724 ± 0.056 0.901 ± 0.02 0.0 ± 0.0 0.70 0.70 MM-COVID Filtered News 0.68 Model Acc. F1 Acc. F1 0.68 0.66 BERT + KI2 TE + RF 0.89 ± 0.011 0.778 ± 0.018 0.834 ± 0.01 0.906 ± 0.006 BERT + RF 0.922 ± 0.005 0.948 ± 0.003 0.839 ± 0.01 0.908 ± 0.006 0.66 0.64 BERT + KI2 TE + LR 0.918 ± 0.008 0.846 ± 0.017 0.843 ± 0.008 0.91 ± 0.005 102 103 104 102 103 104 # of CORD Documents # of CORD Documents BERT + LR 0.943 ± 0.002 0.962 ± 0.001 0.847 ± 0.01 0.909 ± 0.007 FT + KI2 TE + RF 0.853 ± 0.008 0.681 ± 0.021 0.828 ± 0.018 0.903 ± 0.011 Figure 3: Accuracy (left) and F1 score (right) as the number of refer- FT + RF 0.901 ± 0.004 0.935 ± 0.003 0.831 ± 0.012 0.904 ± 0.008 ence documents increases. FT + KI2 TE + LR 0.899 ± 0.01 0.806 ± 0.019 0.826 ± 0.008 0.903 ± 0.005 FT + LR 0.864 ± 0.003 0.912 ± 0.004 0.822 ± 0.009 0.902 ± 0.005 Vaccine News Transmission News We can see that (1) is about the death of a celebrity from coron- Model Acc. F1 Acc. F1 avirus, and it is unlikely that any CORD-19 document would have BERT + KI2 TE + RF 0.865 ± 0.002 0.925 ± 0.001 0.79 ± 0.006 0.865 ± 0.005 a reference to it. (2) is about relatively small area in the U.S. and BERT + RF 0.866 ± 0.003 0.926 ± 0.002 0.805 ± 0.007 0.875 ± 0.006 BERT + KI2 TE + LR 0.869 ± 0.003 0.926 ± 0.002 0.797 ± 0.014 0.866 ± 0.009 would likely not have any references to it in CORD-19. (3) is politi- BERT + LR 0.886 ± 0.002 0.934 ± 0.001 0.833 ± 0.013 0.889 ± 0.009 cal news and also likely does not have many CORD-19 references. FT + KI2 TE + RF 0.86 ± 0.002 0.923 ± 0.001 0.766 ± 0.01 0.855 ± 0.007 (4) is primarily sports news and does not contain information about FT + RF 0.876 ± 0.002 0.931 ± 0.001 0.807 ± 0.015 0.881 ± 0.01 FT + KI2 TE + LR 0.873 ± 0.004 0.929 ± 0.002 0.751 ± 0.006 0.847 ± 0.004 the virus itself. (5) is about a specific restaurant and will not have FT + LR 0.853 ± 0.005 0.919 ± 0.003 0.725 ± 0.017 0.839 ± 0.011 any related information in CORD-19. Table 1: Top: Results on the news, claims, and MM-COVID [14] However, KI2 TE still maintains similar accuracy to our baseline datasets. Bottom: Results on samples of the news datasets. RF stands models. This is because the document distances also serve as a for random forest, LR stands for logistic regression, and FT stands proxy to the raw embeddings, allowing it to maintain much of the for FastText [11]. information from the original BERT/FastText embeddings. However, the explainability of our model suffers in this case. To resolve this, Face mask use A biography of we extract 3 versions from the news dataset. Fauci says likely Using the mask in the general coronaviruses from some degree of This really is – do’s and population and ibv to SARS-CoV-2, We extract a filtered set, which has articles with sports teams aerosol nothing like don’ts in the optimal resource with their flu COVID-19 allocation during evolutionary and popular cities/countries removed, and refer to it as the “Filtered transmission of scenario the COVID-19 paradigms and new coronavirus pandemic pharmacological challenges News” dataset. We also extract only articles that contain the word “vaccine” and call this the “Vaccine News” dataset. Finally, we extract Public War on Terror Wild birds as What you need awareness in Cells: Strategies reservoirs for only articles that contain the word “transmission” and name it the to know about Arboviruses Egypt about to Eradicate diverse and coronavirus, and their COVID-19 "Novel abundant “Transmission News” dataset. The purpose of the last two datasets is plus prevention vectors spread in the Coronavirus" gamma- and tips to slow early phase of Effectively deltacoronavirus spread the pandemic es to provide a smaller sample with articles focusing more on attributes of the virus, rather than on other topics (like those shown above). The performance of our models on these datasets are shown in Dietary recom- COVID-19 and How to prepare The Food Sys- mendations Food Safety: Risk Table 1 above. and stock up for tems in the Arboviruses and during the Management the coronavirus Era of the their vectors COVID-19 pan- and Future Con- pandemic Coronavirus demic siderations 3.3 Sensitivity to Number of Reference Possibility of Fae- Manuscripts You don't need Sports balls as Edible insects cal-Oral Transmis- to obsessively Viewpoint: potential unlikely to con- tribute to trans- sion of Novel Coro- navirus We evaluate the accuracy and F1 score of KI2 TE as the number of disinfect your Sars-cov-2 SARS-CoV-2 (SARS-CoV-2) via groceries, and (the cause of transmission mission of coro- Consumption of reference documents increase, and the results can be seen in Fig. 3. other covid-19 in vectors navirus Contaminated Foods coronavirus tips SARS-CoV-2 of Animal Origin: A from experts Hypothesis Generally, we can see that as the number of reference documents Figure 2: Four sample classification results of real articles (only ti- increase, the accuracy and F1 score of KI2 TE increases. However, tles shown) with the top contributors (only titles shown) to the de- increasing the number of reference documents has a diminishing cision in a random forest classifier. The model was trained on the effect. Interestingly, the FastText-based models exhibit a large dip in vaccine and transmission subsets of the news articles, as described F1 score after about 1,000 documents, but it recovers and continues in Section 3.2. to increase. 3.4 Datasets 5) “Coronavirus: Indian takeaway offering free toilet rolls with In this section, we describe the steps involved in the data collection orders over £20” and filtering the news articles for analysis. We used five datasets for this work. KnOD’21 Workshop, April 14, 2021, Virtual Event William Shiao and Evangelos E. Papalexakis We chose to crawl our own news datasets because we were subsample from this set of articles when training our models to unable to find any up-to-date fake news datasets at the time of reduce the class imbalance. writing. 3.4.4 Google FactCheck Dataset. We also downloaded COVID-19- 3.4.1 CORD-19. The first dataset we used was the COVID-19 Open related claims from the Google FactCheck API4 . These claims are Research Dataset (CORD-19) [26], which is a growing collection of gathered from a variety of fact-checking companies and are checked scientific paper prepared by the White House in partnership with by humans. Each claim consists of a single sentence (or rarely, leading research groups characterizing the wide range of literature several sentences) and a rating from a fact checking agency. This related to coronaviruses. rating does not necessary follow any particular format and can It consists of over 200,000 documents, of which 100,000 have a range from “Fake” to other, less clear ratings like “Needs Context” PDF parse of their full text. Although not all of these documents or “Missing Context”. We chose to exclude those ambigious claims have undergone peer review and includes preprints from sites like from the dataset. This led to a total of 739 claims, of which 97 are bioRxiv, we still consider this to be a relatively credible source of true, and 642 are false/misleading. While there is a heavily class information about the virus. imbalance and it is a small dataset, we chose to include this in our evaluation to test our model’s performance on small datasets with 3.4.2 Fake News Dataset. We crawled sites from NewsGuard’s accurate labels. Misinformation Tracking Center1 for our fake news dataset. News- Guard is an organization that rates the trustworthiness of websites 3.4.5 MM-COVID Dataset. We also used the Multilingual and Mul- that share information online based on their credibility and trans- tidimensional COVID-19 Fake News Data Repository (MM-COVID) parency. We crawled on the sites based in the United States to dataset [13], which contains fake news from 6 different languages. ensure that we crawled only English language sites. We also only However, we only focus on the English portion of the dataset. The crawled the sites with sitemaps to ensure that all of the crawled news articles are labelled by Snopes 5 and Poynter 6 , both of which pages were in fact news articles, not other pages, like store pages. are fact-checking companies that use human fact-checkers. The We also made the assumption that all articles on any of those sites MM-COVID dataset also includes tweets and replies to those tweets, were considered fake news. While this is a very strong assumption, but we only use the text of the articles in the dataset. we could not come up with a better method for labeling individual articles. We used the Newspaper3k2 Python library library to extract 3.5 Limitations article metadata and content. One limitation of this method is that a new feature is added for We chose to scrape only COVID-19-related news articles by filter- each new reference document. This can significantly reduce the ing the crawled articles by keywords like “COVID” or “coronavirus”. performance of the classifier and greatly increase the distance ma- We also removed duplicate lines (where a line is an HTML tag, not trix time calculation when the number of reference documents is a sentence) from the plain text of the articles. This helps prevent large. The simplest way to mitigate this would be to simply use pages with fixed headers or taglines appearing in the document a random sample of reference documents, but there may be very text. Otherwise, articles with mentions of keywords in the header similar reference documents selected, which would not improve or footer would also be included in the crawl. the performance or interpretability of the model. Another way to Certain properties of these sites made them difficult to crawl. mitigate it would be to use standard feature selection methods (like Some sites mixed in abstracts of academic papers in with their Lasso [22]), but this still requires the calculation of the distance articles to lend credibility. Other sites mixed in articles from Reuters matrix across all reference documents. or the Associated Press (AP), both of which we consider reliable One solution for this is to run k-means++ [1] on ED and set 𝑘 sources. The Newspaper3k library also tended to perform worse to the number of reference documents we want to use. Then, we at extracting the content of the articles, likely because the library can select the nearest neighbor to each of the 𝑘 centroids of the was mainly tested on more mainstream news websites. clusters. This leaves us with 𝑘 reference documents, each of which Many of these sites also had other purposes in addition to provid- theoretically represents a different part of the embedding space. ing news articles. Some of them sold alternative medicinal products This helps reduce the chance of similar documents being selected. like colloidal silver. Others also had videos in addition to their text articles. We did our best to clean this data, but it is possible that 4 RELATED WORK some of these issues are still present in the data. After cleaning, we There has been a lot of work in the area of fake news detection were left with around 20K fake news articles. and models use a variety of different methods. These methods can 3.4.3 Real News Dataset. We used the list from the B.S. Detector generally be grouped into four categories: knowledge-based, style- Chrome extension3 to pick the reputable sites. We then collected ar- based, propagation-based, and source-based models [29]. ticles from all of the matching sites from the Common Crawl News Knowledge-based models often attempt to compare the claims archive [16]. After that, the HTML for each article was processed in a news article against facts stored in a knowledge base (KB) or in the same manner as the fake news dataset. We gathered over knowledge graph (KG). Knowledge graphs are commonly repre- 95K articles that mention the novel coronavirus, but we randomly sented as a set of subject-predicate-object triples, where the subject 1 https://www.newsguardtech.com/coronavirus-misinformation-tracking-center/ 4 https://toolbox.google.com/factcheck/apis 2 https://github.com/codelucas/newspaper 5 https://www.snopes.com/ 3 https://gitlab.com/bs-detector/bs-detector 6 https://www.poynter.org/ KI2 TE: Knowledge-Infused InterpreTable Embeddings for COVID-19 Misinformation Detection KnOD’21 Workshop, April 14, 2021, Virtual Event and object map to entities, which are typically represented as nodes. [3] Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R Hr- These models often predict the probability of triples existing in this uschka, and Tom M Mitchell. [n.d.]. Toward an Architecture for Never-Ending Language Learning. Technical Report. www.aaai.org graph and use that to determine the accuracy of a statement [5– [4] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: 7]. These knowledge graphs can be single-source, which uses a Pre-training of Deep Bidirectional Transformers for Language Understanding. (10 2018). http://arxiv.org/abs/1810.04805 knowledge graph from a single source, or open-source, where the [5] Xin Luna Dong, Christos Faloutsos, Xian Li, Subhabrata Mukherjee, and Prashant knowledge graph is created by merging data from multiple sources Shiralkar. 2018. 3-FactCheckingGraph - Google Slides. https://docs.google.com/ [29]. The downside of a knowledge-graph-based approach is that presentation/d/1JudymfQC14vpGdQY6nOodIcmOpec1vSnMa38FnSZiVY/edit# slide=id.g3fc8173fac_2_79 it requires a knowledge graph, which is non-trivial to construct. [6] Valeria Fionda and Giuseppe Pirrò. 2017. Fact Checking via Evidence Patterns. Many existing public knowledge graphs, like Wikidata [24], YAGO Technical Report. [18], and NELL [3], required some level of human supervision to [7] Mohamed H Gad-Elrab, Daria Stepanova, Jacopo Urbani, and Gerhard Weikum. 2019. Tracy: Tracing Facts over Knowledge Graphs and Text. (2019). https: construct. //doi.org/10.1145/nnnnnnn.nnnnnnn Style-based models attempt to look at the style with which the [8] Matthew Gardner, Tom Mitchell, William Cohen, Christos Faloutsos, and Antoine Bordes. [n.d.]. Reading and Reasoning with Knowledge Graphs. Technical Report. article was written to assess the intentions of the author, with the www.lti.cs.cmu.edu assumption that fake news articles are written differently than [9] Aditi Gupta, Hemank Lamba, and Ponnurangam Kumaraguru. 2013. $1.00 per RT authentic news articles. Propagation-based models look at how a #BostonMarathon #PrayForBoston: Analyzing fake content on twitter. In eCrime Researchers Summit, eCrime. IEEE Computer Society. https://doi.org/10.1109/ news article spreads and works on a news cascade [29] or graph eCRS.2013.6805772 representation of that. Source-based models look focus on the au- [10] Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi. thor and publisher of new articles, with the assumption that many [n.d.]. Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy. http://www.guardian.co.uk/world/us-news- fake news items tend to come from the same sources. While we use [11] Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag the term article, all of these methods are applicable to, and been of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, applied to other mediums, like social media posts. Short Papers. Association for Computational Linguistics, 427–431. [12] Heejung Jwa, Dongsuk Oh, Kinam Park, Jang Kang, and Hueiseok Lim. 2019. 5 CONCLUSION exBAKE: Automatic Fake News Detection Model Based on Bidirectional Encoder Representations from Transformers (BERT). Applied Sciences 9, 19 (9 2019), 4062. In this work, we propose a similarity matrix-based embedding https://doi.org/10.3390/app9194062 method: KI2 TE, which allows us to interpret the decisions of embedding- [13] Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. [n.d.]. MM-COVID: A Multilin- gual and Multimodal Data Repository for Combating COVID-19 Disinformation. based models by linking them to a set of reference documents. Technical Report. www.newsguardtech.com We gather a coarsely-labelled dataset of news articles and human- [14] Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. 2020. MM-COVID: A Multilin- labelled claims. We also evaluate our model on the MM-COVID gual and Multimodal Data Repository for Combating COVID-19 Disinformation. (11 2020). http://arxiv.org/abs/2011.04088 [14] dataset. We show that our model has similar performance to [15] Chao Liu, Xinghua Wu, Min Yu, Gang Li, Jianguo Jiang, Weiqing Huang, and baseline methods, with the added benefit of explainability on some Xiang Lu. 2019. A Two-Stage Model Based on BERT for Short Fake News De- tection. In Lecture Notes in Computer Science (including subseries Lecture Notes classification decisions. in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 11776 LNAI. Springer, 172–183. https://doi.org/10.1007/978-3-030-29563-9{_}17 6 ACKNOWLEDGEMENTS [16] Joel Mackenzie, Rodger Benham, Matthias Petri, Johanne R. Trippas, J. Shane Culpepper, and Alistair Moffat. 2020. CC-News-En: A Large English News The authors would like to thank Rutuja Gurav, Pravallika Devineni, Corpus. In Proceedings of the 29th ACM International Conference on Information and Sara Abdali for their valuable help and feedback. Research was & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery, New York, NY, USA, 3077–3084. https://doi.org/10.1145/ partially supported by the National Science Foundation Grant no. 3340531.3412762 1901379 and a UCR Regents Faculty Fellowship. This research was [17] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya partially sponsored by the U.S. Army Combat Capabilities Develop- Sutskever. [n.d.]. Language Models are Unsupervised Multitask Learners. Technical Report. https://github.com/codelucas/newspaper ment Command Army Research Laboratory and was accomplished [18] Thomas Rebele, Fabian Suchanek, Johannes Hoffart, Joanna Biega, Erdal Kuzey, under Cooperative Agreement Number W911NF-13-2-0045 (ARL and Gerhard Weikum. 2016. YAGO: A multilingual knowledge base from wikipedia, wordnet, and geonames. In Lecture Notes in Computer Science (including Cyber Security CRA). The views and conclusions contained in this subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), document are those of the authors and should not be interpreted Vol. 9982 LNCS. Springer Verlag, 177–185. https://doi.org/10.1007/978-3-319- as representing the official policies, either expressed or implied, of 46547-0{_}19 [19] Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings the Combat Capabilities Development Command Army Research using Siamese BERT-Networks. EMNLP-IJCNLP 2019 - 2019 Conference on Empiri- Laboratory or the U.S. Government. The U.S. Government is autho- cal Methods in Natural Language Processing and 9th International Joint Conference rized to reproduce and distribute reprints for Government purposes on Natural Language Processing, Proceedings of the Conference (8 2019), 3982–3992. http://arxiv.org/abs/1908.10084 notwithstanding any copyright notation here on. [20] Victoria L. Rubin. 2010. On deception and deception detection: Content analysis of computer-mediated stated beliefs. In Proceedings of the ASIST Annual Meeting, REFERENCES Vol. 47. https://doi.org/10.1002/meet.14504701124 [21] Ando Saabas. 2014. Interpreting random forests | Diving into data. http: [1] David Arthur and Sergei Vassilvitskii. 2007. K-means++: The advantages of //blog.datadive.net/interpreting-random-forests/ careful seeding. In Proceedings of the Annual ACM-SIAM Symposium on Discrete [22] Robert Tibshirani. 1996. Regression Shrinkage and Selection Via the Lasso. Algorithms, Vol. 07-09-January-2007. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996). [2] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, https://doi.org/10.1111/j.2517-6161.1996.tb02080.x Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda [23] Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, news online. Science 359, 6380 (3 2018), 1146–1151. https://doi.org/10.1126/ Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, science.aap9559 Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin [24] Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A free collaborative Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya knowledgebase. Commun. ACM 57, 10 (9 2014), 78–85. https://doi.org/10.1145/ Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv (5 2020). http://arxiv.org/abs/2005.14165 KnOD’21 Workshop, April 14, 2021, Virtual Event William Shiao and Evangelos E. Papalexakis 2629489 [27] Kai-Chou Yang, Timothy Niven, and Hung-Yu Kao. 2019. Fake News Detection [25] Chenguang Wang, Xiao Liu, and Dawn Song. 2020. Language Models are Open as Natural Language Inference. arXiv (7 2019). http://arxiv.org/abs/1907.07347 Knowledge Graphs. (10 2020). http://arxiv.org/abs/2010.11967 [28] Tong Zhang, Di Wang, Huanhuan Chen, Zhiwei Zeng, Wei Guo, Chunyan Miao, [26] Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, and Lizhen Cui. 2020. BDANN: BERT-Based Domain Adaptation Neural Network Darrin Eide, Kathryn Funk, Rodney Kinney, Ziyang Liu, William Merrill, Paul for Multi-Modal Fake News Detection. In Proceedings of the International Joint Mooney, Dewey Murdick, Devvret Rishi, Jerry Sheehan, Zhihong Shen, Brandon Conference on Neural Networks. Institute of Electrical and Electronics Engineers Stilson, Alex D. Wade, Kuansan Wang, Chris Wilhelm, Boya Xie, Douglas Ray- Inc. https://doi.org/10.1109/IJCNN48605.2020.9206973 mond, Daniel S. Weld, Oren Etzioni, and Sebastian Kohlmeier. 2020. CORD-19: [29] Xinyi Zhou and Reza Zafarani. 2018. A Survey of Fake News: Fundamental The Covid-19 Open Research Dataset. arXiv:2004.10706 [cs.DL] Theories, Detection Methods, and Opportunities. Comput. Surveys 53, 5 (12 2018). https://doi.org/10.1145/3395046