=Paper=
{{Paper
|id=Vol-2877/paper2
|storemode=property
|title=REMOD: Relation Extraction for Modeling Online Discourse
|pdfUrl=https://ceur-ws.org/Vol-2877/paper2.pdf
|volume=Vol-2877
|authors=Matthew Sumpter,Giovanni Luca Ciampaglia
|dblpUrl=https://dblp.org/rec/conf/www/SumpterC21
}}
==REMOD: Relation Extraction for Modeling Online Discourse==
REMOD: Relation Extraction for Modeling Online Discourse Matthew Sumpter Giovanni Luca Ciampaglia mjsumpter@usf.edu glc3@mail.usf.edu University of South Florida University of South Florida ABSTRACT rdf:type vn.role:Agent fred:receive_1 The enormous amount of discourse taking place online poses chal- dbpedia:Tej_Pratap_Yadav vn.data:Receive_13052000 vn.role:Theme lenges to the functioning of a civil and informed public sphere. rdfs:subClassOf SOURCE fred:degree_1 quant:hasDeterminer Efforts to standardize online discourse data, such as ClaimReview, are making available a wealth of new data about potentially inac- dul:Event fred:from quant:a curate claims, reviewed by third-party fact-checkers. These data rdf:type fred:Takshsila_university could help shed light on the nature of online discourse, the role fred:locatedIn of political elites in amplifying it, and its implications for the in- dbpedia:Doctorate rdfs:subClassOf tegrity of the online information ecosystem. Unfortunately, the TARGET dbpedia:Bihar semi-structured nature of much of this data presents significant rdfs:subClassOf rdf:type challenges when it comes to modeling and reasoning about on- dul:Quality fred:Degree schemaorg:Place line discourse. A key challenge is relation extraction, which is the task of determining the semantic relationships between named entities in a claim. Here we develop a novel supervised learning Figure 1: Schematic example of our approach. The RDF method for relation extraction that combines graph embedding graphlet generated by a machine-reading tool (FRED) for techniques with path traversal on semantic dependency graphs. the claim “Tej Pratap Yadav receives a doctorate degree from Our approach is based on the intuitive observation that knowledge Takshsila University in Bihar” (a known misinformation of the entities along the path between the subject and object of a claim [26]). The shortest undirected path between the source triple (e.g. Washington,_D.C., and United_States_of_America) (dbpedia:Tej_Pratap_Yadav) and target (dbpedia:Doctorate) provides useful information that can be leveraged for extracting is shown in red. The nodes along the path are highlighted in its semantic relation (i.e. capitalOf). As an example of a potential gray. application of this technique for modeling online discourse, we show that our method can be integrated into a pipeline to reason about potential misinformation claims. are indicative of misinformation, respectively, however they often fail to engage with the ideological content being shared. Online CCS CONCEPTS discourse typically takes the form of unorganized and unstructured • Information systems → Web mining; Semantic web description data which is a significant limiting factor to performing content languages; Information extraction. analysis. Existing work on semantic ontologies and knowledge base development has proved to be a guiding method in structuring KEYWORDS online information. A knowledge base most commonly structures knowledge in the shape of semantic triples; a semantic triple is relation extraction, semi-structured data, semantic ontology, claim composed of two entities (e.g. a person, place, or thing) and a pred- matching, fact-checking icate relation between them. An example of a semantic triple is. 1 INTRODUCTION This structure allows for concepts to be reduced to machine-readable The prevalence of false and inaccurate information in its myriad data which can be compiled into traversable (and understandable) of forms — a persistent and dangerous societal problem — is still a networks of information. The result is a data structure that can be poorly understood phenomenon [1, 7, 30], especially in the context used to provide quantitative analysis of online discourse. of political communication [21]. Even though strong exposure to An example of knowledge bases application in combating misin- so-called “fake news” is limited to the segment of most active news formation regards computational fact-checking. Fact-checking is consumers [19], individual claims echoing the false or misleading recognized as an antidote to misinformation [32], especially with content shared by these audiences can spread rapidly through social respects to claims spread by political elites. For example, Nyhan and media [57, 69], amplified by bots [46] or other malicious actors [60], Reifler [36] show that alerting politicians to the risk of being fact- who often target elites, like celebrities, pundits, or politicians. From checked leads to less inaccuracy and better ratings. Unfortunately, there, false claims rebroadcast by these elites enjoy further dissem- fact-checking claims at the scale of the web is a hard task. A fact- ination, reaching even wider audiences. checker must first identify claims that are worthy of being checked, Misinformation has become an emerging focus of computational then they must research the claim [6, 51], and finally write, publish, social scientists seeking to understand and combat it [10, 56]. Net- and circulate their conclusion on the web. In general, there is a lag work analysis and natural language processing (NLP) provide in- of approximately 15 hours between the consumption of misinforma- sight into the community organization and stylistic patterns that tion and the appearance of corrections [45]. The time investment KnOD'21 Workshop - April 14, 2021 Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). KnOD’21 Workshop, April 14, 2021, Virtual Event Matthew Sumpter and Giovanni Luca Ciampaglia 2.1 2.2.1 2.2.2 required of human fact-checkers leads to an open opportunity for the development of many automated fact-checking [11, 63] or verifi- RDF RDF cation [34] strategies. One approach is based on identifying missing GREC and Snippet FRED Relational RDF ClaimReview Graph Corpus Graphs Relational Corpus (1 FRED RDF / Graph relations in structured knowledge bases [11, 29, 47, 48]. This ap- Corpora Retrieve Subject and Object Snippet) proach takes a claim in the form of a semantic triple and checks Relation ClaimReview Claims Graph corresponding to Snippet Node2Vec Classifier its validity against the sets of triples in the knowledge base that Model No Path Traversal connect the subject and object. When the knowledge base is viewed d and Shortest eE Training/Testing Path Vector Node2Vec mb as a network, this task is equivalent to link prediction [33]. Data Generation ed din This approach has proven very promising, but its main restriction gs 2.2.5 2.2.4 2.2.3 is that of its input. Modeling a claim using semantic triples is a Classify ClaimReview nontrivial task, and has limited the application of such an approach. Relations It requires choosing a semantic ontology (or developing a new one) Knowledge Linker 2.2.6 which is able to model claims in a consistent and non-redundant manner. Once an ontology has been established, the next step is Figure 2: Schematic illustration of an integrated extraction relation extraction — the task of reducing a text into a semantic and verification pipeline using our relation extraction tool triple that both captures the meaning and fits within the ontology. REMOD. The white components correspond to the various This task is challenging when addressing a compound factual claim steps needed to perform relation extraction. Numbered la- with many subjects and relations; this challenge is amplified when bels correspond to section headings in the manuscript. To considering a claim that may contain sarcasm, opinion, humor, show the potential for integration with external tools, as an or any other nuance of language that can be present in online additional step in the pipeline the green node shows the use discourse. of an off-the-shelf fact-checking algorithm [11]. In this paper, we present a novel relation extraction method built upon semantic dependency trees, see Figure 1 for a schematic example. Our approach to the problem is based on the intuition that knowledge of the nodes and relations along the path between more complex claims, from which one can extract multiple relevant the subject and object of a triple (e.g. Washington,_D.C., and relations, and therefore cannot be fact-checked directly, the fact- United_States_of_America) provides useful information that can checker can still identify evidence in support or against the claim be leveraged for extracting its relation (i.e. capitalOf). This well- with good accuracy (verification AUC = 0.773). established phenomenon was first observed by Richards and Mooney The rest of this paper is structured as follows: Section 2 details [41]. Later, Bunescu and Mooney [9] used it in the context of a the datasets used, as well as the methods used in the various steps kernel-based approach. Here, we take advantage of recent advances of the pipeline. Section 3 shows the results of both the relation clas- in graph representation learning to overcome the above challenges sification task and the fact checking tasks. Section 4 goes into detail posed by online discourse in applying such an approach. Specif- on relevant prior work from the literature on relation classification, ically, we parse a large corpus of Wikipedia snippets, annotated misinformation detection, and computational fact-checking. Finally, with information about one of 5 relations from the DBpedia ontol- Section 5 discusses the impact and importance of our results, as ogy, combine the resulting dependency trees into a larger semantic well as addresses methods that may be used to improve upon this network, and finally use node embedding techniques to obtain a work in the future. high-dimensional representation of this corpus-level network. We find that graph traversal in this learned representation provides a 2 METHODS strong signal to discriminate between multiple possible relations. This approach allowed us to effectively extract these relations in Our relation extraction pipeline is described in Figure 2. Roughly natural language (extraction accuracy measured as the area under speaking, the main task of our pipeline is a supervised relation the ROC curve, AUC = 0.976). We then tested this model’s ability extraction task (white nodes), but since later we show how this to generalize to a set of real-world claims (reviewed by professional task can be integrated to perform an additional unsupervised fact- fact-checkers and annotated using the ClaimReview [22] schema), checking, in the figure we show also this final step (green node). obtaining again a very good signal (extraction AUC = 0.958). Collectively these two tasks leverage a number of different data As an example of a potential application of this technique, we sources, so we start by describing the various datasets used in show that, thanks to our method, a wider range of online discourse building the pipeline. We then describe the various components of samples is amenable to analysis than before. In particular, we inte- the pipeline proper. grate our approach into a pipeline (see Figure 2) that uses off-the- shelf fact-checking algorithms to analyze a subset of ClaimReview- 2.1 Datasets annotated online discourse samples. Using this pipeline, we obtain For the main relation extraction task, we use two main corpora, both very encouraging results on two separate tasks: First, on samples compiled by Google: the Google Relation Extraction Corpus (GREC) of ‘simple’ online discourse claims, which can be effectively sum- and the Google Fact Check Explorer corpus, described below. marized (and thus fact-checked) by extracting a single RDF triple, we outperform a claim-matching baseline based on state-of-the-art 2.1.1 Google Relation Extraction Corpus (GREC). The dataset of re- representation learning (verification AUC = 0.833). Second, on lations used was the Google Relation Extraction Corpus (GREC) [37]. REMOD KnOD’21 Workshop, April 14, 2021, Virtual Event Table 1: Number of snippets per relation before and after Table 2: The set of WordNet synonyms used to extract rele- filtering the GREC corpus. vant claims from the ClaimReview database Total Retained % Retained WordNet synonyms per relation Institution 42, 628 19, 900 46.7 Institution attend, university, college, graduate Education 1, 850 806 43.6 Education graduate, degree Date of Birth 2, 490 1, 010 40.6 Date of Birth born, born on Place of Birth 9, 566 4, 005 41.9 Place of Birth born, birthplace, place of birth, place of origin Place of Death 3, 042 1, 307 43.0 Place of Death deceased, died, perished, passed away, expired This dataset contains text snippets extracted from Wikipedia arti- the same standards for accountability, transparency, and accuracy cles that represent a subject/object relation, which can be described used by Google News to select publishers. We collected claims from by the following defining questions: the Google Fact Check Explorer tool up until 04/2020. From this Institution “What educational institution did the subject at- corpus, we produced a dataset of 49,770 ClaimReview-annotated tend?” claims. Of the 20,817 English claims in the dataset, we searched for Education “What academic degree did the subject receive?” claims that contained one of the relations represented in the GREC, Date of Birth (DOB) “On what date was the subject born?” using WordNet [14] synonyms to select search terms (see Table 2). Place of Birth (POB) “Where was the subject born?” This procedure yielded a subset of 28 claims that met this criteria. Place of Death (POD) “Where did the subject die?” 2.2 REMOD Each entry in the dataset consists of a natural language snippet of text, the URL of the Wikipedia entry from which the text was pulled, The main contribution of this work is REMOD (which stands for the Freebase predicate, a Freebase ID for subject and object, and Relation Extraction for Modeling of Online Discourse), a novel tool the judgements of five human annotators on whether the snippet for relation extraction that extract RDF triples from semi-structured does or does not contain the relation (some annotators also voted samples of online discourse. To do so, our tool leverages an anno- to "skip", representing no decision either way). Freebase has been tated corpus of past claims and relations. In the example pipeline replaced with the Google Knowledge Graph since this dataset was shown in Figure 2, the various steps of REMOD correspond to the generated, which limited the use of this dataset in its original form. white nodes, which we describe in more detail below. (The figure We made a set of addenda1 to the GREC to update it to be more is labeled with numbers corresponding to the following section machine-ready for current relation extraction tasks and knowledge numbers, which elaborate on each step of the process.) To facilitate bases. The addenda include the following for each entry: text strings the replication of our results, the source code of REMOD is freely for both subject and object, DBpedia URI for both subject and object, available online at https://github.com/mjsumpter/remod. Wikidata QID for both subject and object, a unique identifier, and 2.2.1 Semantic Parsing. Our workflow begins with natural lan- the majority annotator vote. guage snippets. To parse these snippets we used FRED, a machine The snippets varied considerably in length. The distribution of reading tool based on Discourse Representation Theory and linguis- word lengths can be found in Figure 3. Because we relied on a third- tic frames [17], described by the authors as “semantic middleware”. party API to parse the snippets, to reduce potential bias due to FRED is an NLP tool that combines frame detection, type induction, snippet length, and to ensure only the most characteristic relations named-entity recognition, semantic parsing, and ontology align- were modeled, snippets were removed if they were not within ±0.5 ment, all into a single tool. The authors provide a RESTful API to standard deviations of the mean snippet length (measured in words), access it. When provided with a text string as input, it returns a Re- per relation. Table 1 shows the number of snippets retained, per source Description Framework (RDF) graphlet of the semantic parse relation. tree of the input. (In practice, FRED produces DAGs instead of trees 2.1.2 Google Fact Check Explorer Corpus. Researchers at Duke due to entity linking to external ontologies, hence our referring to University and Google have developed an annotation standard them as ‘graphlets’.) An example of these RDF graphlets is shown named ClaimReview [44] to help annotate structured fact-checks in Figure 1 for the ClaimReview snippet of a known misinformation on the web. It allows fact-checkers to add structured markup to claim [26]. their fact-checks with info that identifies distinct properties of a 2.2.2 Corpus Graph Composition. In a realistic environment, many claim (i.e. claim reviewed, the rating decision, the source, etc.). claims of different relations will exist in the same corpus. To mimic This semi-structured data allows fact-checks to be catalogued and this environment, we composed a single ‘corpus’ graph, which queried by search engines. The Google Fact Check Explorer tool2 was composed of every FRED RDF graphlet generated from the collects all the ClaimReview fragments published by fact-checking corpus snippets. For named entities, FRED defaults to generating organizations that meet a set of established guidelines3 , which are nodes for its own namespace (e.g. fred:Doctorate), then if it finds 1 https://github.com/mjsumpter/google-relation-extraction-corpus-augmented that the same entity is present in an existing ontology, it links to 2 https://toolbox.google.com/factcheck/explorer that ontology (e.g. dbpedia:Doctorate). Since these equivalent 3 https://developers.google.com/search/docs/data-types/factcheck#guidelines entities were redundant, we contracted the two nodes into a single KnOD’21 Workshop, April 14, 2021, Virtual Event Matthew Sumpter and Giovanni Luca Ciampaglia Institution Education Date of Birth Place of Birth Place of Death Number of Snippets 4000 3000 2000 1000 0 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 0 100 200 300 Snippet Length Snippet Length Snippet Length Snippet Length Snippet Length Figure 3: Distribution of snippet lengths found in the GREC. The red solid line corresponds to the average snippet length (in words) and the dashed lines to ±0.5𝜎 of the average. Snippets were kept if they were within this interval. space was set to 256; the number of walks to 200; the walk length to 200; and, finally, the context window to 50. 2.2.4 Path Traversal for Finding Relations. Our approach is inspired by the well-known idea that finding paths over structured knowl- edge representations can help learning new concepts [41]. More recently, Bunescu and Mooney [9] confirmed the intuitive con- clusion that the shortest path between entities in a dependency tree captures the significant information contained between them. Therefore, we sought to develop a classifier that could distinguish between the shortest paths of different semantic relationships. To do so, for each snippet in the corpus, the subject and object were retrieved, along with the original (i.e., non-stitched) RDF graphlet Figure 4: A visualization of how two separate RDF graphlets of that specific snippet. The nodes corresponding to the subject were stitched together along identical nodes. and object were identified in the RDF graphlet. With the terminal nodes identified, the shortest path in the original RDF graphlet was calculated (Figure 1). Finally, we generated a final embedding by vertex, and use the URI from the linked ontology (i.e. DBpedia in averaging along the path: this example) as its new URI. The corpus graph was than created by 1Õ 𝑛 stitching together all the contracted RDF graphlets: if two graphlets 𝑣®𝑖 𝑛 𝑖=1 share one or more nodes (i.e. two or more nodes have the same URI), then we consider the union of the two graphlets, and contract where 𝑣 1, . . . , 𝑣𝑛 is a path and 𝑣® ∈ R𝑑 is the vector associated to any pair of such nodes into a single node. This new node is incident 𝑣 ∈ 𝑉 . This resulted in a final vector representing the aggregated to the union of all incident edges in the two original graphlets. An sequence of nodes along the shortest path between subject and example of this is shown in Figure 4. The resulting corpus graph object. consists of 212, 976 nodes and 832, 367 edges. This process resulted in a 256-dimensional vector for each snip- pet in the corpus. All results shown in the next section were ob- 2.2.3 Node Embedding. The corpus graph is effectively a combined tained from these vectors. We projected the vectors into a lower- semantic parse tree of the selected snippets from the corpus. To dimensional space using t-SNE. The visualization of these vectors better exploit this structure in machine learning tasks, we generated is shown in Figure 5, where each color corresponds to a different node embeddings using the Node2Vec algorithm [20]. Node2Vec relation. The projection reveals a good separation of vectors based generates sets of random walks for each node, which are then on the relation they represent. substituted in place of natural language sentences as input into the Word2Vec model. There are two important parameters which will 2.2.5 Relation Classification. We trained several classification mod- influence the nature of the embeddings: the return parameter 𝑝 and els on the resulting set of shortest path vectors. The selected clas- the in–out parameter 𝑞. For 𝑝 > 1 there is a higher likelihood of sifiers were Logistic Regression, 𝑘-NN, SVM, Random Forest, De- returning to a visited node in the random walks, whereas for 𝑞 > 1 cision Tree, and a Wide Neural Net. Samples that were rated by there is an increased likelihood of exploring unvisited nodes. We the annotators to not contain a specified relation were removed, performed a grid search of 𝑝 and 𝑞 parameters (see Section 3.2), and and then the dataset was balanced to the lowest frequency class determined the best choice for these parameters to be 𝑝 = 2 and (Education, 𝑁 = 598 samples). Readers will note this is a decrease 𝑞 = 3; this configuration captures what the authors of Node2Vec call from the 806 reported in Table 1; FRED was not always accurate the ‘global’ topological structure of the graph. The other parameters at identifying entities and occasionally returned corrupted RDF of Node2Vec were chosen as follows: the dimension of the vector graphs, resulting in a small loss of data. To effectively compare REMOD KnOD’21 Workshop, April 14, 2021, Virtual Event Table 3: AUC of Wide DNN on the relation classification task Institution using different types of graph to represent the corpus graph. Education AUC Unweighted Weighted Date of Birth Undirected 0.976 0.964 Place of Birth Directed 0.966 0.967 Place of Death remaining claims were passed as input to three fact-checking al- None of the Above gorithms: Knowledge Stream, Knowledge Linker, and Relational Knowledge Linker [48]. As a baseline, we trained a Doc2Vec model [31] on the entirety of the ClaimReview corpus, and used this model to fact-check state- Figure 5: The shortest path vectors of GREC relations pro- ments by matching them with other similar claims. In particular, jected into 2D using t-SNE. Each color represents a different given an input claim, to produce a truth score with the baseline semantic relation, with a sixth color to mark snippets for model we ranked all claims in the ClaimReview corpus by their which a majority of annotators voted ‘No (relation)’. similarity and averaged the truth scores of the top 𝑘 most simi- lar matching claims. We removed fact-checking organizations that used scaleless fact-check verdicts (i.e. factcheck.org); for those that different classifiers, training was done using a 64%/16%/20% train- had scales, we assigned truth scores to every claim, setting "False" ing/validation/testing split. This resulted in a final training dataset to a baseline of 0, unless a scale explicitly stated a different baseline of 1, 913 samples (5 classes, 𝑁 ≈ 382 samples/class), with a val- (i.e. PolitiFact ranks "Pants on Fire" lower than "False"). idation set of 479 samples, and an additional 598 samples held for testing. The 28 selected ClaimReview claims were held as an 3 RESULTS additional test set, which is elaborated on in Section 2.2.6. 3.1 Graph Representation 2.2.6 Fact-Checking. To demonstrate the usefulness of our method, The corpus graph is composed of dependency trees, and so the cor- we show that REMOD can be integrated as the first step of a fact- pus graph is naturally a directed graph; edges are also all weighted checking pipeline using existing, off-the-shelf tools to verify online equally. This design has a strong influence on path traversal, since discourse claims annotated using the ClaimReview standard. To directed edges reduce the number of available paths and the cost perform fact-checking, we rely on the work of Shiralkar et al. [48], of taking an edge (or its absence) influences the choice of one path who provide open-source implementations of several fact-checking over another. For completeness, we considered all four combina- algorithms4 . These algorithms can be used to assess the truthfulness tions of taking either a directed or undirected graph, and of having of a statement, but of course any tool that takes RDF triple in edge weights or not. Let 𝑣𝑖 , 𝑣 𝑗 ∈ 𝑉 represent two nodes in the de- input could be used as well. To extract relations from ClaimReview pendency graph that are incident on the same edge. The weight snippets, we used the deep neural network classifier, which was the 𝑤𝑖 𝑗 between them is the angular distance between the respective most successful classifier from the prior step, and feed the extracted node embeddings: triples into the fact-checker. 𝑣®𝑖 · 𝑣®𝑗 1 Of course, when integrating two distinct tools one has to make 𝑤𝑖 𝑗 = arccos 𝜋 ∥® 𝑣𝑖 ∥ · ∥®𝑣𝑗 ∥ sure that any error originating in the first tool does not affect the performance of the second tool. Therefore, to avoid cascading Where 𝑣® is the vector associated to 𝑣 ∈ 𝑉 . errors we removed some claims from our dataset. We removed two Table 3 shows that the undirected, unweighted graph yields the types of errors. First, we removed any claim where the relation best classification results, which prompts two observations. The was misclassified, to avoid feeding inaccurate inputs into the fact- first is that directed edges reduce the number of available pathways checker. Second, FRED is not always able to link both the subject to connect two nodes. Second, and perhaps a bit surprisingly, we and object entities to DBpedia, which is a requirement for using observe that the unweighted network performs better than the the fact-checking algorithms of Shiralkar et al. [48]. Thus we also weighted one. Because node embeddings were the same in the two removed claims that did not have both the subject or object linked variants, the final feature vector used for relation classification to the DBpedia ontology. Of the original 28 claims, this filtering would be different only if a different shortest path was found. This resulted in 13 remaining ClaimReview claims used in our evaluation. could be possible if edges that are more relevant to discriminating Additionally, we also manually checked whether the overall the relation were assigned large weights, compared to other, less claim reduces to the extracted triple (in the sense that verifying the relevant edges. triple also verifies the overall claim). This distinction is important since it allows us to gauge the ability of our system to check entire 3.2 Classification for Relation Extraction claims automatically, in a purely end-to-end fashion. Finally, these The results of the relation classification task are shown in Table 4. The outcome of these various tests reveal that the node embeddings 4 https://github.com/shiralkarprashant/knowledgestream do contain information regarding the semantic nature of the GREC KnOD’21 Workshop, April 14, 2021, Virtual Event Matthew Sumpter and Giovanni Luca Ciampaglia Table 4: Results of the relation classification task using dif- Table 5: The performance of the fact-checking algorithms ferent ML models, on an unweighted, undirected corpus on predicting the validity of the relations. graph, as compared to training with Word2Vec embeddings. Method AUC Precision Recall F1 AUC Knowledge Linker 0.636 Decision Tree 0.64 0.64 0.64 0.773 Relational Knowledge Linker 0.773 Random Forest 0.81 0.67 0.61 0.793 Knowledge Stream 0.773 𝑘-NN 0.78 0.74 0.74 0.841 SVM 0.81 0.77 0.77 0.855 Log. Regr. 0.80 0.71 0.71 0.827 Table 7 lists this information under the column “Claim ≡ Triple”, Wide DNN 0.85 0.85 0.85 0.976 which is true (indicated by a checkmark) when the extracted rela- Word2Vec+Log. Regr. 0.66 0.47 0.44 0.658 tion summarizes the whole claim (e.g. claim #3). This distinction is Word2Vec+Wide DNN 0.61 0.63 0.61 0.883 important: as mentioned before, although our relation extraction pipeline is capable of predicting a relation for all the entries in Table 7, not all triples that are correctly predicted can be fed to the fact-checking algorithms, due to incomplete entity linking. For relations, however they are not neatly separable by decision planes. the task of identifying supporting evidence, we find a total of 13 It is notable that models we tested are often more successful in ClaimReview claims that are amenable to fact-checking. For the precision than in recall. This suggests that the more complex model, task of checking an entire claim, this number is further reduced to such as a deep neural network (DNN), is necessary to identify the 7 claims. less characteristic samples of a relation. To improve these results, we performed a grid search on the Node2Vec 𝑝 and 𝑞 parameters 3.4.1 Fact Verification. Table 5 shows the results of verifying in- (with values of 0.25, 0.5, 1, 2, 3, and 4). The best overall results dividual pieces of evidence in support or against any of the 13 were a product of a ‘global’ configuration, using 𝑝 = 2 and 𝑞 = 3, ClaimReview claims identified by REMOD, using any of the three which achieved an AUC of 0.976 on the test set. To evaluate our algorithms for fact-checking RDF triples. Relational Knowledge method, as a baseline we generated 300-dimension vectors for each Linker and Knowledge Stream were the best performers. Note that snippet from a Word2Vec model, pre-trained on Wikipedia [66]. since our baseline is intended to emulate a true fact-checking task, This is the same source of the GREC corpus, which provided training in this case we do not run the baseline since the similarity is based data for model. These embeddings were then used as features to on the whole claim, and thus would not be a meaningful compar- train a DNN and a logistic regression model for relation extraction. ison with our method, which focuses only on a specific relation REMOD shows a marked improvement in both instances, indicating within a larger claim. an effective approach to relation extraction. 3.4.2 Fact-Checking. We test here the subset of claims for which checking the triple is equivalent to checking the entire claim. In 3.3 Extraction of ClaimReview Relations this case, REMOD yields 7 claims that can be used as inputs to Table 7 in the appendix shows the claims selected from the Claim- the fact-checking algorithms. Table 6 shows the results of our 7 Review corpus, in addition to the relation they contain ("Actual"), ClaimReview claims, on the three fact-checking algorithms, along the relation predicted by REMOD ("Predicted"), the truth rating as with the baseline. Here, the baseline emulates fact-checking by determined by a fact-checker ("Rating"), and whether verifying the claim matching. relation is equivalent to verifying the claim ("Claim ≡ Triple"). The Since we are using claim-matching to perform fact-checking, we AUC of the predicted relations is 0.958. Inspecting the misclassi- consider three different scenarios to make the task more realistic. fied samples, we see that REMOD made mistakes between similar In particular, we match the claim against three different corpora relations (e.g. place of birth and date of birth), which often occur in by higher degree of realism: 1) the full ClaimReview corpus (‘All’), similar sentences. 2) all ClaimReview entries by PolitiFact only (‘PolitiFact’), and 3) all ClaimReview entries from the same fact-checker of the claim 3.4 Fact-Checking of interest (‘Same’). The first case (‘All’) is meant to give an upper We next test the integration with fact-checking algorithms. In par- bound on the performance of claim matching but is not realistic, ticular, we use the fact-checker for two similar, but conceptually since it makes use of knowledge of the truth score of potentially distinct tasks: 1) fact-checking an entire claim (fact-checking), and 2) future claims, as well as of ratings for the same claim but by different identifying evidence in support or against a claim (fact verification). fact-checkers. The second case (‘PolitiFact’) partially addresses this For example, for claim #1 (see Table 7), Penny Wong was indeed second unrealistic assumption by using only claims from a single born in Malaysia, even though the assertion that she is ineligible source. Thus, it does not have access to truth scores by different for being elected into the Australian parliament is false. Thus, in organizations for the same claim, but it does still have access to this case the extracted triple is only additional evidence, but is not future information. Both 1) and 2) can be thus regarded as gold able in itself to capture the entire claim. We manually fact-checked standard measures of performance. The last one (‘Same’) is the all the extracted relations, and compared their truth rating with more realistic one, since it emulates the scenario of a fact-checker the one provided by the human fact-checker for the whole claim. who may check a claim for the first time, and who thus cannot REMOD KnOD’21 Workshop, April 14, 2021, Virtual Event Table 6: Results of the fact-checking algorithms. (CM = This task takes one of two approaches; the first infers new relations Claim Matching; KL = Knowledge Linker; Rel. KL = Rela- from existing triples in a knowledge base [8, 53] — this is essentially tional Knowledge Linker; KS = Knowledge Stream.) a link-prediction task that builds upon patterns found between en- tities in knowledge bases. The second approach mines data found 𝑘=1 𝑘=3 𝑘=5 𝑘 = 10 on the web for knowledge discovery [12, 67]. This approach relies on redundant relations found among the selected source materials, CM (All) 0.417 0.625 0.500 0.625 which may be as restrictive as Wikipedia articles [39] or as exten- CM (PolitiFact) 0.666 0.625 0.833 0.750 sive as the entire web [12]. Due to the potential for error based CM (Same) 0.500 0.583 0.25 0.25 on the sources, Dong et al. [13] developed a Knowledge-Based KL 0.500 Trust (KBT) score for measuring the trustworthiness of selected Rel. KL 0.833 sources. Yu et al. [67] expand upon this by combining KBT scores KS 0.833 with other entity/relation-based features to assign a unique score to each individual triple. have access to claims fact-checked afterwards nor by ratings of the same claim by different fact-checkers. In all three cases, the claim 4.3 Detecting Information Disorder being matched was removed from the corpus, to prevent trivially Information disorder is a catch-all term for the many kinds of perfect predictions. Relational Knowledge Linker and Knowledge unreliable information that one may encounter online or in the Stream are still the best performing of the fact-checking algorithms real-world [59], which includes disinformation, misinformation, and manages to reach, if not exceed, the performance of the gold fake news, rumor, spam, etc. Information disorder can also take standard (Claim Matching–All, or –PolitiFact). on several modalities, including text, video, and images. The many varieties of information disorder make it challenging to develop any 4 RELATED WORK one approach for detection. This leads to a multi-model approach to detection based on three main modalities: the content of the 4.1 Relation Extraction and Classification information, the users who shared it, and the patterns of informa- Relation extraction and classification is the task of extracting se- tion dissemination on a network. Often bad content is generated mantic relationships between two entities in natural language text by bots; this suggests that features captured from user profiles can and matching them to semantically equivalent or similar relations. be useful for distinguishing bots from humans [50]. Content detec- This task is at the core of information extraction and knowledge tion is dependent on the medium; lexical features, sentiment, and base construction, as it effectively reduces statements to their core readability metrics are used for text, while neural visual features meaning; this is typically modeled as a semantic triple, (s,p,o), where are extracted from other content [40, 42, 43]. Network detection two entities (s and o) are connected with a predicate, p. There are methods model social media networks as propagation networks, several distinct nuances and open challenges to effective relation measuring the flow of information [49]. There has also been promis- extraction. Identifying attributes that discriminate between two ing work into crowd-sourcing the task by allowing users to flag objects provides a descriptive explanation to supplement word em- questionable content [55]. This task, while likely to remain imper- beddings (i.e. lime is separated from lemon by the attribute ‘green’), fect, provides the important supplement of human supervision to and is currently most successful with SVM classifiers [27]. Multi- all of the aforementioned tasks. way classification attempts to distinguish the direction of one-way relations (the sonOf relation is not bidirectional between two peo- ple), and has seen similar levels of success from solutions built with 4.4 ClaimBuster language models [3], convolutional neural networks [58], and recur- Hassan et al. [24] released the first-ever end-to-end fact-checking rent neural networks [64]. Distantly supervised relation extraction system in 2017, called ClaimBuster. ClaimBuster is composed of is a two-way approach whereby semantic triples are generated several distinct components that work in sequence to accomplish from natural language by aligning them with information already the task of automated fact-checking. The first, claim monitor, con- present in knowledge graphs [65]. Relation extraction performance tinuously monitors text published as broadcast television closed- is often assessed on the TACRED dataset [68]. This is a large-scale captions, Twitter accounts, and as content on a selected set of dataset of 106, 264 examples used in the annual TAC Knowledge websites. This text is passed to the claim spotter, which scores ev- Base Population challenges, and covers 41 relation types. The most ery sentence by its likelihood to contain a claim that is worthy successful solution to date is from Baldini Soares et al. [3], who of fact-checking — subjective and opinionated sentences receive a achieved a micro-averaged F1 score of 71.5%. Despite increasing low score in this task. Once it has identified a set of check-worthy availability of state-of-the-art machine learning architectures, rela- sentences, it uses a claim matcher to search through fact-check tion extraction continues to be an open problem with much room repositories to return existing fact-checks that match the selected for improvement. sentences. Claim checker generates questions from the selected sentences and uses those questions to query Wolfram Alpha and 4.2 Knowledge Base Augmentation Google to fetch supporting or debunking evidence as a supplement Knowledge base augmentation is a task that aims to add new re- to the findings of claim matcher. Finally, the fact-check reporter lations to existing knowledge bases in an automated fashion [61]. builds a report from all of the gathered evidence that summarizes KnOD’21 Workshop, April 14, 2021, Virtual Event Matthew Sumpter and Giovanni Luca Ciampaglia the findings of the ClaimBuster pipeline, and disseminates these of our pipeline lies in its discrete structure, which is prone to cas- findings through social media. cading failures. Our main NLP tool, FRED, is a powerhouse of a tool and performed many important NLP tasks at once; however, it 4.5 Claim Verification was not always completely accurate and many of our samples were Claim verification is arguably the key task of fact-checking — to returned as corrupted RDF graphs. Additionally, it was not always check a claim against existing evidence. It is related to the match- able to link the nodes to DBpedia, which limited the number of ing and checking subtasks of ClaimBuster, in that it is the task triples we could feed into our fact-checking algorithms. Cascad- of checking whether a natural language sentence selected as evi- ing failures are common to many machine reading pipelines [35]. dence supports or debunks the correlated claim. To build out com- One way to overcome this issue would be to rely on a joint in- putational solutions to this task, datasets containing claims and ference approaches [52]. Another limitation of our methodology their corresponding evidence are needed. There have been some has to do with our use of distributed representations. For the task datasets [2, 15, 56] relevant to this task, however they are either of fact-checking, the corpus is always growing; Node2Vec cannot not machine-readable or lacking in size. Thorne et al. [54] rec- generalize to unseen data and requires retraining. An inductive ognized this gap, and has since released a large-scale dataset to learning framework, such as GraphSAGE [23], can generate embed- address these concerns, called FEVER. This dataset contains 185,445 dings for unseen nodes, and is therefore a more practical algorithm claims with corresponding evidence that were manually classified for extending this pipeline. For the classification task, our machine as SUPPORTED, REFUTED, or NOTENOUGHINFO. This has been followed learning models were relatively simple, and optimizing both the up with annual workshops that encourage participants to improve parameters and architecture of the neural network would likely upon both the dataset and the claim verification task. The CLEF see an increase in the accuracy and effectiveness of this method. Fi- CheckThat! [4] series of workshops and conferences also seek to nally, a full evaluation of our method against transformer language bring researchers together to improve claim verification, along with models for relevant relation classification tasks [62] is left as future identifying and extracting checkworthy claims. work. 4.6 Other Fact-Checking Methods 5 DISCUSSION In this paper, we have presented a novel relation extraction al- Besides claim-matching approaches, there are a handful of existing gorithm and previewed its application when used to classify rela- algorithms for fact-checking, mostly based on exploiting content or tions present in online discourse and automatically fact-check them characteristics of existing knowledge bases. Embedding approaches, against the information present in a general knowledge graph. We such as TransE [5], seek to generate vector embeddings of knowl- developed a pipeline to facilitate the linkage of these two tasks. edge bases, a task which is conceptually related to our approach. Our relation classification method leverages graph representation By generating these embeddings, they can perform link-prediction learning on the shortest paths between entities in semantic de- based on structural patterns of (s, p, o) triples. In terms of a knowl- pendency trees; it was shown to be comparable to state-of-the-art edge base, this amounts to adding new facts without any needed methods based on a corpus of labeled relations (AUC = 97.6%). source material. For fact-checking, this approach can be used to This classifier was then used to reduce claims from online discourse test whether a triple extracted from a claim is a predicted link to semantic triples with an AUC of 95.8%; these were used as input in the knowledge base; the pitfall of these methods, as with all to fact-checking algorithms to predict the accuracy of the claim. embedding techniques, is they lack both interpretability and scala- We achieved an AUC of 83% on our selected claims, which is at bility. Other algorithms similarly consider paths within knowledge the least comparable to claim matching, but without the need for bases, but seek to address the interpretability problem. PRA [28], the corpus of existing claims that claim matching relies on. SFE [18], PredPath [47], and AMIE [16] all take the approach of Our relation extraction method is a promising approach to distin- mining possible pathways between two entities within a knowledge guishing relations present in large online discourse corpora; scaling base. From these mined pathways, they generate sets of features to up this algorithm could provide an outlet for modeling online dis- be used in supervised learning models for link-prediction. These course within an established ontology. Additionally, our pipeline have shown promise in their success at predicting the validity of a may serve as a proof-of-concept for future research into automated claim, however this also suffers from scalability. Knowledge bases fact-checking. While it is a challenge to model all possible relations that contain enough relevant information to be useful are very in a generalistic ontology like DBpedia, this pipeline could form the large, and path mining and feature generation becomes necessar- basis of tools for reducing the time needed to research an online ily time-consuming. There are a few rule-based [38] methods for discourse claim. fact-checking, which rely on logical constraints of a knowledge graph and are naturally explainable. General, large-scale knowl- Acknowledgements edge graphs do not have these logical constraints from which to build rules from, leaving this approach to fact-checking an open The authors would like to thank Google for making publicly avail- problem [25]. able both the GREC dataset and the Fact Check Explorer tool, and Alexios Mantzarlis for feedback on the manuscript. 4.7 Threats to Validity REFERENCES No method is perfect and our approach suffers from a number of [1] Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the limitations, which we briefly describe here. The main limitation 2016 election. Journal of economic perspectives 31, 2 (2017), 211–36. REMOD KnOD’21 Workshop, April 14, 2021, Virtual Event [2] Gabor Angeli and Christopher D. Manning. 2014. NaturalLI: Natural Logic 2939672.2939754 arXiv:1607.00653 Inference for Common Sense Reasoning. In Proceedings of the 2014 Conference [21] Andrew M Guess, Brendan Nyhan, and Jason Reifler. 2020. Exposure to untrust- on Empirical Methods in Natural Language Processing (EMNLP). Association for worthy websites in the 2016 US election. Nature human behaviour 4, 5 (2020), Computational Linguistics, Doha, Qatar, 534–545. https://doi.org/10.3115/v1/ 472–480. D14-1059 [22] R. V. Guha, Dan Brickley, and Steve Macbeth. 2016. Schema.Org: Evolution of [3] Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski. Structured Data on the Web. Commun. ACM 59, 2 (Jan. 2016), 44–51. https: 2019. Matching the Blanks: Distributional Similarity for Relation Learning. In //doi.org/10.1145/2844544 Proceedings of the 57th Annual Meeting of the Association for Computational [23] William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Linguistics. Association for Computational Linguistics, Florence, Italy, 2895–2905. Learning on Large Graphs. In Proceedings of the 31st International Conference on https://doi.org/10.18653/v1/P19-1279 Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). [4] Alberto Barron-Cedeno, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Curran Associates Inc., Red Hook, NY, USA, 1025–1035. Maram Hasanain, Reem Suwaileh, and Fatima Haouari. 2020. CheckThat! at [24] Naeemul Hassan, Gensheng Zhang, Fatma Arslan, Josue Caraballo, Damian CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Jimenez, Siddhant Gawsane, Shohedul Hasan, Minumol Joseph, Aaditya Kulkarni, Social Media. arXiv:2001.08546 [cs.CL] Anil Kumar Nayak, Vikas Sable, Chengkai Li, and Mark Tremayne. 2017. Claim [5] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Ok- buster: The firstever endtoend factchecking system. Proceedings of the VLDB sana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Endowment 10, 12 (2017), 1945–1948. https://doi.org/10.14778/3137765.3137815 Data. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, [25] Viet Phi Huynh and Paolo Papotti. 2019. A benchmark for fact checking algo- L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran rithms built on knowledge bases. In International Conference on Information and Associates, Inc., Red Hook, NY, United States, 2787–2795. Knowledge Management, Proceedings, Vol. 10. Association for Computing Machin- [6] B. Borel. 2016. The Chicago Guide to Fact-Checking. University of Chicago Press, ery, New York, NY, USA, 689–698. https://doi.org/10.1145/3357384.3358036 Chicago, IL, USA. [26] Krutika Kale. 2018. No, Tej Pratap Yadav Did Not Receive A Doctorate From [7] Alexandre Bovet and Hernán A. Makse. 2019. Influence of fake news in Twitter Takshashila University. Online at https://www.boomlive.in/no-lalus-son-tej- during the 2016 US presidential election. Nature Communications 10, 1 (Jan. 2019), pratap-did-not-receive-a-doctorate-from-takshashila-university/. Last accessed 7. https://doi.org/10.1038/s41467-018-07761-2 2021-02-21. [8] Lorenz Bühmann and Jens Lehmann. 2013. Pattern Based Knowledge Base [27] Sunny Lai, Kwong Sak Leung, and Yee Leung. 2018. SUNNYNLP at SemEval-2018 Enrichment. In The Semantic Web – ISWC 2013, Harith Alani, Lalana Kagal, Achille Task 10: A Support-Vector-Machine-Based Method for Detecting Semantic Differ- Fokoue, Paul Groth, Chris Biemann, Josiane Xavier Parreira, Lora Aroyo, Natasha ence using Taxonomy and Word Embedding Features. In Proceedings of The 12th Noy, Chris Welty, and Krzysztof Janowicz (Eds.). Springer Berlin Heidelberg, International Workshop on Semantic Evaluation. Association for Computational Berlin, Heidelberg, 33–48. Linguistics, New Orleans, Louisiana, 741–746. https://doi.org/10.18653/v1/S18- [9] Razvan C. Bunescu and Raymond J. Mooney. 2005. A Shortest Path Depen- 1118 dency Kernel for Relation Extraction. In Proceedings of the Conference on Human [28] Ni Lao and William W. Cohen. 2010. Relational retrieval using a combination of Language Technology and Empirical Methods in Natural Language Processing path-constrained random walks. Machine Learning 81, 1 (2010), 53–67. https: (Vancouver, British Columbia, Canada) (HLT ’05). Association for Computational //doi.org/10.1007/s10994-010-5205-8 Linguistics, USA, 724–731. https://doi.org/10.3115/1220575.1220666 [29] Ni Lao, Tom Mitchell, and William W. Cohen. 2011. Random Walk Inference [10] Giovanni Luca Ciampaglia. 2018. Fighting fake news: a role for computational and Learning in A Large Scale Knowledge Base. In Proceedings of the 2011 Con- social science in the fight against digital misinformation. Journal of Computational ference on Empirical Methods in Natural Language Processing. Association for Social Science 1, 1 (29 Jan. 2018), 147–153. https://doi.org/10.1007/s42001-017- Computational Linguistics, Edinburgh, Scotland, UK., 529–539. 0005-6 [30] David MJ Lazer, Matthew A Baum, Yochai Benkler, Adam J Berinsky, Kelly M [11] Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M. Rocha, Johan Bollen, Greenhill, Filippo Menczer, Miriam J Metzger, Brendan Nyhan, Gordon Penny- Filippo Menczer, and Alessandro Flammini. 2015. Computational Fact Checking cook, David Rothschild, et al. 2018. The science of fake news. Science 359, 6380 from Knowledge Networks. PLOS ONE 10, 6 (06 2015), 1–13. https://doi.org/10. (2018), 1094–1096. 1371/journal.pone.0128193 [31] Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and [12] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Mur- Documents. In Proceedings of the 31st International Conference on International phy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: Conference on Machine Learning - Volume 32 (ICML’14). JMLR.org, Beijing, China, A web-scale approach to probabilistic knowledge fusion. In Proceedings of the II–1188–II–1196. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [32] Stephan Lewandowsky, Ullrich K. H. Ecker, Colleen M. Seifert, Norbert Schwarz, Association for Computing Machinery, New York, New York, USA, 601–610. and John Cook. 2012. Misinformation and Its Correction: Continued Influence https://doi.org/10.1145/2623330.2623623 and Successful Debiasing. Psychological Science in the Public Interest 13, 3 (2012), [13] Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, 106–131. https://doi.org/10.1177/1529100612451018 Camillo Lugaresi, Shaohua Sun, and Wei Zhang. 2015. Knowledge-Based Trust: [33] David Liben-Nowell and Jon Kleinberg. 2007. The Link-Prediction Problem for Estimating the Trustworthiness of Web Sources. Proc. VLDB Endow. 8, 9 (May Social Networks. Journal of the American society for Information Science and 2015), 938–949. https://doi.org/10.14778/2777598.2777603 Technology 58, 7 (2007), 1019–1031. [14] C. Fellbaum and G.A. Miller. 1998. WordNet: An Electronic Lexical Database. MIT [34] Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Rui Fang, and Sameena Shah. Press, Cambridge, MA, USA. 2015. Real-Time Rumor Debunking on Twitter. In Proceedings of the 24th ACM [15] Kim Fridkin, Patrick J. Kenney, and Amanda Wintersieck. 2015. Liar, Liar, International on Conference on Information and Knowledge Management (Mel- Pants on Fire: How Fact-Checking Influences Citizens’ Reactions to Nega- bourne, Australia) (CIKM ’15). Association for Computing Machinery, New York, tive Advertising. Political Communication 32, 1 (Jan 2015), 127–151. https: NY, USA, 1867–1870. https://doi.org/10.1145/2806416.2806651 //doi.org/10.1080/10584609.2014.914613 [35] T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, B. Yang, J. Betteridge, A. Carlson, [16] Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek. B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, 2013. AMIE: association rule mining under incomplete evidence in ontological N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, knowledge bases. In Proceedings of the 22nd international conference on World A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling. 2018. Never-Ending Wide Web - WWW ’13. ACM Press, New York, New York, USA, 413–422. https: Learning. Commun. ACM 61, 5 (April 2018), 103–115. https://doi.org/10.1145/ //doi.org/10.1145/2488388.2488425 3191513 [17] Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero, Andrea Giovanni [36] Brendan Nyhan and Jason Reifler. 2015. The Effect of Fact-Checking on Elites: A Nuzzolese, Francesco Draicchio, and Misael Mongiovì. 2017. Semantic Web Field Experiment on U.S. State Legislators. American Journal of Political Science Machine Reading with FRED. Semantic Web 8, 6 (2017), 873–893. https://doi. 59, 3 (2015), 628–640. https://doi.org/10.1111/ajps.12162 org/10.3233/SW-160240 [37] Dave Orr. 2013. 50,000 Lessons on How to Read: a Relation Extraction Cor- [18] Matt Gardner and Tom Mitchell. 2015. Efficient and Expressive Knowledge pus. https://ai.googleblog.com/2013/04/50000-lessons-on-how-to-read-relation. Base Completion Using Subgraph Feature Extraction. In Proceedings of the 2015 html Conference on Empirical Methods in Natural Language Processing. Association [38] Stefano Ortona, Venkata Vamsikrishna Meduri, and Paolo Papotti. 2018. Robust for Computational Linguistics, Lisbon, Portugal, 1488–1498. https://doi.org/10. discovery of positive and negative rules in knowledge bases. In 2018 IEEE 34th 18653/v1/D15-1173 International Conference on Data Engineering (ICDE) (Paris, France). IEEE, IEEE, [19] Nir Grinberg, Kenneth Joseph, Lisa Friedland, Briony Swire-Thompson, and Piscataway, NJ, USA, 1168–1179. David Lazer. 2019. Fake news on Twitter during the 2016 US presidential election. [39] Heiko Paulheim and Simone Paolo Ponzetto. 2013. Extending DBpedia with Science 363, 6425 (2019), 374–378. Wikipedia List Pages. In Proceedings of the 2013th International Conference on NLP [20] Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable feature learning for & DBpedia - Volume 1064 (Sydney, Australia) (NLP-DBPEDIA’13). CEUR-WS.org, networks. Proceedings of the ACM SIGKDD International Conference on Knowledge Aachen, DEU, 85–90. Discovery and Data Mining 13-17-Augu (2016), 855–864. https://doi.org/10.1145/ KnOD’21 Workshop, April 14, 2021, Virtual Event Matthew Sumpter and Giovanni Luca Ciampaglia [40] Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. 2017. Meeting of the Association for Computational Linguistics (Volume 1: Long Pa- Truth of Varying Shades: Analyzing Language in Fake News and Political Fact- pers). Association for Computational Linguistics, Berlin, Germany, 1298–1307. Checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural https://doi.org/10.18653/v1/P16-1123 Language Processing. Association for Computational Linguistics, Copenhagen, [59] Claire Wardle and Hossein Derakhshan. 2017. Information disorder: Toward an Denmark, 2931–2937. https://doi.org/10.18653/v1/D17-1317 interdisciplinary framework for research and policy making. Technical Report. [41] Bradley L. Richards and Raymond J. Mooney. 1992. Learning Relations by Council of Europe Report. Pathfinding. In Proceedings of the Tenth National Conference on Artificial Intelli- [60] Jen Weedon, William Nuland, and Alex Stamos. 2017. Information Operations gence (San Jose, California) (AAAI’92). AAAI Press, Palo Alto, CA, USA, 50–55. and Facebook. Technical Report. Facebook, Inc. [42] Victoria L. Rubin, Yimin Chen, and Nadia K. Conroy. 2015. Deception detection for [61] Gerhard Weikum and Martin Theobald. 2010. From Information to Knowledge: news: Three types of fakes. Proceedings of the Association for Information Science Harvesting Entities and Relationships from Web Sources. In Proceedings of the and Technology 52, 1 (2015), 1–4. https://doi.org/10.1002/pra2.2015.145052010083 Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Data- [43] Victoria L. Rubin and Tatiana Vashchilko. 2012. Identification of Truth and base Systems (Indianapolis, Indiana, USA) (PODS ’10). Association for Computing Deception in Text: Application of Vector Space Model to Rhetorical Structure Machinery, New York, NY, USA, 65–76. https://doi.org/10.1145/1807085.1807097 Theory. In Proceedings of the Workshop on Computational Approaches to Deception [62] Shanchan Wu and Yifan He. 2019. Enriching pre-trained language model with Detection. Association for Computational Linguistics, Avignon, France, 97–106. entity information for relation classification. In Proceedings of the 28th ACM [44] schema.org. 2020. ClaimReview schema. https://schema.org/ClaimReview International Conference on Information and Knowledge Management. 2361–2364. [45] Chengcheng Shao, Giovanni Luca Ciampaglia, Alessandro Flammini, and Fil- [63] You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. Toward ippo Menczer. 2016. Hoaxy: A Platform for Tracking Online Misinformation. Computational Fact-Checking. Proc. VLDB Endow. 7, 7 (March 2014), 589–600. In Proceedings of the 25th International Conference Companion on World Wide https://doi.org/10.14778/2732286.2732295 Web (Montréal, Québec, Canada) (WWW ’16 Companion). International World [64] Minguang Xiao and Cong Liu. 2016. Semantic Relation Classification via Hierar- Wide Web Conferences Steering Committee, Republic and Canton of Geneva, chical Recurrent Neural Network with Attention. In Proceedings of COLING 2016, Switzerland, 745–750. https://doi.org/10.1145/2872518.2890098 the 26th International Conference on Computational Linguistics: Technical Papers. [46] Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kai-Cheng Yang, The COLING 2016 Organizing Committee, Osaka, Japan, 1254–1263. Alessandro Flammini, and Filippo Menczer. 2018. The spread of low-credibility [65] Peng Xu and Denilson Barbosa. 2019. Connecting Language and Knowledge with content by social bots. Nature communications 9, 1 (2018), 1–9. Heterogeneous Representations for Neural Relation Extraction. In Proceedings [47] Baoxu Shi and Tim Weninger. 2016. Discriminative predicate path mining for of the 2019 Conference of the North American Chapter of the Association for Com- fact checking in knowledge graphs. Knowledge-Based Systems 104 (Jul 2016), putational Linguistics: Human Language Technologies, Volume 1 (Long and Short 123–133. https://doi.org/10.1016/j.knosys.2016.04.015 arXiv:1510.05911 Papers). Association for Computational Linguistics, Minneapolis, Minnesota, [48] Prashant Shiralkar, Alessandro Flammini, Filippo Menczer, and Giovanni Luca 3201–3206. https://doi.org/10.18653/v1/N19-1323 Ciampaglia. 2017. Finding Streams in Knowledge Graphs to Support Fact Check- [66] Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, and Yoshiyasu Takefuji. 2016. ing. In 2017 IEEE International Conference on Data Mining (ICDM) (New Orleans, Joint Learning of the Embedding of Words and Entities for Named Entity Dis- Louisiana, USA). IEEE, Piscataway, NJ, 859–864. https://doi.org/10.1109/ICDM. ambiguation. In Proceedings of The 20th SIGNLL Conference on Computational 2017.105 arXiv:1708.07239 [cs.AI] Extended Version. Natural Language Learning. Association for Computational Linguistics, 209 N. [49] Kai Shu, Deepak Mahudeswaran, Suhang Wang, and Huan Liu. 2020. Hierarchical Eighth Street, Stroudsburg PA 18360, USA, 250–259. propagation networks for fake news detection: Investigation and exploitation. In [67] Ran Yu, Ujwal Gadiraju, Besnik Fetahu, Oliver Lehmberg, Dominique Ritze, and Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. Stefan DIetze. 2018. KnowMore - Knowledge base augmentation with structured AAAI, Palo Alto, CA, USA, 626–637. web markup. , 159–180 pages. https://doi.org/10.3233/SW-180304 [50] Kai Shu, Suhang Wang, and Huan Liu. 2018. Understanding User Profiles on [68] Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Social Media for Fake News Detection. In IEEE 1st Conference on Multimedia Manning. 2017. Position-aware Attention and Supervised Data Improve Slot Information Processing and Retrieval (MIPR 2018). IEEE, Piscataway, NJ, USA, Filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural 430–435. https://doi.org/10.1109/MIPR.2018.00092 Language Processing. Association for Computational Linguistics, Copenhagen, [51] Craig Silverman (Ed.). 2014. Verification Handbook. European Journalism Center, Denmark, 35–45. https://doi.org/10.18653/v1/D17-1004 Maastricht, the Netherlands. [69] Zhao, Zilong, Zhao, Jichang, Sano, Yukie, Levy, Orr, Takayasu, Hideki, Takayasu, [52] Sameer Singh, Sebastian Riedel, Brian Martin, Jiaping Zheng, and Andrew McCal- Misako, Li, Daqing, Wu, Junjie, and Havlin, Shlomo. 2020. Fake news propagates lum. 2013. Joint Inference of Entities, Relations, and Coreference. In Proceedings differently from real news even at early stages of spreading. EPJ Data Sci. 9, 1 of the 2013 Workshop on Automated Knowledge Base Construction (San Francisco, (2020), 7. https://doi.org/10.1140/epjds/s13688-020-00224-z California, USA) (AKBC ’13). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/2509558.2509559 [53] Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning With Neural Tensor Networks for Knowledge Base Com- pletion. In Advances in Neural Information Processing Systems, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), Vol. 26. Curran Associates, Inc., 57 Morehouse Lane, Red Hook, NY, United States, 926–934. https://proceedings.neurips.cc/paper/2013/file/ b337e84de8752b27eda3a12363109e80-Paper.pdf [54] James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: a Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, Stroudsburg, PA, USA, 809– 819. https://doi.org/10.18653/v1/N18-1074 [55] Sebastian Tschiatschek, Adish Singla, Manuel Gomez Rodriguez, Arpit Merchant, and Andreas Krause. 2018. Fake News Detection in Social Networks via Crowd Signals. In Companion Proceedings of the The Web Conference 2018 (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 517–524. https://doi.org/10.1145/3184558. 3188722 [56] Andreas Vlachos and Sebastian Riedel. 2014. Fact Checking: Task definition and dataset construction. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science. Association for Computational Linguistics, Baltimore, MD, USA, 18–22. https://doi.org/10.3115/v1/W14-2508 [57] Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online. Science 359, 6380 (2018), 1146–1151. https://doi.org/10.1126/science. aap9559 [58] Linlin Wang, Zhu Cao, Gerard de Melo, and Zhiyuan Liu. 2016. Relation Clas- sification via Multi-Level Attention CNNs. In Proceedings of the 54th Annual REMOD KnOD’21 Workshop, April 14, 2021, Virtual Event A SELECTED CLAIMREVIEW CLAIMS Table 7: Selected ClaimReview claims, the relation they con- tain, and the relation predicted by the model. The text bold indicates the entities participating in the relation. The AUC of the relation classification task is 0.958. ID Claim Actual* Predicted* Rating Claim ≡ Triple 1 Malaysian-born Senator Penny Wong ineligible for Australian parliament POB DOB False 2 Donald Trump says President Obama’s grandmother in Kenya said he was born in Kenya and she POB Institution False ✓ was there and witnessed the birth. 3 Donald Trump says his father, Fred Trump, was born in a very wonderful place in Germany. POB POB False ✓ 4 Barack Obama was born in the United States. POB POB True ✓ 5 Barron Trump was born in March 2006 and Melania wasn’t a legal citizen until July 2006. So under DOB POB False this executive order, his own son wouldn’t be an American citizen. 6 Isabelle Duterte was born on January 26, 2002, which makes her only 15 years old today. DOB DOB False 7 Tej Pratap Yadav receives a doctorate degree from Takshsila University in Bihar education education False ✓ 8 Smriti Irani has a MA degree. education institution False ✓ 9 Melania Trump lied under oath in 2013 about graduating from college with a bachelor’s degree in education institution False architecture. 10 Did Michelle Obama recently earn a doctorate degree in law? education education False ✓ 11 Pravin Gordhan does not have a degree. education education False ✓ 12 Alexandria Ocasio-Cortez’s economics degree recalled. education institution False ✓ 13 Ilocos Norte Governor Imee Marcos claimed on January 16 that she earned a degree from Princeton education education False University. 14 Ilocos Norte Governor Imee Marcos claimed on January 16 that she earned a degree from Princeton institution institution False ✓ University. 15 Tej Pratap Yadav receives a doctorate degree from Takshsila University in Bihar. institution education False 16 Patrick Murphy embellished, according to reports, his University of Miami academic achievement. institution institution True 17 Mahmoud Abbas, Ali Khamenei, and Vladimir Putin met each other in the class of 1968 at Patrice institution institution False Lumumba University in Moscow 18 Mahmoud Abbas, Ali Khamenei, and Vladimir Putin met each other in the class of 1968 at Patrice institution institution False Lumumba University in Moscow 19 Mahmoud Abbas, Ali Khamenei, and Vladimir Putin met each other in the class of 1968 at Patrice institution institution False Lumumba University in Moscow 20 Maria Butina is a human rights activist, a student of the American University, and the most institution institution False relevant is that she is a person who did not work (collaborate) with the Russian state bodies. 21 Ilocos Norte Governor Imee Marcos graduated cum laude from the University of the Philippines institution institution False (UP) College of Law. 22 David Hogg graduated from Redondo Shores High School in 2015. institution institution False ✓ 23 Sadhvi Pragya Singh Thakur said Manohar Parrikar died of cancer because he allowed the con- POD POD False sumption of beef in Goa. 24 Fox star Tucker Carlson in critical condition (then died) after head on collision driving home in POD POD False ✓ Washington D.C. 25 Nasser Al Kharafi died in Kuwait. POD POD False ✓ 26 DCP Amit Sharma passed away in Delhi riots POD institution False ✓ 27 It is being claimed that Jason Statham was murdered at his home in New York by assailants who POD POD False broke into his mansion. 28 Actor Robert Downey Jr. died in a car crash stunt in Hollywood on July 8. POD POD False * DOB = Date of Birth, POB = Place of Birth, POD = Place of Death