=Paper=
{{Paper
|id=Vol-2877/paper2
|storemode=property
|title=REMOD: Relation Extraction for Modeling Online Discourse
|pdfUrl=https://ceur-ws.org/Vol-2877/paper2.pdf
|volume=Vol-2877
|authors=Matthew Sumpter,Giovanni Luca Ciampaglia
|dblpUrl=https://dblp.org/rec/conf/www/SumpterC21
}}
==REMOD: Relation Extraction for Modeling Online Discourse==
<pdf width="1500px">https://ceur-ws.org/Vol-2877/paper2.pdf</pdf>
<pre>
    REMOD: Relation Extraction for Modeling Online Discourse
                         Matthew Sumpter                                                       Giovanni Luca Ciampaglia
                         mjsumpter@usf.edu                                                             glc3@mail.usf.edu
                      University of South Florida                                                  University of South Florida

ABSTRACT                                                                                               rdf:type                                vn.role:Agent
                                                                                                                         fred:receive_1
The enormous amount of discourse taking place online poses chal-                                                                                               dbpedia:Tej_Pratap_Yadav
                                                                                     vn.data:Receive_13052000                    vn.role:Theme
lenges to the functioning of a civil and informed public sphere.                                 rdfs:subClassOf                                                   SOURCE
                                                                                                                          fred:degree_1 quant:hasDeterminer
Efforts to standardize online discourse data, such as ClaimReview,
are making available a wealth of new data about potentially inac-                            dul:Event                           fred:from
                                                                                                                                                         quant:a
curate claims, reviewed by third-party fact-checkers. These data                                  rdf:type           fred:Takshsila_university

could help shed light on the nature of online discourse, the role                                                                                    fred:locatedIn
of political elites in amplifying it, and its implications for the in-                           dbpedia:Doctorate rdfs:subClassOf

tegrity of the online information ecosystem. Unfortunately, the                                  TARGET                                       dbpedia:Bihar

semi-structured nature of much of this data presents significant           rdfs:subClassOf                                                         rdf:type
challenges when it comes to modeling and reasoning about on-
                                                                                         dul:Quality               fred:Degree               schemaorg:Place
line discourse. A key challenge is relation extraction, which is the
task of determining the semantic relationships between named
entities in a claim. Here we develop a novel supervised learning           Figure 1: Schematic example of our approach. The RDF
method for relation extraction that combines graph embedding               graphlet generated by a machine-reading tool (FRED) for
techniques with path traversal on semantic dependency graphs.              the claim “Tej Pratap Yadav receives a doctorate degree from
Our approach is based on the intuitive observation that knowledge          Takshsila University in Bihar” (a known misinformation
of the entities along the path between the subject and object of a         claim [26]). The shortest undirected path between the source
triple (e.g. Washington,_D.C., and United_States_of_America)               (dbpedia:Tej_Pratap_Yadav) and target (dbpedia:Doctorate)
provides useful information that can be leveraged for extracting           is shown in red. The nodes along the path are highlighted in
its semantic relation (i.e. capitalOf). As an example of a potential       gray.
application of this technique for modeling online discourse, we
show that our method can be integrated into a pipeline to reason
about potential misinformation claims.                                     are indicative of misinformation, respectively, however they often
                                                                           fail to engage with the ideological content being shared. Online
CCS CONCEPTS                                                               discourse typically takes the form of unorganized and unstructured
• Information systems → Web mining; Semantic web description               data which is a significant limiting factor to performing content
languages; Information extraction.                                         analysis. Existing work on semantic ontologies and knowledge
                                                                           base development has proved to be a guiding method in structuring
KEYWORDS                                                                   online information. A knowledge base most commonly structures
                                                                           knowledge in the shape of semantic triples; a semantic triple is
relation extraction, semi-structured data, semantic ontology, claim
                                                                           composed of two entities (e.g. a person, place, or thing) and a pred-
matching, fact-checking
                                                                           icate relation between them. An example of a semantic triple is
                                                                           <Washington_D.C, capitalOf, United_States_of_America>.
1    INTRODUCTION                                                          This structure allows for concepts to be reduced to machine-readable
The prevalence of false and inaccurate information in its myriad           data which can be compiled into traversable (and understandable)
of forms — a persistent and dangerous societal problem — is still a        networks of information. The result is a data structure that can be
poorly understood phenomenon [1, 7, 30], especially in the context         used to provide quantitative analysis of online discourse.
of political communication [21]. Even though strong exposure to               An example of knowledge bases application in combating misin-
so-called “fake news” is limited to the segment of most active news        formation regards computational fact-checking. Fact-checking is
consumers [19], individual claims echoing the false or misleading          recognized as an antidote to misinformation [32], especially with
content shared by these audiences can spread rapidly through social        respects to claims spread by political elites. For example, Nyhan and
media [57, 69], amplified by bots [46] or other malicious actors [60],     Reifler [36] show that alerting politicians to the risk of being fact-
who often target elites, like celebrities, pundits, or politicians. From   checked leads to less inaccuracy and better ratings. Unfortunately,
there, false claims rebroadcast by these elites enjoy further dissem-      fact-checking claims at the scale of the web is a hard task. A fact-
ination, reaching even wider audiences.                                    checker must first identify claims that are worthy of being checked,
   Misinformation has become an emerging focus of computational            then they must research the claim [6, 51], and finally write, publish,
social scientists seeking to understand and combat it [10, 56]. Net-       and circulate their conclusion on the web. In general, there is a lag
work analysis and natural language processing (NLP) provide in-            of approximately 15 hours between the consumption of misinforma-
sight into the community organization and stylistic patterns that          tion and the appearance of corrections [45]. The time investment

KnOD'21 Workshop - April 14, 2021
Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
KnOD’21 Workshop, April 14, 2021, Virtual Event                                                                            Matthew Sumpter and Giovanni Luca Ciampaglia


                                                                                        2.1                          2.2.1                                                         2.2.2
required of human fact-checkers leads to an open opportunity for
the development of many automated fact-checking [11, 63] or verifi-
                                                                                                                                       RDF                            RDF
cation [34] strategies. One approach is based on identifying missing                  GREC and         Snippet        FRED                        Relational RDF
                                                                                     ClaimReview                                       Graph          Corpus         Graphs   Relational Corpus
                                                                                                                                                  (1 FRED RDF /                     Graph
relations in structured knowledge bases [11, 29, 47, 48]. This ap-                     Corpora
                                                                                                      Retrieve Subject and Object
                                                                                                                                                      Snippet)

proach takes a claim in the form of a semantic triple and checks          Relation ClaimReview Claims                               Graph corresponding to Snippet
                                                                                                                                                                  Node2Vec
                                                                          Classiﬁer
its validity against the sets of triples in the knowledge base that                                                                                                Model


                                                                                                                                                      No
                                                                                                                                     Path Traversal
connect the subject and object. When the knowledge base is viewed


                                                                                                                                                         d
                                                                                                                                      and Shortest


                                                                                                                                                        eE
                                                                                                Training/Testing                       Path Vector                               Node2Vec


                                                                                                                                                          mb
as a network, this task is equivalent to link prediction [33].                                        Data                             Generation


                                                                                                                                                             ed
                                                                                                                                                             din
    This approach has proven very promising, but its main restriction


                                                                                                                                                               gs
                                                                             2.2.5                                                      2.2.4                                      2.2.3
is that of its input. Modeling a claim using semantic triples is a                           Classify
                                                                                           ClaimReview
nontrivial task, and has limited the application of such an approach.                       Relations

It requires choosing a semantic ontology (or developing a new one)                                                    Knowledge Linker
                                                                                              2.2.6
which is able to model claims in a consistent and non-redundant
manner. Once an ontology has been established, the next step is
                                                                         Figure 2: Schematic illustration of an integrated extraction
relation extraction — the task of reducing a text into a semantic
                                                                         and verification pipeline using our relation extraction tool
triple that both captures the meaning and fits within the ontology.
                                                                         REMOD. The white components correspond to the various
This task is challenging when addressing a compound factual claim
                                                                         steps needed to perform relation extraction. Numbered la-
with many subjects and relations; this challenge is amplified when
                                                                         bels correspond to section headings in the manuscript. To
considering a claim that may contain sarcasm, opinion, humor,
                                                                         show the potential for integration with external tools, as an
or any other nuance of language that can be present in online
                                                                         additional step in the pipeline the green node shows the use
discourse.
                                                                         of an off-the-shelf fact-checking algorithm [11].
    In this paper, we present a novel relation extraction method
built upon semantic dependency trees, see Figure 1 for a schematic
example. Our approach to the problem is based on the intuition
that knowledge of the nodes and relations along the path between         more complex claims, from which one can extract multiple relevant
the subject and object of a triple (e.g. Washington,_D.C., and           relations, and therefore cannot be fact-checked directly, the fact-
United_States_of_America) provides useful information that can           checker can still identify evidence in support or against the claim
be leveraged for extracting its relation (i.e. capitalOf). This well-    with good accuracy (verification AUC = 0.773).
established phenomenon was first observed by Richards and Mooney             The rest of this paper is structured as follows: Section 2 details
[41]. Later, Bunescu and Mooney [9] used it in the context of a          the datasets used, as well as the methods used in the various steps
kernel-based approach. Here, we take advantage of recent advances        of the pipeline. Section 3 shows the results of both the relation clas-
in graph representation learning to overcome the above challenges        sification task and the fact checking tasks. Section 4 goes into detail
posed by online discourse in applying such an approach. Specif-          on relevant prior work from the literature on relation classification,
ically, we parse a large corpus of Wikipedia snippets, annotated         misinformation detection, and computational fact-checking. Finally,
with information about one of 5 relations from the DBpedia ontol-        Section 5 discusses the impact and importance of our results, as
ogy, combine the resulting dependency trees into a larger semantic       well as addresses methods that may be used to improve upon this
network, and finally use node embedding techniques to obtain a           work in the future.
high-dimensional representation of this corpus-level network. We
find that graph traversal in this learned representation provides a
                                                                         2      METHODS
strong signal to discriminate between multiple possible relations.
    This approach allowed us to effectively extract these relations in   Our relation extraction pipeline is described in Figure 2. Roughly
natural language (extraction accuracy measured as the area under         speaking, the main task of our pipeline is a supervised relation
the ROC curve, AUC = 0.976). We then tested this model’s ability         extraction task (white nodes), but since later we show how this
to generalize to a set of real-world claims (reviewed by professional    task can be integrated to perform an additional unsupervised fact-
fact-checkers and annotated using the ClaimReview [22] schema),          checking, in the figure we show also this final step (green node).
obtaining again a very good signal (extraction AUC = 0.958).             Collectively these two tasks leverage a number of different data
    As an example of a potential application of this technique, we       sources, so we start by describing the various datasets used in
show that, thanks to our method, a wider range of online discourse       building the pipeline. We then describe the various components of
samples is amenable to analysis than before. In particular, we inte-     the pipeline proper.
grate our approach into a pipeline (see Figure 2) that uses off-the-
shelf fact-checking algorithms to analyze a subset of ClaimReview-       2.1         Datasets
annotated online discourse samples. Using this pipeline, we obtain       For the main relation extraction task, we use two main corpora, both
very encouraging results on two separate tasks: First, on samples        compiled by Google: the Google Relation Extraction Corpus (GREC)
of ‘simple’ online discourse claims, which can be effectively sum-       and the Google Fact Check Explorer corpus, described below.
marized (and thus fact-checked) by extracting a single RDF triple,
we outperform a claim-matching baseline based on state-of-the-art        2.1.1 Google Relation Extraction Corpus (GREC). The dataset of re-
representation learning (verification AUC = 0.833). Second, on           lations used was the Google Relation Extraction Corpus (GREC) [37].
REMOD                                                                                                     KnOD’21 Workshop, April 14, 2021, Virtual Event


Table 1: Number of snippets per relation before and after                     Table 2: The set of WordNet synonyms used to extract rele-
filtering the GREC corpus.                                                    vant claims from the ClaimReview database

                                  Total     Retained      % Retained                               WordNet synonyms per relation
         Institution           42, 628        19, 900             46.7          Institution        attend, university, college, graduate
         Education              1, 850            806             43.6          Education          graduate, degree
         Date of Birth          2, 490         1, 010             40.6          Date of Birth      born, born on
         Place of Birth         9, 566         4, 005             41.9          Place of Birth     born, birthplace, place of birth, place of origin
         Place of Death         3, 042         1, 307             43.0          Place of Death     deceased, died, perished, passed away, expired


This dataset contains text snippets extracted from Wikipedia arti-            the same standards for accountability, transparency, and accuracy
cles that represent a subject/object relation, which can be described         used by Google News to select publishers. We collected claims from
by the following defining questions:                                          the Google Fact Check Explorer tool up until 04/2020. From this
    Institution “What educational institution did the subject at-             corpus, we produced a dataset of 49,770 ClaimReview-annotated
      tend?”                                                                  claims. Of the 20,817 English claims in the dataset, we searched for
    Education “What academic degree did the subject receive?”                 claims that contained one of the relations represented in the GREC,
    Date of Birth (DOB) “On what date was the subject born?”                  using WordNet [14] synonyms to select search terms (see Table 2).
    Place of Birth (POB) “Where was the subject born?”                        This procedure yielded a subset of 28 claims that met this criteria.
    Place of Death (POD) “Where did the subject die?”
                                                                              2.2    REMOD
   Each entry in the dataset consists of a natural language snippet of
text, the URL of the Wikipedia entry from which the text was pulled,          The main contribution of this work is REMOD (which stands for
the Freebase predicate, a Freebase ID for subject and object, and             Relation Extraction for Modeling of Online Discourse), a novel tool
the judgements of five human annotators on whether the snippet                for relation extraction that extract RDF triples from semi-structured
does or does not contain the relation (some annotators also voted             samples of online discourse. To do so, our tool leverages an anno-
to "skip", representing no decision either way). Freebase has been            tated corpus of past claims and relations. In the example pipeline
replaced with the Google Knowledge Graph since this dataset was               shown in Figure 2, the various steps of REMOD correspond to the
generated, which limited the use of this dataset in its original form.        white nodes, which we describe in more detail below. (The figure
We made a set of addenda1 to the GREC to update it to be more                 is labeled with numbers corresponding to the following section
machine-ready for current relation extraction tasks and knowledge             numbers, which elaborate on each step of the process.) To facilitate
bases. The addenda include the following for each entry: text strings         the replication of our results, the source code of REMOD is freely
for both subject and object, DBpedia URI for both subject and object,         available online at https://github.com/mjsumpter/remod.
Wikidata QID for both subject and object, a unique identifier, and            2.2.1 Semantic Parsing. Our workflow begins with natural lan-
the majority annotator vote.                                                  guage snippets. To parse these snippets we used FRED, a machine
   The snippets varied considerably in length. The distribution of            reading tool based on Discourse Representation Theory and linguis-
word lengths can be found in Figure 3. Because we relied on a third-          tic frames [17], described by the authors as “semantic middleware”.
party API to parse the snippets, to reduce potential bias due to              FRED is an NLP tool that combines frame detection, type induction,
snippet length, and to ensure only the most characteristic relations          named-entity recognition, semantic parsing, and ontology align-
were modeled, snippets were removed if they were not within ±0.5              ment, all into a single tool. The authors provide a RESTful API to
standard deviations of the mean snippet length (measured in words),           access it. When provided with a text string as input, it returns a Re-
per relation. Table 1 shows the number of snippets retained, per              source Description Framework (RDF) graphlet of the semantic parse
relation.                                                                     tree of the input. (In practice, FRED produces DAGs instead of trees
2.1.2 Google Fact Check Explorer Corpus. Researchers at Duke                  due to entity linking to external ontologies, hence our referring to
University and Google have developed an annotation standard                   them as ‘graphlets’.) An example of these RDF graphlets is shown
named ClaimReview [44] to help annotate structured fact-checks                in Figure 1 for the ClaimReview snippet of a known misinformation
on the web. It allows fact-checkers to add structured markup to               claim [26].
their fact-checks with info that identifies distinct properties of a          2.2.2 Corpus Graph Composition. In a realistic environment, many
claim (i.e. claim reviewed, the rating decision, the source, etc.).           claims of different relations will exist in the same corpus. To mimic
This semi-structured data allows fact-checks to be catalogued and             this environment, we composed a single ‘corpus’ graph, which
queried by search engines. The Google Fact Check Explorer tool2               was composed of every FRED RDF graphlet generated from the
collects all the ClaimReview fragments published by fact-checking             corpus snippets. For named entities, FRED defaults to generating
organizations that meet a set of established guidelines3 , which are          nodes for its own namespace (e.g. fred:Doctorate), then if it finds
1 https://github.com/mjsumpter/google-relation-extraction-corpus-augmented    that the same entity is present in an existing ontology, it links to
2 https://toolbox.google.com/factcheck/explorer                               that ontology (e.g. dbpedia:Doctorate). Since these equivalent
3 https://developers.google.com/search/docs/data-types/factcheck#guidelines   entities were redundant, we contracted the two nodes into a single
KnOD’21 Workshop, April 14, 2021, Virtual Event                                                                            Matthew Sumpter and Giovanni Luca Ciampaglia


                                  Institution                Education                 Date of Birth               Place of Birth                 Place of Death
Number of Snippets


                     4000
                     3000
                     2000
                     1000
                        0
                            0     100     200    300   0     100    200     300   0     100     200     300   0      100    200         300   0     100    200     300
                                Snippet Length             Snippet Length             Snippet Length              Snippet Length                  Snippet Length

Figure 3: Distribution of snippet lengths found in the GREC. The red solid line corresponds to the average snippet length (in
words) and the dashed lines to ±0.5𝜎 of the average. Snippets were kept if they were within this interval.


                                                                                              space was set to 256; the number of walks to 200; the walk length
                                                                                              to 200; and, finally, the context window to 50.
                                                                                              2.2.4 Path Traversal for Finding Relations. Our approach is inspired
                                                                                              by the well-known idea that finding paths over structured knowl-
                                                                                              edge representations can help learning new concepts [41]. More
                                                                                              recently, Bunescu and Mooney [9] confirmed the intuitive con-
                                                                                              clusion that the shortest path between entities in a dependency
                                                                                              tree captures the significant information contained between them.
                                                                                              Therefore, we sought to develop a classifier that could distinguish
                                                                                              between the shortest paths of different semantic relationships. To
                                                                                              do so, for each snippet in the corpus, the subject and object were
                                                                                              retrieved, along with the original (i.e., non-stitched) RDF graphlet
Figure 4: A visualization of how two separate RDF graphlets
                                                                                              of that specific snippet. The nodes corresponding to the subject
were stitched together along identical nodes.
                                                                                              and object were identified in the RDF graphlet. With the terminal
                                                                                              nodes identified, the shortest path in the original RDF graphlet was
                                                                                              calculated (Figure 1). Finally, we generated a final embedding by
vertex, and use the URI from the linked ontology (i.e. DBpedia in                             averaging along the path:
this example) as its new URI. The corpus graph was than created by                                                           1Õ
                                                                                                                                    𝑛
stitching together all the contracted RDF graphlets: if two graphlets                                                              𝑣®𝑖
                                                                                                                             𝑛 𝑖=1
share one or more nodes (i.e. two or more nodes have the same
URI), then we consider the union of the two graphlets, and contract                           where 𝑣 1, . . . , 𝑣𝑛 is a path and 𝑣® ∈ R𝑑 is the vector associated to
any pair of such nodes into a single node. This new node is incident                          𝑣 ∈ 𝑉 . This resulted in a final vector representing the aggregated
to the union of all incident edges in the two original graphlets. An                          sequence of nodes along the shortest path between subject and
example of this is shown in Figure 4. The resulting corpus graph                              object.
consists of 212, 976 nodes and 832, 367 edges.                                                   This process resulted in a 256-dimensional vector for each snip-
                                                                                              pet in the corpus. All results shown in the next section were ob-
2.2.3 Node Embedding. The corpus graph is effectively a combined
                                                                                              tained from these vectors. We projected the vectors into a lower-
semantic parse tree of the selected snippets from the corpus. To
                                                                                              dimensional space using t-SNE. The visualization of these vectors
better exploit this structure in machine learning tasks, we generated
                                                                                              is shown in Figure 5, where each color corresponds to a different
node embeddings using the Node2Vec algorithm [20]. Node2Vec
                                                                                              relation. The projection reveals a good separation of vectors based
generates sets of random walks for each node, which are then
                                                                                              on the relation they represent.
substituted in place of natural language sentences as input into the
Word2Vec model. There are two important parameters which will                                 2.2.5 Relation Classification. We trained several classification mod-
influence the nature of the embeddings: the return parameter 𝑝 and                            els on the resulting set of shortest path vectors. The selected clas-
the in–out parameter 𝑞. For 𝑝 > 1 there is a higher likelihood of                             sifiers were Logistic Regression, 𝑘-NN, SVM, Random Forest, De-
returning to a visited node in the random walks, whereas for 𝑞 > 1                            cision Tree, and a Wide Neural Net. Samples that were rated by
there is an increased likelihood of exploring unvisited nodes. We                             the annotators to not contain a specified relation were removed,
performed a grid search of 𝑝 and 𝑞 parameters (see Section 3.2), and                          and then the dataset was balanced to the lowest frequency class
determined the best choice for these parameters to be 𝑝 = 2 and                               (Education, 𝑁 = 598 samples). Readers will note this is a decrease
𝑞 = 3; this configuration captures what the authors of Node2Vec call                          from the 806 reported in Table 1; FRED was not always accurate
the ‘global’ topological structure of the graph. The other parameters                         at identifying entities and occasionally returned corrupted RDF
of Node2Vec were chosen as follows: the dimension of the vector                               graphs, resulting in a small loss of data. To effectively compare
REMOD                                                                                                      KnOD’21 Workshop, April 14, 2021, Virtual Event


                                                                             Table 3: AUC of Wide DNN on the relation classification task
                                                         Institution         using different types of graph to represent the corpus graph.
                                                         Education
                                                                                                AUC      Unweighted          Weighted
                                                         Date of Birth
                                                                                          Undirected               0.976           0.964
                                                         Place of Birth                     Directed               0.966           0.967

                                                         Place of Death
                                                                             remaining claims were passed as input to three fact-checking al-
                                                         None of the Above   gorithms: Knowledge Stream, Knowledge Linker, and Relational
                                                                             Knowledge Linker [48].
                                                                                 As a baseline, we trained a Doc2Vec model [31] on the entirety
                                                                             of the ClaimReview corpus, and used this model to fact-check state-
Figure 5: The shortest path vectors of GREC relations pro-                   ments by matching them with other similar claims. In particular,
jected into 2D using t-SNE. Each color represents a different                given an input claim, to produce a truth score with the baseline
semantic relation, with a sixth color to mark snippets for                   model we ranked all claims in the ClaimReview corpus by their
which a majority of annotators voted ‘No (relation)’.                        similarity and averaged the truth scores of the top 𝑘 most simi-
                                                                             lar matching claims. We removed fact-checking organizations that
                                                                             used scaleless fact-check verdicts (i.e. factcheck.org); for those that
different classifiers, training was done using a 64%/16%/20% train-          had scales, we assigned truth scores to every claim, setting "False"
ing/validation/testing split. This resulted in a final training dataset      to a baseline of 0, unless a scale explicitly stated a different baseline
of 1, 913 samples (5 classes, 𝑁 ≈ 382 samples/class), with a val-            (i.e. PolitiFact ranks "Pants on Fire" lower than "False").
idation set of 479 samples, and an additional 598 samples held
for testing. The 28 selected ClaimReview claims were held as an              3 RESULTS
additional test set, which is elaborated on in Section 2.2.6.
                                                                             3.1 Graph Representation
2.2.6 Fact-Checking. To demonstrate the usefulness of our method,            The corpus graph is composed of dependency trees, and so the cor-
we show that REMOD can be integrated as the first step of a fact-            pus graph is naturally a directed graph; edges are also all weighted
checking pipeline using existing, off-the-shelf tools to verify online       equally. This design has a strong influence on path traversal, since
discourse claims annotated using the ClaimReview standard. To                directed edges reduce the number of available paths and the cost
perform fact-checking, we rely on the work of Shiralkar et al. [48],         of taking an edge (or its absence) influences the choice of one path
who provide open-source implementations of several fact-checking             over another. For completeness, we considered all four combina-
algorithms4 . These algorithms can be used to assess the truthfulness        tions of taking either a directed or undirected graph, and of having
of a statement, but of course any tool that takes RDF triple in              edge weights or not. Let 𝑣𝑖 , 𝑣 𝑗 ∈ 𝑉 represent two nodes in the de-
input could be used as well. To extract relations from ClaimReview           pendency graph that are incident on the same edge. The weight
snippets, we used the deep neural network classifier, which was the          𝑤𝑖 𝑗 between them is the angular distance between the respective
most successful classifier from the prior step, and feed the extracted       node embeddings:
triples into the fact-checker.                                                                                    
                                                                                                                      𝑣®𝑖 · 𝑣®𝑗
                                                                                                                                   
                                                                                                        1
   Of course, when integrating two distinct tools one has to make                               𝑤𝑖 𝑗 = arccos
                                                                                                        𝜋           ∥®
                                                                                                                     𝑣𝑖 ∥ · ∥®𝑣𝑗 ∥
sure that any error originating in the first tool does not affect
the performance of the second tool. Therefore, to avoid cascading            Where 𝑣® is the vector associated to 𝑣 ∈ 𝑉 .
errors we removed some claims from our dataset. We removed two                   Table 3 shows that the undirected, unweighted graph yields the
types of errors. First, we removed any claim where the relation              best classification results, which prompts two observations. The
was misclassified, to avoid feeding inaccurate inputs into the fact-         first is that directed edges reduce the number of available pathways
checker. Second, FRED is not always able to link both the subject            to connect two nodes. Second, and perhaps a bit surprisingly, we
and object entities to DBpedia, which is a requirement for using             observe that the unweighted network performs better than the
the fact-checking algorithms of Shiralkar et al. [48]. Thus we also          weighted one. Because node embeddings were the same in the two
removed claims that did not have both the subject or object linked           variants, the final feature vector used for relation classification
to the DBpedia ontology. Of the original 28 claims, this filtering           would be different only if a different shortest path was found. This
resulted in 13 remaining ClaimReview claims used in our evaluation.          could be possible if edges that are more relevant to discriminating
   Additionally, we also manually checked whether the overall                the relation were assigned large weights, compared to other, less
claim reduces to the extracted triple (in the sense that verifying the       relevant edges.
triple also verifies the overall claim). This distinction is important
since it allows us to gauge the ability of our system to check entire        3.2    Classification for Relation Extraction
claims automatically, in a purely end-to-end fashion. Finally, these         The results of the relation classification task are shown in Table 4.
                                                                             The outcome of these various tests reveal that the node embeddings
4 https://github.com/shiralkarprashant/knowledgestream                       do contain information regarding the semantic nature of the GREC
KnOD’21 Workshop, April 14, 2021, Virtual Event                                                         Matthew Sumpter and Giovanni Luca Ciampaglia


Table 4: Results of the relation classification task using dif-            Table 5: The performance of the fact-checking algorithms
ferent ML models, on an unweighted, undirected corpus                      on predicting the validity of the relations.
graph, as compared to training with Word2Vec embeddings.
                                                                                          Method                             AUC
                                  Precision       Recall    F1    AUC
                                                                                          Knowledge Linker                   0.636
   Decision Tree                        0.64       0.64    0.64   0.773                   Relational Knowledge Linker        0.773
   Random Forest                        0.81       0.67    0.61   0.793                   Knowledge Stream                   0.773
   𝑘-NN                                 0.78       0.74    0.74   0.841
   SVM                                  0.81       0.77    0.77   0.855
   Log. Regr.                           0.80       0.71    0.71   0.827    Table 7 lists this information under the column “Claim ≡ Triple”,
   Wide DNN                             0.85       0.85    0.85   0.976    which is true (indicated by a checkmark) when the extracted rela-
   Word2Vec+Log. Regr.                   0.66       0.47   0.44   0.658    tion summarizes the whole claim (e.g. claim #3). This distinction is
   Word2Vec+Wide DNN                     0.61       0.63   0.61   0.883    important: as mentioned before, although our relation extraction
                                                                           pipeline is capable of predicting a relation for all the entries in
                                                                           Table 7, not all triples that are correctly predicted can be fed to
                                                                           the fact-checking algorithms, due to incomplete entity linking. For
relations, however they are not neatly separable by decision planes.
                                                                           the task of identifying supporting evidence, we find a total of 13
It is notable that models we tested are often more successful in
                                                                           ClaimReview claims that are amenable to fact-checking. For the
precision than in recall. This suggests that the more complex model,
                                                                           task of checking an entire claim, this number is further reduced to
such as a deep neural network (DNN), is necessary to identify the
                                                                           7 claims.
less characteristic samples of a relation. To improve these results,
we performed a grid search on the Node2Vec 𝑝 and 𝑞 parameters              3.4.1 Fact Verification. Table 5 shows the results of verifying in-
(with values of 0.25, 0.5, 1, 2, 3, and 4). The best overall results       dividual pieces of evidence in support or against any of the 13
were a product of a ‘global’ configuration, using 𝑝 = 2 and 𝑞 = 3,         ClaimReview claims identified by REMOD, using any of the three
which achieved an AUC of 0.976 on the test set. To evaluate our            algorithms for fact-checking RDF triples. Relational Knowledge
method, as a baseline we generated 300-dimension vectors for each          Linker and Knowledge Stream were the best performers. Note that
snippet from a Word2Vec model, pre-trained on Wikipedia [66].              since our baseline is intended to emulate a true fact-checking task,
This is the same source of the GREC corpus, which provided training        in this case we do not run the baseline since the similarity is based
data for model. These embeddings were then used as features to             on the whole claim, and thus would not be a meaningful compar-
train a DNN and a logistic regression model for relation extraction.       ison with our method, which focuses only on a specific relation
REMOD shows a marked improvement in both instances, indicating             within a larger claim.
an effective approach to relation extraction.
                                                                           3.4.2 Fact-Checking. We test here the subset of claims for which
                                                                           checking the triple is equivalent to checking the entire claim. In
3.3     Extraction of ClaimReview Relations
                                                                           this case, REMOD yields 7 claims that can be used as inputs to
Table 7 in the appendix shows the claims selected from the Claim-          the fact-checking algorithms. Table 6 shows the results of our 7
Review corpus, in addition to the relation they contain ("Actual"),        ClaimReview claims, on the three fact-checking algorithms, along
the relation predicted by REMOD ("Predicted"), the truth rating as         with the baseline. Here, the baseline emulates fact-checking by
determined by a fact-checker ("Rating"), and whether verifying the         claim matching.
relation is equivalent to verifying the claim ("Claim ≡ Triple"). The         Since we are using claim-matching to perform fact-checking, we
AUC of the predicted relations is 0.958. Inspecting the misclassi-         consider three different scenarios to make the task more realistic.
fied samples, we see that REMOD made mistakes between similar              In particular, we match the claim against three different corpora
relations (e.g. place of birth and date of birth), which often occur in    by higher degree of realism: 1) the full ClaimReview corpus (‘All’),
similar sentences.                                                         2) all ClaimReview entries by PolitiFact only (‘PolitiFact’), and 3)
                                                                           all ClaimReview entries from the same fact-checker of the claim
3.4     Fact-Checking                                                      of interest (‘Same’). The first case (‘All’) is meant to give an upper
We next test the integration with fact-checking algorithms. In par-        bound on the performance of claim matching but is not realistic,
ticular, we use the fact-checker for two similar, but conceptually         since it makes use of knowledge of the truth score of potentially
distinct tasks: 1) fact-checking an entire claim (fact-checking), and 2)   future claims, as well as of ratings for the same claim but by different
identifying evidence in support or against a claim (fact verification).    fact-checkers. The second case (‘PolitiFact’) partially addresses this
For example, for claim #1 (see Table 7), Penny Wong was indeed             second unrealistic assumption by using only claims from a single
born in Malaysia, even though the assertion that she is ineligible         source. Thus, it does not have access to truth scores by different
for being elected into the Australian parliament is false. Thus, in        organizations for the same claim, but it does still have access to
this case the extracted triple is only additional evidence, but is not     future information. Both 1) and 2) can be thus regarded as gold
able in itself to capture the entire claim. We manually fact-checked       standard measures of performance. The last one (‘Same’) is the
all the extracted relations, and compared their truth rating with          more realistic one, since it emulates the scenario of a fact-checker
the one provided by the human fact-checker for the whole claim.            who may check a claim for the first time, and who thus cannot
REMOD                                                                                                  KnOD’21 Workshop, April 14, 2021, Virtual Event


Table 6: Results of the fact-checking algorithms. (CM =                   This task takes one of two approaches; the first infers new relations
Claim Matching; KL = Knowledge Linker; Rel. KL = Rela-                    from existing triples in a knowledge base [8, 53] — this is essentially
tional Knowledge Linker; KS = Knowledge Stream.)                          a link-prediction task that builds upon patterns found between en-
                                                                          tities in knowledge bases. The second approach mines data found
                           𝑘=1      𝑘=3      𝑘=5      𝑘 = 10              on the web for knowledge discovery [12, 67]. This approach relies
                                                                          on redundant relations found among the selected source materials,
        CM (All)            0.417   0.625    0.500     0.625
                                                                          which may be as restrictive as Wikipedia articles [39] or as exten-
        CM (PolitiFact)     0.666   0.625    0.833     0.750
                                                                          sive as the entire web [12]. Due to the potential for error based
        CM (Same)           0.500   0.583      0.25      0.25
                                                                          on the sources, Dong et al. [13] developed a Knowledge-Based
        KL                               0.500                            Trust (KBT) score for measuring the trustworthiness of selected
        Rel. KL                          0.833                            sources. Yu et al. [67] expand upon this by combining KBT scores
        KS                               0.833                            with other entity/relation-based features to assign a unique score
                                                                          to each individual triple.

have access to claims fact-checked afterwards nor by ratings of the
same claim by different fact-checkers. In all three cases, the claim      4.3    Detecting Information Disorder
being matched was removed from the corpus, to prevent trivially           Information disorder is a catch-all term for the many kinds of
perfect predictions. Relational Knowledge Linker and Knowledge            unreliable information that one may encounter online or in the
Stream are still the best performing of the fact-checking algorithms      real-world [59], which includes disinformation, misinformation,
and manages to reach, if not exceed, the performance of the gold          fake news, rumor, spam, etc. Information disorder can also take
standard (Claim Matching–All, or –PolitiFact).                            on several modalities, including text, video, and images. The many
                                                                          varieties of information disorder make it challenging to develop any
4 RELATED WORK                                                            one approach for detection. This leads to a multi-model approach
                                                                          to detection based on three main modalities: the content of the
4.1 Relation Extraction and Classification
                                                                          information, the users who shared it, and the patterns of informa-
Relation extraction and classification is the task of extracting se-      tion dissemination on a network. Often bad content is generated
mantic relationships between two entities in natural language text        by bots; this suggests that features captured from user profiles can
and matching them to semantically equivalent or similar relations.        be useful for distinguishing bots from humans [50]. Content detec-
This task is at the core of information extraction and knowledge          tion is dependent on the medium; lexical features, sentiment, and
base construction, as it effectively reduces statements to their core     readability metrics are used for text, while neural visual features
meaning; this is typically modeled as a semantic triple, (s,p,o), where   are extracted from other content [40, 42, 43]. Network detection
two entities (s and o) are connected with a predicate, p. There are       methods model social media networks as propagation networks,
several distinct nuances and open challenges to effective relation        measuring the flow of information [49]. There has also been promis-
extraction. Identifying attributes that discriminate between two          ing work into crowd-sourcing the task by allowing users to flag
objects provides a descriptive explanation to supplement word em-         questionable content [55]. This task, while likely to remain imper-
beddings (i.e. lime is separated from lemon by the attribute ‘green’),    fect, provides the important supplement of human supervision to
and is currently most successful with SVM classifiers [27]. Multi-        all of the aforementioned tasks.
way classification attempts to distinguish the direction of one-way
relations (the sonOf relation is not bidirectional between two peo-
ple), and has seen similar levels of success from solutions built with    4.4    ClaimBuster
language models [3], convolutional neural networks [58], and recur-       Hassan et al. [24] released the first-ever end-to-end fact-checking
rent neural networks [64]. Distantly supervised relation extraction       system in 2017, called ClaimBuster. ClaimBuster is composed of
is a two-way approach whereby semantic triples are generated              several distinct components that work in sequence to accomplish
from natural language by aligning them with information already           the task of automated fact-checking. The first, claim monitor, con-
present in knowledge graphs [65]. Relation extraction performance         tinuously monitors text published as broadcast television closed-
is often assessed on the TACRED dataset [68]. This is a large-scale       captions, Twitter accounts, and as content on a selected set of
dataset of 106, 264 examples used in the annual TAC Knowledge             websites. This text is passed to the claim spotter, which scores ev-
Base Population challenges, and covers 41 relation types. The most        ery sentence by its likelihood to contain a claim that is worthy
successful solution to date is from Baldini Soares et al. [3], who        of fact-checking — subjective and opinionated sentences receive a
achieved a micro-averaged F1 score of 71.5%. Despite increasing           low score in this task. Once it has identified a set of check-worthy
availability of state-of-the-art machine learning architectures, rela-    sentences, it uses a claim matcher to search through fact-check
tion extraction continues to be an open problem with much room            repositories to return existing fact-checks that match the selected
for improvement.                                                          sentences. Claim checker generates questions from the selected
                                                                          sentences and uses those questions to query Wolfram Alpha and
4.2     Knowledge Base Augmentation                                       Google to fetch supporting or debunking evidence as a supplement
Knowledge base augmentation is a task that aims to add new re-            to the findings of claim matcher. Finally, the fact-check reporter
lations to existing knowledge bases in an automated fashion [61].         builds a report from all of the gathered evidence that summarizes
KnOD’21 Workshop, April 14, 2021, Virtual Event                                                           Matthew Sumpter and Giovanni Luca Ciampaglia


the findings of the ClaimBuster pipeline, and disseminates these          of our pipeline lies in its discrete structure, which is prone to cas-
findings through social media.                                            cading failures. Our main NLP tool, FRED, is a powerhouse of a
                                                                          tool and performed many important NLP tasks at once; however, it
4.5     Claim Verification                                                was not always completely accurate and many of our samples were
Claim verification is arguably the key task of fact-checking — to         returned as corrupted RDF graphs. Additionally, it was not always
check a claim against existing evidence. It is related to the match-      able to link the nodes to DBpedia, which limited the number of
ing and checking subtasks of ClaimBuster, in that it is the task          triples we could feed into our fact-checking algorithms. Cascad-
of checking whether a natural language sentence selected as evi-          ing failures are common to many machine reading pipelines [35].
dence supports or debunks the correlated claim. To build out com-         One way to overcome this issue would be to rely on a joint in-
putational solutions to this task, datasets containing claims and         ference approaches [52]. Another limitation of our methodology
their corresponding evidence are needed. There have been some             has to do with our use of distributed representations. For the task
datasets [2, 15, 56] relevant to this task, however they are either       of fact-checking, the corpus is always growing; Node2Vec cannot
not machine-readable or lacking in size. Thorne et al. [54] rec-          generalize to unseen data and requires retraining. An inductive
ognized this gap, and has since released a large-scale dataset to         learning framework, such as GraphSAGE [23], can generate embed-
address these concerns, called FEVER. This dataset contains 185,445       dings for unseen nodes, and is therefore a more practical algorithm
claims with corresponding evidence that were manually classified          for extending this pipeline. For the classification task, our machine
as SUPPORTED, REFUTED, or NOTENOUGHINFO. This has been followed           learning models were relatively simple, and optimizing both the
up with annual workshops that encourage participants to improve           parameters and architecture of the neural network would likely
upon both the dataset and the claim verification task. The CLEF           see an increase in the accuracy and effectiveness of this method. Fi-
CheckThat! [4] series of workshops and conferences also seek to           nally, a full evaluation of our method against transformer language
bring researchers together to improve claim verification, along with      models for relevant relation classification tasks [62] is left as future
identifying and extracting checkworthy claims.                            work.

4.6     Other Fact-Checking Methods                                       5    DISCUSSION
                                                                          In this paper, we have presented a novel relation extraction al-
Besides claim-matching approaches, there are a handful of existing
                                                                          gorithm and previewed its application when used to classify rela-
algorithms for fact-checking, mostly based on exploiting content or
                                                                          tions present in online discourse and automatically fact-check them
characteristics of existing knowledge bases. Embedding approaches,
                                                                          against the information present in a general knowledge graph. We
such as TransE [5], seek to generate vector embeddings of knowl-
                                                                          developed a pipeline to facilitate the linkage of these two tasks.
edge bases, a task which is conceptually related to our approach.
                                                                          Our relation classification method leverages graph representation
By generating these embeddings, they can perform link-prediction
                                                                          learning on the shortest paths between entities in semantic de-
based on structural patterns of (s, p, o) triples. In terms of a knowl-
                                                                          pendency trees; it was shown to be comparable to state-of-the-art
edge base, this amounts to adding new facts without any needed
                                                                          methods based on a corpus of labeled relations (AUC = 97.6%).
source material. For fact-checking, this approach can be used to
                                                                          This classifier was then used to reduce claims from online discourse
test whether a triple extracted from a claim is a predicted link
                                                                          to semantic triples with an AUC of 95.8%; these were used as input
in the knowledge base; the pitfall of these methods, as with all
                                                                          to fact-checking algorithms to predict the accuracy of the claim.
embedding techniques, is they lack both interpretability and scala-
                                                                          We achieved an AUC of 83% on our selected claims, which is at
bility. Other algorithms similarly consider paths within knowledge
                                                                          the least comparable to claim matching, but without the need for
bases, but seek to address the interpretability problem. PRA [28],
                                                                          the corpus of existing claims that claim matching relies on.
SFE [18], PredPath [47], and AMIE [16] all take the approach of
                                                                             Our relation extraction method is a promising approach to distin-
mining possible pathways between two entities within a knowledge
                                                                          guishing relations present in large online discourse corpora; scaling
base. From these mined pathways, they generate sets of features to
                                                                          up this algorithm could provide an outlet for modeling online dis-
be used in supervised learning models for link-prediction. These
                                                                          course within an established ontology. Additionally, our pipeline
have shown promise in their success at predicting the validity of a
                                                                          may serve as a proof-of-concept for future research into automated
claim, however this also suffers from scalability. Knowledge bases
                                                                          fact-checking. While it is a challenge to model all possible relations
that contain enough relevant information to be useful are very
                                                                          in a generalistic ontology like DBpedia, this pipeline could form the
large, and path mining and feature generation becomes necessar-
                                                                          basis of tools for reducing the time needed to research an online
ily time-consuming. There are a few rule-based [38] methods for
                                                                          discourse claim.
fact-checking, which rely on logical constraints of a knowledge
graph and are naturally explainable. General, large-scale knowl-
                                                                          Acknowledgements
edge graphs do not have these logical constraints from which to
build rules from, leaving this approach to fact-checking an open          The authors would like to thank Google for making publicly avail-
problem [25].                                                             able both the GREC dataset and the Fact Check Explorer tool, and
                                                                          Alexios Mantzarlis for feedback on the manuscript.
4.7     Threats to Validity
                                                                          REFERENCES
No method is perfect and our approach suffers from a number of             [1] Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the
limitations, which we briefly describe here. The main limitation               2016 election. Journal of economic perspectives 31, 2 (2017), 211–36.
REMOD                                                                                                                           KnOD’21 Workshop, April 14, 2021, Virtual Event


 [2] Gabor Angeli and Christopher D. Manning. 2014. NaturalLI: Natural Logic                     2939672.2939754 arXiv:1607.00653
     Inference for Common Sense Reasoning. In Proceedings of the 2014 Conference            [21] Andrew M Guess, Brendan Nyhan, and Jason Reifler. 2020. Exposure to untrust-
     on Empirical Methods in Natural Language Processing (EMNLP). Association for                worthy websites in the 2016 US election. Nature human behaviour 4, 5 (2020),
     Computational Linguistics, Doha, Qatar, 534–545. https://doi.org/10.3115/v1/                472–480.
     D14-1059                                                                               [22] R. V. Guha, Dan Brickley, and Steve Macbeth. 2016. Schema.Org: Evolution of
 [3] Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski.               Structured Data on the Web. Commun. ACM 59, 2 (Jan. 2016), 44–51. https:
     2019. Matching the Blanks: Distributional Similarity for Relation Learning. In              //doi.org/10.1145/2844544
     Proceedings of the 57th Annual Meeting of the Association for Computational            [23] William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation
     Linguistics. Association for Computational Linguistics, Florence, Italy, 2895–2905.         Learning on Large Graphs. In Proceedings of the 31st International Conference on
     https://doi.org/10.18653/v1/P19-1279                                                        Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17).
 [4] Alberto Barron-Cedeno, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino,               Curran Associates Inc., Red Hook, NY, USA, 1025–1035.
     Maram Hasanain, Reem Suwaileh, and Fatima Haouari. 2020. CheckThat! at                 [24] Naeemul Hassan, Gensheng Zhang, Fatma Arslan, Josue Caraballo, Damian
     CLEF 2020: Enabling the Automatic Identification and Verification of Claims in              Jimenez, Siddhant Gawsane, Shohedul Hasan, Minumol Joseph, Aaditya Kulkarni,
     Social Media. arXiv:2001.08546 [cs.CL]                                                      Anil Kumar Nayak, Vikas Sable, Chengkai Li, and Mark Tremayne. 2017. Claim
 [5] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Ok-                buster: The firstever endtoend factchecking system. Proceedings of the VLDB
     sana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational                  Endowment 10, 12 (2017), 1945–1948. https://doi.org/10.14778/3137765.3137815
     Data. In Advances in Neural Information Processing Systems 26, C. J. C. Burges,        [25] Viet Phi Huynh and Paolo Papotti. 2019. A benchmark for fact checking algo-
     L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran                   rithms built on knowledge bases. In International Conference on Information and
     Associates, Inc., Red Hook, NY, United States, 2787–2795.                                   Knowledge Management, Proceedings, Vol. 10. Association for Computing Machin-
 [6] B. Borel. 2016. The Chicago Guide to Fact-Checking. University of Chicago Press,            ery, New York, NY, USA, 689–698. https://doi.org/10.1145/3357384.3358036
     Chicago, IL, USA.                                                                      [26] Krutika Kale. 2018. No, Tej Pratap Yadav Did Not Receive A Doctorate From
 [7] Alexandre Bovet and Hernán A. Makse. 2019. Influence of fake news in Twitter                Takshashila University. Online at https://www.boomlive.in/no-lalus-son-tej-
     during the 2016 US presidential election. Nature Communications 10, 1 (Jan. 2019),          pratap-did-not-receive-a-doctorate-from-takshashila-university/. Last accessed
     7. https://doi.org/10.1038/s41467-018-07761-2                                               2021-02-21.
 [8] Lorenz Bühmann and Jens Lehmann. 2013. Pattern Based Knowledge Base                    [27] Sunny Lai, Kwong Sak Leung, and Yee Leung. 2018. SUNNYNLP at SemEval-2018
     Enrichment. In The Semantic Web – ISWC 2013, Harith Alani, Lalana Kagal, Achille            Task 10: A Support-Vector-Machine-Based Method for Detecting Semantic Differ-
     Fokoue, Paul Groth, Chris Biemann, Josiane Xavier Parreira, Lora Aroyo, Natasha             ence using Taxonomy and Word Embedding Features. In Proceedings of The 12th
     Noy, Chris Welty, and Krzysztof Janowicz (Eds.). Springer Berlin Heidelberg,                International Workshop on Semantic Evaluation. Association for Computational
     Berlin, Heidelberg, 33–48.                                                                  Linguistics, New Orleans, Louisiana, 741–746. https://doi.org/10.18653/v1/S18-
 [9] Razvan C. Bunescu and Raymond J. Mooney. 2005. A Shortest Path Depen-                       1118
     dency Kernel for Relation Extraction. In Proceedings of the Conference on Human        [28] Ni Lao and William W. Cohen. 2010. Relational retrieval using a combination of
     Language Technology and Empirical Methods in Natural Language Processing                    path-constrained random walks. Machine Learning 81, 1 (2010), 53–67. https:
     (Vancouver, British Columbia, Canada) (HLT ’05). Association for Computational              //doi.org/10.1007/s10994-010-5205-8
     Linguistics, USA, 724–731. https://doi.org/10.3115/1220575.1220666                     [29] Ni Lao, Tom Mitchell, and William W. Cohen. 2011. Random Walk Inference
[10] Giovanni Luca Ciampaglia. 2018. Fighting fake news: a role for computational                and Learning in A Large Scale Knowledge Base. In Proceedings of the 2011 Con-
     social science in the fight against digital misinformation. Journal of Computational        ference on Empirical Methods in Natural Language Processing. Association for
     Social Science 1, 1 (29 Jan. 2018), 147–153. https://doi.org/10.1007/s42001-017-            Computational Linguistics, Edinburgh, Scotland, UK., 529–539.
     0005-6                                                                                 [30] David MJ Lazer, Matthew A Baum, Yochai Benkler, Adam J Berinsky, Kelly M
[11] Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M. Rocha, Johan Bollen,                  Greenhill, Filippo Menczer, Miriam J Metzger, Brendan Nyhan, Gordon Penny-
     Filippo Menczer, and Alessandro Flammini. 2015. Computational Fact Checking                 cook, David Rothschild, et al. 2018. The science of fake news. Science 359, 6380
     from Knowledge Networks. PLOS ONE 10, 6 (06 2015), 1–13. https://doi.org/10.                (2018), 1094–1096.
     1371/journal.pone.0128193                                                              [31] Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and
[12] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Mur-                 Documents. In Proceedings of the 31st International Conference on International
     phy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault:                   Conference on Machine Learning - Volume 32 (ICML’14). JMLR.org, Beijing, China,
     A web-scale approach to probabilistic knowledge fusion. In Proceedings of the               II–1188–II–1196.
     ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.            [32] Stephan Lewandowsky, Ullrich K. H. Ecker, Colleen M. Seifert, Norbert Schwarz,
     Association for Computing Machinery, New York, New York, USA, 601–610.                      and John Cook. 2012. Misinformation and Its Correction: Continued Influence
     https://doi.org/10.1145/2623330.2623623                                                     and Successful Debiasing. Psychological Science in the Public Interest 13, 3 (2012),
[13] Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn,                     106–131. https://doi.org/10.1177/1529100612451018
     Camillo Lugaresi, Shaohua Sun, and Wei Zhang. 2015. Knowledge-Based Trust:             [33] David Liben-Nowell and Jon Kleinberg. 2007. The Link-Prediction Problem for
     Estimating the Trustworthiness of Web Sources. Proc. VLDB Endow. 8, 9 (May                  Social Networks. Journal of the American society for Information Science and
     2015), 938–949. https://doi.org/10.14778/2777598.2777603                                    Technology 58, 7 (2007), 1019–1031.
[14] C. Fellbaum and G.A. Miller. 1998. WordNet: An Electronic Lexical Database. MIT        [34] Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Rui Fang, and Sameena Shah.
     Press, Cambridge, MA, USA.                                                                  2015. Real-Time Rumor Debunking on Twitter. In Proceedings of the 24th ACM
[15] Kim Fridkin, Patrick J. Kenney, and Amanda Wintersieck. 2015. Liar, Liar,                   International on Conference on Information and Knowledge Management (Mel-
     Pants on Fire: How Fact-Checking Influences Citizens’ Reactions to Nega-                    bourne, Australia) (CIKM ’15). Association for Computing Machinery, New York,
     tive Advertising. Political Communication 32, 1 (Jan 2015), 127–151. https:                 NY, USA, 1867–1870. https://doi.org/10.1145/2806416.2806651
     //doi.org/10.1080/10584609.2014.914613                                                 [35] T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, B. Yang, J. Betteridge, A. Carlson,
[16] Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek.               B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed,
     2013. AMIE: association rule mining under incomplete evidence in ontological                N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya,
     knowledge bases. In Proceedings of the 22nd international conference on World               A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling. 2018. Never-Ending
     Wide Web - WWW ’13. ACM Press, New York, New York, USA, 413–422. https:                     Learning. Commun. ACM 61, 5 (April 2018), 103–115. https://doi.org/10.1145/
     //doi.org/10.1145/2488388.2488425                                                           3191513
[17] Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero, Andrea Giovanni           [36] Brendan Nyhan and Jason Reifler. 2015. The Effect of Fact-Checking on Elites: A
     Nuzzolese, Francesco Draicchio, and Misael Mongiovì. 2017. Semantic Web                     Field Experiment on U.S. State Legislators. American Journal of Political Science
     Machine Reading with FRED. Semantic Web 8, 6 (2017), 873–893. https://doi.                  59, 3 (2015), 628–640. https://doi.org/10.1111/ajps.12162
     org/10.3233/SW-160240                                                                  [37] Dave Orr. 2013. 50,000 Lessons on How to Read: a Relation Extraction Cor-
[18] Matt Gardner and Tom Mitchell. 2015. Efficient and Expressive Knowledge                     pus. https://ai.googleblog.com/2013/04/50000-lessons-on-how-to-read-relation.
     Base Completion Using Subgraph Feature Extraction. In Proceedings of the 2015               html
     Conference on Empirical Methods in Natural Language Processing. Association            [38] Stefano Ortona, Venkata Vamsikrishna Meduri, and Paolo Papotti. 2018. Robust
     for Computational Linguistics, Lisbon, Portugal, 1488–1498. https://doi.org/10.             discovery of positive and negative rules in knowledge bases. In 2018 IEEE 34th
     18653/v1/D15-1173                                                                           International Conference on Data Engineering (ICDE) (Paris, France). IEEE, IEEE,
[19] Nir Grinberg, Kenneth Joseph, Lisa Friedland, Briony Swire-Thompson, and                    Piscataway, NJ, USA, 1168–1179.
     David Lazer. 2019. Fake news on Twitter during the 2016 US presidential election.      [39] Heiko Paulheim and Simone Paolo Ponzetto. 2013. Extending DBpedia with
     Science 363, 6425 (2019), 374–378.                                                          Wikipedia List Pages. In Proceedings of the 2013th International Conference on NLP
[20] Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable feature learning for              & DBpedia - Volume 1064 (Sydney, Australia) (NLP-DBPEDIA’13). CEUR-WS.org,
     networks. Proceedings of the ACM SIGKDD International Conference on Knowledge               Aachen, DEU, 85–90.
     Discovery and Data Mining 13-17-Augu (2016), 855–864. https://doi.org/10.1145/
KnOD’21 Workshop, April 14, 2021, Virtual Event                                                                              Matthew Sumpter and Giovanni Luca Ciampaglia


[40] Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. 2017.         Meeting of the Association for Computational Linguistics (Volume 1: Long Pa-
     Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-               pers). Association for Computational Linguistics, Berlin, Germany, 1298–1307.
     Checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural            https://doi.org/10.18653/v1/P16-1123
     Language Processing. Association for Computational Linguistics, Copenhagen,           [59] Claire Wardle and Hossein Derakhshan. 2017. Information disorder: Toward an
     Denmark, 2931–2937. https://doi.org/10.18653/v1/D17-1317                                   interdisciplinary framework for research and policy making. Technical Report.
[41] Bradley L. Richards and Raymond J. Mooney. 1992. Learning Relations by                     Council of Europe Report.
     Pathfinding. In Proceedings of the Tenth National Conference on Artificial Intelli-   [60] Jen Weedon, William Nuland, and Alex Stamos. 2017. Information Operations
     gence (San Jose, California) (AAAI’92). AAAI Press, Palo Alto, CA, USA, 50–55.             and Facebook. Technical Report. Facebook, Inc.
[42] Victoria L. Rubin, Yimin Chen, and Nadia K. Conroy. 2015. Deception detection for     [61] Gerhard Weikum and Martin Theobald. 2010. From Information to Knowledge:
     news: Three types of fakes. Proceedings of the Association for Information Science         Harvesting Entities and Relationships from Web Sources. In Proceedings of the
     and Technology 52, 1 (2015), 1–4. https://doi.org/10.1002/pra2.2015.145052010083           Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Data-
[43] Victoria L. Rubin and Tatiana Vashchilko. 2012. Identification of Truth and                base Systems (Indianapolis, Indiana, USA) (PODS ’10). Association for Computing
     Deception in Text: Application of Vector Space Model to Rhetorical Structure               Machinery, New York, NY, USA, 65–76. https://doi.org/10.1145/1807085.1807097
     Theory. In Proceedings of the Workshop on Computational Approaches to Deception       [62] Shanchan Wu and Yifan He. 2019. Enriching pre-trained language model with
     Detection. Association for Computational Linguistics, Avignon, France, 97–106.             entity information for relation classification. In Proceedings of the 28th ACM
[44] schema.org. 2020. ClaimReview schema. https://schema.org/ClaimReview                       International Conference on Information and Knowledge Management. 2361–2364.
[45] Chengcheng Shao, Giovanni Luca Ciampaglia, Alessandro Flammini, and Fil-              [63] You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. Toward
     ippo Menczer. 2016. Hoaxy: A Platform for Tracking Online Misinformation.                  Computational Fact-Checking. Proc. VLDB Endow. 7, 7 (March 2014), 589–600.
     In Proceedings of the 25th International Conference Companion on World Wide                https://doi.org/10.14778/2732286.2732295
     Web (Montréal, Québec, Canada) (WWW ’16 Companion). International World               [64] Minguang Xiao and Cong Liu. 2016. Semantic Relation Classification via Hierar-
     Wide Web Conferences Steering Committee, Republic and Canton of Geneva,                    chical Recurrent Neural Network with Attention. In Proceedings of COLING 2016,
     Switzerland, 745–750. https://doi.org/10.1145/2872518.2890098                              the 26th International Conference on Computational Linguistics: Technical Papers.
[46] Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kai-Cheng Yang,                     The COLING 2016 Organizing Committee, Osaka, Japan, 1254–1263.
     Alessandro Flammini, and Filippo Menczer. 2018. The spread of low-credibility         [65] Peng Xu and Denilson Barbosa. 2019. Connecting Language and Knowledge with
     content by social bots. Nature communications 9, 1 (2018), 1–9.                            Heterogeneous Representations for Neural Relation Extraction. In Proceedings
[47] Baoxu Shi and Tim Weninger. 2016. Discriminative predicate path mining for                 of the 2019 Conference of the North American Chapter of the Association for Com-
     fact checking in knowledge graphs. Knowledge-Based Systems 104 (Jul 2016),                 putational Linguistics: Human Language Technologies, Volume 1 (Long and Short
     123–133. https://doi.org/10.1016/j.knosys.2016.04.015 arXiv:1510.05911                     Papers). Association for Computational Linguistics, Minneapolis, Minnesota,
[48] Prashant Shiralkar, Alessandro Flammini, Filippo Menczer, and Giovanni Luca                3201–3206. https://doi.org/10.18653/v1/N19-1323
     Ciampaglia. 2017. Finding Streams in Knowledge Graphs to Support Fact Check-          [66] Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, and Yoshiyasu Takefuji. 2016.
     ing. In 2017 IEEE International Conference on Data Mining (ICDM) (New Orleans,             Joint Learning of the Embedding of Words and Entities for Named Entity Dis-
     Louisiana, USA). IEEE, Piscataway, NJ, 859–864. https://doi.org/10.1109/ICDM.              ambiguation. In Proceedings of The 20th SIGNLL Conference on Computational
     2017.105 arXiv:1708.07239 [cs.AI] Extended Version.                                        Natural Language Learning. Association for Computational Linguistics, 209 N.
[49] Kai Shu, Deepak Mahudeswaran, Suhang Wang, and Huan Liu. 2020. Hierarchical                Eighth Street, Stroudsburg PA 18360, USA, 250–259.
     propagation networks for fake news detection: Investigation and exploitation. In      [67] Ran Yu, Ujwal Gadiraju, Besnik Fetahu, Oliver Lehmberg, Dominique Ritze, and
     Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14.         Stefan DIetze. 2018. KnowMore - Knowledge base augmentation with structured
     AAAI, Palo Alto, CA, USA, 626–637.                                                         web markup. , 159–180 pages. https://doi.org/10.3233/SW-180304
[50] Kai Shu, Suhang Wang, and Huan Liu. 2018. Understanding User Profiles on              [68] Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D.
     Social Media for Fake News Detection. In IEEE 1st Conference on Multimedia                 Manning. 2017. Position-aware Attention and Supervised Data Improve Slot
     Information Processing and Retrieval (MIPR 2018). IEEE, Piscataway, NJ, USA,               Filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural
     430–435. https://doi.org/10.1109/MIPR.2018.00092                                           Language Processing. Association for Computational Linguistics, Copenhagen,
[51] Craig Silverman (Ed.). 2014. Verification Handbook. European Journalism Center,            Denmark, 35–45. https://doi.org/10.18653/v1/D17-1004
     Maastricht, the Netherlands.                                                          [69] Zhao, Zilong, Zhao, Jichang, Sano, Yukie, Levy, Orr, Takayasu, Hideki, Takayasu,
[52] Sameer Singh, Sebastian Riedel, Brian Martin, Jiaping Zheng, and Andrew McCal-             Misako, Li, Daqing, Wu, Junjie, and Havlin, Shlomo. 2020. Fake news propagates
     lum. 2013. Joint Inference of Entities, Relations, and Coreference. In Proceedings         differently from real news even at early stages of spreading. EPJ Data Sci. 9, 1
     of the 2013 Workshop on Automated Knowledge Base Construction (San Francisco,              (2020), 7. https://doi.org/10.1140/epjds/s13688-020-00224-z
     California, USA) (AKBC ’13). Association for Computing Machinery, New York,
     NY, USA, 1–6. https://doi.org/10.1145/2509558.2509559
[53] Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng.
     2013. Reasoning With Neural Tensor Networks for Knowledge Base Com-
     pletion. In Advances in Neural Information Processing Systems, C. J. C.
     Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger
     (Eds.), Vol. 26. Curran Associates, Inc., 57 Morehouse Lane, Red Hook,
     NY, United States, 926–934. https://proceedings.neurips.cc/paper/2013/file/
     b337e84de8752b27eda3a12363109e80-Paper.pdf
[54] James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal.
     2018. FEVER: a Large-scale Dataset for Fact Extraction and VERification. In
     Proceedings of the 2018 Conference of the North American Chapter of the Association
     for Computational Linguistics: Human Language Technologies, Volume 1 (Long
     Papers). Association for Computational Linguistics, Stroudsburg, PA, USA, 809–
     819. https://doi.org/10.18653/v1/N18-1074
[55] Sebastian Tschiatschek, Adish Singla, Manuel Gomez Rodriguez, Arpit Merchant,
     and Andreas Krause. 2018. Fake News Detection in Social Networks via Crowd
     Signals. In Companion Proceedings of the The Web Conference 2018 (Lyon, France)
     (WWW ’18). International World Wide Web Conferences Steering Committee,
     Republic and Canton of Geneva, CHE, 517–524. https://doi.org/10.1145/3184558.
     3188722
[56] Andreas Vlachos and Sebastian Riedel. 2014. Fact Checking: Task definition
     and dataset construction. In Proceedings of the ACL 2014 Workshop on Language
     Technologies and Computational Social Science. Association for Computational
     Linguistics, Baltimore, MD, USA, 18–22. https://doi.org/10.3115/v1/W14-2508
[57] Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false
     news online. Science 359, 6380 (2018), 1146–1151. https://doi.org/10.1126/science.
     aap9559
[58] Linlin Wang, Zhu Cao, Gerard de Melo, and Zhiyuan Liu. 2016. Relation Clas-
     sification via Multi-Level Attention CNNs. In Proceedings of the 54th Annual
REMOD                                                                                                                     KnOD’21 Workshop, April 14, 2021, Virtual Event


A     SELECTED CLAIMREVIEW CLAIMS
Table 7: Selected ClaimReview claims, the relation they con-
tain, and the relation predicted by the model. The text bold
indicates the entities participating in the relation. The AUC
of the relation classification task is 0.958.

 ID   Claim                                                                                               Actual*       Predicted*    Rating   Claim ≡ Triple
 1    Malaysian-born Senator Penny Wong ineligible for Australian parliament                              POB           DOB           False
 2    Donald Trump says President Obama’s grandmother in Kenya said he was born in Kenya and she          POB           Institution   False           ✓
      was there and witnessed the birth.
 3    Donald Trump says his father, Fred Trump, was born in a very wonderful place in Germany.            POB           POB           False           ✓
 4    Barack Obama was born in the United States.                                                         POB           POB           True            ✓
 5    Barron Trump was born in March 2006 and Melania wasn’t a legal citizen until July 2006. So under    DOB           POB           False
      this executive order, his own son wouldn’t be an American citizen.
 6    Isabelle Duterte was born on January 26, 2002, which makes her only 15 years old today.             DOB           DOB           False
 7    Tej Pratap Yadav receives a doctorate degree from Takshsila University in Bihar                     education     education     False           ✓
 8     Smriti Irani has a MA degree.                                                                      education     institution   False           ✓
 9    Melania Trump lied under oath in 2013 about graduating from college with a bachelor’s degree in     education     institution   False
      architecture.
 10   Did Michelle Obama recently earn a doctorate degree in law?                                         education     education     False           ✓
 11   Pravin Gordhan does not have a degree.                                                              education     education     False           ✓
 12   Alexandria Ocasio-Cortez’s economics degree recalled.                                               education     institution   False           ✓
 13   Ilocos Norte Governor Imee Marcos claimed on January 16 that she earned a degree from Princeton     education     education     False
      University.
 14   Ilocos Norte Governor Imee Marcos claimed on January 16 that she earned a degree from Princeton     institution   institution   False           ✓
      University.
 15   Tej Pratap Yadav receives a doctorate degree from Takshsila University in Bihar.                    institution   education     False
 16   Patrick Murphy embellished, according to reports, his University of Miami academic achievement.     institution   institution   True
 17   Mahmoud Abbas, Ali Khamenei, and Vladimir Putin met each other in the class of 1968 at Patrice      institution   institution   False
      Lumumba University in Moscow
 18   Mahmoud Abbas, Ali Khamenei, and Vladimir Putin met each other in the class of 1968 at Patrice      institution   institution   False
      Lumumba University in Moscow
 19   Mahmoud Abbas, Ali Khamenei, and Vladimir Putin met each other in the class of 1968 at Patrice      institution   institution   False
      Lumumba University in Moscow
 20   Maria Butina is a human rights activist, a student of the American University, and the most         institution   institution   False
      relevant is that she is a person who did not work (collaborate) with the Russian state bodies.
 21   Ilocos Norte Governor Imee Marcos graduated cum laude from the University of the Philippines        institution   institution   False
      (UP) College of Law.
 22   David Hogg graduated from Redondo Shores High School in 2015.                                       institution   institution   False           ✓
 23   Sadhvi Pragya Singh Thakur said Manohar Parrikar died of cancer because he allowed the con-         POD           POD           False
      sumption of beef in Goa.
 24   Fox star Tucker Carlson in critical condition (then died) after head on collision driving home in   POD           POD           False           ✓
      Washington D.C.
 25   Nasser Al Kharafi died in Kuwait.                                                                   POD           POD           False           ✓
 26   DCP Amit Sharma passed away in Delhi riots                                                          POD           institution   False           ✓
 27   It is being claimed that Jason Statham was murdered at his home in New York by assailants who       POD           POD           False
      broke into his mansion.
 28   Actor Robert Downey Jr. died in a car crash stunt in Hollywood on July 8.                           POD           POD           False
 * DOB = Date of Birth, POB = Place of Birth, POD = Place of Death

</pre>