<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Commonsense Knowledge in Wikidata</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Filip Ilievski</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pedro Szekely</string-name>
          <email>pszekelyg@isi.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Schwabe</string-name>
          <email>dschwabe@inf.puc-rio.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Informatics, Ponti cia Universidade Catolica Rio de Janeiro</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Information Sciences Institute, University of Southern California</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Wikidata and Wikipedia have been proven useful for reasoning in natural language applications, like question answering or entity linking. Yet, no existing work has studied the potential of Wikidata for commonsense reasoning. This paper investigates whether Wikidata contains commonsense knowledge which is complementary to existing commonsense sources. Starting from a de nition of common sense, we devise three guiding principles, and apply them to generate a commonsense subgraph of Wikidata (Wikidata-CS ). Within our approach, we map the relations of Wikidata to ConceptNet, which we also leverage to integrate Wikidata-CS into an existing consolidated commonsense graph. Our experiments reveal that: 1) albeit Wikidata-CS represents a small portion of Wikidata, it is an indicator that Wikidata contains relevant commonsense knowledge, which can be mapped to 15 ConceptNet relations; 2) the overlap between Wikidata-CS and other commonsense sources is low, motivating the value of knowledge integration; 3) Wikidata-CS has been evolving over time at a slightly slower rate compared to the overall Wikidata, indicating a possible lack of focus on commonsense knowledge. Based on these ndings, we propose three recommended actions to improve the coverage and quality of Wikidata-CS further.</p>
      </abstract>
      <kwd-group>
        <kwd>Commonsense Knowledge</kwd>
        <kwd>Wikidata</kwd>
        <kwd>Knowledge Graphs</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Common sense is \the basic ability to perceive, understand, and judge things
that are shared by nearly all people and can be reasonably expected of nearly all
people without need for debate" [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. For instance, humans typically know that
the political opposition is an opposite of the government, that hunger causes
one to eat, and that if one walks in the rain one gets wet. Possessing such
commonsense knowledge is important for both humans and machines in order to
ll gaps in communication, and ful ll tasks such as entity recognition and linking
from text, question answering, and planning. Yet, understanding common sense
is di cult for machines. Even with the recent progress of language models such
as BERT [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and GPT-2 [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], which have been able to perform very well on
Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
a number of tasks with enough training,1 the correct answer is often given for
wrong reasons [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The utterances produced are syntactically sound, but may
lack plausibility. For instance, GPT-2 complements the following prompt `if you
break a bottle that contains liquids, some of the liquid will (other things being
equal) probably...' with `...wind up 300 meters away' [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        Commonsense graphs like ConceptNet [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] and ATOMIC [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] provide
relevant knowledge that can be used to enhance the ability of language models to
reason on downstream tasks. Unfortunately, these are largely incomplete, e.g.,
while ConceptNet contains information that a barbecue can be located in a
garage, it is unable to infer that they are also common in other outdoor places,
like parks, nor it has information on the expectations from such an event.
      </p>
      <p>
        According to [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], common knowledge graphs (KGs) derived from Wikipedia,
such as Wikidata [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] or YAGO4 [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], provide knowledge which is `often required
to achieve a deep understanding of both the low- and high-level concepts found in
language'. In addition, Wikipedia has been used by a large number of systems for
downstream reasoning tasks [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. As the largest and highest-quality structured
counterpart of Wikipedia [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Wikidata is likely to contain useful commonsense
knowledge - yet, no existing work has studied its commonsense coverage.
      </p>
      <p>
        In this paper, we investigate whether Wikidata contains commonsense
knowledge and whether that is complementary to existing commonsense knowledge
graphs. Its contributions are: 1. we formulate three key principles for
distinguishing commonsense knowledge from the rest in Wikidata, starting from three
key properties of commonsense knowledge and from a survey of existing
commonsense KGs (Section 3.1). These principles dictate that commonsense
knowledge concerns well-known concepts and general-domain relations. 2. Based on
them, we design and implement computational steps to extract a commonsense
subgraph from Wikidata which we refer to as Wikidata-CS in the remainder
of this paper (Section 3.2). Here, we also map relations in Wikidata to
relations in ConceptNet. 3. We leverage this mapping to integrate Wikidata-CS into
the Commonsense Knowledge Graph (CSKG) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which already contains well{
known commonsense sources, such as ATOMIC and ConceptNet (Section 3.3).
4. We perform quantitative and qualitative analysis of the resulting subgraph
(Section 4.1). Moreover, we compute overlaps between Wikidata-CS and other
resources included in CSKG, like ConceptNet and WordNet (Section 4.2). 5. We
perform the same experiments with three di erent versions of Wikidata from
2017, 2018, and 2020, and compare the results (Section 4.3). This allows us to
quantify the evolution of commonsense knowledge in Wikidata over time. 6. In
Section 5, we re ect on the ndings from our experiments and propose
recommended actions for further inclusion of commonsense knowledge in Wikidata.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>We review: 1. well-known commonsense KGs 2. prior works on reasoning with
Wikidata or Wikipedia over text 3. studies of completeness of Wikidata.</p>
      <sec id="sec-2-1">
        <title>1 For example: https://leaderboard.allenai.org/socialiqa/submissions/public</title>
        <p>
          Commonsense KGs such as ConceptNet and ATOMIC are popular and
have been utilized by downstream reasoners [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Lexical resources, like
WordNet [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] and FrameNet [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], capture commonsense knowledge about concepts and
frames, respectively. Moreover, sources like Visual Genome [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] which have been
originally proposed for a di erent purpose (image captioning and visual
recognition), have recently been recognized as sources of commonsense knowledge.
Commonsense knowledge can also be extracted from documents [
          <xref ref-type="bibr" rid="ref20 ref29">29, 20</xref>
          ], query
logs [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], or quantities [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. A recent resource, called the Commonsense Knowledge
Graph [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], consolidates many of these resources into a single KG. The
complementarity of these sources motivates their integration, but also reveals that they
are still largely incomplete. Wikidata, as one of the richest public KGs, holds a
promise to enrich the set of recorded commonsense facts even further.
        </p>
        <p>
          A recent idea is to use language models, like BERT [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and GPT-2 [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], as
knowledge bases, due to their inherent ability to produce a fact for any input
prompt. Still, they often exhibit shallow understanding of the world [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
Integration with KGs like Wikidata or ConceptNet may increase their robustness [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>
          Reasoning with Wikipedia and Wikidata Wikipedia and Wikidata
serve as sources of background knowledge in natural language processing tasks [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ],
e.g., as a repository of entities to link to, or as a source of contextual information
to help linking entities in text [
          <xref ref-type="bibr" rid="ref21 ref3 ref5">21, 3, 5</xref>
          ]. The work by Suh et al. [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] attempts to
extract commonsense knowledge from Wikipedia. As far as we are aware, there is
no comprehensive proposal to extract commonsense knowledge from Wikipedia
or Wikidata, or to study their strengths and weaknesses for this purpose.
        </p>
        <p>
          Studies of completeness of Wikidata Several papers study the
completeness of Wikidata [
          <xref ref-type="bibr" rid="ref16 ref2 ref4">4, 2, 16</xref>
          ]. Luggen et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] provide an approach to estimate
class completeness in knowledge graphs, and use Wikidata as a use case. They
note that some classes in Wikidata, like Painting, are more complete than
others, such as Mountain. In addition, they also quantify the evolution of Wikidata
over time. Similarly, we also study the completeness of Wikidata and its richness
over time, albeit focusing on its coverage of commonsense knowledge.
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Extraction of Commonsense Knowledge from</title>
    </sec>
    <sec id="sec-4">
      <title>Wikidata</title>
      <sec id="sec-4-1">
        <title>Principles of commonsense knowledge</title>
        <p>
          Common sense is \the basic ability to perceive, understand, and judge things
that are shared by nearly all people and can be reasonably expected of nearly all
people without need for debate" [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. From this de nition, we can infer that
commonsense knowledge: 1) concerns conceptual rather than instance-based
information; 2) is primarily about commonly known observations; 3) targets
generaldomain information. We expand these three aspects into three guiding principles
for our approach, which would allow us to de ne a commonsense subset of the
knowledge in a general KG such as Wikidata.
        </p>
        <p>P1 Concepts, not entities The primary principle of commonsense knowledge
draws on the distinction between concept- and named-entity-level (instance)
knowledge. Generally speaking, most concept-level knowledge is common
sense, whereas most named-entity-level knowledge is not. The fact that
houses have rooms is commonsense knowledge, as it common and widely
applicable; the fact that the Versailles Palace has 700 rooms is not, as it
concerns a particular instance and cannot be expected by most people. Thus,
principle P1 is that commonsense knowledge has to be about concepts.
P2 Commonness The second principle (P2) of commonsense knowledge is its
`commonness': it is knowledge about well-known concepts that is shared
among most human beings. The fact that a container (Q987767) is used for
storage (Q9158768) is a common fact, whereas the fact that noma (Q994794)
is a subclass of aphthous stomatitis (Q189956) is fairly unknown.
P3 General-domain knowledge The third principle is that commonsense
knowledge is about general-domain information rather than expert
knowledge about a speci c domain like chemistry or biology. Notably, even within
a knowledge type, some relations describe general information, whereas
others require expert knowledge. For instance, considering meronymy relations,
we observe that part of describes well-known facts (e.g., wheel is part of a
car), whereas cell component focuses on biological knowledge (e.g.,
cholesterol has component cell membrane). As a third principle (P3), we aim to
distinguish between domain-speci c and general-domain knowledge.
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Approach</title>
        <p>Next, we apply P1-P3 to select commonsense knowledge from Wikidata.</p>
        <p>1. Excluding named entities (P1) In practice, Wikidata does not make
a clear distinction between concepts and named instances through its structured
knowledge. The relation instance of (P31) would intuitively be useful for this;
yet, it often expresses an is-a relation between concepts, similar to subclass
of (P279). For instance, Wikidata states that surgeon is an instance of medical
profession, and a subclass of medical specialist. Leveraging the rdf:type relation
from another public ontology, such as DBpedia, is a possible direction, yet this
strategy would be limited to the set of nodes that are mapped between Wikidata
and DBpedia. Hence, we follow a di erent route. The convention of Wikidata
stipulates that the labels of named entities should be capitalized, whereas the
ones for concepts should not.2 Following this rule, we employ a simple heuristic
of selecting edges where both nodes have alphanumeric labels starting with a
lowercase letter. We expand this rule and lter out labels that contain any capital
letter, to remove entities with labels like \graf Nikolai Aleksyeevich Sheremetev".
This procedure implicitly excludes nodes without English labels.</p>
      </sec>
      <sec id="sec-4-3">
        <title>2. Characterizing commonness (P2) We argued that commonsense facts</title>
        <p>concern common concepts. Wikidata-based metrics of frequency or popularity,
such as PageRank, cannot be used to estimate commonness, as they inherit the
bias towards topics that are heavily represented in Wikidata (e.g., entertainment
or science). Instead, we approximate commonness by frequencies of word and</p>
        <sec id="sec-4-3-1">
          <title>2 https://www.wikidata.org/wiki/Help:Label</title>
          <p>
            subclass of (P279) 172,535 saxophone - woodwind instrument
instance of (P31) 141,499 happiness - positive emotion
part of (P361) 9,118 shower - bathroom
di erent from (P1889) 7,767 vein - artery
has part (P527) 6,252 senses - touch
cell component (P681) 5,607 cholesterol - cell membrane
property constraint (P2302) 5,180 votes received - integer constraint
facet of (P1269) 4,792 wind - weather
strand orientation (P2548) 4,345 sac-1 - forward strand
use (P366) 3,045 crystal ball - psychic reading
opposite of (P461) 3,028 political opposition - government
properties for this type (P1963) 2,382 human - date of birth
molecular function (P680) 2,369 protein kinase - kinase activity
see also (P1659) 2,344 position held - member of
sport (P641) 2,338 head stand - gymnastics
followed by (P156) 2,244 middle school - secondary school
follows (P155) 2,234 queen - jack
material used (P186) 2,047 ice cream cone - wafer
is a list of (P360) 1,914 list of major opera composers - human
Wikidata property (P1687) 1,746 president - head of government
phrase usage that have been pre-computed over an independent corpus [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ].3
Here, we assume that frequently occurring words and phrases refer to well-known
concepts. According to this tool, the frequency of a common word, like storage,
is much higher than that of a relatively unknown word, such as noma (3.39e-05
compared to 3.24e-07). We select edges where both the subject and the object
labels have usage frequency above an empirically determined threshold of 1e 06.
          </p>
          <p>3. Excluding domain knowledge (P3) The initial two steps yield 420,822
edges, involving 414 edge types and describing 194,595 nodes. Table 1 presents
the number of occurrences for the 20 most frequent edge types, together with
a representative example edge for each type. By analyzing the frequency
distribution of the remaining relations, we observe that the frequency quickly decays.
The 50th most common relation describes less than 500 edges, and their
frequency plot becomes relatively at (Figure 1). Hence, we focus on the 50 most
frequent relations and distinguish the remaining knowledge by manually
mapping them to relations in ConceptNet v5.7.4 These account for 409,775 edges,
which is 97.4% of the total set of edges available at this point.5</p>
          <p>The main guideline for this mapping was to exclude properties which are
meant to describe domain-speci c information, such as strand orientation (P2548).</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>3 https://pypi.org/project/wordfreq/</title>
        </sec>
        <sec id="sec-4-3-3">
          <title>4 https://github.com/commonsense/conceptnet5/wiki/Downloads</title>
        </sec>
        <sec id="sec-4-3-4">
          <title>5 In the future, we intend to consolidate the remaining statements of Wikidata by</title>
          <p>mapping them to ConceptNet relations as well.
The mapping was performed independently by two authors of this paper. In all
cases, the annotators agreed on whether the relation describes general-domain
knowledge. In 9 cases, the annotators disagreed on which ConceptNet relation is
the most appropriate to map to. Typically, this meant that ConceptNet lacks a
relation with the same speci city, forcing the annotators to opt for a more generic
relation, such as /r/HasContext. The disagreements were resolved through a
joint discussion and examination of exemplar edges in Wikidata.</p>
          <p>The resulting mappings are shown in Table 2. 44 out of the top 50
relations were mapped to existing relations in ConceptNet, yielding 388,250 edges.
The remaining six relations are either biology domain-speci c: cell component
(P681), strand orientation (P2548), molecular function (P680), biological
process (P682); physical domain-speci c: decays to (P816); or ontological: property
constraint (P2302). The mapping shows that some ConceptNet properties (e.g.,
/r/Antonym) have a single counterpart in Wikidata (opposite of ), while others
(e.g., /r/HasContext) map to several properties, often with more speci c
meanings (e.g., genre, sport ). This might reveal an opportunity to enrich the speci city
of relations in ConceptNet with more detailed ones as in Wikidata.6 Some
relations in ConceptNet (e.g., /r/MotivatedByGoal) may have no counterpart in
Wikidata, and others map to a relation which is very sparse for common
concepts. For instance, /r/AtLocation maps to location, which is well-populated
for named entities in Wikidata, but only ranks 72nd with 159 occurrences in our
commonsense subset. These observations reveal a knowledge gap in Wikidata.
In several cases, the relation in Wikidata is inverse to that in ConceptNet, e.g.,
has part to /r/PartOf and has cause to /r/Causes. We analyze the overlap
between ConceptNet and Wikidata further in Section 4.2.</p>
        </sec>
        <sec id="sec-4-3-5">
          <title>6 For some ConceptNet relations, like /r/PartOf and /r/HasProperty, a similar pro</title>
          <p>
            posal to add detail comes from WebChild [
            <xref ref-type="bibr" rid="ref29">29</xref>
            ].
          </p>
          <p>Category</p>
          <p>ConceptNet</p>
          <p>Wikidata
distinctness /r/DistinctFrom di erent from (P1889)
antonymy /r/Antonym opposite of (P461)
synonymy /r/Synonym said to be the same as (P460)
similarity /r/SimilarTo partially coincident with (P1382)
derivation /r/DerivedFrom named after (P138), ctional analog of (P1074)
inheritance /r/IsA instance of (P31), subclass of (P279), subproperty of
(P1647)
meronymy /r/PartOf part of (P361), *has part (P527), *has parts of the
class (P2670)
material /r/MadeOf material used (P186), is a list of (P360), *has list
(P2354)
attribution /r/CreatedBy *product or material produced (P1056)
utility /r/UsedFor use (P366), *uses (P2283), used by (P1535)
properties /r/HasProperty color (P462), has quality (P1552), properties of this
type (P1963), Wikidata property (P1687), sex or
gender (P21)
causation /r/Causes *has cause (P828), has e ect (P1542), symptoms
(P780)
ordering /r/HasPrerequisite *followed by (P156), follows (P155)
context /r/HasContext facet of (P1269), eld of this occupation (P425),
health specialty (P1995), main subject (P921),
competition class (P2094), genre (P136), studied
by (P2579), eld of work (P101), a icts (P689),
*practiced by (P3095), depicts (P180), sport (P641)
other /r/RelatedTo see also (P1659), subject item of this property
(P1629)</p>
          <p>Finally, assuming that domain-speci c relations involve domain-speci c nodes,
we construct a set of `blacklist' nodes found in these relations. We ensure that
the remaining edges do not contain these domain-speci c nodes. This allows us
to lter out nodes like protein (Q8054), which has over 172 thousand incoming
edges, typically from child proteins.
3.3</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>Integration in the Commonsense Knowledge Graph</title>
        <p>
          The Commonsense Knowledge Graph (CSKG) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] is an existing resource that
consolidates information from seven commonsense sources, including
ConceptNet, Roget [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], Visual Genome [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], WordNet [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], and Wikidata. It is
represented using the KGTK [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] format with 10 columns, including the core elements
of an edge (id, node1, relation, and node2), their labels (e.g., node1;label,
and provenance information about an edge (source and sentence). Regarding
Wikidata, CSKG includes all the edges involving the inheritance (P279) relation.
        </p>
        <p>We integrate the commonsense subset of Wikidata presented in this paper
into CSKG. For this purpose, we adapt its columns to match those speci ed by
CSKG. The columns for which we lack information, such as sentence, are left
empty. We map the 50 most frequent relations to ConceptNet relations following
Table 2, and discard the small number of remaining statements.
3.4</p>
      </sec>
      <sec id="sec-4-5">
        <title>Implementation</title>
        <p>
          We implement the proposed selection of commonsense knowledge from Wikidata
by using the Knowledge Graph ToolKit (KGTK) [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. KGTK allows us to carry
out the proposed approach in a direct and simple way, despite the challenging
size and complexity of Wikidata. The full experiment reported in this paper is
coded as three Jupyter Notebooks which run on a laptop in under an hour.7 The
starting point is the entire Wikidata split into three Wikidata les in KGTK
tabular format (an edge le, a node le, and a quali ers le), as pre-computed
with the import-wikidata command.
        </p>
        <p>The concrete steps are as follows. We use a customized Python function to
create a subset of the node le that contains only concept nodes, by removing
nodes whose labels are either empty or contain a capital letter. We use the ifexists
join operator to lter out edges that do not connect two concepts from the edge
le.The command remove-columns trims all columns which are not necessary
for the experiment. After this, we run compact to remove duplicate edges. At
this point, we have a subset of edges that are about concepts (P1). To prepare
for the usage ltering and help human readability, we expand the set of columns
with the lift command to include the labels of the subject, the object, and the
relation. We use the aforementioned threshold-based lter to select edges for
which both the subject and the object are common concepts. Next, we inspect
the remaining edges in terms of their relations. We apply the manual mapping of
the top 50 relations (Section 3.2) to consolidate the remaining Wikidata graph
and make its edge types compatible with the format of CSKG.</p>
        <p>These steps produce the subset of Wikidata (Wikidata-CS ), which satis es
our principles (P1-P3), in the CSKG format. Finally, we use graph-statistics to
compute metrics over this subset. Wikidata-CS is available for download.8
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Analysis</title>
      <sec id="sec-5-1">
        <title>General Statistics</title>
        <p>
          Wikidata-CS consists of 71,243 nodes and 106,103 edges. It uses 44 edge types to
describe these edges. The mean node degree is 2.98, which is higher than in the
subclass of subset of Wikidata (2.45) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. The nodes with the highest PageRank
in the resulting graph are: arti cial entity (Q16686448), kinship (Q171318), and
class (Q16889133), which are more customary compared to the top nodes in the
un ltered subclass-of data, all of which describe biochemical concepts [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <sec id="sec-5-1-1">
          <title>7 https://github.com/usc-isi-i2/cskg/tree/master/wikidata 8 https://doi.org/10.5281/zenodo.3983029</title>
          <p>The ve most frequent relations in Wikidata-CS are: subclass of (P279),
instance of (P31), di erent from (P1889), part of (P361), and has part (P527).
The rst two account for 68.8% of all edges, indicating the the commonsense
knowledge in Wikidata mostly concerns taxonomic information. After mapping
the relations to ConceptNet, all commonsense knowledge corresponds to 15 edge
types. We perform de-duplication to consolidate edges that were expressed with
relations of the same group (e.g., subclass of and instance of ), or in two
directions with inverse properties (e.g., has cause and has e ect ). The distribution
of knowledge across these types is shown in Table 4 (last column). The nal set
has 101,771 edges, which is below 0.01% of the full Wikidata. Next, we compare
the content and size of Wikidata-CS to those of other commonsense KGs.
4.2</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>Comparison to other graphs in CSKG</title>
        <p>The integration of Wikidata-CS into CSKG allows one to easily compare its
content to other sources, such as ConceptNet. How does the size of the
commonsense subset in Wikidata compare to that of the other sources? How much of the
commonsense knowledge in Wikidata is already present in these other sources?
How much is missing? Conversely: how many edges are de ned in ConceptNet or
WordNet, but are lacking in Wikidata? We provide insight into these questions.</p>
        <p>Table 3 compares the size of Wikidata-CS with the other subgraphs within
CSKG. Despite the fact that Wikidata is by far the largest graph, its
commonsense subset ranks 6th in terms of edges and 5rd in terms of nodes, being only
larger than FrameNet and over 30 times smaller than ConceptNet.9 We also
inspect the overlap between the knowledge in Wikidata-CS and in other CSKG
sources that share the same relations. Since only the relations are mapped
between these sources, whereas the nodes are not, we assume equivalence of two
edges with identical subject labels, object labels, and edge types. The results are
given in Table 5. We observe that Wikidata-CS shares 2,386 edges with
ConceptNet, 1,613 with WordNet, and only 299 with Roget. Above all, this investigation
shows extremely little overlap between Wikidata-CS and the other three graphs.
The observation that commonsense knowledge in Wikidata is almost entirely
missing in the other KGs, and vice versa, validates the main pursuit of this
paper, and motivates the consolidation of these sources into a single graph.</p>
        <sec id="sec-5-2-1">
          <title>9 To be fair, the edge count of the other graphs may include edges with named entities</title>
          <p>(e.g., through the /r/IsA relation), which were excluded in Wikidata-CS.</p>
          <p>/r/IsA
/r/PartOf
/r/HasContext
/r/DistinctFrom
/r/HasPrerequisite
/r/UsedFor
/r/Antonym
/r/MadeOf
/r/Synonym
/r/HasProperty</p>
          <p>/r/Causes
/r/DerivedFrom
/r/SimilarTo
/r/CreatedBy
/r/RelatedTo
edges (Wikidata-CS)
edges (Wikidata)
2017-12-27
2018-12-10</p>
          <p>2020-05-04
nodes (Wikidata-CS)
nodes (Wikidata)</p>
          <p>We note that, with this lexical overlap approach, an edge might be counted
multiple times if its nodes have multiple labels. This is why WordNet has over
500k edges in total in Table 5, while having little over 100k in the original data.
Future work should investigate more semantic overlap estimation methods.
4.3</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Evolution of the Wikidata commonsense knowledge over time</title>
        <p>The size of Wikidata has been growing at a tremendous rate. In only 30 months,
its number of edges nearly tripled and the number of nodes doubled (Table 4). A
natural question arises: has the size of its commonsense subset been growing at a
similar rate? To investigate this question, we consider three versions of Wikidata,
with dates: 2017-12-27, 2018-12-10, and 2020-05-04. For fair comparison, we
apply our approach (Section 3.2) on the three Wikidata dumps.</p>
        <p>For each of the Wikidata dumps, we present the number of edges per relation
in Table 4. Firstly, while the number of edges in Wikidata-CS has multiplied
for nearly all relations (except RelatedTo), its growth is slightly slower than
the full Wikidata - 244% vs 273% between December 2017 and May 2020. A
similar trend holds for the December 2018 version. Hence, despite the apparent
interest in enriching the commonsense knowledge subset of Wikidata, this has not
been a priority so far. Secondly, we see larger growth of the relations SimilarTo,
HasPrerequisite, and DistinctFrom relative to the others. This shows that certain
commonsense aspects (like di erentiating potentially confusing concepts) may be
more relevant to the Wikidata community and its applications than others.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Discussion</title>
      <p>The commonsense knowledge in Wikidata could bene t applications like question
answering or entity linking. For instance, let's consider the following true/false
question from the CycIC dataset:10 Suppose something is under the table. It is
either a toaster or a correction tape dispenser. You can tell that it isn't a kitchen
tool. True or False: The thing under the table is a correction tape dispenser. The
key implicit knowledge in this example is the fact that the toaster is often found
in the kitchen, while the dispenser is not. Luckily, over 100k such commonsense
facts are part of our Wikidata-CS collection, and could help a downstream system
to reason over such questions.</p>
      <p>Still, we noted that only a neglegible portion of Wikidata directly describes
commonsense knowledge today. Given the considerable community involved in
Wikidata and Wikipedia, and the commonsense relations identi ed in this paper,
we propose for the commonsense knowledge in Wikidata to be substantially
enriched in the near future. We discuss three actions towards this goal:</p>
      <sec id="sec-6-1">
        <title>1. Integration of ready commonsense sources into Wikidata A num</title>
        <p>ber of commonsense sources, like ConceptNet and ATOMIC, contain much
complementary knowledge that could be included into Wikidata (cf. Table 3). Our
prior work on consolidating their formats and modeling principles into CSKG
enables their seamless integration into Wikidata, when so desired. At present,
CSKG contains 5.89 million edges, expressed through 58 relations. The
mappings in Table 2 could be used as a starting point, whereas missing relations
might need to be added to Wikidata. Data licensing may be a a roadblock here.</p>
        <p>2. Generalizing over instance-level knowledge Much of the
commonsense knowledge in Wikidata is indirectly expressed through its instance-level
knowledge. While Barack Obama being born in Hawaii is not a commonsense
fact, the fact that humans have a birthplace is. Furthermore, all humans have
a single birthplace, i.e., it is a functional property. One could think of other
generalizations as well, e.g., if many locations belong to countries, it is common
sense thinking that any location would belong to a country. Such commonsense
information is not directly represented in Wikidata, yet it could be inferred by
statistical generalization over instance-level knowledge.</p>
        <p>
          3. Missing knowledge types The Wikidata model de nes the notion of
quali ers, which would be ideal to represent much commonsense knowledge.
10 https://leaderboard.allenai.org/cycic/submissions/get-started
However, in many cases, Wikidata does not only lack a commonsense fact, but
also the relation or the quali er that would express it. For instance, while many
quali ers (e.g., minimum and maximum value) express quantities, no quali er
describes typical/expected quantity.11 This could express that spiders typically
have eight legs, while chairs have four. Quali ers for expressing a purpose or a
goal (e.g., one participates in a competition in order to win) are also missing.
Besides quali ers, it might be of use to include relations that are currently missing,
like typical properties of concepts (e.g., elephants are heavy), or their symbolism
(e.g., red is a symbol of danger). The actual information could be extracted from
unstructured sources, like Wikipedia, or reused from previous extractions [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
6
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>Wikidata has been growing tremendously in terms of both size and popularity.
Consequently, it has been attracting interest from applications that require
background knowledge in order to ll in gaps, such as question answering and entity
linking. In this paper, we studied the commonsense knowledge coverage of
Wikidata and its complementarity to existing commonsense graphs. Starting from
three key principles of commonsense, we devised a three-step ltering approach
that distinguishes concepts from named entities, favors common concepts, and
general-domain knowledge types. Here, we also created mappings between the
relations in the commonsense subset of Wikidata (Wikidata-CS) and those in
ConceptNet, which allowed us to integrate Wikidata-CS into CSKG, an
existing consolidated graph of commonsense knowledge. We analyzed the content of
Wikidata-CS and compared it to other existing sources, like ConceptNet and
WordNet, noting that while Wikidata contains useful and novel commonsense
knowledge that complements other sources, its coverage of commonsense
knowledge is currently largely incomplete. We propose three directions to improve
this in the future: by inclusion of the knowledge from the Commonsense
Knowledge Graph, by generalizing over existing instance-level knowledge in Wikidata,
and by inclusion of missing knowledge types that are relevant for representing
commonsense knowledge. In addition, subsequent research should evaluate the
quality of Wikidata-CS and its relevance for commonsense reasoning, based on
user studies and downstream tasks.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>We would like to thank Daniel Garijo for the very helpful comments and
suggestions on drafts of this paper. We would also like to thank Craig Milo Rogers for
his help with KGTK. We are grateful for the reviewer comments. This material
is based upon work sponsored by the DARPA MCS program under Contract
No. N660011924033 with the United States O ce Of Naval Research and by Air
Force Research Laboratory under agreement number FA8750-20-2-10002.
11 https://www.wikidata.org/wiki/Wikidata:List of properties/Wikidata quali er</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>C.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fillmore</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          :
          <article-title>The berkeley framenet project</article-title>
          .
          <source>In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics</source>
          , Volume
          <volume>1</volume>
          . pp.
          <volume>86</volume>
          {
          <issue>90</issue>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Balaraman</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Razniewski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nutt</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Recoin: relative completeness in wikidata</article-title>
          .
          <source>In: Companion Proceedings of the The Web Conference</source>
          <year>2018</year>
          . pp.
          <volume>1787</volume>
          {
          <issue>1792</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cetoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bragaglia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>O</given-names>
            <surname>'Harney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.D.</given-names>
            ,
            <surname>Sloan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Akbari</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>A neural approach to entity linking on wikidata</article-title>
          .
          <source>In: European Conference on Information Retrieval</source>
          . pp.
          <volume>78</volume>
          {
          <fpage>86</fpage>
          . Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Darari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prasojo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Razniewski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nutt</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Cool-wd: A completeness tool for wikidata</article-title>
          .
          <source>CEUR-WS. org</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Delpeuch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Opentapioca: Lightweight entity linking for wikidata</article-title>
          . arXiv preprint arXiv:
          <year>1904</year>
          .
          <volume>09131</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Elazar</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahabal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramachandran</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bedrax-Weiss</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>How large are lions? inducing distributions over quantitative attributes</article-title>
          . arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>01327</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ettinger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>What bert is not: Lessons from a new suite of psycholinguistic diagnostics for language models</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>8</volume>
          ,
          <issue>34</issue>
          {
          <fpage>48</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Farber,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Menne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Rettinger</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>A comparative survey of dbpedia, freebase, opencyc, wikidata, and yago</article-title>
          .
          <source>Semantic Web Journal</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ), 1{
          <issue>5</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gunning</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>: Machine common sense concept paper</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>07528</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponzetto</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          :
          <article-title>Collaboratively built semi-structured content and arti cial intelligence: The story so far</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>194</volume>
          ,
          <issue>2</issue>
          {
          <fpage>27</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ilievski</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garijo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chalupsky</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Divvala</surname>
            ,
            <given-names>N.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Schwabe</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          , et al.:
          <article-title>Kgtk: A toolkit for large knowledge graph manipulation and analysis</article-title>
          .
          <source>arXiv preprint arXiv:2006</source>
          .
          <volume>00088</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ilievski</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szekely</surname>
          </string-name>
          , P., Cheng, J.,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qasemi</surname>
          </string-name>
          , E.:
          <article-title>Consolidating commonsense knowledge</article-title>
          . arXiv preprint arXiv:
          <year>2006</year>
          .
          <volume>06114</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kipfer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Roget's 21st century thesaurus in dictionary form</article-title>
          <source>(ed. 3)</source>
          . new york: The philip lief group (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Krishna</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Hata</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kravitz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalantidis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shamma</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          , et al.:
          <article-title>Visual genome: Connecting language and vision using crowdsourced dense image annotations</article-title>
          .
          <source>International Journal of Computer Vision</source>
          <volume>123</volume>
          (
          <issue>1</issue>
          ),
          <volume>32</volume>
          {
          <fpage>73</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Luggen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Difallah</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarasua</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demartini</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cudre-Mauroux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Nonparametric class completeness estimators for collaborative knowledge graphs|the case of wikidata</article-title>
          . In: International Semantic Web Conference. pp.
          <volume>453</volume>
          {
          <fpage>469</fpage>
          . Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Francis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nyberg</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oltramari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Towards generalizable neuro-symbolic systems for commonsense question answering</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>14087</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Marcus</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>The next decade in ai: four steps towards robust arti cial intelligence</article-title>
          . arXiv preprint arXiv:
          <year>2002</year>
          .
          <volume>06177</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>G.A.</given-names>
          </string-name>
          :
          <article-title>Wordnet: a lexical database for english</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>38</volume>
          (
          <issue>11</issue>
          ),
          <volume>39</volume>
          {
          <fpage>41</fpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tandon</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Domain-targeted, high precision knowledge extraction</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>5</volume>
          ,
          <issue>233</issue>
          {
          <fpage>246</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Mulang</surname>
            ,
            <given-names>I.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vyas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shekarpour</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sakor</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
          </string-name>
          , J.:
          <article-title>Context-aware entity linking with attentive neural networks on wikidata knowledge graph</article-title>
          . arXiv preprint arXiv:
          <year>1912</year>
          .
          <volume>06214</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amodei</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Language models are unsupervised multitask learners</article-title>
          .
          <source>OpenAI Blog</source>
          <volume>1</volume>
          (
          <issue>8</issue>
          ),
          <volume>9</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Razniewski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pal</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Sakhadeo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>Commonsense properties from query logs and question answering forums</article-title>
          .
          <source>In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management</source>
          . pp.
          <volume>1411</volume>
          {
          <issue>1420</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Sap</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            <given-names>Bras</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Allaway</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Bhagavatula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Lourie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Rashkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Roof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.A.</given-names>
            ,
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          :
          <article-title>Atomic: An atlas of machine commonsense for ifthen reasoning</article-title>
          .
          <source>In: Proceedings of the AAAI Conference on Arti cial Intelligence</source>
          . vol.
          <volume>33</volume>
          , pp.
          <volume>3027</volume>
          {
          <issue>3035</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Speer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Havasi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Conceptnet 5.5: An open multilingual graph of general knowledge</article-title>
          .
          <source>In: Thirty-First AAAI Conference on Arti cial Intelligence</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Speer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jewett</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nathan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <source>Luminosoinsight/wordfreq: v2.2 (Oct</source>
          <year>2018</year>
          ). https://doi.org/10.5281/zenodo.1443582, https://doi.org/10.5281/zenodo.1443582
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Storks</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chai</surname>
          </string-name>
          , J.Y.:
          <article-title>Commonsense reasoning for natural language understanding: A survey of benchmarks, resources, and approaches</article-title>
          . arXiv preprint arXiv:
          <year>1904</year>
          .01172 pp.
          <volume>1</volume>
          {
          <issue>60</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Suh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halpin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
          </string-name>
          , E.:
          <article-title>Extracting common sense knowledge from wikipedia</article-title>
          .
          <source>In: Proceedings of the Workshop on Web Content Mining with Human Language Technologies at ISWC</source>
          . vol.
          <volume>6</volume>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Tandon</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Melo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>Webchild 2.0: Fine-grained commonsense knowledge distillation</article-title>
          .
          <source>In: Proceedings of ACL</source>
          <year>2017</year>
          ,
          <article-title>System Demonstrations</article-title>
          . pp.
          <volume>115</volume>
          {
          <issue>120</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Tanon</surname>
            ,
            <given-names>T.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Yago 4: A reason-able knowledge base</article-title>
          .
          <source>In: European Semantic Web Conference</source>
          . pp.
          <volume>583</volume>
          {
          <fpage>596</fpage>
          . Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Vrandecic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Krotzsch, M.:
          <article-title>Wikidata: a free collaborative knowledgebase</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>57</volume>
          (
          <issue>10</issue>
          ),
          <volume>78</volume>
          {
          <fpage>85</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>