WikidataComplete: Knowledge
      Graph Completion using Question-Answering

                                   Kunpeng
    Guo1,3, Bernhard Kratzwald2, Dennis Diefenbach1,3, and Christophe Gravier1
         1
             CNRS, Laboratoire Hubert Curien UMR 5516, Université Jean Monnet
                2
                   Chair of Management Information Systems, ETH Zurich
                        3
                          The QA Company, Saint-Etienne, France


        Abstract. In this demonstration, we showcase WikidataComplete, a KG
        completion system for Wikidata based on Question Answering (QA). We
        showcase (https://wikidatacomplete.org) that it is possible to leverage Ques-
        tion Answering to extract new facts from Wikipedia and jointly provide
        explanations for these extracted facts. Providing explanations has two ad-
        vantage: first it helps human annotators to verify if the generated facts are
        correct, second it allows ingesting the approved facts in the KG together
        with provenance information. Ultimately, we measure the accuracy of such
        a system for the first time on a real scenario including human annotations.

        Keywords: Knowledge Graph Completion, Question Answering, Natural
        Language Processing, Human in the loop.


1     Introduction

This demo paper discusses Question Answering (QA) based Knowledge Graph (KG)
completion. Knowledge graphs (also known as knowledge bases) are an important
data repository for a number of Natural Language Processing (NLP) tasks like
entity linking [6], query expansion [1], and QA [3]. While KGs are critical for the
performance of these downstream NLP tasks, they are however often incomplete. For
example between the Wikidata entities of type TV program (Q15416), 96% do not
have a creator and 94% do not have a production company (P272). KG completion
can be loosely grouped into two paradigms. The first method is KG completion
via link prediction which predicts missing relation based on the existing graph
structure [5, 7]). The second method is to complete the KG by leveraging missing
relations from free text [8, 9]. WikidataComplete falls into the second category. For
the sake of the demonstration, we will showcase to the audience the facts that
WikidataComplete can leverage for a finite set of 20 Wikidata properties taken
from different Wikipedia categories (TV programs, Sport, Pet, Magazine, Company,
Video Games. . . ). We give access to the 6,045 newly extracted facts as well as to the
lightweight annotation interface that we offer to editors. On the demo platform, the
1
    Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons
    License Attribution 4.0 International (CC BY 4.0).
2       Kunpeng Guo et al.


          Fig. 1: QA-based fact extraction pipeline in WikidataComplete.


conference attendee will see facts that are not yet approved by an editor. The attendee
will see the explanations for the extracted facts and be able either to approve or reject
them. If the user is logged in Wikidata, the new assertion will be added to Wikidata
on the spot at the conference booth. WikidataComplete builds upon [4] for the fact
extraction system. We recap that extraction process in Section 2 and we provide a
new real-field evaluation of the system based on human annotators in Section 3.

2    QA-based fact extraction
We build upon [4] for extracting and selecting new facts from Wikipedia using QA.
We recap this approach using a running example based on the Wikidata entity
Q11066637 (the television program ”Jesus, Mary and Joseph!”). For the sake of
illustration, we assume that we would like to extract the missing information about
the property ”original broadcaster” (P449) – the possible properties for a given
entity are assumed to be known from the KG ontology. We will later discuss how
this generalize to other entities and properties. WikidataComplete starts with the
Wikipedia page corresponding to the entity Q11066637. Then, a four step process
(M1–M4) extracts information about the missing property(see Figure 1).
(M1) Sentence Selection takes the Wikipedia article of Q11066637 together with
the possible labels (”original broadcaster”, ”original channel”, ”channel”, ”TV chan-
nel”) for the property P449 . The module splits the whole article into sentences and
then ranks them by the cosine-similarity scores calculated on their Tf-Idf vectors.
(M2) Relation Extraction receives the ranked sentences and extracts a text span
that is appropriate to fulfill the missing property. This task is accomplished by casting
the problem into a QA task. In the illustrative task, the question should be ”Who is
the original broadcaster of ’Jesus, Mary, and Joseph!’?”, but the generated question
is simplified to ”original broadcaster?” to avoid the human supervision in generating
questions. BERT [2] is used to encode the inference, but the final softmax layer over
different answer spans is removed with the goal of aggregating different results from
the sentences. This process is repeated for all ranked candidate sentences given by M1.
(M3) Re-Ranking receives all the candidate answers in text span and generates
statistical features (span score, answer length, etc.) that maps the answer with the
question. The module re-ranks all the candidate answers using these features.
(M4) Entity Linking links the text span to the corresponding entity in Wikidata.
This is accomplished by using a pre-trained KG linker model (see [4]) which signifi-
cantly improves the precision of linking and avoid mislinking in ambiguous conditions.
In addition, it helps to filter out some irrelevant answers produced by the module M2.
    WikidataComplete: Knowledge Graph Completion using Question-Answering               3

We use the described pipeline (M1–M4) to extract new facts in two different ways.
First, we extract new facts based on a fixed relation as illustrated by our running
example in this section for P449. We also run it for a given category (e.g., facts about
television programs, sport, or celebrities). In this case, we first compute which are
the most used properties for instances of a specific category. Then we consider all
instances having an associated Wikipedia article were these properties are missing.
In our demonstration, the user is able to select facts either by selecting a specific
category or by selecting a specific property. The fact distribution for the categories
presented in the demo is the following: TV program (Q15416) : 1573 facts, Sport
(Q349) : 746 facts, Pet (Q39201) : 107 facts, Magazine (Q41298) : 536 facts, TV series
(Q5398426) : 1759 facts, Company (Q783794) : 543 facts, and Video Games (Q7889)
: 781 facts.

3    User-Verification of the generated facts

Each newly extracted fact is presented to the user together with the corresponding
evidence – the sentence with the fact utterance. Links to the corresponding Wikidata
and Wikipedia page are provided so that the user can explore the corresponding entity
and textual description. Based on this information a user is able to approve, reject, or
skip a newly proposed fact. Note that the editing effort of a user is reduced just to a
click on yes, no, or skip. If approved, the fact is also added to Wikidata as a statement.
The new statement is inserted together with a reference to the Wikipedia article.


                              Benchmark vs Human annotation
                            Pb(1) Pb(2) Ph                          Pb(1) Pb(2) Ph
 P178 (Developer)           41.5% 62.4% 73.1% P404 (Game mode)       44% 87.6% 62%
 P407 (Lang. of work)       34.8% 73.8% 88.67% P4743 (Animal breed) 78.8% 97.3% 68%
 P495 (Country origin)      59.9% 83.7% 56.1% P527 (Has part)        8% 15% 9.6%
 P641 (Sport)               53.3% 85.3% 15.38% P749 (Parent org.)   39.1% 52.9% 28%
 P1056 (Product or mat.)     5.3% 9.3% 12% P123 (Publisher)         36.8% 57.3% 60%
 P127 (Owned by)            12.5% 26.7% 30% P161 (Cast members) 63.6% 75% 51%
 P17 (Country)              42.9% 93.3% 56.9% P19 (Place of birth)  27.8% 40.1% 14%
 P364 (Film/TV orig. lang.) 35.8% 75.8% 82% P400 (Platform)         15.5% 50% 76%
 P437 (Distribution format) 2% 6.9% 34% P449 (Orig. broadcaster) 55.5% 73.4% 86%
 P57 (Director)             34.2% 43.1% 42.3% P921 (Main subject)   5.3% 18.3% 72%
Table 1: Performance comparison for 20 randomly selected properties between distant
supervision (Pb1) and human annotators (Ph). (Pb2) is the same as Pb1 where we
remove triples for which the object cannot be linked.


The system performance was originally reported to be 49.4%. The performance for
a subset of properties is provided in Table 1. Nonetheless, the dataset was built using
distant supervision: it is not guaranteed that when the correct answer is found also the
corresponding evidence is correct. Along this demo, we newly compare the performance
of the benchmark created using distant supervision [4] against human annotations.
We make this comparison for the 20 randomly selected Wikidata properties in Table 1.
We started by sampling Wikidata triplets that meet the following two requirements:
4       Kunpeng Guo et al.

(i) the Wikipedia article of the subject entity can be found; (ii) in the Wikipedia
article of the subject at least one of object entity’s labels appears. We choose 1,000
examples for training the system and evaluated it on a different set containing 500
examples. The performance results (Pb) for each property are reported in Table 1.
We report the performance of the pipeline over all the statements (Pb1) and also
the performance of the pipeline over all the statements for which the object could be
linked (Pb2). Moreover we measured the performance using human annotators (Ph).
For each property human annotators evaluated 50 of the generated facts. We can
only present to humans statements for which the object could be linked. The metric
(Ph) needs therefore to be compared with the metric (Pb2). The results of (Pb2) and
(Ph) are very similar. We can therefore conclude that: first the system is still mostly
extracting facts from sentences that are also an evidence for the facts, and second
that our selection strategy allows to identify entities for which the missing statements
can be found in the corresponding Wikipedia articles.
     The current version of the demo is hosted at: https://wikidatacomplete.org. Only
facts that were not yet approved by a user are offered – validated ones were added
to Wikidata.


4    Conclusion
In this paper, we presented WikidataComplete, a KG Completion system, which
extracts new facts from unstructured data, i.e. Wikipedia articles, using QA. The
system has strong scalability so that it can potentially generate thousands of new
pieces of knowledge, which makes a significant contribution to the completeness of KGs.
Moreover, based on the convenience of the system, it also greatly facilitates Wikidata
editors and contributors. In the future, we would like to investigate how the extraction
process can be carried out on the fly so that users can choose the subject and property
of their interests directly. We believe that if we could build a more powerful and
complete version of such a system, it will significantly boost the KG completion process.


References
1. Dalton, J., Dietz, L., Allan, J.: Entity query feature expansion using knowledge base
   links. In: SIGIR. pp. 365–374 (2014)
2. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional
   transformers for language understanding. arXiv:1810.04805 (2018)
3. Diefenbach, D., Maret, P.: Qanswer: A question answering prototype bridging the gap
   between a considerable part of the lod cloud and end-users. In: WWW (2019)
4. Kratzwald, B., Kunpeng, G., Feuerriegel, S., Diefenbach, D.: Intkb: A verifiable
   interactive framework for knowledge base completion. In: COLING 2020 (2020)
5. Lao, N., Mitchell, T., Cohen, W.: Random walk inference and learning in a large scale
   knowledge base. In: EMNLP (2011)
6. Mendes, P.N., Jakob, M., Garcı́a-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light
   on the web of documents. In: Semantics (2011)
7. Riedel, S., Yao, L., McCallum, A., Marlin, B.M.: Relation extraction with matrix
   factorization and universal schemas. In: NAACL (2013)
   WikidataComplete: Knowledge Graph Completion using Question-Answering           5

8. Weikum, G., Theobald, M.: From information to knowledge: harvesting entities and
   relationships from web sources. In: SIGMOD-SIGACT-SIGART (2010)
9. West, R., Gabrilovich, E., Murphy, K., Sun, S., Gupta, R., Lin, D.: Knowledge base
   completion via search-based question answering. In: WWW (2014)