Question Answering over Knowledge Graphs

                               Sareh Aghaei [0000-1111-2222-3333]

    Semantic Technology Institute Innsbruck, Department of Computer Science, University of
                                      Innsbruck, Austria
                                 sareh.aghaei@sti2.at


        Abstract. With the increasing maturity of large-scale knowledge graphs, ques-
        tion answering over knowledge graphs has become a crucial topic and attracted
        massive attention. A knowledge graph-based question answering system targets
        to leverage facts in knowledge graphs to answer natural language questions and
        assist users to access the meaningful and pertinent knowledge, without knowing
        data structures. This research intends to propose an approach to answer natural
        language questions over knowledge graphs in three main steps including identi-
        fication of optimal subgraphs, creation of candidates and answer selection. The
        proposed approach leverages the state-of-the-art techniques including graph
        alignment, neural networks and natural language process to generate more accu-
        rate answers for questions, either simple questions or multi-hop questions. Ex-
        periments are to be conducted over different knowledge graphs to demonstrate
        the effectiveness of the approach, which can outperform novel existing ap-
        proaches.


        Keywords: Question Answering, Knowledge Graphs, Graph Alignment, Neu-
        ral Networks.


1       Introduction

With the rapid progress of the data web, a large amount of structured data has become
available on the web in the form of knowledge graphs (KGs). A knowledge graph is
huge semantic net which integrates various, inconsistent and heterogeneous infor-
mation resources to represent knowledge about different domains [1]. Basically, a KG
is a directed graph where its nodes are entities with different types and attributes and
its edges are relations of entities. In KGs, each directed edge, along with its head entity
and tail entity, is considered as a triple which is also named a fact. Numerous real-world
KGs such as DBPedia [2], Freebase [3] contain millions or billions of facts. The in-
creasing volume and complexity of the data structures make it difficult for end users to
access the substantial and profitable knowledge in the KGs. In order to bridge the gap,
question answering over KGs has been proposed and attracted massive attention [4].


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License
  Attribution 4.0 International (CC BY 4.0).
    Making the facts of KGs accessible and beneficial for end users is one of the primary
goals of question answering (QA) over linked data [4]. QA over knowledge graph pro-
vides a way for artificial intelligence systems to incorporate knowledge graphs as a key
ingredient to answer human questions, which can benefit a variety of applications, such
as search engine design, automatic customer service, smart home devices, chatbots and
search engine optimization (SEO) [5]. Research projects such as WordLiftNG [6] and
KI-NET [7] show that QA over KGs is essential for construction of the most SEO-
friendly websites and chatbots in industrial areas, respectively, and knowledge graphs
and their supporting systems are already being practically deployed, for example, in the
domain of tourism marketing on the Web [8].
   Considering the number of KG triples required to obtain answers, natural questions
can be summarized into two groups: simple questions and complex questions. A simple
question which is named single-hop question, requires only one triple to fetch the an-
swer whereas a complex question which is called multi-hop question needs two or more
triples [9, 10]. The research’s key motivation is proposing an approach to improve an-
swering multi hop questions over knowledge graphs.


2      State of the art

QA over KGs has attracted wide attention from researchers and the research progress
can be generally categorized into three groups: rule-based techniques, information re-
trieval-based techniques and semantic parsing techniques [10].

2.1    Rule-based techniques

   Most of the early research in this field leverage on predefined rules or templates to
parse questions and provide logical forms. Defining templates leads to limited scalabil-
ity and necessity for researchers to be familiar with the linguistic knowledge [10]. Alt-
hough, there are some techniques which try to automatically or semi-automatically gen-
erate templates [10]. In [11], a question answering system has been proposed to auto-
matically learn utterance-query templates with alignments between the constituents of
the question utterance and the KG query through integer linear programming. These
templates are generated in an offline step by distant supervision [12] at training step.
Then, in an online step, the templates are used to answer structurally questions of users.

2.2    Information retrieval-based techniques

   For each given natural language question, these techniques extract the entities of
interest and determine the links between the extracted entities and the KG. Then, topic-
entity-centric sub-graphs are extracted and the nodes of the sub-graphs are assumed as
candidate answers. Based on the features extracted from the questions and candidate
answers, the matching scores between the encoded answers and questions are calculated
and the final answer is selected. Although, these techniques overcome manually defined
templates and rules, but suffer from model interpretability and lack of training data [10].
   H. Sun et al. [13] has proposed an open domain QA, namely PullNet which uses an
iterative process to construct a question-specific subgraph that contains nodes relevant
to the question. The initial subgraph is constructed only based on the question and is
expanded iteratively. In each iteration, a graph convolutional neural network (CNN) is
used to detect nodes that should be expanded on the KG. Then, another graph CNN is
employed to detect the answer from the constructed subgraph.
   Stepwise reasoning network (SRN) [9] is a neural method based on reinforcement
learning which considers multi-hop QA as a sequential decision problem. SRN lever-
ages path search and beam search in order to fetch answer and reduce the number of
candidates. To enhance the unique impact of different parts of a question, the attention
mechanism and neural networks are used to determine the parts that need more focus.
Moreover, a potential-based reward shaping strategy is applied to address the delayed
and sparse reward problem.

2.3    Semantic parsing techniques

   These methods usually convert natural language questions into executable queries
or intermediate query forms such as query graphs based on neural semantic parsing
with high scalability and capability.
   Zhu et. al. [14] proposed a knowledge-based QA by tree-to-sequence learning. The
basic idea behind the system can be summarized in five steps. In the first step, it deter-
mines the information (entities, types and numbers) in the KG which is referred by the
question, so it leads to constraining the construction of queries. In the next step, the
constraints are used to construct candidate queries for the given question. Each candi-
date query is encoded into a set of hidden states which are decoded into the given ques-
tion in the fourth step. Finally, using the decoding probabilities, the best query is cho-
sen. In order to capture contexts of an entity or a relation in a query during the encoding
phase, a tree-based bi-directional LSTM is used. A tree-based LSTM runs from all
leaves to the root while the other one runs reversely. During decoding, a generating
mode and a referring mode are mixed to capture different levels of correlations between
queries and questions.
   The key idea behind [15] is to leverage graphs to represent questions. This paper
conceptualizes semantic parsing as a graph matching problem. Questions are parsed
using combinatory categorial grammar and then ungrounded semantic graphs are cre-
ated. Next, the created ungrounded semantic graphs are mapped to the KG subgraphs
through mapping edge labels to KG relations, type nodes to KG entity types, and entity
nodes to KG entities. The most pertinent semantic graph is selected among mapped
candidates and finally is converted to an executable query to fetch the answer.


3      Problem statement and contribution

This research delves into examining the problem of question answering over knowledge
graphs and the following research challenges need to be dealt with:
Challenge 1: complex semantic information in multi-hop questions.
Challenge 2: sparsity and incompleteness in KG.
Challenge 3: high time complexity to detect answers in large-scaled KGs.
   Basically, complex semantic information leads to poor performance in analyzing of
multi-hop questions. Moreover, KGs are often incomplete and sparse with the sparsity
resulting in low recall for multi-hop questions. Extracting an optimal question subgraph
which contains the answer and is small enough, results in reducing time complexity.
   Different from existing methods, the current research intends to propose a solution
to improve the performance of multi-hop QA over KGs. To propose the solution, the
following questions should be addressed.
Question 1: which methods can be used to build the optimal subgraph?
Question 2: how to consider the semantic information in multi-hop questions?
Question 3: which techniques should be applied to encode questions and entities of
KGs?
Question 4: how to leverage graphs to represent questions and graph alignment to map
question graphs to KGs?
Question 5: how to select the answer(s) among the answer candidates?
   Therefore, the underlying hypothesis of this research is as follows:
Hypothesis: neural networks, graph alignment, graph embedding and natural language
processing can improve the performance of question answering over knowledge graphs
in terms of time complexity, recall and precision especially in multi-hop questions.
   The main contributions can be summarized into (i) reducing search space to find
answers in KGs through building optimal subgraphs, (ii) using graph embedding tech-
niques in order to address incompleteness and sparsity in KGs and (iii) employing graph
alignment and neural networks to answer multi-hop questions.


4      Methodology

The current study intends to present a new guided approach to QA over KGs that im-
proves answering natural language questions posed over the KG, either simple ques-
tions or multi-hop questions. This approach includes three main steps: identification of
optimal-subgraphs, creation of candidates, answer selection.

4.1    Identification of optimal subgraphs

    In this step, the entity of interest in the natural language question, namely topic en-
tity, is recognized and then linked to the KG. The linked part of the KG as the optimal
subgraph is more likely to contain the answer. Based on the category which the pro-
posed approach belongs to, a variety of techniques can be employed in this step includ-
ing: part-of-speech tagging, entity recognition, graph embedding, heuristic algorithms
and neural networks.

4.2    Creation of candidates

   The candidate answers are generated based on the identified optimal subgraph. To
achieve this goal, graph alignment, beam search, neural networks, computing semantic
similarity and other technologies should be examined.
4.3    Answer selection

   According to the generated answers, the more accurate response is selected. Some
techniques such as neural networks, reinforcement learning, comparing embeddings of
answers to the natural question can be leveraged to select the more pertinent answer.


5      Evaluation Plan

In order to evaluate how the proposed approach outperforms the available state-of-the-
art approaches, the benchmark datasets including WebQuestionsSP [21] and MetaQA
[22] would be applied. The purpose of this evaluation is to substantiate the claim to be
able to improve answering multi-hop questions over knowledge graphs. The evaluation
would be carried out using metrics such as recall, precision and F1-score. For the prac-
tical evaluation, the proposed approach can be applied in the WordLiftNG project and
KI-Net project.


6      Conclusion

QA over KGs has emerged as a significant research area over the last few years. KG
QA aims to automatically answer natural language questions via well-structured rela-
tion information between entities stored in a KG. This study leverages various tech-
niques in various fields including KGs, graph alignment, natural language processing
and neural networks to provide a new guided approach which consists of three main
steps: identification of optimal subgraphs, creation of candidates and answer selection.
Experiments are to be performed to show that the proposed approach is competitive
compared to the state-of-the-art.


Acknowledgements

This research has been supported by the project WordLiftNG within the Eureka, Euro-
stars Programme (grant agreement number 877857 with the Austrian Research Promo-
tion Agency (FFG)) and the project KI-NET within the Interreg ̈Osterreich-Bayern
2014-2020 programme (grant agreement number AB 292). I would like to express my
gratitude to Assoc.-Prof. Dr. Anna Fensel for her support and insightful comments.


References
 1. Stroh, E., Mathur, P.: Question Answering Using Deep Learning, https://cs224d.stan-
    ford.edu/reports/StrohMathur.pdf, last accessed 1.6.21 (2016).
 2. Lehmann, J., Isele, R., M. Jakob, Jentzsch, A., Kontokostas, D., Mendes, P. N., Hellmann,
    S., Morsey, M., Kleef, P.V., Auer, S. and Bizer, C.: DBpedia–A Large-Scale, Multilingual
    Knowledge Base Extracted From Wikipedia, Semantic Web, 6, pp. 167–195 (2015).
 3. Bollacker, K., Evans, C., Paritosh, P., Sturgem T., Taylor, J.: Freebase: a collaboratively
    created graph database for structuring human knowledge, In: the 2008 ACMSIGMOD In-
    ternational Conference on Management of Data, pp. 1247–1250 (2008).
 4. Ait-Mlouk, A., Jiang, L.: KBot: A Knowledge Graph Based ChatBot for Natural Language
    Understanding Over Linked Data, in IEEE Access, vol. 8, pp. 149220-149230 (2020).
 5. Huang, X., Zhang, J., Li, D., Li, P.: Knowledge Graph Embedding Based Question Answer-
    ing, In: the Twelfth ACM International Conference on Web Search and Data Mining, pp.
    105–113 (2019).
 6. WordLift New Generation, https://wordlift.io/ng/, last accessed 1.6.21 (2020).
 7. KI-NET: AI-based optimization in production, https://scch.at/en/das-projects-details/ki-
    net, /, last accessed 1.6.21.
 8. Fensel, A., Akbar, Z., Kärle, E., Blank, C., Pixner, P., Gruber, A.: Knowledge Graphs for
    Online Marketing and Sales of Touristic Services, Information, 11(5), 253 (2020).
 9. Qiu, Y., Wang, Y., Jin, X., Zhang, K.: Stepwise Reasoning for Multi-Relation Question
    Answering over Knowledge Graph with Weak Supervision, In: the 13th International Con-
    ference on Web Search and Data Mining, pp. 474-482 (2020).
10. Fu, B., Qiu, Y., Tang, C., Li, Y., Yu, H., Sun, J.: A Survey on Complex Question Answering
    over Knowledge Base: Recent Advances and Challenges, ArXiv, vol. abs/2007.13069
    (2020).
11. Abujabal, A., Yahya, M., Riedewald, M., Weikum, G.: Automated Template Generation for
    Question Answering over Knowledge Graphs, In: the 26th International Conference on
    World Wide Web, pp. 1191–1200 (2017).
12. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction with-
    out labeled data, In: the Joint Conference of the 47th Annual Meeting of the ACL and the
    4th International Joint Conference on Natural Language Processing, pp. 1003-1011 (2009).
13. Sun, H., Bedrax-Weiss, T., Cohen, W.: PullNet: Open do-main question answering with
    iterative retrieval on knowledgebases and text, In: the 2019 Conference on Empirical Meth-
    ods in Natural Language Processing and the 9th International Joint Conference on Natural
    Language Processing (EMNLP-IJCNLP), pp. 2380–2390 (2019).
14. Zhu, S., Cheng, X., Su, S.: Knowledge-based question answering by tree-to-sequence learn-
    ing, Neurocomputing, vol. 372, pp. 64-72 (2020).
15. Reddy, S., Lapata, M., Steedman, M.: Large-scale Semantic Parsing without Question-An-
    swer Pairs, Transactions of the Association for Computational Linguistics, pp. 377-392
    (2014).
16. Usbeck, R., Ngomo, A-CN., Haarmann, B., Krithara, A., Röderm M., Napolitano, G.: 7th
    open challenge on question answering over linked data (qald-7), Semantic Web Challenges,
    pp. 59–69 (2017).
17. Trivedi, P., Maheshwari, G., Dubey, M., Lehmann, J.: Lc-quad: A corpus for complex ques-
    tion answering over knowledge graphs, The Semantic Web-ISWC, pp. 210–218 (2017).
18. Zhang, Y., Dai, H., Kozareva, Z., Smola, A. J., Song, L.: Variational reasoning for question
    answering with knowledge graph, Thirty-Second AAAI Conference on Artificial Intelli-
    gence (2018).
19. Yih, W., Chang, M.W., He, X., Gao, J.: Semantic Parsing via Staged Query Graph Genera-
    tion: Question Answering with Knowledge Base, ACL (2015).
20. Talmor, A., Berant, J.: The Web as a Knowledge-Base for Answering Complex Questions,
    In NAACL-HLT, pp. 641–651 (2018).
21. Yih, W., Richardson, M., Meek, C., Chang, M., Suh, J.: The value of semantic parse labeling
    for knowledge base question answering, In: the 54th Annual Meeting of the Association for
    Computational Linguistics, vol. 2, pp 201–206 (2016).
22. Zhang, Y., Dai, H., Kozareva, Z., Smola, A., Song, L.: Variational reasoning for question
    answering with knowledge graph, In: Association for the advancement of artificial Intelli-
    gence AAAI (2018).