A Fundamental Evaluation of Candidate Answers Generation for Question Answering Using Wikidata

A Fundamental Evaluation of Candidate Answers Generation for Question Answering Using Wikidata RyogaNakagawa Osaka Electro-Communication University

18-8 Hatsuchou, Neyagawa-shi 572-0833 Osaka Japan

KoujiKozaki kozaki@osakac.ac.jp Osaka Electro-Communication University

18-8 Hatsuchou, Neyagawa-shi 572-0833 Osaka Japan

A Fundamental Evaluation of Candidate Answers Generation for Question Answering Using Wikidata 1613-0073 E0CD96B8C3A9C7DB33451F030230B168 GROBID - A machine learning software for extracting information from scholarly documents Knowledge graph Wikidata Question answering Natural language question answering SPARQL

Question answering (QA) is an import application of knowledge graphs (KGs). In order to evaluate knowledge structures within a knowledge graph, we consider an investigation into the details of correspondences between questions and knowledge graph. In this paper, we focus on Wikidata as a knowledge graph in the general domain and evaluate a basic question answering method using Wikidata, aiming to conduct a fundamental evaluation of its content as a knowledge graph.

Introduction

Knowledge graphs are widely used in various fields and have been applied to tasks such as natural language processing, question answering, information retrieval, and problem-solving. In question answering (QA), a knowledge graph enables users to ask questions in natural language, and the system generates answers based on the knowledge graph. Specifically, the system interprets the meaning of a question, extracts relevant entities and relationships from the knowledge graph, and combines them to generate answers.

Several existing works focus on developing query answering systems using knowledge graphs. For instance, T. Ploumis et al. developed a query answering system using Wikidata based on the dependency analysis of question text [1]. X. Hu, et.al proposed a method for question answering over knowledge graph based on SPARQL queries and keyword search [2]. Such an approach plays an important role in improving the performance of QA systems. These studies utilize datasets from various domains for QA and conduct experiments to evaluate the proposed systems.

However, to evaluate knowledge structures within a knowledge graph, we are considering a more in-depth investigation into correspondences between questions and knowledge graph. In this paper, we uses Wikidata [3] as an open knowledge graph in the general domain and evaluate a basic question answering method using Wikidata, aiming to conduct fundamental evaluation of its contents as a knowledge graph.

Test Dataset for Question Answering using Wikidata

In this study, we have a test dataset for question answering using Wikidata. While there are several datasets for question answering using Wikidata, they do not support Japanese language. Therefore, we utilized the Japanese quiz dataset [4] which was developed using Japanese Wikipedia for a competition of QA system primarily focused on NLP. This dataset provides provides question text and its answer, which is described as one of Wikipedia articles. Consequently, we can also obtain Wikidata ID corresponding to the article because Wikidata contains resources related to all Wikipedia articles in principle. In order to determine how many questions can be correctly answered using Wikidata, we conducted manual SPARQL queries. It's important to note that some questions cannot be answered using any SPARQL query due to the limitations in the graph structures of Wikidata, which may lack the necessary knowledge to answer certain questions. From the quiz dataset, We choose 50 questions and formulated SPARQL queries. As a result, we were able to create SPARQL queries that provided correct answers for 33 out of the 50 questions.

Evaluation of Candidate Answers Generation

To generate candidate answers, it is necessary to map the words in the question text to Wikidata IDs and then use the mapping results to generate SPARQL queries to find answers using Wikidata. The former corresponds to Entity Linking, and the latter to SPARQL query generation. In the following sections, we evaluate these two methods using the test dataset provided in Section 2.

As an example, we will use the question shown in (Figure 1 (a)) to illustrate these methods.

Evaluation of Entity Linking

Out of 50 question in the test dataset, we evaluated entity linking using the 33 questions for which we could obtain correct answers using SPARQL queries on Wikidata. For the entity linking method, we extracted words from question text and searched Wikidata for entities that matched the extracted words using an exact string match. In the example question mentioned above, four entities with Wikidata IDs were obtained from the search results (see Figure 1 (b)). The evaluation was conducted manually using Wikidata search API to avoid any potential issues caused by morphological analysis.

Next, we compare the obtained IDs from a question text to the SPARQL query for the question in the test dataset. If all the IDs in the query are included in the obtained IDs, it means that the query could be generated using them. In the case of the example of Figure 1(a), all the IDs shown in the query in Figure 1(c) are included in the obtained IDs shown in in Figure 1(b). However, it's worth noting that Wikidata IDs of properties are not obtained by the entity linking method, so they are replaced with variables such as "?p1", "?p2" in the query. In the evaluation results of entity linking, it was found that for 27 of the 33 questions, all the IDs required to generate the SPARQL queries were obtained. However, for 6 questions, the questions text included some IDs that were not obtained by the entity linking method due to differences in spelling between the question text and Wikidata.

Evaluation of SPARQL Query Generation

For the 27 questions which all the required IDs could be obtained through entity linking, we evaluate SPARQL query generation to obtain the candidate answers using the graph structure of Wikidata. In this study, we explore an exhaustive combination of the entities obtained through entity linking. When two IDs are obtained through entity linking, four SPARQL patterns are generated. For instance, when two IDs are obtained, such as ID1 and ID2, four patterns of SPARQL queries are generated as illustrated in Figure 2. In this exampel, variables like "?p1", "?p2" represent properties, and "?ans" represents the answer to the question. If the SPARQL query accurately represents the question's condition, the correct answer can be returned as the value of "?ans" . However, if there are multiple entities that satisfy the search conditions, a single query pattern can yield multiple candidate answers.

To evaluate the query generation, we executed the generated queries on SPARQL endpoint of Wikidata and verified whether their query results contained the correct answer to the questions. As a results, out of the 27 questions, the method successfully generated SPARQL queries that produced the correct answer for 23 questions. However, in the case of the remaining four questions, timeouts occurred during SPARQL query processing, preventing us from obtaining the correct answers..

Discussion

Figure 3 displays an overview of the evaluation results. For our testing, we used 50 questions from the Japanese quiz dataset. Through manual efforts, We obtained SPARQL queries capable of deriving correct answers from Wikidata for 33 of these questions (Figure 3(1)). These 33 questions with SPARQL queries constitute the correct data for evaluating question answering using Wikidata. Among the remaining 17 questions, 7 required aliases (alternative labels) to obtain the correct answers using SPARQL queries to Wikidata. For the remaining 10 questions, we could obtain the correct answers using SPARQL due to the lack of knowledge defined in Wikidata. We evaluated fundamental methods of entity linking and SPARQL query generation using the 33 questions and their associated queries in the correct dataset.

Through the evaluation of entity linking, we obtained all the IDs required for the SPARQL queries for 27 of the 33 questions. We then applied these IDs to query generation and obtained the correct answers using one of the generated SPARQL queries for 23 of the 27 questions. In other words, we were able to obtain the correct answers from Wikidata for 82.2% of the questions when we could obtain enough IDs through entity linking. We believe that we could obtain correct answers for the remaining questions by improving the querying performance because the primary reason why we could not obtain the answers was the timeout problem during SPARQL query processing. When we compare the final results in (3) and the 33 questions in the correct dataset, we found that we could obtain correct answers from Wikidata for 69.7% of questions. To improve these results, we need to enhance the accuracy of entity linking.

In this way, this study conducts a fundamental evaluation of candidate answers generation for question answering using Wikidata. The evaluation results could serve as baseline data to development of question answering methods using Wikidata. As part of our future works, we plan to extend the number of questions for the test datasets and evaluation results. We will consider using not only manual efforts but also automatic methods to prepare them using entity linking and SPARQL query generation, as discussed in Section 3. Additionally, we are in the process of developing a QA system by extending these methods.

Figure 1 :1Figure 1: An example of (a)question text, (b)entity linking and (c)SPARQL query

Figure 2 :2Figure 2: Patterns for SPARQL query generation using two IDs

Figure 3 :3Figure 3: The overview of evaluation results

A factoid based question answering system based on dependency analysis and wikidata TPloumis IPerikos FGrivokostopoulou IHatzilygeroudis 2021 Natural language question answering over knowledge graph: the marriage of sparql query and keyword search XHu JDuan DDang Knowledge and Information Systems 63 2021 <author> <persName><surname>Wikidata</surname></persName> </author> <ptr target="https://www.wikidata.org/" /> <imprint> <date type="published" when="2022">2022. 2022/10/25</date> </imprint> </monogr> </biblStruct> <biblStruct xml:id="b3"> <monogr> <ptr target="https://jaqket.s3.ap-northeast-1.amazonaws.com/data/aio_01/dev1_questions.json" /> <title level="m">Jaqket japanese on the subject of quizzes qa data-set 2020. 2022/10/25