ANNA: Answering Why-Not Questions for SPARQL

ANNA: Answering Why-Not Questions for SPARQL SiyuYao Department of Computer Science MOEKLINNS Lab Xi'an Jiaotong University

710049 China

JunLiu Department of Computer Science MOEKLINNS Lab Xi'an Jiaotong University

710049 China

MengWang wangmengsd@stu.xjtu.edu.cn Department of Computer Science MOEKLINNS Lab Xi'an Jiaotong University

710049 China

BifanWei weibifan@mail.xjtu.edu.cn Department of Computer Science MOEKLINNS Lab Xi'an Jiaotong University

710049 China

XueluChen Department of Computer Science MOEKLINNS Lab Xi'an Jiaotong University

710049 China

ANNA: Answering Why-Not Questions for SPARQL 29B6E04C5BF478B637E42ADBA3B6E8C9 GROBID - A machine learning software for extracting information from scholarly documents Why-Not SPARQL RDF Graph Query Basic Graph Pattern

Considerable effort has been made to improve the functionality and usability of SPARQL search engines. However, explaining missing items in the results of SPARQL queries or the so-called why-not questions remains in its infancy. Existing explanation models cannot be trivially extended to SPARQL queries because of the SPARQL-specific features in the data model and query operations. In this demonstration, we present a novel explanation system, ANNA (Answering why-Not questioNs for spArql), to explain why-not questions using a divide-and-conquer strategy. ANNA can visualize explanations to help users revise their initial queries to make the expected result-items presented. Experimental results on DBpedia prove that ANNA can generate high-quality explanations within a reasonable amount of time.

Introduction

Given that writing SPARQL queries is an error-prone and tedious task, users often make mistakes or cannot obtain the expected results. When such situations happen, users will naturally ask a question, specifically, a why-not question. For example, a user wants to find all films directed by Tim Burton. Therefore, the user submits a SPARQL query over DBpedia1 , as shown in Fig. 1(a). However, the results confuse the user. Various possibilities may be considered to answer the why-not question shown in Fig. 1(b). The film may not be directed by Tim Burton, or the film does not have the director property in DBpedia. The user may find determining the real answer difficult and can hardly sift through the initial SPARQL query. This situation illustrates the significance of our system, namely, Answering why-Not questioNs for spArql (ANNA 2 ). Many explanation models have been created to answer why-not questions for relational databases, social image searches and topqueries [1][2][3]. The data model of SPARQL queries is the Resource Description Framework (RDF), and query operations are based on graph pattern matching. The differences in these two aspects make existing models unable to be trivially extended to SPARQL queries. ANNA can generate corresponding explanations according to the given why-not questions. ANNA initially identifies which parts of a SPARQL query should be responsible for removing the expected items and then generates explanations using a divide-and-conquer strategy. With the help of the explanations returned by ANNA, users can refine their initial SPARQL queries.

Preliminary

A SPARQL query consists of triple patterns and operators (FILTER, DISTINCT, MINUS, LIMIT, ORDER BY, etc.). The evaluation of over the RDF dataset can be divided into two levels, namely, basic graph pattern (BGP) level and operator level. At the BGP level, the BGP of is evaluated to match the RDF graphs in . If

, then the operators use to provide the query result . Given , we represent a why-not question as a mapping , where is a variable in , and the RDF term is a solution of . A mapping indicates why an RDF item does not appear in . An explanation represents the reason for a why-not question . The explanation for the absence of an item is given in the following two forms: (1) A modified BGP, which is similar to the original BGP. The modified BGP should match an RDF graph from with a variable bound to . (2) A set of tuples, which is denoted by . Each tuple indicates a questionable query operator and the corresponding matched RDF graph that contains the expected item .

ANNA

After analyzing the SPARQL query evaluation, we find that restrictive BGP expressions (BGP level) and questionable query operators (operator level) are the two reasons why the expected items may be absent from the query result. Accordingly, ANNA is designed to address why-not questions using a divide-and-conquer strategy.

Figure 2 shows the ANNA framework, which consists of three modules. A total of 61 why-not questions are obtained from 42 SPARQL queries4 to evaluate the effectiveness and efficiency of ANNA. The satisfaction of the explanations is measured by a five-point Likert scale, and 76.5% of the explanations are considered strongly agree. The experimental results prove that ANNA can generate high-quality explanations within a reasonable amount of time at both BGP (approximately 5 s) and operator levels (approximately 1.8 s).

Conclusion and Future Work

For the first time, we develop a novel explanation system called ANNA. Two main lines are prioritized in future work. First, we aim to transform ANNA into a Java library that can be extended to any RDF database. Second, we intend to utilize union and optional graph patterns to address why-not questions for SPARQL queries.

Fig. 1 .1Fig. 1. SPARQL query and query results.

Fig. 2 .2Fig. 2. ANNA framework.

Fig. 3 .3Fig. 3. Demonstration of ANNA http://wiki.dbpedia.org/Datasets, released in September, http://jena.apache.org http://kfm.skyclass.net/anna/queryset.html (b) Visualization of an explanation (a)A screenshot of ANNA for submitting a why-not question (c) An explanation

Acknowledgements

The research was supported in part by the Doctoral Fund of Ministry of Education of China under Grant No. 20130201130002 and No. IRT13035.

Module I Identifying Why-not Reasons: This module identifies the level from which the expected item is removed in a two-step process. a) All the variables of BGP are replaced in accordance with to generate a why-not BGP . In consideration of the SPARQL query in Section 1, the variable is adjusted to The Nightmare Before Christmas in accordance with . b) is matched to (the dataset for ANNA is the DBpedia data stored by Jena TDB 3 ). If

, then the why-not reason is located at the operator level; otherwise, it is located at the BGP level. Module II Modifying Why-not BGPs: This module aims to identify and modify the inappropriate triple patterns in , which are blamed for . ANNA generates a modified why-not BGP via a graph-based approach, as follows: a) Each triple pattern of is added to initialized as by a biased breadth-first traversal over the line graph [4] of . When each is added, ANNA matches over . Therefore, we implement a heuristic rule, Equation ( 1), to select to improve the efficiency of matching.

(1)

b) If after adding to , then is replaced with a modified , which is computed by the query relaxation approach proposed in [5]. The left of is then added to . If , then the traversal is completed, else return to step a. Module III Identifying Questionable Operators: This module aims to address whynot questions at the operator level. Questionable query operators are filtered out, and is returned and denoted by . The main procedures are as follows: a) A SPARQL operator tree is constructed by parsing query according to [6]. b) A set of operators, , is generated from by a post-order traversal on . c) For each and each matched RDF graph , if any subgraphs of do not belong to , which is the output of , then filters out from the query processing. The tuple is subsequently added to .

Demonstration

The entire system is performed through a web application written in Java. We briefly illustrate how ANNA works through the preceding example.

The user submits a query using the search panel shown in Fig. 3(a). After the results return, the user can pose a why-not question . The procedures are as follows: (i) Select from the drop-down menu (e.g.,

). (ii) Fill in the blank with (e.g.,

). The explanation generated by ANNA is returned as shown in Fig. 3(b), and is highlighted in the operator tree shown in Fig. 3(c). For the preceding example, the explanation is a modified BGP generated from as is replaced with .

Explaining missing answers to SPJUA queries MHerschel MAHernández PVLDB 2010 Why not, WINE?: towards answering why-not questions in social image search SSBhowmick ASun BQTruong ACM MM 2013 Answering why-not questions on top-k queries ZHe ELo IEEE Transactions on Knowledge and Data Engineering 26 6 2014 ABKhmelnitskaya Values for rooted-tree and sink-tree digraph games and sharing a river </analytic> <monogr> <title level="j">Theory & Decision 69 4 2010 A relaxed approach to RDF querying CAHurtado APoulovassilis PTWood presented at the ISWC 2006 Semantics and complexity of SPARQL JPérez MArenas CGutierrez ACM Transactions on Database Systems 34 2009