<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ANNA: Answering Why-Not Questions for SPARQL</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Siyu</forename><surname>Yao</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="laboratory">MOEKLINNS Lab</orgName>
								<orgName type="institution">Xi&apos;an Jiaotong University</orgName>
								<address>
									<postCode>710049</postCode>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jun</forename><surname>Liu</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="laboratory">MOEKLINNS Lab</orgName>
								<orgName type="institution">Xi&apos;an Jiaotong University</orgName>
								<address>
									<postCode>710049</postCode>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Meng</forename><surname>Wang</surname></persName>
							<email>wangmengsd@stu.xjtu.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="laboratory">MOEKLINNS Lab</orgName>
								<orgName type="institution">Xi&apos;an Jiaotong University</orgName>
								<address>
									<postCode>710049</postCode>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Bifan</forename><surname>Wei</surname></persName>
							<email>weibifan@mail.xjtu.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="laboratory">MOEKLINNS Lab</orgName>
								<orgName type="institution">Xi&apos;an Jiaotong University</orgName>
								<address>
									<postCode>710049</postCode>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Xuelu</forename><surname>Chen</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="laboratory">MOEKLINNS Lab</orgName>
								<orgName type="institution">Xi&apos;an Jiaotong University</orgName>
								<address>
									<postCode>710049</postCode>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">ANNA: Answering Why-Not Questions for SPARQL</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">29B6E04C5BF478B637E42ADBA3B6E8C9</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T21:57+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Why-Not</term>
					<term>SPARQL</term>
					<term>RDF Graph</term>
					<term>Query</term>
					<term>Basic Graph Pattern</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Considerable effort has been made to improve the functionality and usability of SPARQL search engines. However, explaining missing items in the results of SPARQL queries or the so-called why-not questions remains in its infancy. Existing explanation models cannot be trivially extended to SPARQL queries because of the SPARQL-specific features in the data model and query operations. In this demonstration, we present a novel explanation system, ANNA (Answering why-Not questioNs for spArql), to explain why-not questions using a divide-and-conquer strategy. ANNA can visualize explanations to help users revise their initial queries to make the expected result-items presented. Experimental results on DBpedia prove that ANNA can generate high-quality explanations within a reasonable amount of time.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Given that writing SPARQL queries is an error-prone and tedious task, users often make mistakes or cannot obtain the expected results. When such situations happen, users will naturally ask a question, specifically, a why-not question. For example, a user wants to find all films directed by Tim Burton. Therefore, the user submits a SPARQL query over DBpedia<ref type="foot" target="#foot_0">1</ref> , as shown in Fig. <ref type="figure" target="#fig_0">1</ref>(a). However, the results confuse the user.  Various possibilities may be considered to answer the why-not question shown in Fig. <ref type="figure" target="#fig_0">1(b</ref>). The film may not be directed by Tim Burton, or the film does not have the director property in DBpedia. The user may find determining the real answer difficult and can hardly sift through the initial SPARQL query. This situation illustrates the significance of our system, namely, Answering why-Not questioNs for spArql (ANNA 2 ). Many explanation models have been created to answer why-not questions for relational databases, social image searches and topqueries <ref type="bibr" target="#b0">[1]</ref><ref type="bibr" target="#b1">[2]</ref><ref type="bibr" target="#b2">[3]</ref>. The data model of SPARQL queries is the Resource Description Framework (RDF), and query operations are based on graph pattern matching. The differences in these two aspects make existing models unable to be trivially extended to SPARQL queries. ANNA can generate corresponding explanations according to the given why-not questions. ANNA initially identifies which parts of a SPARQL query should be responsible for removing the expected items and then generates explanations using a divide-and-conquer strategy. With the help of the explanations returned by ANNA, users can refine their initial SPARQL queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Preliminary</head><p>A SPARQL query consists of triple patterns and operators (FILTER, DISTINCT, MINUS, LIMIT, ORDER BY, etc.). The evaluation of over the RDF dataset can be divided into two levels, namely, basic graph pattern (BGP) level and operator level. At the BGP level, the BGP of is evaluated to match the RDF graphs in . If</p><p>, then the operators use to provide the query result . Given , we represent a why-not question as a mapping , where is a variable in , and the RDF term is a solution of . A mapping indicates why an RDF item does not appear in . An explanation represents the reason for a why-not question . The explanation for the absence of an item is given in the following two forms: (1) A modified BGP, which is similar to the original BGP. The modified BGP should match an RDF graph from with a variable bound to . (2) A set of tuples, which is denoted by . Each tuple indicates a questionable query operator and the corresponding matched RDF graph that contains the expected item .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">ANNA</head><p>After analyzing the SPARQL query evaluation, we find that restrictive BGP expressions (BGP level) and questionable query operators (operator level) are the two reasons why the expected items may be absent from the query result. Accordingly, ANNA is designed to address why-not questions using a divide-and-conquer strategy.</p><p>Figure <ref type="figure" target="#fig_1">2</ref> shows the ANNA framework, which consists of three modules. A total of 61 why-not questions are obtained from 42 SPARQL queries<ref type="foot" target="#foot_3">4</ref> to evaluate the effectiveness and efficiency of ANNA. The satisfaction of the explanations is measured by a five-point Likert scale, and 76.5% of the explanations are considered strongly agree. The experimental results prove that ANNA can generate high-quality explanations within a reasonable amount of time at both BGP (approximately 5 s) and operator levels (approximately 1.8 s).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and Future Work</head><p>For the first time, we develop a novel explanation system called ANNA. Two main lines are prioritized in future work. First, we aim to transform ANNA into a Java library that can be extended to any RDF database. Second, we intend to utilize union and optional graph patterns to address why-not questions for SPARQL queries. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. SPARQL query and query results.</figDesc><graphic coords="1,166.50,517.20,130.50,72.90" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. ANNA framework.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Demonstration of ANNA</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://wiki.dbpedia.org/Datasets, released in September,</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2014" xml:id="foot_1"></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://jena.apache.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">http://kfm.skyclass.net/anna/queryset.html (b) Visualization of an explanation (a)A screenshot of ANNA for submitting a why-not question (c) An explanation</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>The research was supported in part by the Doctoral Fund of Ministry of Education of China under Grant No. 20130201130002 and No. IRT13035.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Module I Identifying Why-not Reasons: This module identifies the level from which the expected item is removed in a two-step process. a) All the variables of BGP are replaced in accordance with to generate a why-not BGP . In consideration of the SPARQL query in Section 1, the variable is adjusted to The Nightmare Before Christmas in accordance with . b) is matched to (the dataset for ANNA is the DBpedia data stored by Jena TDB 3 ). If</p><p>, then the why-not reason is located at the operator level; otherwise, it is located at the BGP level. Module II Modifying Why-not BGPs: This module aims to identify and modify the inappropriate triple patterns in , which are blamed for . ANNA generates a modified why-not BGP via a graph-based approach, as follows: a) Each triple pattern of is added to initialized as by a biased breadth-first traversal over the line graph <ref type="bibr" target="#b3">[4]</ref> of . When each is added, ANNA matches over . Therefore, we implement a heuristic rule, Equation ( <ref type="formula">1</ref>), to select to improve the efficiency of matching.</p><p>(1)</p><p>b) If after adding to , then is replaced with a modified , which is computed by the query relaxation approach proposed in <ref type="bibr" target="#b5">[5]</ref>. The left of is then added to . If , then the traversal is completed, else return to step a. Module III Identifying Questionable Operators: This module aims to address whynot questions at the operator level. Questionable query operators are filtered out, and is returned and denoted by . The main procedures are as follows: a) A SPARQL operator tree is constructed by parsing query according to <ref type="bibr" target="#b6">[6]</ref>. b) A set of operators, , is generated from by a post-order traversal on . c) For each and each matched RDF graph , if any subgraphs of do not belong to , which is the output of , then filters out from the query processing. The tuple is subsequently added to .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Demonstration</head><p>The entire system is performed through a web application written in Java. We briefly illustrate how ANNA works through the preceding example.</p><p>The user submits a query using the search panel shown in Fig. <ref type="figure">3(a)</ref>. After the results return, the user can pose a why-not question . The procedures are as follows: (i) Select from the drop-down menu (e.g.,</p><p>). (ii) Fill in the blank with (e.g.,</p><p>). The explanation generated by ANNA is returned as shown in Fig. <ref type="figure">3(b)</ref>, and is highlighted in the operator tree shown in Fig. <ref type="figure">3(c</ref>). For the preceding example, the explanation is a modified BGP generated from as is replaced with .</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Explaining missing answers to SPJUA queries</title>
		<author>
			<persName><forename type="first">M</forename><surname>Herschel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Hernández</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PVLDB</title>
		<imprint>
			<biblScope unit="page" from="185" to="196" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Why not, WINE?: towards answering why-not questions in social image search</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Bhowmick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">Q</forename><surname>Truong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM MM</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="917" to="926" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Answering why-not questions on top-k queries</title>
		<author>
			<persName><forename type="first">Z</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Lo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="1300" to="1315" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">B</forename><surname>Khmelnitskaya</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Values for rooted-tree and sink-tree digraph games and sharing a river</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">Theory &amp; Decision</title>
		<imprint>
			<biblScope unit="volume">69</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="657" to="669" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A relaxed approach to RDF querying</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A</forename><surname>Hurtado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Poulovassilis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">T</forename><surname>Wood</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">presented at the ISWC</title>
				<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Semantics and complexity of SPARQL</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pérez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Arenas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gutierrez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Database Systems</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="1" to="45" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
