<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ANNA: Answering Why-Not Questions for SPARQL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Siyu Yao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jun Liu</string-name>
          <email>liukeen@mail.xjtu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Meng Wang</string-name>
          <email>wangmengsd@stu.xjtu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bifan Wei</string-name>
          <email>weibifan@mail.xjtu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xuelu Chen</string-name>
          <email>shirley.chen0217@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Results</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alice in Wonderland Batman</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>MOEKLINNS Lab, Department of Computer Science, Xi'an Jiaotong University</institution>
          ,
          <addr-line>710049</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Considerable effort has been made to improve the functionality and usability of SPARQL search engines. However, explaining missing items in the results of SPARQL queries or the so-called why-not questions remains in its infancy. Existing explanation models cannot be trivially extended to SPARQL queries because of the SPARQL-specific features in the data model and query operations. In this demonstration, we present a novel explanation system, ANNA (Answering why-Not questioNs for spArql), to explain why-not questions using a divide-and-conquer strategy. ANNA can visualize explanations to help users revise their initial queries to make the expected result-items presented. Experimental results on DBpedia prove that ANNA can generate high-quality explanations within a reasonable amount of time.</p>
      </abstract>
      <kwd-group>
        <kwd>Why-Not</kwd>
        <kwd>SPARQL</kwd>
        <kwd>RDF Graph</kwd>
        <kwd>Query</kwd>
        <kwd>Basic Graph Pattern</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <sec id="sec-1-1">
        <title>1 http://wiki.dbpedia.org/Datasets, released in September, 2014</title>
        <p>
          This situation illustrates the significance of our system, namely, Answering
whyNot questioNs for spArql (ANNA2). Many explanation models have been created to
answer why-not questions for relational databases, social image searches and
topqueries [
          <xref ref-type="bibr" rid="ref1 ref2 ref3">1–3</xref>
          ]. The data model of SPARQL queries is the Resource Description
Framework (RDF), and query operations are based on graph pattern matching. The
differences in these two aspects make existing models unable to be trivially extended
to SPARQL queries. ANNA can generate corresponding explanations according to
the given why-not questions. ANNA initially identifies which parts of a SPARQL
query should be responsible for removing the expected items and then generates
explanations using a divide-and-conquer strategy. With the help of the explanations
returned by ANNA, users can refine their initial SPARQL queries.
2
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Preliminary</title>
      <p>A SPARQL query consists of triple patterns and operators (FILTER, DISTINCT,
MINUS, LIMIT, ORDER BY, etc.). The evaluation of over the RDF dataset
can be divided into two levels, namely, basic graph pattern (BGP) level and operator
level. At the BGP level, the BGP of is evaluated to match the RDF graphs in
. If , then the operators use to provide the query result .
Given , we represent a why-not question as a mapping , where
is a variable in , and the RDF term is a solution of . A mapping
indicates why an RDF item does not appear in .</p>
      <p>An explanation represents the reason for a why-not question . The
explanation for the absence of an item is given in the following two forms: (1) A
modified BGP, which is similar to the original BGP. The modified BGP should
match an RDF graph from with a variable bound to . (2) A set of tuples,
which is denoted by . Each tuple indicates a questionable query
operator and the corresponding matched RDF graph that contains the
expected item .
3</p>
    </sec>
    <sec id="sec-3">
      <title>ANNA</title>
      <p>2 Demonstration is available online at http://kfm.skyclass.net/anna/index.jsp
Module I Identifying Why-not Reasons: This module identifies the level from
which the expected item is removed in a two-step process.
a) All the variables of BGP are replaced in accordance with to generate a
why-not BGP . In consideration of the SPARQL query in Section 1, the
variable is adjusted to The Nightmare Before Christmas in accordance
with .
b) is matched to (the dataset for ANNA is the DBpedia data stored by
Jena TDB3). If , then the why-not reason is located at the operator
level; otherwise, it is located at the BGP level.</p>
      <p>
        Module II Modifying Why-not BGPs: This module aims to identify and modify the
inappropriate triple patterns in , which are blamed for . ANNA
generates a modified why-not BGP via a graph-based approach, as follows:
a) Each triple pattern of is added to initialized as by a biased
breadth-first traversal over the line graph [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] of . When each is added,
ANNA matches over . Therefore, we implement a heuristic rule,
Equation (1), to select to improve the efficiency of matching.
(1)
b) If after adding to , then is replaced with a
modified , which is computed by the query relaxation approach proposed in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
The left of is then added to . If , then the traversal is
completed, else return to step a.
      </p>
      <p>
        Module III Identifying Questionable Operators: This module aims to address
whynot questions at the operator level. Questionable query operators are filtered out, and
is returned and denoted by . The main procedures are as follows:
a) A SPARQL operator tree is constructed by parsing query according to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
b) A set of operators, , is generated from by a post-order traversal on .
c) For each and each matched RDF graph , if any subgraphs
of do not belong to , which is the output of , then filters out
from the query processing. The tuple is subsequently added to .
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Demonstration</title>
      <p>The entire system is performed through a web application written in Java. We briefly
illustrate how ANNA works through the preceding example.</p>
      <p>The user submits a query using the search panel shown in Fig. 3(a). After the
results return, the user can pose a why-not question . The procedures are as
follows:
(i) Select from the drop-down menu (e.g., ).
(ii) Fill in the blank with (e.g., ).</p>
      <p>The explanation generated by ANNA is returned as shown in Fig. 3(b), and is
highlighted in the operator tree shown in Fig. 3(c). For the preceding example, the
explanation is a modified BGP generated from as is replaced with
.</p>
      <sec id="sec-4-1">
        <title>3 http://jena.apache.org</title>
        <p>(b) Visualization of an explanation
(a)A screenshot of ANNA for submitting a why-not question (c) An explanation</p>
        <p>Fig. 3. Demonstration of ANNA</p>
        <p>A total of 61 why-not questions are obtained from 42 SPARQL queries4 to
evaluate the effectiveness and efficiency of ANNA. The satisfaction of the
explanations is measured by a five-point Likert scale, and 76.5% of the explanations
are considered strongly agree. The experimental results prove that ANNA can
generate high-quality explanations within a reasonable amount of time at both BGP
(approximately 5 s) and operator levels (approximately 1.8 s).
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>For the first time, we develop a novel explanation system called ANNA. Two main
lines are prioritized in future work. First, we aim to transform ANNA into a Java
library that can be extended to any RDF database. Second, we intend to utilize union
and optional graph patterns to address why-not questions for SPARQL queries.
Acknowledgements
The research was supported in part by the Doctoral Fund of Ministry of Education of
China under Grant No. 20130201130002 and No. IRT13035.</p>
      <sec id="sec-5-1">
        <title>4 http://kfm.skyclass.net/anna/queryset.html</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Herschel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hernández</surname>
          </string-name>
          ,
          <article-title>Explaining missing answers to SPJUA queries</article-title>
          ,
          <source>in PVLDB</source>
          , pages
          <fpage>185</fpage>
          -
          <lpage>196</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Bhowmick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sun</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. Q.</given-names>
            <surname>Truong</surname>
          </string-name>
          , Why not, WINE?:
          <article-title>towards answering why-not questions in social image search</article-title>
          ,
          <source>in ACM MM</source>
          , pages
          <fpage>917</fpage>
          -
          <lpage>926</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Z.</given-names>
            <surname>He</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <article-title>Answering why-not questions on top-k queries</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>26</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1300</fpage>
          -
          <lpage>1315</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Khmelnitskaya</surname>
            and
            <given-names>A. B.</given-names>
          </string-name>
          ,
          <article-title>Values for rooted-tree and sink-tree digraph games and sharing a river[J]</article-title>
          .
          <source>Theory &amp; Decision</source>
          ,
          <volume>69</volume>
          (
          <issue>4</issue>
          ):
          <fpage>657</fpage>
          -
          <lpage>669</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Hurtado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poulovassilis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. T.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <article-title>A relaxed approach to RDF querying, presented at the ISWC</article-title>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <article-title>"Semantics and complexity of SPARQL,"</article-title>
          <source>ACM Transactions on Database Systems</source>
          , vol.
          <volume>34</volume>
          , pages
          <fpage>1</fpage>
          -
          <lpage>45</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>