<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generating SPARQL Query Containment Benchmarks using the SQCFramework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Muhammad Saleem</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qaiser Mehmood</string-name>
          <email>qaiser.mehmood@insight-centre.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claus Stadler</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jens Lehmann</string-name>
          <email>jens.lehmann@cs.uni-bonn.de</email>
          <email>jens.lehmann@iais.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel-Cyrille Ngonga Ngomo</string-name>
          <email>axel.ngonga@upb.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer IAIS</institution>
          ,
          <addr-line>Bonn</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>INSIGHT</institution>
          ,
          <addr-line>NUIG</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universita ̈t Leipzig, IFI/AKSW</institution>
          ,
          <addr-line>PO 100920, D-04009 Leipzig</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Bonn</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Paderborn</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this demo paper, we present the interface of the SQCFramework [8], a SPARQL query containment benchmark generation framework. SQCFramework is able to generate customized SPARQL containment benchmarks from real SPARQL query logs. To this end, the framework makes use of different clustering techniques. It is flexible enough to generate benchmarks of varying sizes and complexities according to user-defined criteria on important SPARQL features for query containment benchmarking. We evaluate the usability of the interface by using the standard system usability scale questionnaire. Our overall usability score of 82.33 suggests that the online interface is consistent, easy to use, and the various functions of the system are well integrated.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Query containment is the problem of deciding if the result set of a query Q1 is included
in the result set of another query Q2. For example, the result set of a query Q1 :“find
all restaurants in Berlin” is included in the result set of the query Q2 : “find all
restaurants in Germany”. Q2 is called the super-query while Q1 is a sub-query of Q2.
Query containment is used to devise efficient query planners, caching mechanisms, data
integration, and view maintenance solutions [2]. For example, in our aforementioned
example, the result of Q1 can be obtained efficiently from the result of Q2.</p>
      <p>In recent years, a considerable amount of work on SPARQL query containment has
been carried out [3,4,9]. To the best of our knowledge, the SPARQL Query Containment
Benchmark (SQC-Bench) [1] is the only benchmark designed to test SPARQL query
containment solvers. This benchmark contains a fixed number of 76 query containment
tests handcrafted by the authors. While this benchmark contains a variety of tests with
varying complexities, the number of tests is fixed and all tests are synthetic. In addition,
this benchmark does not allow users to generate benchmarks tailored towards a specific
use-case.</p>
      <p>To fill this gap, we propose the SQCFramework, a framework for the automatic
generation of SPARQL query containment benchmarks from real SPARQL query logs.
The framework is able to generate benchmarks customized by its user in terms of various
SPARQL query features (defined in Section 2.1). The framework generates the desired
benchmark from query logs by using different clustering methods, while considering the
customized selection criteria specified by the user.
2</p>
    </sec>
    <sec id="sec-2">
      <title>SQCFramework Benchmark Generation</title>
      <p>In this section, we briefly present the benchmark generation process in the
SQCFramework6.</p>
      <sec id="sec-2-1">
        <title>2.1 Input Queries and Important SPARQL Features</title>
        <p>
          Our framework takes a set of queries as input. In this work, we aim to generate
benchmarks from real user queries. To this end, we use the Linked SPARQL Queries (LSQ)
datasets [6], which provide real queries extracted from the logs of public SPARQL
endpoints. A query containment benchmark should comprise queries/tests of varying
complexities. Hence, we consider the following query features while generating
containment benchmarks: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) number of entailments/sub-queries, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) number of projection
variables, (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) number of BGPs, (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) number of triple patterns, (
          <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
          ) max. and min. BGP
triples, (
          <xref ref-type="bibr" rid="ref7">7</xref>
          ) number of join vertices, (
          <xref ref-type="bibr" rid="ref10 ref8">8</xref>
          ) mean join vertex degree, (
          <xref ref-type="bibr" rid="ref9">9</xref>
          ) number of LSQ
features additional features. Our framework allows generating customized benchmarks
based on these features.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Benchmark Generation</title>
        <p>
          The user provides the input LSQ dataset and the required number N of super-queries to
be included in the generated benchmark. Then, the benchmark generation is carried out
the following four main steps: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Select all the super-queries along with the required
features from the input LSQ dataset, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Generate feature vectors and their normalization
for the selected super-queries, (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) Generate N number of clusters from the super-queries,
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) Select the single most representative super-query from each cluster to be included in
the final benchmark.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>SQCFramework Online</title>
      <p>The online demo and source code of the SQCFramework are available at the
SQCFramework homepage https://github.com/AKSW/SQCFramework. Figure 1 shows
the online interface of the SQCFramework, which comprises five main steps:</p>
      <sec id="sec-3-1">
        <title>1. Selection of benchmark generation method: The first step is to select the bench</title>
        <p>mark generation method(s). Currently, our framework supports 6 well-known
clustering methods namely DBSCAN+Kmeans++, Kmean++, Agglomerative, Random
selection, FEASIBLE [7] and FEASIBLE-Exemplars.
6 Readers are encouraged to read [8] for complete details and intuition about the selection of a
particular choice.
2. Parameters selection: The second step is the selection of parameters like the
number of queries in the resulting benchmark or the number of iterations for Kmeans++
clustering, etc.
3. Benchmark personalization: The third step allows to further customize the
resulting benchmark. The SPARQL query selects all the queries along with required
features to be considered for benchmarking by default. The user can modify this query
to generate customized benchmarks. For example, adding FILTER(?projVars
&lt;=2 &amp;&amp; (?bgps &gt; 1 || ?tps &gt;3)) will force the resulting benchmark
queries having number of projection variables less than 3 and the number of BGPs
greater than 1 or the number of triple patterns greater than 3.
4. Results: The diversity score and the similarity errors for the selected methods will
be shown as bar graphs.
5. Benchmarks download: The resulting benchmarks can be finally downloaded and
used in the evaluation of containment solvers.</p>
        <p>We will demonstrate all of the steps above to the ISWC audience and generate
benchmarks from the LSQ log of the Semantic Web Dog Food dataset.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>An evaluation of the SQCFramework can be found in [8]. To assess the usability of our
system, we used the standardized, ten-item Likert scale-based System Usability Scale
(SUS) [5] questionnaire7 which can be used for global assessment of systems usability.
7 Our survey can found at: https://goo.gl/forms/n4IG2FQ1PBNCdxBl1
I needed to learn a lot of things before I could get going with this system (10)</p>
      <p>
        I felt very confident using the system (
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
      </p>
      <p>
        I found the system very cumbersome to use (
        <xref ref-type="bibr" rid="ref10 ref8">8</xref>
        )
I would imagine that most people would learn to use this system very quickly (
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
      </p>
      <p>
        I thought there was too much inconsistency in this system (
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
      </p>
      <p>
        I found the various functions in this system were well integrated (
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
I think that I would need the support of a technical person to be able to use this system (
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
      </p>
      <p>
        I thought the system was easy to use (
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
      </p>
      <p>
        I found the system unnecessarily complex (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
I think that I would like to use this system frequently (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
0
      </p>
      <p>The survey was posted through Twitter with the ISWC2018 hashtag and was filled by
15 users.8 The results of SUS usability survey is shown in Figure 2. We achieved a
mean usability score of 82.33 indicating a high level of usability according to the SUS
score. The responses to question 1 suggests that our system is adequate for frequent use
(average score to question 1 = 3.8 0.94) by users all of type. The responses to question
3 (average score 4.53 0.74) suggests that the interface is easy to use and the responses
to question 5 indicates that the various functions are well integrated (average score 4.4
0.73). However, the response to question 10 (average score 2.06 1.27) indicates that
users need to learn some basic concepts before they can use the system effectively.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Chekol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>Geneve`s, and</article-title>
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Laya¨ıda. Evaluating and benchmarking sparql query containment solvers</article-title>
          .
          <source>In ISWC</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>C.</given-names>
            <surname>Chekuri</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Rajaraman</surname>
          </string-name>
          .
          <article-title>Conjunctive query containment revisited</article-title>
          .
          <source>Theoretical Computer Science</source>
          ,
          <volume>239</volume>
          (
          <issue>2</issue>
          ):
          <fpage>211</fpage>
          -
          <lpage>229</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>P.</given-names>
            <surname>Geneves</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Laya¨ıda, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Schmitt</surname>
          </string-name>
          .
          <article-title>Efficient static analysis of xml paths and types</article-title>
          .
          <source>Acm Sigplan Notices</source>
          ,
          <volume>42</volume>
          (
          <issue>6</issue>
          ):
          <fpage>342</fpage>
          -
          <lpage>351</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Letelier</surname>
          </string-name>
          , J. Pe´rez, R. Pichler, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Skritek</surname>
          </string-name>
          .
          <article-title>Static analysis and optimization of semantic web queries</article-title>
          .
          <source>TODS</source>
          ,
          <volume>38</volume>
          (
          <issue>4</issue>
          ):
          <fpage>25</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Lewis</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Sauro.</surname>
          </string-name>
          <article-title>The factor structure of the system usability scale</article-title>
          .
          <source>In HCD</source>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Mehmood</surname>
          </string-name>
          , and A.
          <string-name>
            <surname>-C. Ngonga</surname>
          </string-name>
          <article-title>Ngomo. LSQ: The linked sparql queries dataset</article-title>
          .
          <source>In ISWC</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Mehmood</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          .
          <article-title>Feasible: a feature-based sparql benchmark generation framework</article-title>
          .
          <source>In ISWC</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Stadler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Mehmood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          . Sqcframework:
          <article-title>Sparql query containment benchmark generation framework</article-title>
          . In K-Cap,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>C.</given-names>
            <surname>Stadler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          .
          <article-title>Efficiently pinpointing sparql query containments</article-title>
          .
          <source>In ICWE</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          8 As of May 31st,
          <year>2018</year>
          .
          <article-title>Summary of the responses can be found at</article-title>
          : https://goo.gl/ W5eZkv
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>