<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Benchmarking RDF Storage Solutions with Iguana</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Felix Conrads</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jens Lehmann</string-name>
          <email>jens.lehmann@cs.uni-bonn.de</email>
          <email>jens.lehmann@iais.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Saleem</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel-Cyrille Ngonga Ngomo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Paderborn</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>System and Network Engineering Group, University of Amsterdam</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Bonn and Fraunhofer IAIS</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Leipzig</institution>
          ,
          <addr-line>AKSW</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Choosing the right RDF storage storage is of central importance when developing any data-driven Semantic Web solution. In this demonstration paper, we present the configuration and use of the Iguana benchmarking framework. This framework addresses a crucial drawback of state-of-the-art benchmarks: While several benchmarks have been proposed that assess the performance of triple stores, an integrated benchmark-independent execution framework for these benchmarks was yet to be made available. Iguana addresses this research by providing an integrated and highly configurable environment for the execution of SPARQL benchmarks. Our framework complements benchmarks by providing an execution environment which can measure the performance of triple stores during data loading, data updates as well as under di erent loads and parallel requests. Moreover, it allows a uniform comparison of results on di erent benchmarks. During the demonstration, we will execute the DBPSB benchmark using the Iguana framework and show how our framework measures the performance of popular triple stores under updates and parallel user requests. Iguana is opensource and can be found at http://iguana-benchmark.eu/.</p>
      </abstract>
      <kwd-group>
        <kwd>Benchmarking</kwd>
        <kwd>Triple Stores</kwd>
        <kwd>SPARQL</kwd>
        <kwd>RDF</kwd>
        <kwd>Log Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The number of size of datasets available as RDF grows constantly.5 Consequently, many
applications that consume and manipulate RDF data rely on storage solutions to manage
the data they need to manipulate. Iguana ( Integrated Suite for Benchmarking SPARQL)
[1]6 is a robust benchmarking framework designed for the execution of SPARQL
benchmarks. Our framework is complementary to the large number of benchmarks already
available (e.g., [3,5]) in that it provides an environment within which SPARQL
benchmarks can be executed so provide comparable results. Therewith, Iguana is (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) able to
pinpoint the strengths and weaknesses of the triple store under test. This in turn allows,
5 http://lodstats.aksw.org
6 Main paper to be presented at ISWC 2017
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) the evaluation of the suitability of a specific triple store to the application under
which it is to operate, and the proposal of the best candidate triple stores for that
application. In addition, benchmarking triple stores can also help (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) identifying the best
running conditions for each triple store (e.g., the best memory configuration) as well as
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) providing developers with insights pertaining to how to improve their frameworks.
      </p>
      <p>
        The methodology implemented by Iguana follows the four key requirements for
domain-specific benchmarks that are postulated in the Benchmark Handbook [2], i.e., it
is (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) relevant, as it allows the testing of typical operations within the specific domain,
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) portable as it can be executed on di erent platforms and using di erent benchmarks
and datasets, (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) scalable as it is possible to run benchmarks on both small and large
data sets with variable rates of updates and concurrent users and (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) understandable
as it returns results using standard measures that have been used across literature for
more than a decade. Links to all information pertaining to Iguana (including its source
code, GPLv3) can be found at http://iguana-benchmark.eu. A guide on how to
get started is at http://iguana-benchmark.eu/gettingstarted.html.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The Iguana Framework</title>
      <p>
        Figure 1 shows the core components of the Iguana framework. The input for the
framework is a configuration file (see Example in Listing 1.1), which contains (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) a
description of the storage solutionn to benchmark, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) instructions that orchestrate how the
benchmark queries are to be generated, processed and issued as well as (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) a
specification of the benchmarking process and (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) the external data sources to be used during this
process. In our example, the configuration file describes two storage solutions which are
to be benchmarked: OWLIM (see Lines 4–7) and Fuseki (Lines 8–11). A benchmarking
suite describes the points (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) to (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) aforementioned. The DBpedia knowledge base is to
be used for the evaluation (see Line 16). No data is to be dropped before the evaluation
(ergo, we assume and make sure that the triple stores are empty before the beginning
of the benchmarking process). The benchmark in itself is a stress test of the two triple
stores OWLIM and Fuseki (see Line 22). Here, 16 query users and 4 update users
emulate working in parallel on the underlying triple store.
      </p>
      <p>A representation of the parsed config file is stored internally in a configuration
object (short: config object). If the config object points to a query log, then the analyzer
processor analyzes this query log file and generates benchmark queries (e.g., using the
FEASIBLE [5] approach). Iguana also supports the benchmark queries being directly
provided to the framework. The dataset generator process creates a fraction (e.g., 10%
of DBpedia) of dataset, thus enabling it to test the scalability of the triple stores with
varying sizes of the same dataset. Note that this generator is an interface which can
be used to integrate data generators (e.g., the DBPSB [4] generator) that can create
datasets of varying size. The warmup processor allows the execution of a set of test
queries before the start of the actual stress testing. The testcase processor then performs
the benchmarking by means of stress tests according to the parameters specified in the
config file. Finally, the result processor generates the results which can be emailed by
the email processor. Details on the core components of Iguana can be found in [1].</p>
      <p>Listing 1.1: Example configuration for Iguana
1 &lt;?xml version="1.0" encoding="UTF-8"?&gt;
2 &lt;iguana xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;
3 &lt;databases main="dbpedia"&gt;
4 &lt;database id="owlim" type="impl"&gt;
5 &lt;endpoint uri="http://localhost:8080/openrdf/lite/query"/&gt;
6 &lt;update-endpoint uri="http://localhost:8080/openrdf/lite/up"/&gt;
7 &lt;/database&gt;
8 &lt;database id="fuseki" type="impl"&gt;
9 &lt;endpoint uri="http://localhost:3030/tdb/sparql" /&gt;
10 &lt;update-endpoint uri="http://localhost:3030/tdb/update" /&gt;
11 &lt;/database&gt;
12 &lt;database id="ref" type="impl"&gt;
13 &lt;endpoint uri="http://dbpedia.org/sparql" /&gt;
14 &lt;/database&gt;&lt;/databases&gt;
15 &lt;suite&gt;
16 &lt;graph-uri name="http://dbpedia.org" /&gt;
17 &lt;random-function type="RandomTriple" generate="false"&gt;
18 &lt;percent value="1.0" file-name="dbpedia2/" /&gt;
19 &lt;/random-function&gt;
20 &lt;warmup time="20" file-name="warmup.txt" /&gt;
21 &lt;test-db type="choose" reference="ref"&gt;
22 &lt;db id="owlim" /&gt; &lt;db id="fuseki" /&gt;
23 &lt;/test-db&gt;
24 &lt;testcases testcase-pre="./testcasePre.sh %DBID% %PERCENT% %TESTCASEID%"
25 testcase-post="./testcasePost.sh %DBID% %PERCENT% %TESTCASEID%"&gt;
26 &lt;testcase class="org.aksw.iguana.testcases.StressTestcase"&gt;
27 &lt;property name="sparql-user" value="16" /&gt;
28 &lt;property name="update-user" value="4" /&gt;</p>
      <p>During the demo, we will show how to build and use a configuration akin to that
in Listing 1.1. Typical results generated by Iguana based on such configurations are
shown in Figure 2. The framework allows the evaluation of triple stores across di erent
datasets and user types and configuration. For example, it shows clearly that the
performance of modern storage solutions increases with the number of workers as the systems
make good use of parallel processing.</p>
      <p>Blazegraph
Fuseki
Virtuoso
107
101
1
16</p>
      <p>103 1
4
(a) DBpedia
4</p>
      <p>16
(b) SWDF</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>F.</given-names>
            <surname>Conrads</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          , and A.
          <string-name>
            <surname>-C. Ngonga</surname>
          </string-name>
          <article-title>Ngomo. IGUANA: A generic framework for benchmarking the read-write performance of triple stores</article-title>
          .
          <source>In Proceedings of International Semantic Web Conference (ISWC</source>
          <year>2017</year>
          ),
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. J. Gray, editor.
          <source>The Benchmark Handbook for Database and Transaction Systems (1st Edition)</source>
          . Morgan Kaufmann,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , and A.
          <string-name>
            <surname>-C. Ngonga Ngomo. DBpedia SPARQL Benchmark</surname>
          </string-name>
          <article-title>- Performance Assessment with Real Queries on Real Data</article-title>
          .
          <source>In International Semantic Web Conference (ISWC)</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , and A.
          <string-name>
            <surname>-C. Ngonga Ngomo</surname>
          </string-name>
          .
          <article-title>Usage-Centric Benchmarking of RDF Triple Stores</article-title>
          .
          <source>In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI</source>
          <year>2012</year>
          ),
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Mehmood</surname>
          </string-name>
          , and A.
          <string-name>
            <surname>-C. Ngonga</surname>
          </string-name>
          <article-title>Ngomo. FEASIBLE: A featured-based sparql benchmark generation framework</article-title>
          .
          <source>In International Semantic Web Conference (ISWC)</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>