<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SPARQL Aggregate Queries made easy with Diagrammatic Query Language ViziQuer</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kārlis Čerāns</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jūlija Ovčiņņikova</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mārtiņš Zviedris</string-name>
          <email>martins.zviedris@lumii.lv</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>We present a novel way to draw SPARQL aggregate queries via diagrammatic query language - ViziQuer. Since the introduction of SPARQL different graphical languages have been proposed to make SPARQL more userfriendly. In SPARQL 1.1 aggregate queries were introduced that are key to meaningful query formulation. However, diagrammatic query languages lacked this important end-user feature to make the diagrammatic SPARQL extensions powerful enough.</p>
      </abstract>
      <kwd-group>
        <kwd>Visual query creation</kwd>
        <kwd>SPARQL</kwd>
        <kwd>RDF</kwd>
        <kwd>Aggregate queries</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        SPARQL [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is de facto query language for RDF [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] databases. Semantic
RDF/SPARQL technologies offer a higher-level view on data compared to the classical
relational databases (RDB) with SQL query language. Thus, semantic technologies
enable more direct involvement of various domain experts in data set definition,
exploration and analysis. Still, the textual form of SPARQL queries hinders its direct
usage for IT professionals and non-professionals alike.
      </p>
      <p>
        The diagrammatic query languages introduced to help formulating SPARQL
queries, for instance, an earlier version of ViziQuer [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], or Optique VQS [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], do not
support aggregate query formulation that is available in SPARQL 1.1. In a real-case
scenario [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] it was identified that users could formulate basic SPARQL queries via
graphical notation and that they were satisfied with the diagrammatic solution for very
basic queries. Still they lacked expressive power to calculate different aggregated data.
      </p>
      <p>
        The demonstration will show creation of aggregate SPARQL queries in the ViziQuer
notation that is the main novelty of this paper. An extended outline of the design of the
visual aggregate queries appears as [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This demo and paper present a novel and more
refined SPARQL query generation algorithm that relies on explicit distinctness list
notion for aggregate queries thus allowing correctly capturing a wider range of intuitive
queries within the diagrammatic notation.
1 Supported, in part, by Latvian State Research program NexIT project No.1 “Technologies of
ontologies, semantic web and security”.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Basic Query Notation</title>
      <p>
        The visual/diagrammatic query definition is based on the data schema definition as
OWL ontology or RDF Schema. We use the following example mini-University
ontology that is presented in Figure 1 in graphical OWLGrEd ontology editor notation [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>Student
studentNam e
personID
student 1
enrolled 1 AcademicProgram</p>
      <p>programName
nationality 0..1 Nationality
nCode</p>
      <p>Registration
mark:integer</p>
      <p>takes
includes Course</p>
      <p>courseName
course 1 courseCredits :integer</p>
      <p>
        A query in ViziQuer is a graph of class instance nodes connected with links
corresponding to triples connecting these instances. Each node shows the instance class
name (e.g. Registration, Student in Fig.2), possibly an explicit instance reference (e.g.
R and S), as well as conditions (e.g. mark&gt;=4) and selection instances and attributes
(e.g. R and mark for Registration class). One of the classes in the query is marked as
the main query class (shown as orange round rectangle) while all other classes (shown
as violet rectangles) are called condition classes [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The semantics of a basic query is
to find all instance graphs matching the pattern defined by the query and list the
selection instances and/or attributes for each instance graph. The order by, limit and
offset clauses for the query can be marked within the main query class, as well.
      </p>
      <p>There can be affirmative (black solid line), optional (blue/light dashed line) or
negation (red line with stereotype {not}) links within the query. The default
interpretation of optional or negation link is to mark the entire subgraph placed behind
the link (from the viewpoint of main query class) as optional or negated respectively.
A negation link with {condition} stereotype is interpreted as the non-existence of the
respective link between its end instances (the query graph is required to have a spanning
tree consisting of all its non-condition links).</p>
      <sec id="sec-2-1">
        <title>Registration</title>
        <p>R
mark&gt;=4
R
mark
student
course</p>
      </sec>
      <sec id="sec-2-2">
        <title>Student</title>
        <p>sn=studentName</p>
      </sec>
      <sec id="sec-2-3">
        <title>Course</title>
        <p>courseCredits&gt;=6
cn=courseName</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Introducing Aggregate Queries</title>
      <p>The aggregation options can be included into the queries just by introducing into class
instance attribute lists aggregate expressions where an SPARQL aggregate function
(e.g. count, sum, avg) is applied to a non-aggregated (i.e. plain) attribute expression,
for instance, as in sum(courseCredits) in Fig. 3.</p>
      <p>
        The semantics idea is to compute aggregate values taking as the grouping set all
nonaggregated attributes specified in the query. A direct implementation of this idea would,
however, lead to counterintuitive results since the aggregated instance attribute value
would be included into the aggregation as many times as the instance appears in some
instance graph matching the query. We offer a more refined semantics that we explain
for the case, if all aggregate attributes are placed within single class of the query, we
call it aggregation class. In the case of aggregate attributes in different classes separate
subqueries are to be made for each aggregation class with their results merged (cf. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]).
      </p>
      <p>The SPARQL query generation follows three steps: (i) the raw query with
aggregation function arguments (plain attributes) instead of aggregate attributes is generated;
(ii) the distinctness list for aggregation computation over the raw query is formed,
consisting of all attributes (both non-aggregated and aggregated ones alike) and
instances of so-called multiplicative classes. The multiplicative class set by default
includes the main query class and the grouping class; the set can be extended by
ascribing the &lt;&lt;all&gt;&gt; stereotype to a class in the query, a class can be excluded from
the set by the &lt;&lt;exists&gt;&gt; stereotype; (iii) the aggregation over the distinctness-list
selection from the raw query is formed by aggregating the aggregation attributes and
grouping on all non-aggregated attributes.</p>
      <p>Figure 3 depicts two variants of the natural language query “find all nationalities and
the sum of credit points of courses taken by students of this nationality”. The first query
counts every course once per nationality, while the second one - once per nationality
and student, since the Student class is in the multiplicative class set for the query and
therefore an extra ?S appears in the query distinctness list (leading possibly to counting
credit points of a single course several times per nationality).</p>
      <sec id="sec-3-1">
        <title>Nationality</title>
        <p>N
nCode</p>
      </sec>
      <sec id="sec-3-2">
        <title>Nationality</title>
        <p>N
nCode
nationality
nationality</p>
      </sec>
      <sec id="sec-3-3">
        <title>Student</title>
        <p>S
takes
&lt;&lt;all&gt;&gt;</p>
      </sec>
      <sec id="sec-3-4">
        <title>Student</title>
        <p>S</p>
      </sec>
      <sec id="sec-3-5">
        <title>Course</title>
        <p>C
ss=sum
(courseCredits)</p>
      </sec>
      <sec id="sec-3-6">
        <title>Course</title>
        <p>C
takes ss=sum
(courseCredits)</p>
        <p>SELECT ?nCode ?ss WHERE{
{SELECT (SUM( ?courseCredits) as ?ss) ?nCode WHERE{
{SELECT DISTINCT ?N ?C ?nCode ?courseCredits WHERE{
?N a ont:Nationality. ?N ont:nCode ?nCode. ?S a ont:Student.
?S ont:nationality ?N. ?C a ont:Course. ?S ont:takes ?C.
?C ont:courseCredits ?courseCredits.}}} GROUP BY ?nCode}}
SELECT ?nCode ?ss WHERE{
{SELECT (SUM( ?courseCredits) as ?ss) ?nCode WHERE{
{SELECT DISTINCT ?N ?C ?S ?nCode ?courseCredits WHERE{
?N a ont:Nationality. ?N ont:nCode ?nCode. ?S a ont:Student.
?S ont:nationality ?N. ?C a ont:Course. ?S ont:takes ?C.</p>
        <p>?C ont:courseCredits ?courseCredits.}}} GROUP BY ?nCode}}</p>
        <p>
          The ViziQuer tool supports also explicit subquery introduction via {group}
stereotype on affirmative and optional links [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], useful both for more involved query
formulation (e.g. “find all courses passed by at least 10 students with mean mark (over
all passed courses) at least 7”) and for merging the results of aggregate queries with
different multiplicative class sets.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Conclusions</title>
      <p>
        The demonstrated ViziQuer tool is freely available online at viziquer.lumii.lv and the
users are welcome to download it, import their ontologies/RDF schemas and start
creating visually their own SPARQL queries. The introduced notation raises a hope of
introducing a wider range of specialists to direct use of RDF/SPARQL-organized data
as the ViziQuer tool will assist in creating complex statistical queries (the need for
initial user training is foreseen). The potential usage scenarios for the ViziQuer tool
involve both exploring SPARQL endpoints [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and re-engineering relational databases [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>There is an important practical query pattern “find all x with their related most
common y” (e.g. find all courses with the most often received marks in them) that can
be expressed in a slightly extended diagrammatical query notation, still queries of this
kind cannot be naturally expressed in SPARQL. A practical workaround to this problem
would be either to re-formulate such queries to return larger result sets from which the
needed results can be obtained e.g. in a spreadsheet, or to translate visual queries
directly into SQL, if there is a relational database behind the SPARQL endpoint.</p>
      <p>
        The use cases have also shown the possibility of attribute expression creation and
translation. Still, the standard SPARQL functions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] appear insufficient for practical
queries e.g. concerning duration calculation. Notably, the Virtuoso RDF data store [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
supports the extensions allowing the necessary date and interval value handling.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>1. SPARQL 1.1 Overview. W3C Recommendation 21 March</source>
          <year>2013</year>
          [WWW] http://www.w3.org/TR/sparql11-overview/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Resource</given-names>
            <surname>Description</surname>
          </string-name>
          <article-title>Framework (RDF)</article-title>
          , http://www.w3.org/RDF/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Zviedris</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Barzdins</surname>
          </string-name>
          , G.:
          <article-title>ViziQuer: A Tool to Explore and Query SPARQL Endpoints</article-title>
          .
          <source>In: The Semantic Web: Research and Applications</source>
          , LNCS,
          <year>2011</year>
          , Volume
          <volume>6644</volume>
          /
          <year>2011</year>
          , pp.
          <fpage>441</fpage>
          -
          <lpage>445</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Soylu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Giese</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Jiménez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kharlamov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zheleznyakov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>OptiqueVQS: Towards an Ontology based Visual Query System for Big Data</article-title>
          .
          <source>In: MEDES</source>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Barzdins</surname>
          </string-name>
          , G.;
          <string-name>
            <surname>Liepins</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Veilande M.; Zviedris</surname>
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Semantic Latvia Approach in the Medical Domain</article-title>
          .
          <source>In Proc. of 8th International Baltic Conference on Databases and Information Systems</source>
          . H.
          <string-name>
            <surname>M.Haav</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          .Kalja (eds.), TUT Press, pp.
          <fpage>89</fpage>
          -
          <lpage>102</lpage>
          . (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cerans</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ovcinnikova</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Zviedris,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Towards Graphical Query Notation for Semantic Databases</article-title>
          .
          <source>In Proc. of BIR'</source>
          <year>2015</year>
          , LNBIP, Springer
          <year>2015</year>
          , vol.
          <volume>229</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Barzdins</surname>
          </string-name>
          , J.;
          <string-name>
            <surname>Cerans</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Liepins</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sprogis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>UML Style Graphical Notation and Editor for OWL 2</article-title>
          .
          <source>In Proc. of BIR'</source>
          <year>2010</year>
          , LNBIP, Springer
          <year>2010</year>
          , vol.
          <volume>64</volume>
          , p.
          <fpage>102</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zviedris</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Liepins</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Readability of a diagrammatic query language</article-title>
          .
          <source>In VL/HCC 2014 IEEE Symposium on, S</source>
          .
          <fpage>227</fpage>
          -
          <lpage>228</lpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Cerans</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Barzdins</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ; Bumans,
          <string-name>
            <given-names>G.</given-names>
            ;
            <surname>Ovcinnikova</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ; Rikacovs,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Romane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Zviedris</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <string-name>
            <given-names>A Relational</given-names>
            <surname>Database Semantic</surname>
          </string-name>
          Re-Engineering Technology and Tools. In
          <source>Baltic Journal of Modern Computing (BJMC)</source>
          , Vol.
          <volume>3</volume>
          (
          <issue>2014</issue>
          ),
          <source>No. 3</source>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>198</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Blakeley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>RDF Views of SQL Data (Declarative SQL Schema to RDF Mapping)</article-title>
          ,
          <source>OpenLink Software</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>