<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Morph-Skyline: Skyline Queries for Virtual Knowledge Graph Access</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marlene Goncalves</string-name>
          <email>mgoncalves@usb.ve</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Chaves-Fraga</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oscar Corcho</string-name>
          <email>ocorchog@fi.upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ontology Engineering Group, Universidad Politecnica de Madrid</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Query translation R2RML</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad Simon Bolivar</institution>
          ,
          <country country="VE">Venezuela</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A skyline set corresponds to the points that are non-dominated by any other points in terms of a multi-criteria function, i.e, there is no other point with values better than them in the criteria de ned in a multi-criteria function. Particularly, skyline queries can be used to lter the points that best meet a multi-criteria function in decision making applications over large datasets. Most of these datasets are represented in relational databases under di erent schemes and data models. Integrating these datasets in a uni ed view may be useful for stakeholders to make more accurate decisions. As a proof of concept, we have developed Morph-Skyline, a virtual knowledge graph open source engine, which is able to automatically translate SPARQL skyline queries to equivalent ones in SQL. In this paper, we will demonstrate Morph-Skyline functionalities, and users will observe transport sector scenarios where MorphSkyline allows specifying preferences by using skyline clause. We also evaluate the performance of Morph-Skyline against the state-of-the-art skyline methods for skyline processing over knowledge graphs. Evaluations were performed on Geonames and GTFS datasets of the subway network of Madrid with scale from 1 to 1000.</p>
      </abstract>
      <kwd-group>
        <kwd>Skyline Queries</kwd>
        <kwd>OBDA</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Skyline queries [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] have gained popularity in decision making applications to
nd a set of non-dominated data points (known as the skyline set) in a
multidimensional dataset, i.e., a set of points such that none of them is better than the
rest. Formally, a relational database is comprised of tables whose instances can
be seen formally as a set of multi-dimensional points that describe each table in
terms of their attributes. A skyline query consists of a list of numeric attributes
or dimensions, together with the MIN or MAX directives that indicate whether
the values of the corresponding dimension must be minimized or maximized. The
Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
(a)
Query
answer of a skyline query q corresponds to the points in the multi-dimensional
dataset D that are non-dominated, i.e., all the points p, such that: i) there is no
other point p0 in D with values better or equal than p in all the dimensions of p,
and ii) other points in the skyline are better than p in at least one dimension.
      </p>
      <p>
        The problem of e ciently computing the skyline over a database was
introduced by Borzsonyi and colleagues [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and it has been extensively studied in the
literature [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This problem has also been studied for Semantic Web. There are
some works related to extensions of SPARQL with qualitative preferences [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
which are based on a more general operator than the skyline. Finally, [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
presented a set of client-based algorithms to evaluate skyline queries over knowledge
graphs using standard query interfaces for RDF although they assumed no
control over how the data is stored (e.g., indexes, internal structures, etc). To the
best of our knowledge, existing OBDA engines do not support skyline queries.
OBDA systems are usually focused on the transformation of the original sources
into a global schema and its corresponding materialization but also on query
translation techniques for highly dynamic data sources [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>To illustrate the skyline approach, suppose a database containing data from
the Madrid metro system3 following the GTFS (General Transit Feed Speci
cation) model4, a table named demands storing statistics on annual accumulated
demand or the number of users by line, and a table called uses containing the
number of uses or movements within each station where uses are calculated as
the number of entries and exits through the turnstile plus the number of transfers
between lines if it is a station with access to more than one Metro line.</p>
      <p>
        Consider a skyline query to identify the least used stations within the most
demanded lines. In this scenario, a station can be chosen, if and only if, there is no
other station with a lower use belonging to a line with a higher demand. To select
a station, we must identify the set of all the stations that are non-dominated by
any other station in terms of minimizing station uses and maximizing demands
per line. Following these criteria, a SPARQL skyline is expressed in Fig. 1a and
translated into SQL according to Fig. 1c. The computed skyline is presented
3 https://www.metromadrid.es
4 https://developers.google.com/transit/gtfs
in Fig. 1b. The stations 253, 278, 296, 295, and 259 are the non-dominated
ones, i.e., there is no other station s with values better than them in these two
attributes. Additionally, a station s1 dominates a station s2, if s1 has better
or equal values and at least one better in demand and use than s2, e.g., the
station 259 dominates the station 107. Based on related work, we devised a
solution where we have developed techniques able to identify the skyline set on
relational databases using an Ontology-Based Data Access (OBDA) approach
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and extending the Chebotko et al.'s translation method [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Particularly, we
have implemented an extension Chebotko et al.'s method which translates the
preference clause where each variable in the clause is mapped to an attribute
in the database. Our tool, Morph-Skyline5, receives a skyline SPARQL query,
translates it into SQL according to the de ned R2RML mappings, and then
executes the translated SQL query directly into the relational database engine.
Under this approach, our example skyline SPARQL query (Fig. 1a) is translated
to SQL (Fig. 1c) and then executed by a relational database engine.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The Morph-Skyline Architecture</title>
      <p>At data level, the Morph-Skyline architecture, shown in Fig. 2a, is constituted
by a virtual knowledge graph, a relational schema and the mapping between
them. In this architecture, the user formulates skyline SPARQL queries over the
virtual knowledge graph, which are transformed into the underlying query
languages of the relational database. Fig. 2b depicts the Morph-Skyline architecture
at component-level. The Morph-Skyline receives a skyline SPARQL query and a
set of R2RML mappings. Given a skyline SPARQL query, the parser component
performs its lexical, syntactic and semantic analysis. If the skyline SPARQL
query statement is correct, the Query Rewriter component prepares the query
to facilitate any optimization by the Query Translator. Then, the Query
Translator component uses the mappings to translate the skyline query posed over the
ontology into a skyline SQL query to be executed by the underlying relational
database engine. Subsequently, the query evaluator component receives the
skyline query rewritten by the Query Translator and executes it over the relational
database engine. Lastly, results obtained by the query evaluator are translated
by the query result writer component to RDF using the rules provided in the
mappings which de ne how ontological terms are related to terms occurring in
the relational schema.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Demonstration of Use Cases</title>
      <p>Within the transportation domain, the Madrid Metro web page provides its data
in GTFS format together with statistics such as uses by stations and demands
5 https://doi.org/10.5281/zenodo.3974178
(a) Data-level architecture</p>
      <p>
        (b) Component-level architecture
by lines6 in XLS format. We have joined these datasets in conjunction with
precalculated distances from Madrid Geonames dataset7. For each station, we have
pre-computed the closest distance to a school, a market, a shopping center and
a hospital. This dataset was scaled-up over the scale values (5, 10, 50, 100, 500
and 1000) by using VIG8. For these datasets, we have created an R2RML
mapping document for accessing each relational table with 81 PredicateObjectMaps
and 14 JoinConditions. In the context of this demo we will show how
MorphSkyline can be used in the transportation domain where stakeholders perform
SPARQL skyline queries over relational databases, so that the usefulness of
skyline SPARQL queries can be better understood and demonstrated specifying
preferences as criteria in a SPARQL query. We will also show the impact of the
dataset and skyline sizes on the performance of Morph-Skyline and the methods
based on TPF, brTPF and SkyTPF [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to evaluate skyline queries over
knowledge graphs. To evaluate the methods implemented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we have transformed
the datasets into RDF by means of RDFizer [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and then, we have converted
them to HDT by using the HDT library 9. We report the average execution time
in seconds for 5 SPARQL skyline queries varying the number of skyline criteria
from 2 to 6. Each query was executed 5 times in cold mode on a machine with
the following characteristics: 2GHz CPU with 8 cores, 64 GB RAM with Ubuntu
18.04 as its operating system. It is important to highlight that we have had to
eliminate the DISTINCT feature in the SELECT clause to execute the methods
based on TPF, brTPF and SkyTPF since they do not parse SELECT DISTINCT
6 https://www.metromadrid.es/es/transparencia/proyectos-y-datos\#panel1
7 https://download.geonames.org/export/dump/
8 https://github.com/ontop/vig
9 http://www.rdfhdt.org/manual-of-the-java-hdt-library/
statement. Our results show that Morph-Skyline time is constant as the size of
the dataset increases because the size of the skyline remains small for the di
erent sizes of datasets and queries. It is noteworthy that the best case for a skyline
method is when the skyline is small [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Unfortunately, the methods based on
TPF and brTPF return empty answers and SkyTPF it outputs an execution
error when they were executed on the original dataset. For the scaled datasets,
the methods based on TPF, brTPF and SkyTPF were not able to execute the
queries because of an execution error. Due to the lack of space, the resources
and results obtained in this real use case are publicly available in the GitHub
repository of the engine10.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>Identifying the points in a database that best meet a set of user criteria can
be represented as a skyline-based ranking problem. In our case, we have gone
further by moving into an OBDA context. We have developed a tool that
provides an e cient solution to this problem, allowing data to reside in a relational
database and skyline SPARQL queries to be used. As a proof of concept, we have
developed Morph-Skyline and implemented an algorithm based the Chebotko et
al.'s translation method on top of a relational database. We will demonstrate the
capabilities of our tool as well as the performance of our solution with respect
to state-of-the-art solutions.
10 https://github.com/oeg-upm/morph-skyline</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Borzsonyi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kossmann</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stocker</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>The Skyline Operator</article-title>
          .
          <source>In: Proceedings of the 17th International Conference on Data Engineering</source>
          , pp.
          <fpage>421</fpage>
          -
          <lpage>430</lpage>
          . IEEE Computer Society. Heidelberg, Germany (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chebotko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fotouhi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Semantics preserving SPARQL-to-SQL translation</article-title>
          .
          <source>Data Knowl. Eng.</source>
          , vol.
          <volume>68</volume>
          (
          <issue>10</issue>
          ), pp.
          <fpage>973</fpage>
          -
          <lpage>1000</lpage>
          . Elsevier Science Publishers
          <string-name>
            <surname>B. V.</surname>
          </string-name>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Iglesias</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jozashoori</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaves-Fraga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Collarana</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <article-title>SDMRDFizer: An RML Interpreter for the E cient Creation of RDF Knowledge Graphs</article-title>
          ,
          <source>ACM Intern. Confer. on Information and Knowledge Management</source>
          , (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kalyvas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tzouramanis</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <article-title>A Survey of Skyline Query Processing</article-title>
          .
          <source>Computing Research Repository Journal</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Keles</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hose</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Skyline Queries over Knowledge Graphs</article-title>
          .
          <source>In: The Semantic Web</source>
          , pp.
          <fpage>293</fpage>
          -
          <lpage>310</lpage>
          . Auckland, New Zealand (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mandl</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozachuk</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Endres</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kie ling</surname>
          </string-name>
          , W.:
          <article-title>Preference Analytics in EXASolution</article-title>
          . In: Datenbanksysteme fur Business,
          <source>Technologie und Web (BTW)</source>
          , pp.
          <fpage>613</fpage>
          -
          <lpage>632</lpage>
          . GI. Hamburg,
          <string-name>
            <surname>Germany</surname>
          </string-name>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Patel-Schneider</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Comparative Preferences in SPARQL</article-title>
          .
          <source>Knowl. Eng. and Knowl. Management</source>
          , pp.
          <fpage>289</fpage>
          -
          <lpage>305</lpage>
          . Nancy, France (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miranker</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : OBDA:
          <article-title>Query Rewriting or Materialization? In Practice, Both!</article-title>
          .
          <source>In: The Semantic Web - ISWC</source>
          , pp.
          <fpage>535</fpage>
          -
          <lpage>551</lpage>
          . Trentino,
          <string-name>
            <surname>Italy</surname>
          </string-name>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>