<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>POD-QUERY: Schema Mapping and Query Rewriting for Solid Pods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maarten Vandenbrande</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maxime Jakubowski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pieter Bonte</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bart Buelens</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Femke Ongenae</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Van den Bussche</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Science Institute, Hasselt University</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Departement of Computer Science, KU Leuven Kulak</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Flemish Institute for Technological Research (VITO)</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>IDLab, Ghent University - imec</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <fpage>6</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>We envisage a decentralized Web architecture in which access to data in Solid pods is mediated by Web agents. Leveraging methods from information integration and data exchange, Web agents can give clients access to views defined by schema mappings. Queries formulated by clients over the view are rewritten by the agent to queries that work correctly over the base data. We demonstrate POD-QUERY, our prototype implementation of such an architecture, where schema alignment is specified by RDF-to-RDF mappings written in RML, and SPARQL queries are rewritten based on these mappings. We demonstrate our system in a personal health scenario.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Decentralized Web</kwd>
        <kwd>Personal data management</kwd>
        <kwd>Web agent</kwd>
        <kwd>Schema alignment</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Background</title>
      <p>
        For many years there has been interest in the management of personal data in a decentralized
manner, e.g., ongoing societal movements such as MyData [https://mydata.org], and work on
personal information management systems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A more recent example is the massive uptake
of Mastodon and the Fediverse [https://spectrum.ieee.org/mastodon-social-media]. Of course,
more broadly, support for federated linked data has always been a goal of the Semantic Web [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        An instrumental advance in this direction is Solid, a Web protocol for accessing “pods”, i.e.,
personal data vaults [https://solidproject.org]. Solid essentially describes authenticated HTTP
access to linked data containers. Like the Fediverse, Solid was initially inspired by social network
applications [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], but also has a broader applicability in the decentralized Web of data. There is
ambition to become a W3C standard.
      </p>
      <p>
        Solid pods, following the LDP protocol, serve resources structured in nested containers,
comparable to files in a UNIX directory structure. Resources can be RDF or non-RDF (i.e.,
blobs); we will focus here on resources that are RDF graphs. Clients, identified by a Web ID,
can be given access to specific resources in the pod. Access can be read or write; we will focus
here on read access. When a client is given read access to some resource, they can simply
download (HTTP GET) the corresponding RDF graph. For some types of applications, however,
and especially with larger graphs, we may want to allow clients to submit SPARQL queries
instead. This natural extension to Solid is being developed or supported by several servers
(Inrupt, Comunica [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), and is adopted in our demo system (using Comunica).
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Problem statement and approach</title>
      <p>The problem motivating this work is that of dealing with multiple parties or clients. Assuming a
world where Solid pods are used routinely, there will be multiple parties that we are willing to
share data with, e.g., governments, health providers, recommendation services, social networks.
With each party we want to share diferent data, and, moreover, each party would likely prefer
to see that data according to a specific vocabulary or schema. However, the pod stores our base
data in several RDF graphs that are organized according to our own, fixed, internal schema.
Therefore, we cannot simply grant a party access to some set of graphs coming straight out of
the pod, as these will likely include data that we are not willing to share with said party, and
moreover, the data will not conform to the party’s desired schema.</p>
      <p>Towards a solution, we propose here the POD-QUERY system. The user can specify diferent
views over their personal data for every requesting party. This way, the user can control exactly
what the party can query. These views can hide data as well as transform into the right shape.</p>
    </sec>
    <sec id="sec-3">
      <title>3. The POD-QUERY System</title>
      <p>POD-QUERY (Fig. 1) realizes views on Solid pods through RDF-to-RDF schema alignment
specified in RML, and rewriting SPARQL queries based on these RML mappings.</p>
      <p>Formally, consider a pod  owned by , and a party  with which  is willing to share
some of its data.  specifies a schema , stating requirements to which they expect the data
to conform. We are not fixing the schema language in this paper; the requirements can be very
loose, merely suggesting the use of certain vocabularies, or can be quite strict, taking the form
of SHACL constraints, for example. POD-QUERY consists of the following components:
View:  defines, over the data in , a view  which contains the RDF graph that  will be
allowed to read. This view conforms to .</p>
      <p>
        Schema mapping:  is specified as a schema mapping, from data conforming to whatever
local schema used in , to RDF graphs conforming to . In our approach, we use
RDFto-RDF mappings written in RML https://rml.io [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], but alternatives could be SPARQL
CONSTRUCT queries, or N3Logic rules.
      </p>
      <p>Query rewriting:  can simply download the view , which may be computed on demand
or may be materialized by  (in which case it must also be kept up-to-date under updates
to ). However, for larger data, and frequent access, we want to make it possible for  to
be able to pose arbitrary SPARQL queries over the view . If  is materialized, these
queries can be directly answered by a SPARQL engine. Otherwise, however, the client
query  must be rewritten into an equivalent query ′ that works over the local schema.
Web agent, mediator: Keeps track of which views are used for which parties, executes or
materializes the schema mappings, and applies query rewriting. This agent has full access
to , but may appear to  as a virtual Solid pod; the view  is then a resource to which
 is granted access.</p>
      <p>
        Each schema mapping is based on a SPARQL query that can be executed over a pod to result
in a set of variable bindings. Each binding gives rise to generated triples, which can involve
newly created IRIs through IRI templates [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Our system POD-QUERY generates, from any
input RML mapping document  and any input client SPARQL query , a SPARQL query ′
such that ′() = ( ()), for every pod dataset .
      </p>
      <p>An example of rewriting is given in Figure 2. Our SPARQL–RML Rewriter software is
publicly available as an independent module https://github.com/MaximeJakubowski/SRR. The
consolidated POD-QUERY system software is available as well https://github.com/maartyman/
solid-aggregator-server/tree/query-rewriting, https://github.com/maartyman/solid-agent.</p>
    </sec>
    <sec id="sec-4">
      <title>4. The demo</title>
      <p>Video: https://vimeo.com/846598633, Online: https://podquery-demo.vito.be</p>
      <p>
        We focus on a personal health data sharing scenario, inspired by the We Are platform in
Flanders [https://we-are-health.be]. Citizens are asked to fill a health questionnaire known as
GGDM. As this pertains personal information, answers to the questions are stored in their pod
using a designed GGDM vocabulary. Now assume a regional research survey (RRS) which asks
people access to their GGDM data in order to study diabetes. Alice is willing to participate,
but only wants to share selected info. Moreover, for her diabetes status, she refers to her
health record, which was directly filled in her pod at the hospital. This record using the FHIR
vocabulary [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], however. Thus, Alice instructs her Web agent to invoke two schema mappings
defining her view for RRS: (1) directly retrieve only selected GGDM answers; and (2) transform
my diabetes status from FHIR to GGDM.
      </p>
      <p>Now RRS, contacting Alice’s Web agent, may come with a query to retrieve all available
GGDM answers, on condition that her diabetes status is positive. POD-QUERY will automatically
rewrite this query correctly, checking diabetes status in FHIR and returning only the answers
(e.g., eating habits and exercising) that Alice instructed to share. For another example, RRS may
ask how many GGDM answers Alice makes available. In general, arbitrary client queries can be
posed, but will be rewritten to answer only what Alice makes available, possibly from diverse
sources.
# Schema mapping in RML
:answerSource a rml:LogicalSource ;
rml:source :sparqlService ;
rml:referenceFormulation ql:JSONPath ;
rml:iterator "$.results.bindings[*]" ;
rml:query
"""
SELECT ?question ?completedQuestion
?answer ?date ?person ?session
WHERE { ?session
prov:atTime ?date ;
prov:wasAssociatedWith ?person .
?completedQuestion
sur:answeredIn ?session ;
sur:hasAnswer ?answer ;
sur:completesQuestion ?question .</p>
      <p>FILTER (?question IN (ggdm:question1,
ggdm:question2,ggdm:question3,
ggdm:question4,ggdm:question5,
ggdm:question6,ggdm:question6-1,
ggdm:question6-2,ggdm:question7,
ggdm:question7-1,ggdm:question7-2,
ggdm:question7-3,ggdm:question7-4,
ggdm:question7-5,ggdm:question7-6,
ggdm:question7-7,ggdm:question7-8,
ggdm:question7-9,ggdm:question7-10,
ggdm:question8-1,ggdm:question9-1,
ggdm:question10,ggdm:question11,
ggdm:question12,ggdm:question13,
ggdm:question14)) }
""" .
:completedQuestionTriplesMap
a rr:TriplesMap ;
rml:logicalSource :answerSource ;
rr:subjectMap [
rml:reference
"completedQuestion.value" ;
rr:termType rr:IRI ] ;
rr:predicateObjectMap [
rr:predicate sur:answeredIn ;
rr:objectMap [
rml:reference "session.value" ;
rr:termType rr:IRI ] ] ;
rr:predicateObjectMap [
rr:predicate sur:hasAnswer ;
rr:objectMap [
rml:reference "answer.value" ;
rr:termType rr:IRI ] ] ;
rr:predicateObjectMap [
rr:predicate sur:completesQuestion ;
rr:objectMap [
rml:reference "question.value" ;
rr:termType rr:IRI ] ] .</p>
      <p>:sessionTriplesMap a rr:TriplesMap ;
rml:logicalSource :answerSource ;
rr:subjectMap [
rml:reference "session.value" ;
rr:termType rr:IRI ] ;
rr:predicateObjectMap [
rr:predicate prov:atTime ;
rr:objectMap [
rml:reference "date.value" ;
rr:datatype xsd:date ] ] ;
rr:predicateObjectMap [
rr:predicate prov:wasAssociatedWith ;
rr:objectMap [
rml:reference "person.value" ;
rr:termType rr:IRI ] ] .
#
# Client query in SPARQL
# How many GGDM questions are available?
#
SELECT (
COUNT(DISTINCT ?completedQuestion) )
WHERE {
?completedQuestion sur:answeredIn ?session
}
#
# Rewritten query
#
SELECT (
COUNT(DISTINCT ?completedQuestion) )
WHERE {
{SELECT ?rvar28 ?rvar24 ?rvar29</p>
      <p>?rvar27 ?rvar25 ?rvar26
WHERE {
?rvar26 prov:atTime ?rvar27 ;
prov:wasAssociatedWith ?rvar25 .
?rvar24 sur:answeredIn ?rvar26 ;
sur:hasAnswer ?rvar29 ;
sur:completesQuestion ?rvar28
FILTER (?rvar28 IN (ggdm:question1,
ggdm:question2,ggdm:question3,
ggdm:question4,ggdm:question5,
ggdm:question6,ggdm:question6-1,
ggdm:question6-2,ggdm:question7,
ggdm:question7-1,ggdm:question7-2,
ggdm:question7-3,ggdm:question7-4,
ggdm:question7-5,ggdm:question7-6,
ggdm:question7-7,ggdm:question7-8,
ggdm:question7-9,ggdm:question7-10,
ggdm:question8-1,ggdm:question9-1,
ggdm:question10,ggdm:question11,
ggdm:question12,ggdm:question13,
ggdm:question14)) } }
BIND(?rvar26 AS ?session)
BIND(?rvar24 AS ?completedQuestion)
}</p>
    </sec>
    <sec id="sec-5">
      <title>5. Related work, concluding remarks</title>
      <p>
        The application of views is, of course, very familiar from database systems. Furthermore,
techniques for schema mapping and query rewriting have been developed in information
integration and data exchange [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. For example, the successful Ontop system applies
SPARQLto-SQL rewriting [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Several researchers have explored the use of views and query rewriting for privacy [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ].
The utility of expressive views as a bridge from the local vocabulary of pod data to client
vocabularies was also pointed out by Arndt and Van Woensel [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], but query rewriting was not
considered.
      </p>
      <p>The novelty of our work lies in showing the applicability of these techniques and adapting
them to the new Solid pod context. Furthermore, our use of RML to specify RDF-to-RDF
mappings is new, as is our SPARQL-to-SPARQL rewriting algorithm. Our algorithm (details to
be published separately) involves an optimization technique that checks satisfiability on sets of
equalities coming from matched triple patterns in the BGP against the RML mapping heads.</p>
      <p>The design of interfaces for configuring the schema mappings in Web agents is an important
direction for future research.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by the Flanders AI Research Program, and also partly funded by the
SolidLab Vlaanderen project (Flemish Government, EWI and RRF project VV023/10) and the
FWO Project FRACTION (Nr. G086822N). We are indebted to Thomas Delva for his contributions
to an earlier stage of this work, and to Bart Bogaerts, Anastasia Dimou, and Ruben Verborgh
for inspiring discussions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>André</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <article-title>Managing your digital life</article-title>
          ,
          <source>CACM</source>
          <volume>59</volume>
          (
          <year>2015</year>
          )
          <fpage>32</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <article-title>A review of the Semantic Web field</article-title>
          ,
          <source>CACM</source>
          <volume>64</volume>
          (
          <year>2021</year>
          )
          <fpage>76</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mansour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sambra</surname>
          </string-name>
          , et al.,
          <article-title>A demonstration of the Solid platform for social web applications</article-title>
          ,
          <source>in: 25th WWW Companion</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>223</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Taelman</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Van Herwegen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Vander</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <article-title>Comunica: A modular SPARQL query engine for the Web</article-title>
          , ISWC,
          <year>2018</year>
          , pp.
          <fpage>239</fpage>
          -
          <lpage>255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Vander</given-names>
            <surname>Sande</surname>
          </string-name>
          , et al.,
          <article-title>RML: A generic language for integrated RDF mappings of heterogeneous data</article-title>
          ,
          <source>LDOW</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cogrel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Komla-Ebri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rezk</surname>
          </string-name>
          , M. RodriguezMuro, G. Xiao,
          <article-title>Ontop: Answering SPARQL queries over relational databases</article-title>
          ,
          <source>Semantic Web</source>
          <volume>8</volume>
          (
          <year>2017</year>
          )
          <fpage>471</fpage>
          -
          <lpage>487</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bender</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sartipi</surname>
          </string-name>
          , HL7
          <string-name>
            <surname>FHIR</surname>
          </string-name>
          :
          <article-title>An agile and RESTful approach to healthcare information exchange</article-title>
          ,
          <source>CBMS</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>326</fpage>
          -
          <lpage>331</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ives</surname>
          </string-name>
          , Principles of Data Integration, Morgan Kaufmann,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barceló</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Libkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Murlak</surname>
          </string-name>
          ,
          <source>Foundations of Data Exchange</source>
          , Cambridge University Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bonifati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Comignani</surname>
          </string-name>
          , E. Tsamoura,
          <article-title>Exchanging data under policy views</article-title>
          ,
          <source>EDBT</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Oulmakhzoune</surname>
          </string-name>
          et al.,
          <article-title>Privacy query rewriting algorithm instrumented by a privacyaware access control model</article-title>
          ,
          <source>Ann. des Télécommunications</source>
          <volume>69</volume>
          (
          <year>2014</year>
          )
          <fpage>3</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Goncalves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>M. Endris, PURE: A privacy aware rule-based framework over knowledge graphs</article-title>
          ,
          <source>DEXA</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>205</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Arndt</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. Van Woensel</surname>
          </string-name>
          ,
          <article-title>Decentralization rules: Linking Solid pods in diferent vocabularies using N3</article-title>
          ,
          <source>First DKG Workshop</source>
          ,
          <year>2022</year>
          . https://paul.ti.rw.fau.de/~ec69etyl/2022/ dkg-22/DKG-22_paper_7.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>