<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SPARQL-parser: Query Rewriting for Authorization in the semantic.works Microservice Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tom De Nies</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aad Versteden</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>In this paper, we demonstrate SPARQL-parser, the latest iteration of our authorization service in the semantic.works framework, which provides a developer-friendly way to create applications and microservices using Linked Data. SPARQL-parser allows developers to query the database transparently, without having to consider access rights for the current user. This service rewrites any SPARQL query based on its session headers, and makes use of a graph-based data division strategy for separate user groups. This accommodates the use of a combination of public and private data, without adding complexity to the application logic. In addition, the service provides delta notifications after executing INSERT/DELETE queries, which can be subscribed to by other microservices, allowing developers to build an reactive, data-driven application.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The semantic.works platform [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] enables building applications using Linked Data at the core. This
means that all data is Linked Data from the start. This has many advantages, the foremost of which is
that it makes for easy, conversion-free dissemination and integration with other applications. It does,
however, come with its own challenges. While Linked Data is often considered as open data, in realistic
scenarios it is often interlaced with private as well as public data. With many of our projects dealing
with governmental data, where public documents are mixed with private/sensitive information such as
contracts and digitally signed legislation, the need for an authorization microservice in the architecture
is clear.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we presented a previous iteration of such a microservice: mu-authorization, which faced a
number of challenges: (timed) data distribution, data duplication, query execution overhead and search
indexing speed. Additionally, configuration in the Elixir programming language was verbose and a
bottleneck in terms of runtime, afecting query performance.
      </p>
      <p>Additionally, one of its key features is the generation of so-called deltas, where any INSERT or
DELETE on the database triggers a notification message, informing subscribed microservices of the
changes made by the user (or service). This is a powerful concept, enabling reactive services to be
built into an application. In the previous iteration, services were simply provided the URIs of resources
that had changed, without listing the efective changes that were made. This required the reactive
microservices to use additional querying, which afected performance.</p>
      <p>Lastly, there was often a need to bypass the authorization component when executing queries without
an active user session, e.g., in the context of a cronjob. This added code complexity, since these queries
must include the correct graph statements, and introduced a potential security risk, since these services
then have read/write access to the whole database, requiring additional development efort to properly
shield their access points.</p>
      <p>Now, we have rewritten the component as SPARQL-parser, with the following improvements:
1. Simplified configuration in LISP, allowing “scoped“ requests, where microservices can get their
own access rights to specific graphs and resources, improving code clarity and security.
2. Delta notifications now include efective changes, instead of only all related resources as was the
case in the previous version. This allows for more eficient handling of these deltas, especially in</p>
      <p>terms of caching and data distribution.
3. Improved performance, especially in read-only scenarios, which has significantly sped up search
indexing.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Authorization of SPARQL endpoints is by no means an unexplored problem. While a full overview
of literature is impractical for this demonstration, surveys of related work exist in [3], [4], and [5].
Following the classification scheme of [ 3], our approach can be placed somewhere between Role Based
Access Control and Context Based Access Control. While we do allow contextual access rules, in practice
the complexity is often limited to the user role, in order to preserve query eficiency. When used as
a standalone entry point, our approach is closely related in concept to an “authorization proxy“, as
described in [6]. However, the main diference is that SPARQL-parser is meant to exist in an ecosystem
of closely related microservices, where it provides added value thanks to its emission of delta events
and its transparency to other microservices when reading and writing data.</p>
    </sec>
    <sec id="sec-3">
      <title>3. SPARQL-parser</title>
      <p>In Figure 1, we provide a high-level overview of a typical semantic.works application stack.
SPARQLparser is part of the “Store &amp; Sync“ module, which handles all requests coming from the microservices
(and indirectly: the frontend).</p>
      <p>
        The general principle of our authorization approach as described in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] has not changed: data in the
triplestore is organized into graphs, and the read access to these graphs is restricted to a certain set
of criteria for each user, usually association with a certain group and/or role. Additionally, the write
access is restricted even further to a specified set of resource types and predicates. SPARQL-parser
then rewrites incoming queries by adding the correct graphs based on the session information of the
user and the access rights on the data, and forwards the rewritten query to the triplestore. Paired with
identification &amp; authentication services, this allows the application frontend and other microservices
to transparently query the triplestore, without having to implement any custom authorization logic
themselves.
      </p>
      <p>A major feature of SPARQL-parser is the emission of delta messages. When a user (or service) writes
data to the triplestore, SPARQL-parser generates a delta message, which includes the inserted quads,
deleted quads, efective changes, and the allowed groups of the original request. This efectively allows
a microservice to react on behalf of the user based on the changes they made to the data.</p>
      <p>The code for SPARQL-parser is open source, available at https://github.com/mu-semtech/
sparql-parser. A basic configuration is structured as follows (omitting boilerplate code):
(define-graph public ("http://mu.semte.ch/graphs/public")
(_ -&gt; _)) ; public allows ANY TYPE -&gt; ANY PREDICATE</p>
      <p>; in the direction of the arrow
(define-graph privatebooks ("http://mu.semte.ch/graphs/privatebooks/")
("schema:Book"
-&gt; "schema:genre"
-&gt; "dct:creator"
-&gt; "dct:issued"))
(supply-allowed-group "public")
(grant (read)
:to-graph public
:for-allowed-group "public")
(supply-allowed-group "privatebooks"
:query "PREFIX org: &lt;http://www.w3.org/ns/org#&gt;</p>
      <p>PREFIX ext: &lt;http://mu.semte.ch/vocabularies/ext/&gt;</p>
      <p>PREFIX skos: &lt;http://www.w3.org/2004/02/skos/core#&gt;
SELECT DISTINCT ?role_label WHERE {
&lt;SESSION_ID&gt; ext:sessionMembership / org:role ?role .
?role skos:notation ?role_label .</p>
      <p>VALUES ?role { &lt;https://example.org/privateBookReader&gt; } .</p>
      <p>}"
:parameters ("role_label"))
(grant (read)
:to-graph privatebooks
:for-allowed-group "privatebooks")
(with-scope "service:privatebook-service"
(grant (write)
:to-graph privatebooks
:for-allowed-group "public"))</p>
      <p>In this example, a public and private graph are defined, which will house publicly available books
and private ones, respectively. The (_ -&gt; _) statement means that the public graph could contain
any resource and predicate. On the other hand, the privatebooks graph defined below is restricted to a
specific set of resources with type schema:Book with specific predicates.</p>
      <p>The statement (supply-allowed-group "public") supplies the group “public“ to any incoming
request, and the grant (read) statement provides read access to the public graph for this group. This
means that any unauthenticated users will have access to the public graph of the triplestore, which is
realistic in a scenario where public data is disseminated.</p>
      <p>Below that, we see that the supply-allowed-group "privatebooks" statement is more complex,
and includes a template query that checks whether the session is associated with a user that has a role
that allows the user to read private books. Additionally, the role_label parameter will be appended to
the http://mu.semte.ch/graphs/privatebooks/ URL for every query, ensuring data separation
for each user role.</p>
      <p>Based on these allowed groups, many optimizations become possible, since we know that the same
allowed groups will have access to the same data. For example, in our real-world implementations, we
use this for caching, search indexing, and push-updates.</p>
      <p>Finally, we see a service that receives scoped write access to the privatebooks graph, even when the
only allowed group it has is “public“. This means that this service will be able to write all resource types
and predicates specified for the privatebooks graph, into this graph and this graph only. This allows the
service to work without the need for an active session, enabling it to read &amp; write data while reacting
to deltas, or when executing cronjobs, for example.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Demonstration</title>
      <p>Our demonstrator is hosted at https://authorization-demo.redpencil.io. This small web application
showcases the two main features of SPARQL-parser: access rights and reactive services, as well as
more generic features of the semantic.works framework. All code is open source, available at https:
//github.com/tdn/app-auth-demo/ (backend) and https://github.com/tdn/frontend-auth-demo (frontend).
The backend code is relevant to this paper in particular, as it includes a more complete authorization
configuration that the example listed in Section 3.</p>
      <p>When accessing the demonstrator webpage, the user is provided with a YASGUI[7] query editor,
through which a portion of publicly available data can be queried right away. The public graph was
seeded with a number of RDF resources with the type ‘schema:Book‘, and a number of properties such
as author, genre, issue date, wikidata reference, etc.</p>
      <p>However, when using the mock-login route, we can simulate an authenticated user, for whom
private data becomes available. As the demonstrator explains, the same SPARQL query will yield
diferent results, depending on whether the user is authenticated or not.</p>
      <p>To demonstrate write access, we allow authenticated users to store “favorites“, either through the
user interface or by writing a SPARQL INSERT query. Note that when the user attempts to write other
data, this will be stopped by SPARQL-parser, and an error will be returned by the SPARQL endpoint. In
other words, users only see what they are allowed to see, and can only write what they are allowed to
write.</p>
      <p>Finally, to showcase scoped write access, we created a QR-code generating service, which reacts on
any “favorites“ write actions being added to the system by generating a QR code for the URI of the
favorited book, and storing it into the database. This demonstrates that even though the user has no
writing rights for these types of resources, a scoped service can still get the necessary rights to do so.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion &amp; Future Work</title>
      <p>Since its first stable release in 2024, we have deployed SPARQL-parser in multiple production
environments such as Rollvolet’s CRM1, the Local Mandatee management app of LBLOD2, the Veeakker
webshop3, and the Flemish Governments decisionmaking support platform Kaleidos4.</p>
      <p>While we have not noticed significant write speed improvements (or deteriorations) in these apps
compared to mu-authorization, we did measure a read speed increase in Kaleidos. Its elasticsearch
indexer, which performs thousands of read queries in rapid succession, has sped up by a factor of 3
(from approx. 36 hours to approx. 12 hours for a full reindex).</p>
      <p>In future work, we aim to improve SPARQL-parser even further by working on read-only resource
constraints for graphs, since read access is now generalized per graph, regardless of resource type.
This will create less load on the triplestore and improve caching. Additionally, we want to extend
graph specifications and scopes for enhanced constraints, allowing enhanced model validation and
basic forward reasoning. Lastly, while the current implementation supports read/write to multiple
triplestores, we are looking into the possibility to have SPARQL-parser act as a Linked Data proxy to
other types of data stores, that are not necessarily triplestores.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <sec id="sec-6-1">
        <title>The authors have not employed any Generative AI tools in this paper.</title>
      </sec>
      <sec id="sec-6-2">
        <title>1https://github.com/rollvolet/app-crm/</title>
        <p>2https://github.com/lblod/app-lokaal-mandatenbeheer/
3https://github.com/veeakker/app-veeakker-webshop/
4https://github.com/kanselarij-vlaanderen/app-kaleidos/
[3] S. Kirrane, A. Mileo, S. Decker, Access control and the resource description framework: A survey,</p>
        <p>Semantic Web 8 (2016) 1–42. doi:10.3233/SW-160236.
[4] V. Zdraveski, D. Trajanov, R. Stojanov, S. Stojanova, M. Jovanovik, Ranking semantic web
authorization systems, Semantic Web (2017).
[5] T. G. da Silva, Access control in linked data archives (2023).
[6] R. Stojanov, M. Jovanovik, Authorization proxy for sparql endpoints, in: ICT Innovations 2017,</p>
        <p>Springer International Publishing, Cham, 2017, pp. 205–218.
[7] L. Rietveld, R. Hoekstra, Yasgui: not just another sparql client, in: Extended Semantic Web
Conference, Springer, 2013, pp. 78–86.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Versteden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Pauwels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Papantoniou</surname>
          </string-name>
          ,
          <article-title>An ecosystem of user-facing microservices supported by semantic models</article-title>
          .,
          <source>USEWOD-PROFILES@ ESWC</source>
          <volume>1362</volume>
          (
          <year>2015</year>
          )
          <fpage>12</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>T. De Nies</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Versteden</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Pauwels</surname>
            ,
            <given-names>J. Delaure,</given-names>
          </string-name>
          <article-title>Combining public and private linked data through graph-based authorization profiles in the semantic. works framework</article-title>
          ., in: QuWeDa/MEPDaW@ ISWC,
          <year>2023</year>
          , pp.
          <fpage>22</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>