<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>methodological approach to incremental view maintenance for optimizing SPARQL queries in Solid</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dore Staquet</string-name>
          <email>dore.staquet@uhasselt.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bart Buelens</string-name>
          <email>bart.buelens@vito.be</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Van den Bussche</string-name>
          <email>jan.vandenbussche@uhasselt.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Science Institute, Hasselt University</institution>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>7</lpage>
      <abstract>
        <p>This contribution explores innovative strategies for eficient information retrieval in large-scale Solid deployments through the development of incremental view maintenance techniques. Assuming a pivotal role for web agents and aggregators in managing SPARQL queries within decentralized data architectures, we propose the adaptation of established relational database methodologies, specifically the counting algorithm and propagation rules, for the maintenance of materialized views in RDF (Resource Description Framework) data settings. Our research critically assesses the applicability of these algorithms to RDF, addressing the unique challenges posed by SPARQL and linked data's graph nature. We demonstrate the incremental maintainability of aggregation functions like COUNT, SUM, and AVG, while highlighting the limitations for functions such as MIN, MAX, and SAMPLE. By formulating these methodologies using SPARQL algebra, we set the stage for practical implementations that significantly enhance query response times without necessitating full data re-computation. This approach not only underscores the feasibility of applying relational database concepts to linked data but also sets a foundational framework for future research aimed at optimizing data retrieval processes in Solid-based applications and ecosystems at web-scale.</p>
      </abstract>
      <kwd-group>
        <kwd>RDF</kwd>
        <kwd>linked data querying</kwd>
        <kwd>materialized view</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        In large-scale Solid [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] deployments, eficient information retrieval will be crucial. We adopt
the model of web agents, which can be seen as decentralized data processing entities [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. One
important capability of such agents is responding to queries submitted to Solid pods. Queries
can be submitted by partners – parties engaging directly with solid pods, or by aggregators
– intermediary service providers that can be accessed by other aggregators or partners [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
SPARQL queries drive the information retrieval process in this decentralized data architecture.
In our research, we envisage both web agents and aggregators to maintain materialized views
to swiftly respond to SPARQL queries. Figure 1 provides a schematic overview of our setting, in
this case with three pods and agents providing their data to several agents and one aggregator.
In database design and operations generally, maintaining materialized views on the underlying
(J. V. d. Bussche)
      </p>
      <p>eb Agent
W
 1</p>
      <p>eb Agent
W</p>
      <p>2
Aggregator</p>
      <p>eb Agent
W
 3</p>
      <p>
        3
 4

4
data is a key facilitator of eficient information retrieval, certainly so in decentralized models.
Materialized views allow to eficiently respond to repeated querying. Often, identical queries are
repeated over time to take modifications of the underlying data into account. View maintenance
is the approach to update views following changes in the underlying data. A baseline strategy
to update a view following a change in the data is to re-execute the query and recompute the
view from scratch. In this work, we develop strategies to do this incrementally, avoiding full
re-computation where possible, by using the changes in the data only as opposed to the full
data set needed to compute the view. Our approach to incremental view maintenance in a
Solid setting is based on theory and algorithms established for relational database management
systems, which we will apply to linked data [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] in RDF (Resource Description Framework), the
data standard used in Solid [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Related work</title>
      <p>
        Incremental view maintenance has a history in the relational setting, where it has been studied
extensively [
        <xref ref-type="bibr" rid="ref7">7, 8, 9</xref>
        ]. To extend on this in regards to SPARQL, where the LeftJoin (OPTIONAL),
is very important, there has also been research on outer joins in incremental view maintenance.
Larson and Zhou propose an alternative [10] to Qian and Wiederhold’s [8] propagation rules.
Also Grifin and Kumar expand propagation rules for outer join and semijoin [ 11]. Regarding
multiset semantics, there was a proposal which include duplicate elimination by Grifin and
Libkin [12]. DBToaster, a state-of-the-art system in the relational setting for incremental view
maintenance, also works with multiset semantics [13]. There has been published work on
incremental maintenance on linkset views in SPARQL [14], but to the best of our knowledge
not yet on incremental view maintenance for general SPARQL patterns.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Methods</title>
      <p>We revisit methods known from the relational database literature and study how they can be
modified and adapted to our needs, to operate on RDF data.</p>
      <p>We envisage these methods to be implemented as capabilities of web agents, as illustrated in
Figure 1. The web agents in our setup run server-side close to the RDF data, which is stored
in Solid pods. A key capability of such a web agent is exposing a SPARQL endpoint, which
is not part of the Solid specification. The web agent acts as a layer on top of the Solid pod,
maintaining views on the data inside the pod, providing functionalities to set up new views for
clients. Aggregators too are server side components that maintain aggregated views integrating
data from multiple pods.</p>
      <p>First, we consider the counting algorithm [15] and its applicability in a SPARQL and linked
data setting. The counting algorithm is, in the relational setting, one of the most general
solutions to incremental view maintenance, i.e., updating the view using only the data changes
and not recomputing the entire view. The algorithm works by keeping track of the count of
tuples – or triples, for RDF data – that contribute to each value in the view. The algorithm
supports all of relational algebra, plus aggregation for incrementally computable aggregate
functions. Extending this algorithm from relational databases, what it was designed for, to
RDF requires handling forms of complexity that are characteristic of SPARQL and of the graph
nature of the data. This is a topic of our ongoing research.</p>
      <p>We formulate the incremental view maintenance issue as follows. Let  be an RDF dataset,
and  a SPARQL query. An update to the dataset is written as Δ . The original view based on
the query  ,   () , is to be recomputed taking the change into account, giving rise to an updated
view   (, Δ) . Incremental view maintenance aims to obtain an eficiently computable Δ 
such that   (, Δ) =   () + Δ  (Δ) , avoiding the need to access the full data  again to
update the view.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Results</title>
      <p>Before detailing algorithm specifics, we briefly reflect on the incremental maintainability of
certain views in the first place. In particular, we study the aggregator functions commonly used
in SPARQL, and note that the aggregation functions COUNT, SUM and AVG are incrementally
maintainable, while MIN, MAX and SAMPLE are not, at least not always. For example MAX:
when the query  selects the maximum of some values in  ,   () is the maximum value; if
Δ entails the deletion of any value other than the maximum, the new maximum is equal to
the old one; however when the first maximum is deleted, a new one needs to be computed –
for which the complete data set is needed again. We provide proofs or counterexamples of the
common operators.</p>
      <p>For example, based on Gupta and Mumick’s work[15], an aggregation with the rule
 (,  , ) ←    ((,  , ), [,  ],  =    ()) where rule  is defined as (,  ,  1 +
 2) ← (, ,  1), (,  ,  2). As illustrated in Figure 2, Graph  showcases an issue that
encounters when deleting a node with a minimum aggregation in the view maintenance. In
Figure 2a Graph  has  = ((, , 5)) as a result. When adding a node, as seen in Figure 2b, the
minimum is maintained incrementally as it should only compute a new minimum when the
edges relating to the new addition equate to the new minimum. In Figure 2b  = ((, , 2)) .
However when deleting a node, a problem occurs where the view can no longer be incrementally
maintained. Figure 2c shows that when node  gets deleted, there is no way of knowing what
the new minimum should be without having to re-evaluate the entire graph from scratch. The
result in Figure 2c is  = ((, , 6)) , but this can only be computed by recomputation.</p>
      <p>To investigate the algorithms mentioned in the previous section, we express SPARQL queries
2
d
b
c
3
2
a
d
b
c
3
(a) Graph G.</p>
      <p>(b) Addition of node  .</p>
      <p>(c) Deletion of node  .
in formal SPARQL algebra, allowing formal derivations and proofs, and serving as instructions
when implementing SPARQL query plans in practice.</p>
      <p>A foundational algorithm for incremental view maintenance in relational database systems
is the counting algorithm [15]. We adapt this algorithm to work for RDF data. Now, counts
of solution mappings must be kept and the required updates to views specified. We address a
number of issues. An important one, for example, is the handling of NULL values – a common
concept in relational databases. In RDF, or linked data more generally, NULL is not stated,
rather, that specific triple is simply not present. In SPARQL, selecting such data can be achieved
using the OPTIONAL function.</p>
      <p>Propagation rules [8, 9] ofer a fairly straightforward optimization recipe for incremental
view maintenance, once the rules are established and proven correct. We address this latter
aspect for RDF data by revisiting the rules for relational data and rewriting them for RDF. This
includes rules for diferences, unions, outerjoins, semijoins, etc.</p>
      <p>For example, illustrating an incremental view maintenance rule on a SPARQL context, assume
a Basic Graph Pattern rule where G is the graph on which the query is computated. This rule
could look like (,  ) ← (, foaf:name,  ), (, foaf:mbox, ) . Furthermore Δ represents
a changes made to the base graph. Following the definitions of the delta rules or counting
algorithm, the delta ruling for Δ becomes
Δ(,  ) ← Δ(,
Δ(,  ) ← 
foaf:name,  ), (,</p>
      <p>foaf:mbox, )
′(, foaf:name,  ), Δ(,
foaf:mbox, )
Here,  ′ =  + Δ denotes the updated graph. Now  ′ =  + Δ achieves
the desired new relation by calculation of the changes to the view. To
visualise this, take a graph shown in Figure 3a where there exists a graph
(( :b, foaf:name, Bob), (:b, foaf:mbox, bob@mail), (:b, foaf:mbox, bob@vito.be)). For this
graph we see that (( :b, Bob) ∗ 2) within the counting algorithm, derived twice here as
Bob possesses two mailboxes. Removing one of the mailboxes generates the change to the
graph as Δ(( :b, foaf:mbox, bob@mail) ∗ −1) indicating that this triple gets deleted once from
the base graph. By executing the counting algorithm Δ becomes ((:b, Bob) ∗ −1), making
 ′ = ((:b, Bob) ∗ 1). By Figure 3b it is evident that this is correct.</p>
      <p>:b
foaf:name</p>
      <p>Bob
:b
foaf:name</p>
      <p>Bob
foaf:mbox
foaf:mbox</p>
      <p>foaf:mbox
(a) Graph  .</p>
    </sec>
    <sec id="sec-6">
      <title>5. Discussion</title>
      <p>In the current research we are well underway of using elements of the counting algorithm
and propagation rules to be transferred to a linked data setting to render them useful for our
envisaged approach in a Solid model as sketched in Figure 1. Formulating these algorithms in
SPARQL algebra terms will allow us to move to a general implementation quickly.</p>
      <p>In the near future we will benchmark our incremental view maintenance scheme against
naive full re-computations, and to quantify the performance gains. We have been working on
an adaptation of the incremental maintenance counting algorithm in the context of SPARQL.
Utilizing delta rules, adapted to the algebra of SPARQL, using a combination of join, union,
diference, selection and projection rules. Using the diference and join we are able to also
implement the OPTIONAL operation in SPARQL. Next to diference, there is also an adaption
to the minus operation in SPARQL in our proposed algorithm. We plan to incorporate our
techniques in a Web agent operating in the Solid ecosystem. Finally, more research is needed
on aggregation, as well as applying incremental view maintenance to aggregators over multiple
pods.
Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data,
SIGMOD ’86, Association for Computing Machinery, New York, NY, USA, 1986, p. 61–71.</p>
      <p>URL: https://doi.org/10.1145/16894.16861. doi:10.1145/16894.16861.
[8] X. Qian, G. Wiederhold, Incremental recomputation of active relational expressions, IEEE
transactions on knowledge and data engineering 3 (1991) 337–341.
[9] T. Grifin, L. Libkin, H. Trickey, An improved algorithm for the incremental recomputation
of active relational expressions, IEEE Transactions on Knowledge and Data Engineering 9
(1997) 508–511.
[10] P.-Å. Larson, J. Zhou, Eficient maintenance of materialized outer-join views, 2007, pp. 56
– 65. doi:10.1109/ICDE.2007.367851.
[11] T. Grifin, B. Kumar, Algebraic change propagation for semijoin and outerjoin queries,</p>
      <p>SIGMOD Record 27 (1998) 22–27. doi:10.1145/290593.290597.
[12] T. Grifin, L. Libkin, H. Trickey, An improved algorithm for the incremental recomputation
of active relational expressions, IEEE Transaction on Knowledge and Data Engineering 9
(1997) 508–511.
[13] C. Koch, et al., DBToaster: Higher-order delta processing for dynamic, frequently fresh
views, The VLDB Journal 23 (2014) 253–278.
[14] E. Menendez, M. Casanova, V. Vidal, et al., Incremental maintenance of materialized
SPARQL-based linkset views, in: S. Hartmann, H. Ma (Eds.), Proceedings 27th International
Conference on Database and Expert Systems Applications, Part II, volume 9828 of Lecture
Notes in Computer Science, Springer, 2016, pp. 68–83.
[15] A. Gupta, I. S. Mumick, et al., Materialized views: techniques, implementations, and
applications, MIT press Cambridge, MA, 1998.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Sambra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mansour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hawke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zereba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Greco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghanem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zagidulin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aboulnaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <article-title>Solid: a platform for decentralized social applications based on linked data</article-title>
          ,
          <source>MIT CSAIL &amp; Qatar Computing Research Institute, Tech. Rep</source>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>The</given-names>
            <surname>Solid</surname>
          </string-name>
          <string-name>
            <surname>Team</surname>
          </string-name>
          , Solid project, https://solidproject.org/,
          <year>2024</year>
          . Accessed:
          <fpage>2024</fpage>
          -04-05.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vandenbrande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakubowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Buelens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ongenae</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Van den Bussche</surname>
          </string-name>
          , POD-QUERY:
          <article-title>Schema mapping and query rewriting for Solid pods</article-title>
          , in: I.
          <string-name>
            <surname>Fundulaki</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Kozaki</surname>
          </string-name>
          , et al. (Eds.),
          <source>Proceedings of the ISWC 2023 Posters</source>
          ,
          <article-title>Demos and Industry Tracks: From Novel Ideas to Industrial Practice</article-title>
          , volume
          <volume>3632</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vandenbrande</surname>
          </string-name>
          ,
          <article-title>Aggregators to realize scalable querying across decentralized data sources</article-title>
          ,
          <source>ISWC Doctoral Consortium</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <article-title>Linked data</article-title>
          , http://www. w3. org/DesignIssues/LinkedData. html (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W3C</given-names>
            <surname>Solid Community</surname>
          </string-name>
          <string-name>
            <surname>Group</surname>
          </string-name>
          , Solid protocol, https://solidproject.org/TR/protocol,
          <year>2022</year>
          . Accessed:
          <fpage>2024</fpage>
          -04-05.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Blakeley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. W.</given-names>
            <surname>Tompa</surname>
          </string-name>
          ,
          <article-title>Eficiently updating materialized views</article-title>
          , in:
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>