<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>AMW</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Opportunities for Shape-Based Optimization of Link Traversal Queries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bryan-Elliott Tam</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruben Taelman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pieter Colpaert</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruben Verborgh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IDLab, Department of Electronics and Information Systems, Ghent University - imec</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>16</volume>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less performant and understood. The two main dificulties of LTQP are the lack of apriori information about data sources and the high number of HTTP requests. Exploring decentralized-friendly ways to document unindexed networks of data sources could lead to solutions to alleviate those dificulties. RDF data shapes are widely used to validate linked data documents, therefore, it is worthwhile to investigate their potential for LTQP optimization. In our work, we built an early version of a source selection algorithm for LTQP using RDF data shape mappings with linked data documents and measured its performance in a realistic setup. In this article, we present our algorithm and early results, thus, opening opportunities for further research for shape-based optimization of link traversal queries. Our initial experiments show that with little maintenance and work from the server, our method can reduce up to 80% the execution time and 97% the number of links traversed during realistic queries. Given our early results and the descriptive power of RDF data shapes it would be worthwhile to investigate non-heuristic-based query planning using RDF shapes.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Linked data</kwd>
        <kwd>Link Traversal Query Processing</kwd>
        <kwd>Query containment</kwd>
        <kwd>RDF data shapes</kwd>
        <kwd>Descentralized environments</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The World Wide Web is a naturally decentralized database. Centralizing large web segments in single
endpoints provides easier query interfaces and faster query execution times. However, data centralization
can lead to practices that raise ethical and legal concerns, making the exploration of
decentralizationfriendly query paradigms a relevant research topic. The query languages webSQL [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and SPARQL
propose mechanisms to capture decentralized web data with conjunctive queries. However, webSQL
relies on web indexing [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Indexing processes can be expensive, particularly on the scale of the web,
and necessitate frequent updates, furthermore, they can be restrictive by excluding some sources thus
hindering the natural serendipity of the web. SPARQL solutions rely on the publication of linked data.
Linked data in their structure particularly with the presence of IRI gives the opportunity to find more
related information without indexes. However, most query processing over linked data is performed in
centralized and federated setups, leaving indexing-independent approaches largely experimental.
      </p>
      <p>
        Link Traversal Query Processing (LTQP) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a method to query unindexed networks of linked data
documents. The method consists of answering a query using an evolving triple store. This evolving
triple store is continuously updated with data acquired by the query engine through the recursive
dereferencing of IRIs from the store. The process is started with a set of IRIs provided by the user to the
engine. While LTQP enables live exploration of environments without prior indexing, it leads to some
dificulties. One of them is the pseudo-infinite search domain derived from the size of the World Wide
      </p>
      <p>
        Web [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Additionally, HTTP requests can be very slow and unpredictable making their execution the
bottleneck of the method [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Reachability criteria [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] are a partial answer to this problem by defining
completeness based on the traversal of URIs contained in the internal data source of the engine instead
of on the acquisition of all the results or the traversal of the whole web. Another dificulty is the lack of
a priori information about the sources rendering query planning challenging. To alleviate this problem,
the current state-of-the-art consists of using carefully crafted heuristics for joins ordering [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The
limitations of the heuristics approach are usually of little importance because the main bottleneck is
the high number of HTTP requests.
      </p>
      <p>
        Earlier LTQP research has focused on the open web. More recently, LTQP research has shifted its
focus to environments where the structure of data publication provides useful information for query
optimization. This line of research uses structural assumptions [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to guide query engines [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] towards
relevant data sources. Structural assumptions act as contracts between the data provider and the query
engines stipulating that within a certain subdomain of the web, information meeting a specific constraint
can be found. The use of structural assumptions has been studied in Solid [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The method involves
the utilization of the solid storage hypermedia description [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to locate all the resources of a pod. This
hypermedia description is not expressive enough to capture the content of the resources of a pod,
thus, for query-aware optimizations, the type index specification 1 is additionally used. The type index
formulation proposes a more declarative approach [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] by mapping RDF classes with sets of resources.
By using those structural assumptions it is possible to reduce the query execution time of realistic
queries to the extent where the bottleneck is not the execution of HTTP requests but the suboptimal
heuristic-based query plan [
        <xref ref-type="bibr" rid="ref5 ref9">9, 5</xref>
        ]. Yet, for multiple queries the high number of HTTP requests remains
the main bottleneck [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. It is reasonable to hypothesize that a significant portion of those HTTP requests
lead to the dereferencing of documents containing data that do not contribute to the result of the query.
Hence, investigating more descriptive structural assumptions is a relevant research endeavor.
      </p>
      <p>
        In this article, we propose to use RDF data shapes as the main mechanism for a structural assumption
in the form of a shape index. RDF data shapes are mostly used in data validation [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] thus, they
provided a good formalization to describe the structure of data. Additionally, to a lesser extent, they
have been used for query optimizations [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The shape index is an early efort for data summarisation
of decentralized datasets [
        <xref ref-type="bibr" rid="ref12 ref13 ref14">12, 13, 14</xref>
        ] within networks of unindexed linked data documents. The current
focus of the index is source selection. However, we foresee opportunities to use a similar approach for
link queue ordering and query planning. This paper presents our preliminary work on data discovery
and link pruning thus tackling the problem of the large search space of LTQP queries in linked data
environments with structure.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Shape Index and Query-Shape Containment</title>
      <p>
        The RDF specification does not enforce schemas on data. However, the data published often adhere
to an implicit schema due to the nature of its modeled object and the formulation of RDF [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. From
those observations, implicit schemas have been used with success for query optimization [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]. We
propose to adapt those methods for LTQP by using explicit data schemas provided by the data provider
in our source selection process.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Shape Index</title>
        <p>
          Our method introduces the concept of a shape index to reduce query execution time by minimizing
unnecessary dereferencing of RDF documents within web subdomains (sets of URLs or URL patterns). 2
We define a shape index as a set of mappings between RDF data shapes and sets of resources. This
mapping concept shares similarities with shape mapping [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and target declarations [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. However,
        </p>
        <sec id="sec-2-1-1">
          <title>1 https://solid.github.io/type-indexes/</title>
          <p>2 From a perspective where the domain is composed of URLs leading to linked data documents and the codomain is composed
of the documents with their RDF content.
instead of mapping shapes to RDF subgraphs, the shape index maps shapes to sets of documents. The
shape index also shares commonalities with shape trees, 3 however, it is designed to be a simpler
formulation focused on query optimization. The shape index has a range of applications defined in a
domain and a flag indicating if the index is complete. A shape index is complete when every resource in
the domain is associated with a shape within the shape index. In a shape index when a shape is mapped
to a set of RDF resources then the shape must validate those resources. Furthermore, every set of triples
respecting the shape in the domain must be located inside one resource of the set.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Query-Shape Containment</title>
        <p>
          In order to determine before the traversal of a whole subdomain which resources are useful and which
can be pruned, the query engine solves a query-shape containment problem over the shape of the index
analogous to the classic query containment problem [
          <xref ref-type="bibr" rid="ref19 ref20">19, 20, 21</xref>
          ]. The query containment problem
consists of determining independent of the data source if the results of a query will be a subset of
the results of another query. We propose to express RDF data shapes into SELECT SPARQL queries
() [22, 23, 24, 25] and apply similar resolution methods to query containment problems. Due to the
explicit domain definition of the index, this approach is adaptative, thus, the query engine can start its
processing with permissive reachability criteria such as  [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] or the Solid state-of-the-art reachability
criteria [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and not sufer from the associated longer execution time during the traversal of environments
containing a shape index. The source selection process is schematized for a single (sub)domain in
Figure 1. The process starts with the discovery of the shape index in the current (sub)domain. In the
case of Solid, the index can be at the root of the pod to be easily discoverable. 4 After the dereferencing
of the index, the analysis is started inside the query engine. The analysis consists of interpreting the
binding results (homomorphism and "partial" homomorphism) of the query-shape containment problem.
The algorithm divides the query from the user into multiple star patterns with their dependent star
pattern (). After the division of the query, the queries are pushdown [
          <xref ref-type="bibr" rid="ref12">12, 26</xref>
          ] to the level of source
selection to evaluate if the  are contained inside the  of the shape index. If all the  are
contained in a  or have no binding with any  the reachability criterion is adapted to ignore all the
resources not linked to a  even if the shape index is incomplete. If the shape index is complete and
not all the  are contained in a  the reachability criterion can be adapted to visit every resource
in relation to a  with a partial binding with a . In a similar case with an incomplete shape index,
the query engine can only use the shape index for data discovery. This case is similar to the usage of
the type index but with a more reaching ability to match a query with the index because shapes in their
definition describe the properties (RDF predicates) of the entities whereas the type index only provides
        </p>
        <sec id="sec-2-2-1">
          <title>3 https://shapetrees.org/TR/specification/</title>
          <p>
            4 In this work, we do not take into consideration confidentiality restrictions.
the classes IRIs. It is possible to dereference the class IRIs to get information about the properties (if
available), however, it is not the current practice [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. A comparison of the RDF data shapes and RDF
class approach due to their potential similarities is delegated to future works. 5
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. A Concrete Example</title>
        <p>We conclude this section with a concrete example. Let’s assume that a user wants to retrieve the IDs
and contents of the comments in a network along with the forums ID where they have been posted
and the name of the moderator of the forums. This query is schematized in Figure 1. The query can
be represented by three star pattern queries, the comment , the forum , and the
moderator . The full query is formed by the join of those star patterns, where the joins
respect the dependencies defined by shared variables  =  ◁▷  ◁▷ .
When traversing the network the query engine cannot know the content of the documents encountered,
therefore, the engine must deference every reachable document as defined by a reachability criterion.
The presence of a shape index can change the state of afairs. If the engine encounters a domain
containing exclusively book data as indicated by a complete shape index, the engine can skip the
documents of the domain. If a domain has comment and movie review data declared by a complete
shape index, the query engine can safely limit its dereferencing operations to the set of documents
related to comments without afecting results completeness. The engine can restrict its dereferencing
operations because at least one star pattern is contained in the comment shape and none in the movie
review and book shapes. If the engine encounters a domain regardless of the completeness of the index,
declaring comment, forum, and individual (moderators are individuals/people) data, among others,
then the documents associated with the non-query-relevant part of the domain can be ignored with the
same containment logic presented earlier. Thus, we can consider that the traversal proceeds domain by
domain ignoring documents known a priori to not content query-relevant data.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminary Results</title>
      <p>S1V0S1V1S1V2S1V3S1V4S5V0S5V1S5V2S5V3S5V4D1V0D1V1D1V2D1V3D1V4D3V0D3QV1uD3eVr2yD3V3D3V4D4V0D4V1D4V2D4V3D4V4D5V0D5V1D5V2D5V3D5V4D7V0D7V1D7V2D7V3D7V4
Figure 2: The execution time with shape indexes is consistently lower (up to 80% with D1V3 and S1V3) or equal
to that with the type indexes (except for D3V3 and D3V4), and always uses fewer HTTP requests. The queries
are denoted with first the initial of the query template (e.g., S1 for interactive-short-1), and the version of the
concrete query (e.g., V0). Values not present in the plot (D7V0 and D7V3) indicate that the query timeout before
the end of the execution.</p>
      <p>
        An open-source implementation of the algorithm and an integration in the query engine Comunica
5 There exist comparisons between the shape and class definition approaches in the context of data validation [ 27] but it is
left to be determined if their frame of comparison is compatible with our current problem and foreseen opportunities.
)2000
s
m
(e1098000000
itm760000
n 500
io 400
t
u 300
c
xe 200
E
ts450
se400
u
q350
e
rP300
TT250
fH200
o
r150
e
b100
um50
N 0
[28] is available online. 6 We use the benchmark Solidbench [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to compare our approach with the
current state-of-the-art (the type index and the LDP specification 7 as structural assumptions) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
We used the supported subset of SolidBench queries, skipping the currently unimplemented SPARQL
property paths 8 and unions. We executed each query 50 times with a timeout of 1 minute (6,000
ms). Figure 2 shows that the reduction can be as high as 80% (D1V3 and S1V3) for execution time and
97% (S1V3) for the number of HTTP requests. Our approach reliably executes fewer HTTP requests
compared to the state-of-the-art. This is an expected result because no queries target (implicitly) each
ifle of a user. The shape index approach requests a subset of the request of the type index approach
(without sacrificing query results) with the addition of the request to get the shape definitions which
leads in general to the dereferencing of a small number of short documents. There is not a direct
correlation between the reduction of execution time and HTTP requests (e.g., the ratio between our
approach and the state-of-the-art of the number of HTTP requests by the execution time for D1V3 is
0.5 compared to 0.15 for S1V3). This hints at the results from the state-of-the-art [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposing that
the query plan is the bottleneck for some queries in this environment, however, the overhead of the
containment calculation could also be a contributing factor to the current results. In the worst cases, our
approach has similar query execution to the state-of-the-art except for D3V3 and D3V4 with an increase
of 9% of the mean of the execution time. The variance of the execution with a shape index tends to be
lower compared to the type index. A possible explanation for this observation is that the execution
time of HTTP requests is unpredictable [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] leading to an increase in variance. This observation not
only has potential implications for the reliability of multiple executions in terms of execution time but
also in terms of the performance of single executions in unstable networks where the server might take
longer times to respond.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>The shape index approach shows that more precise source selection in LTQP can significantly reduce
query execution time. Although it is still an early efort, we believe that a solution inspired by our
approach could be beneficial for the query and publication of fragmented document-based linked data.
Our solution does not require extensive computational power from the data publisher during queries
and updates 9 of data sources. Additionally, using a shape index holds promise to improve the data
quality of fragmented document-based linked data. There are still multiple questions left to be answered
such as how to handle private data, what is the overhead and the complexity of the method 10, does the
reduction of HTTP request or the reduction on the size of the internal triple store has more impact on
the performance, can the shape index alone or with other data summarisation structures be used to
improve query planning without sacrificing query execution times.</p>
      <sec id="sec-4-1">
        <title>Acknowledgements</title>
        <p>Supported by SolidLab Vlaanderen (Flemish Government VV023/10) and the Research Foundation
Flanders (FWO) under grant number S006323N.
6 The algorithm implementation is available at the following link
https://github.com/constraintAutomaton/query-shape-detection and the integration in the Comunica query engine at the
following link https://github.com/constraintAutomaton/comunica-feature-link-traversal/tree/feature/shapeIndex. The
implementation of the benchmark and complementary results such as the analysis of the statistical significance are available at the
following link https://github.com/constraintAutomaton/amw_shape_index_results.
7 https://www.w3.org/TR/ldp/
8 https://www.w3.org/TR/sparql11-query/#propertypaths
9 Considering no change in the data model.
10 Given the expressiveness of RDF data shapes language [22, 29, 30] and practice in shape definitions [31, 29, 32].</p>
        <p>Web Semantics 76 (2023) 100770. doi:10.1016/j.websem.2022.100770.
[21] M. W. Chekol, J. Euzenat, P. Genevès, N. Layaïda, Sparql query containment under schema,
Journal on Data Semantics 7 (2018) 133–154. URL: http://dx.doi.org/10.1007/s13740-018-0087-1.
doi:10.1007/s13740-018-0087-1.
[22] Delva, Thomas and Dimou, Anastasia and Jakubowksi, Maxime and Van den Bussche, Jan, Data
provenance for SHACL, in: Proceedings 26th International Conference on Extending Database
Technology (EDBT 2023), volume 26, 2023, pp. 285–297. URL: http://doi.org/10.48786/edbt.2023.23.
[23] W3C, Sparql queries to validate shape expressions (informative), 2013. URL: https://www.w3.org/
2013/ShEx/toSPARQL.html.
[24] J.-E. L. Gayo, E. Prud’hommeaux, H. Solbrig, I. Boneva, Validating and describing linked data
portals using shapes, 2017. arXiv:1701.08924.
[25] J. Corman, F. Florenzano, J. L. Reutter, O. Savković, Validating shacl constraints over a sparql
endpoint, in: The Semantic Web – ISWC 2019, Springer International Publishing, Cham, 2019, pp.
145–163.
[26] Y. Yang, M. Youill, M. Woicik, Y. Liu, X. Yu, M. Serafini, A. Aboulnaga, M. Stonebraker,
Flexpushdowndb: Hybrid pushdown and caching in a cloud dbms, Proc. VLDB Endow. 14 (2021)
2101–2113.
[27] B. De Meester, P. Heyvaert, D. Arndt, A. Dimou, R. Verborgh, RDF graph validation using rule-based
reasoning, Semantic Web Journal 12 (2021) 117–142. doi:10.3233/SW-200384.
[28] R. Taelman, J. Van Herwegen, M. Vander Sande, R. Verborgh, Comunica: a modular sparql query
engine for the web, in: Proceedings of the 17th International Semantic Web Conference, 2018.
[29] S. Staworko, I. Boneva, J.-E. L. Gayo, S. Hym, E. G. Prud’hommeaux, H. Solbrig, Complexity and
Expressiveness of ShEx for RDF, in: 18th International Conference on Database Theory (ICDT
2015), volume 31 of Leibniz International Proceedings in Informatics (LIPIcs), Schloss Dagstuhl–
Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 2015, pp. 195–211. doi:10.4230/LIPIcs.</p>
        <p>ICDT.2015.195.
[30] I. Boneva, J.-E. L. Gayo, E. G. Prud’hommeaux, Semantics and validation of shapes schemas for
rdf, in: The Semantic Web – ISWC 2017: 16th International Semantic Web Conference, Vienna,
Austria, October 21–25, 2017, Proceedings, Part I, Springer-Verlag, Berlin, Heidelberg, 2017, p.
104–120. doi:10.1007/978-3-319-68288-4_7.
[31] S. Lieber, A. Dimou, R. Verborgh, Statistics about data shape use in RDF data, in: Proceedings of the
19th International Semantic Web Conference: Posters, Demos, and Industry Tracks, volume 2721
of CEUR Workshop Proceedings, 2020, pp. 330–335. URL: http://ceur-ws.org/Vol-2721/paper584.pdf.
[32] S. Staworko, P. Wieczorek, Containment of shape expression schemas for rdf, Proceedings of the
38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (2018).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mendelzon</surname>
          </string-name>
          , G. Mihaila, T. Milo,
          <article-title>Querying the world wide web</article-title>
          ,
          <source>in: Fourth International Conference on Parallel and Distributed Information Systems</source>
          ,
          <year>1996</year>
          , pp.
          <fpage>80</fpage>
          -
          <lpage>91</lpage>
          . doi:
          <volume>10</volume>
          .1109/PDIS.
          <year>1996</year>
          .
          <volume>568671</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          , J.-C.
          <article-title>Freytag, Foundations of traversal based query execution over linked data</article-title>
          ,
          <source>in: Conference on Hypertext and Social Media</source>
          ,
          <source>HT '12</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA,
          <year>2012</year>
          , p.
          <fpage>43</fpage>
          -
          <lpage>52</lpage>
          . doi:
          <volume>10</volume>
          .1145/2309996.2310005.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          , M. T. Özsu,
          <article-title>Walking without a map: Optimizing response times of traversal-based linked data queries (extended version</article-title>
          ),
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          ,
          <article-title>Zero-knowledge query planning for an iterator implementation of link traversal based query execution</article-title>
          ,
          <source>in: The Semantic Web: Research and Applications</source>
          , Springer, Berlin, Heidelberg,
          <year>2011</year>
          , pp.
          <fpage>154</fpage>
          -
          <lpage>169</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Taelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <article-title>Link traversal query processing over decentralized environments with structural assumptions</article-title>
          ,
          <source>in: Proceedings of the 22nd International Semantic Web Conference</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Taelman</surname>
          </string-name>
          ,
          <source>Guided link-traversal-based query processing</source>
          ,
          <year>2020</year>
          . arXiv:
          <year>2005</year>
          .02239.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. T.</given-names>
            <surname>Fielding</surname>
          </string-name>
          ,
          <article-title>Architectural styles and the design of network-based doftware architectures</article-title>
          ,
          <source>Ph.D. thesis</source>
          , University of California,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Taelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <article-title>Declaratively describing responses of hypermedia-driven web apis</article-title>
          , in: Knowledge Capture Conference, K-CAP '
          <fpage>17</fpage>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2017</year>
          . doi:
          <volume>10</volume>
          .1145/3148011.3154467.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Eschauzier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Taelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <article-title>How does the link queue evolve during traversal-based query processing?</article-title>
          ,
          <source>in: Proceedings of the 7th QuWeDa, CEUR Workshop Proceedings</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>J.-E. L. Gayo</surname>
            , E. Prud'hommeaux, I. Boneva,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <source>Validating RDF Data: Applications</source>
          , Springer International Publishing, Cham,
          <year>2018</year>
          , pp.
          <fpage>195</fpage>
          -
          <lpage>231</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>031</fpage>
          -79478-
          <issue>0</issue>
          _
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Rabbani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lissandrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          , Optimizing sparql queries using shape statistics,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          . 5441/002/EDBT.
          <year>2021</year>
          .
          <volume>59</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Stuckenschmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vdovjak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.-J.</given-names>
            <surname>Houben</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Broekstra</surname>
          </string-name>
          ,
          <article-title>Index structures and algorithms for querying distributed rdf repositories</article-title>
          ,
          <source>in: Proceedings of the 13th International Conference on World Wide Web, WWW '04</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2004</year>
          , p.
          <fpage>631</fpage>
          -
          <lpage>639</lpage>
          . URL: https://doi.org/10.1145/988672.988758. doi:
          <volume>10</volume>
          .1145/988672.988758.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Goldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          , Dataguides:
          <article-title>Enabling query formulation and optimization in semistructured databases</article-title>
          ,
          <source>in: Proceedings of the 23rd International Conference on Very Large Data Bases</source>
          , VLDB '
          <fpage>97</fpage>
          , Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,
          <year>1997</year>
          , p.
          <fpage>436</fpage>
          -
          <lpage>445</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Harth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karnstedt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          , K.-U. Sattler,
          <string-name>
            <given-names>J.</given-names>
            <surname>Umbrich</surname>
          </string-name>
          ,
          <article-title>Data summaries for on-demand queries over linked data</article-title>
          ,
          <source>in: Proceedings of the 19th International Conference on World Wide Web, WWW '10</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2010</year>
          , p.
          <fpage>411</fpage>
          -
          <lpage>420</lpage>
          . URL: https://doi.org/10.1145/1772690.1772733. doi:
          <volume>10</volume>
          .1145/1772690.1772733.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          , G. Moerkotte,
          <article-title>Characteristic sets: Accurate cardinality estimation for rdf queries with multiple joins</article-title>
          ,
          <source>2011 IEEE 27th International Conference on Data Engineering</source>
          (
          <year>2011</year>
          )
          <fpage>984</fpage>
          -
          <lpage>994</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Meimaris</surname>
          </string-name>
          , G. Papastefanatos,
          <article-title>Hierarchical characteristic set merging for optimizing sparql queries in heterogeneous rdf</article-title>
          , ArXiv abs/
          <year>1809</year>
          .02345 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>J.-E. L. Gayo</surname>
            , E. Prud'hommeaux, I. Boneva,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Kontokostas</surname>
          </string-name>
          , Shape Expressions, Springer International Publishing, Cham,
          <year>2018</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>117</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -79478-
          <issue>0</issue>
          _
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>J.-E. L. Gayo</surname>
            , E. Prud'hommeaux, I. Boneva,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>SHACL</given-names>
          </string-name>
          , Springer International Publishing, Cham,
          <year>2018</year>
          , pp.
          <fpage>119</fpage>
          -
          <lpage>194</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -79478-
          <issue>0</issue>
          _5. doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>031</fpage>
          -79478-
          <issue>0</issue>
          _
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Foto Afrati</surname>
          </string-name>
          ,
          <source>Query Containment and Equivalence</source>
          , Springer Cham,
          <year>2019</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>59</lpage>
          . doi:https: //.doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -01871-8.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Spasić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Janičić</surname>
          </string-name>
          ,
          <article-title>Solving the SPARQL query containment problem with SpeCS</article-title>
          ,
          <source>Journal of</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>