1. Introduction

Nara, Japan * Corresponding author. $ thi-hoang-thi.pham@univ-nantes.fr (T. H. T. Pham); gabriela.montoya@univ-nantes.fr (G. Montoya); brice.nedelec@univ-nantes.fr (B. Nédelec); hala.skaf@univ-nantes.fr (H. Skaf-Molli); pascal.molli@univ-nantes.fr (P. Molli)

Continuation Queries: Embracing Timeouts on Public SPARQL Endpoints

Thi Hoang Thi Pham

Gabriela Montoya

Brice Nédelec

Hala Skaf-Molli

Pascal Molli

0 0 Nantes Université , LS2N, UMR 6004, F-44000 Nantes , France

2025

000 0 0003

Public SPARQL endpoints, such as Wikidata, provide essential access points to large-scale knowledge graphs. However, they often sufer from strict timeouts that prevent the retrieval of complete query results. This demonstration presents the first public deployment of passage, a SPARQL query engine that guarantees query completeness through continuation queries. Instead of failing upon timeout, passage returns partial results along with a SPARQL continuation query capable of retrieving the missing results. These continuation queries can be chained iteratively until complete results are obtained. For this demo, attendees can interact with a passage loaded with 13B triples from Wikidata 2025, and observe in detail its operation during their query execution.

eol>Semantic Web Public Knowledge Graph SPARQL Endpoint Continuation Queries

1. Introduction

Public SPARQL endpoints, such as Wikidata1, ofer valuable access to large-scale knowledge graphs. However, to remain responsive under heavy load, they enforce fair-use policies, including timeouts and result size limits [ 1 ], to prevent a single query from monopolizing server resources. Consequently, many queries fail to complete, returning partial or no results at all. For instance, consider the query cite in Figure 1a, which retrieves pairs of articles that cite each other. When executed on the oficial Wikidata endpoint2, this query times out after 60 seconds and fails to return complete results. The inability to ensure query completeness undermines both the reliability and usability of public SPARQL endpoints.

In a recent paper [ 2 ], we introduced SPARQL continuation queries, a novel approach to overcoming timeout limitations while preserving compatibility with existing SPARQL infrastructure. The core idea is simple: when a query exceeds server-imposed limits, the server returns the partial results and a SPARQL continuation query able to retrieve the missing results. This process can be repeated, allowing users to recover complete answers by chaining SPARQL continuation queries. To the best of our knowledge, our approach is the first to ensure completeness, responsiveness, and, more importantly, full compliance with the SPARQL protocol.

In this demo, we present the first public deployment of SPARQL continuation queries over real-world data. The passage server embeds two SPARQL query engines within a single Java Virtual Machine: the standard blazegraph query engine and the continuation-enabled passage query engine. Both query engines operate on a shared blazegraph journal file (.jnl), ensuring that both query engines access the same physical storage and indexes, while exposing separate SPARQL endpoints.

The blazegraph query engine, which powers the Wikidata Query Service, fully supports SPARQL 1.1 but enforces a 60-second timeout that can yield incomplete results for complex queries. In contrast, the (a) The query cite times out after 60 seconds on the oficial Wikidata endpoint and returns no results to the user.

(b) Using passage, the query cite completes with 37

SPARQL continuation queries, taking 37 minutes to retrieve its 1210 results. passage query engine currently supports only core SPARQL [ 3 ] but enables query continuation. This configuration allows users to easily choose which engine to use for executing their SPARQL queries: passage or blazegraph.

As part of the demonstration, the attendees will be able to query a public passage server containing the 13B statements of Wikidata (as of February 13th, 2025), showcasing: • Execution of SPARQL queries that time out on blazegraph but complete successfully on passage using continuations; • Live inspection of SPARQL queries through the execution of successive SPARQL continuation queries. Each timeout ofers an opportunity to monitor progress and estimate completion time; • Integration with the comunica smart client [ 4 ], enabling support for SPARQL 1.1 queries beyond passage’s core SPARQL capabilities. The client decomposes the queries: core SPARQL subqueries are delegated to the passage engine, while unsupported operators are executed client-side.

2. Continuation Queries at Work

Timeouts play a crucial role in protecting shared infrastructures. An engine that supports SPARQL continuation queries does not eliminate timeouts; instead, when a query execution is interrupted, the engine returns partial results and computes a new SPARQL capable of retrieving the missing answers, called a SPARQL continuation query. A continuation query can itself be interrupted, leading to yet another continuation query. Assuming that each continuation query makes progress, we have proven that there exists a finite sequence of continuation queries that returns complete and correct results [ 2 ].

This demo highlights the sequences of SPARQL continuation queries that occur during query execution, as illustrated in Figure 2. On the left is the original query cite, which was executed for 1 minute and returned 64 results. In the center is the first continuation query, also executed for 1 minute, returning 26 results. On the right is the second continuation query, which returned 23 results after another 1 minute of execution. At this point, the results are still incomplete, and more continuation queries must be executed to retrieve the remaining answers.

A key point to note is that, although the engine automatically generates each continuation query to compute the remaining work, continuation queries are standard SPARQL queries that any user can read and understand. A human user can inspect it to determine the processed parts of the query and estimate the remaining time required to complete the query execution. Continuation queries help open the black box of SPARQL query processing, allowing users to better understand and reason about how their queries are being executed.

Estimating remaining time. Users are often interested in estimating how long they will need to wait for their query to complete fully. In this context, the continuation query, which represents the remaining work, can be used to estimate the remaining execution time. With the first continuation query of cite shown in the center of Figure 2, the execution time of the first part of the UNION is marginal; an article rarely cites millions of others, and 46 citations is already a relatively high number. The primary factor is OFFSET 4378, applied to the first triple pattern. Given that there are 252,130 articles in total, and 4378 have been processed in one minute, we can roughly estimate the remaining time as 252,130 4378 − 1 ≈ 56 minutes. This is a coarse estimate, since it does not consider citation distribution biases, but it can be refined after each continuation query. In this case, passage terminates the execution of this query in ~37 minutes, after issuing 37 continuation queries. 3 3cite was executed after clearing the server’s cache.

(a) passage x comunica query plan (b) Example of passage subqueries.

Technical view on continuation queries. From a technical perspective, one might be surprised to see OFFSET without ORDER BY, as OFFSET is generally unreliable when the order of results is not explicitly defined [ 5 ]. However, passage extensively uses subqueries comprising a single triple/quad pattern with an ofset: by assuming a deterministic evaluation order for such patterns, the resulting triples/quads are always produced in the same order and the OFFSET actually allows skipping results already produced. Although passage uses the blazegraph engine to satisfy this assumption by scanning through its augmented B+Tree indexes (the deterministic order is that of the chosen index), other SPARQL engines based on other indexing data structures (e.g., HDT [ 6 ]) can provide deterministic ordering on single triple/quad patterns.

A second remark concerns the evaluation time of a triple/quad pattern with an OFFSET: are all preceding results read and discarded, or are they eficiently skipped? Again, passage uses blazegraph’s indexing augmented B+Trees to initialize the departure point of each triple/quad pattern scan. A counter maintained within each node of the indexes allows passage to directly navigate to the desired ofset in logarithmic time. Consequently, even on large datasets, the impact of OFFSET on triple/quad pattern evaluation remains marginal. It is worth noting that passage could use other SPARQL engines providing such a convenience, such as HDT [ 6 ].

In summary. Before passage, a timeout was an event to dread; it meant the query had failed, and the results would not be delivered as expected. With passage, a timeout becomes an event to embrace: it marks the moment when the first results arrive, progress becomes measurable, and the remaining work can be estimated.

3. passage x comunica: Beyond Core SPARQL

Currently, passage provides continuation queries only for core SPARQL queries, i.e., queries with projections, triple/quad patterns, joins, unions, optionals, binds, and filters [ 3 ]. passage cannot directly execute more complex queries like the query top depicted in Figure 3a that retrieves the top 10 brightest stars, since this query includes a property path, an ORDER BY, and a DISTINCT modifier.

With passage x comunica, we extended the comunica smart client with passage to support full SPARQL 1.1 queries while ensuring termination. We declared the capacities of the passage endpoint within comunica, enabling the client to decompose SPARQL queries so that only supported subqueries are delegated to passage.

Figure 3 shows the physical plan of top using passage x comunica. Each green square represents a call to passage. When multiple green squares appear on the same line, this indicates that several continuation queries were required to complete a subquery. The plan involves several inner joins, and the query is processed as follows: 1. comunica transitively retrieves all subclasses (wdt:P279) of stars (wd:Q523) from the passage server; 2. For each found subclass ?x, the original query top is rewritten by replacing the property path pattern with the triple pattern {?star wdt:P31 ?x}; 3. The resulting basic graph pattern now conforms to core SPARQL and can be executed in full by passage, as shown in Figure 3b; 4. The ORDER BY and LIMIT clauses are applied as solution modifiers on the aggregated results and executed client-side by comunica.

Executing directly on Wikidata results in a timeout. However, with passage x comunica, the query completes successfully in 19 minutes. It required 689 service queries, transferred 683KB from the client to the server, and 987KB from the server to the client, ultimately retrieving the top 10 brightest stars: the Sun, SN 1054, Tycho’s Supernova, Sirius A, Sirius, Canopus, Alpha Centauri, Arcturus, Alpha Centauri A, Vega. 4 The decomposition may generate many service calls, but termination is guaranteed. 4. Demonstration (https://youtu.be/_yFwC0UAeqA) We deployed a public endpoint containing the 13B statements of Wikidata as of February 13th, 2025 in a 1.8TB blazegraph journal. The server embeds two SPARQL query engines within a single Java Virtual Machine: the standard blazegraph query engine5 and our passage query engine6.

We deployed a web user interface available at https://passage-org.github.io/passage-comunica/. It includes an interactive widget that displays the passage x comunica physical plan, updated in real-time. Users can type their SPARQL queries or choose among query examples from Wikidata or WDBench [ 7 ]. 7

Thanks to this setup, we can fairly compare blazegraph and passage side by side on the same hardware and data, as illustrated in Figure 4. Both engines can execute the same query; users need to change the endpoint URL: use the /sparql sufix for blazegraph and /passage for passage.

If your SPARQL query can be executed with blazegraph, we recommend using it. Otherwise, switch to passage to retrieve partial results, observe progression, diagnose issues such as suboptimal join orders, and estimate whether the final results are worth waiting for. There is no need to choose one over the other; this setup ofers the best of both worlds.

The demo is fully live, and users are encouraged to submit their queries and request explanations about the results and execution behavior. 4top was executed after clearing the server’s cache. 5https://10-54-2-226.gcp.glicid.fr/wikidata/sparql with a 60-second timeout. 6https://10-54-2-226.gcp.glicid.fr/wikidata/passage with a 60-second timeout and 10,000 results limit. 7The web interface, as well as endpoints with smaller datasets, such as WatDiv [8] or WDBench [ 7 ], can also be deployed locally. All code is publicly available on the GitHub platform at https://github.com/passage-org.

5. Conclusion

passage redefines the meaning of timeouts during SPARQL query processing. Rather than signaling failure, it becomes an opportunity: the moment when partial results become visible, ofering users both progress and perspective. Importantly, in our deployment, passage and blazegraph are not alternatives but complements; they run on the same server, the same data, and combine their strengths in a unified system.

Our roadmap includes extending support in passage for more advanced SPARQL features, such as aggregates (COUNT/COUNT DISTINCT), GROUP BY, and property paths. In addition, we plan to enhance performance through parallel execution by partitioning single triple/quad patterns using OFFSET with LIMIT.

Acknowledgments

We would like to thank Erwan Boisteau-Desdevises and Izzedine Issa Ahmat for their valuable support and contributions to this work. This work is supported by the French ANR project MeKaNo – Search the Web with Things (ANR-22-CE23-0021).

Declaration on Generative AI

During the preparation of this work, the authors used https://chatgpt.com in order to: Grammar and spelling check. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content. [8] G. Aluç, O. Hartig, M. T. Özsu, K. Daudjee, Diversified Stress Testing of RDF Data Management Systems, in: The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I, volume 8796 of Lecture Notes in Computer Science, Springer, 2014, pp. 197–212. doi:10.1007/978-3-319-11964-9\_13.

[1]

C. B.

Aranda ,

Hogan ,

Umbrich , P. Vandenbussche,

SPARQL

Web-Querying Infrastructure : Ready for Action? , in: The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference , Sydney, NSW , Australia, October 21-25 , 2013 , Proceedings, Part

, volume 8219 of Lecture Notes in Computer Science, Springer, 2013 , pp. 277 - 293 . doi: 10 .1007/978-3- 642 -41338-4\_ 18 .

[2]

T. H. T.

Pham ,

Montoya ,

Nédelec ,

Skaf-Molli ,

Molli , Passage: Ensuring Completeness and Responsiveness of Public SPARQL Endpoints with SPARQL Continuation Queries , in: Proceedings of the ACM on Web Conference 2025 , WWW '25, Association for Computing Machinery, New York, NY, USA, 2025 , p. 47 - 58 . doi: 10 .1145/3696410.3714757.

[3]

Pérez ,

Arenas ,

Gutiérrez , Semantics and complexity of SPARQL , ACM Trans. Database Syst . 34 ( 2009 ) 16 : 1 - 16 : 45 . doi: 10 .1145/1567274.1567278.

[4]

Taelman ,

J. V.

Herwegen ,

M. V.

Sande ,

Verborgh , Comunica:

A Modular

SPARQL Query Engine for the Web , in: The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference , Monterey, CA, USA, October 8- 12 , 2018 , Proceedings, Part

, volume 11137 of Lecture Notes in Computer Science, Springer, 2018 , pp. 239 - 255 . doi: 10 .1007/978-3- 030 -00668-6\_ 15 .

[5]

C. B.

Aranda ,

Polleres ,

Umbrich , Strategies for Executing Federated Queries in SPARQL1.1, in: The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23 , 2014 . Proceedings, Part

, volume 8797 of Lecture Notes in Computer Science, Springer, 2014 , pp. 390 - 405 . doi: 10 .1007/978-3- 319 -11915-1\_ 25 .

[6]

J. D.

Fernández ,

M. A.

Martínez-Prieto ,

Gutiérrez ,

Polleres ,

Arias , Binary RDF representation for publication and exchange (HDT) , J. Web Sem . 19 ( 2013 ) 22 - 41 . doi: 10 .1016/j.websem. 2013 . 01 .002.

[7]

Angles ,

Buil-Aranda ,

Hogan ,

Rojas , D. Vrgoč, WDBench: A Wikidata Graph Query Benchmark , in: The Semantic Web - ISWC 2022 - 21st International Semantic Web Conference, Virtual Event, October 23-27 , 2022 ., Springer-Verlag, Berlin, Heidelberg, 2022 , p. 714 - 731 .