Source Selection for SPARQL Endpoints: Fit for Heterogeneous Federations of RDF Data Sources? Sijin Cheng1 , Olaf Hartig1 1 Dept. of Computer and Information Science (IDA), Linköping University, Linköping, Sweden Abstract To answer queries over a federation of multiple data sources, a preliminary task is source selection; i.e., identify the federation members that can answer each part of the query and decompose the query into subqueries assigned to each federation member. Existing source selection approaches for federations of RDF data sources have been developed based on the assumption that the federation member are SPARQL endpoints. This paper presents an analytical study that investigates whether these approaches are still effective in the context of federations that are heterogeneous in terms of the types of data access inter- face. In particular, we identify what information about the data of the federation members is required by the approaches and analyze the possibilities and the effort of obtaining this information via the dif- ferent types of data access interfaces. We find that almost all existing source selection approaches can be adopted for heterogeneous federations but obtaining the required information may not be practical. 1. Introduction Processing queries that cannot be answered by a single data source requires combining data from multiple data sources in a federation. For such federated queries, it is significant to identify relevant federation members for executing particular parts of a given query, since requesting data from irrelevant federation members is an unnecessary overhead or may result in high costs on subsequent query processing. The task of source selection is to identify relevant federation members that may answer parts of the query and to determine the subqueries to be executed at each federation member. Existing source selection approaches have, by and large, focused only on the source selection problem in federations that are homogeneous in terms of the interfaces provided by the federation members; that is, the federation members have typically be assumed to be SPARQL endpoints. For federations that are heterogeneous in terms of interface types (i.e., each federation member may support a different type of interface to access its RDF dataset), the source selection problem has yet to be studied. As a first step in this direction, this paper focuses on the existing source selection approaches for homogeneous federations and investigates whether these approaches are still applicable in the context of heterogeneous federations. The main question that we aim to answer is whether the information that the approaches assume to have about the federation members can still be obtained even if the federation members provide interfaces different from the SPARQL endpoint sijin.cheng@liu.se (S. Cheng); olaf.hartig@liu.se (O. Hartig) { https://www.ida.liu.se/~sijch63/ (S. Cheng); http://olafhartig.de/ (O. Hartig)  0000-0003-4363-0654 (S. Cheng); 0000-0002-1741-2090 (O. Hartig) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) interface. Additionally, we aim to achieve a general understanding of the effort of obtaining this information when relying on the different types of interfaces. As a preliminary for the analysis, we first provide an overview of the types of interfaces that we consider in this study; cf. Section 2. Thereafter, we present our analysis in Section 3 and summarize our findings in Section 4. 2. Types of Interfaces This section provides an overview of the different types of interfaces for accessing RDF datasets that have been proposed in the literature. For each interface, we focus on the types of requests that clients can send to the server and the possible metadata that the server needs to include in its responses (if any). SPARQL endpoints [1]. The clients can request the result of any SPARQL query over the RDF dataset hosted at the server. The responses to such requests contain only the corresponding query result and no additional metadata. However, SPARQL queries can be constructed to also retrieve some forms of metadata and statistics about the dataset from a SPARQL endpoint. Triple Pattern Fragments (TPF) [2]. The TPF interface restricts the type of queries that can be answered by the server to single triple patterns. More specifically, clients can request all triples from the server-side dataset that match a given triple pattern. Since some triple patterns may have many matching triples, TPF servers use paging. That is, they return only a limited number of matching triples per the response, together with metadata that provides a URI via which the next subset of matching triples can be requested. Each such response is called a page and the typical limit that TPF servers apply is a maximum of 100 matching triples per page [2]. In addition to paging-related metadata, the specification of the TPF interface requires the server to include further metadata in the responses; in particular, an estimate of the total number of triples that match the given triple pattern must be included as metadata in each page. Bindings-Restricted TPF (brTPF) [3]. The brTPF interface is an extension of the TPF interface and has been proposed to mitigate the shortcomings of TPF (such as a large number of requests and unnecessary data transfer). The extension of brTPF over TPF is that it enables clients to include a set of solution mappings in their requests (in addition to the triple pattern). The response to such a brTPF request contains RDF triples from the underlying dataset that not only match the given triple pattern but are also guaranteed to contribute to a join with the given set of solution mappings (which would typically be an intermediate result produced for other triple patterns of the query evaluated by a brTPF client). As for TPF, brTPF servers may use paging and the responses to brTPF requests must also include metadata with cardinality estimates. Additionally, brTPF servers are required to be able to respond to TPF requests. Star Pattern Fragments (SPF) [4]. The SPF interface presents another extension over TPF and brTPF. Instead of a single triple pattern, SPF requests may contain a set of triple patterns that form a so-called star pattern; that is, the triple patterns all have the same subject. As with brTPF, SPF requests may additionally include a set of solution mappings. SPF servers may use paging and the responses must include metadata with cardinality estimates. smart-KG [5]. The smart-KG interface combines TPF with shipping of whole partitions of the server-side RDF dataset to the clients in order to reduce redundant data transfer between servers and clients. Specifically, upon set up, the server partitions the triples of the dataset based on characteristic sets [6] and compresses the partitions using HDT [7]. Then, clients can request the partitions in their compressed form, cache them, and query them locally for triples that match given triple patterns. To facilitate the interaction between smart-KG clients and servers, each client is equipped with a server-provided catalog that contains metadata about the structure of the partitions. The metadata in these catalogs that is most relevant for our analysis in this paper is a list of all URIs that appear as predicates in the triples of the server-side dataset. WiseKG [8]. WiseKG dynamically balances the query processing load between client and server by combining SPF and smart-KG. That is, for each star-shaped subquery, the WiseKG server chooses either the SPF or the smart-KG approach by comparing estimated costs for both approaches. The cost model used to this end considers the current server load, client capabilities, an estimation of necessary data transfer between client and server (for intermediate query results), and network bandwidth. Since WiseKG is a combination of SPF and smart-KG, we do not consider it explicitly in the rest of this paper, but the observations that we make for both SPF and smart-KG also apply to WiseKG. SaGe [9, 10, 11, 12]. SaGe applies a preemption method in order to provide stable and responsive SPARQL services. That is, clients can request the execution of a SPARQL query over the server- side dataset and the server performs the execution for a fixed time quantum; when that time is exceeded, the server suspends the execution and encodes the current state of the execution in a response to the client, together with any solutions of the query result that may have been produced already. By returning the encoded state to the server, the client can then request the server to resume the execution for another time quantum. This process of the client requesting the server to make step-wise progress on the query execution continues until either the query execution completes or the client decides to discontinue. It is important to note that only a fragment of SPARQL can be implemented using preemptable query operators and, thus, only queries from this fragment can be requested to be executed by a SaGe server. The original work presented preemptable operators for joins, unions, projections (i.e., SELECT), and filters with pure logical expressions [9]. Follow-up work introduced further preemptable operators for aggregations (COUNT, AVG, SUM, MIN, MAX) and aggregations with DISTINCT [10], as well as property path patterns [11]. While some of these preemptable operators can be executed completely by the server, others have a client-side component that produces the result in collaboration with the server-side component of the operator. Moreover, for queries that use features of SPARQL for which no preemptable operators exist (e.g., ORDER BY, GROUP BY, MINUS, FILTER EXISTS), these features need to be implemented completely in the client. 3. Analysis of Existing Source Selection Approaches In a federation of RDF data sources, the maintainer of such a data source may choose any of the aforementioned types of interfaces to provide access to the dataset of their federation member. In this section, we analyze existing source selection approaches in terms of their applicability to federations that are heterogeneous in terms of interface types. Specifically, ASK { SELECT * WHERE { ?book rdf:type dbo:Book . ?book rdf:type dbo:Book . } } LIMIT 1 (a) (b) Figure 1: Example check queries for triple pattern-wise source selection for (a) SPARQL endpoints and (b) SaGe servers. we examine what information about the datasets of federation members is required for each approach, and then analyze whether and how it is possible to collect that information based on the different types of data access interfaces. We first focus on source selection approaches that do not require any prior information about the datasets of the federation members (Sections 3.1– 3.3). Thereafter, we examine source selection approaches that use a data summary or index for federation members together with runtime requests (Sections 3.4–3.6). In addition, we also investigate source selection approaches that rely solely on a pre-computed data catalog or index for federation members (Sections 3.7 and 3.8). Finally, we consider approaches that require federation members to compute and share statistics (Sections 3.9 and 3.10) and discuss alternative options if the required information is not provided by federation members. 3.1. Source Selection in FedX FedX [13] uses a source selection approach that does not require any prior information about the data of the federation members before performing any query. Instead, FedX obtains the information it needs for its source selection approach from each federation member at query runtime. The particular information that FedX requires for each federation member is: Info1: For each triple pattern of the given query, determine whether the federation member contains at least one triple that matches the triple pattern. To obtain this information for SPARQL endpoints (for which FedX was designed), FedX iterates over the triple patterns of the given query and, for each triple pattern, issues a SPARQL ASK query with that triple pattern. Every federation member that returns TRUE in response to such a query is identified as a federation member containing triples that match the triple pattern. For federation members using the SaGe interface, checking whether such federation mem- bers contain matching triples for a triple pattern can be done by requesting partial results using SPARQL SELECT queries with a LIMIT clause [8]. Figure 1 lists such check queries for an example triple pattern against a SPARQL endpoint and a SaGe interface, respectively. For federation members with a TPF interface, a brTPF interface, or an SPF interface, using ASK queries or LIMIT clauses is not possible. The required information (Info1) can instead be obtained by requesting the first page of matching triples for each triple pattern of the query. The count metadata in the retrieved pages indicates whether the federation member has matching triples for the corresponding triple pattern. Finally, considering federation members that use a smart-KG interface, recall that smart-KG clients are provided with a catalog that contains metadata about the structure of the compressed dataset partitions available at the smart-KG server. This catalog can be leveraged to obtain the source selection information (Info1) required by the FedX approach. That is, for every triple pattern in which the predicate is bound, and subject and object are not, the corresponding predicate set of this catalog can be checked to determine whether the smart-KG server contains triples that match the triple pattern. This check does not even require contacting the smart-KG server. For any other triple pattern, a TPF first-page request can be used as suggested above for the TPF interface (because the smart-KG interface is an extension of the TPF interface). 3.2. Source Selection in Lusail Lusail [14] applies another source selection approach that does not rely on prior information of the federation members and their datasets. The particular information that Lusail requires for each federation member is the same as required by FedX: Info1 (same as in Section 3.1): For each triple pattern of the given query, determine whether the federation member contains at least one triple that matches the triple pattern. Additionally, Lusail aims to determine whether two or more triple patterns can be grouped as a subquery. To this end, Lusail requires the following information for each federation member. Info2: For each pair of triple patterns in the query that share a variable, information whether the set difference is empty between (i) the set of all RDF terms at the position of the shared variable within triples that match the first triple pattern and (ii) the set of all RDF terms at the position of the shared variable within triples that match the second triple pattern. Since the options to obtain Info1 via the different types of interfaces have been discussed already in Section 3.1, the following discussion focuses on Info2. For federation members that provide a SPARQL endpoint, the set difference check can be performed by issuing a SPARQL SELECT query with FILTER NOT EXISTS and LIMIT 1 [14] (e.g., Figure 2), since it is only essential to know if the result is an empty set. Another option would be to use ASK instead of SELECT, in which case LIMIT 1 can be omitted. For federation members with a SaGe interface, the set differences cannot be checked directly based on preemptable SPARQL queries because SaGe servers can only process filter conditions with pure logical expressions but no NOT EXISTS filters. As an alternative, the set difference may be computed within the federation engine, for which the relevant sets of RDF terms (i.e., Info2(i) and Info2(ii)) have to be retrieved via preemptable SPARQL queries that contain the corresponding triple patterns. The drawback of this alternative option is that large amounts of data may end up being shipped from the server to the federation engine. If federation members offer a TPF interface or a smart-KG interface, the same alternative as for SaGe servers can be applied, just with using TPF requests instead of SPARQL queries, which also has the same drawback as in the case of SaGe (potentially large amounts of data might need to be transferred from the federation members to the federation engine). For smart-KG SELECT ?book WHERE { SELECT ?book WHERE { ?book rdf:type dbo:Book . ?book dbo:author ?author . FILTER NOT EXISTS { FILTER NOT EXISTS { SELECT ?book WHERE { SELECT ?book WHERE { ?book dbo:author ?author . ?book rdf:type dbo:Book . } } } . } . } LIMIT 1 } LIMIT 1 (a) (b) Figure 2: Example queries for checking set difference at SPARQL endpoints [14]. servers it is additionally possible to fetch the corresponding partitions from the server and, then, extract the required sets of RDF terms locally, instead of via TPF requests. If federation members use a brTPF interface, the option outlined for TPF and smart-KG can be simplified as follows. For each pair of triple patterns with a shared variable, first retrieve the set of triples matching one of the triple patterns (i.e., either Info2(i) or Info2(ii)) by using a TPF requests (as are supported by brTPF servers). Thereafter, attach a set of solution mappings (variable bindings for the shared variable) to a brTPF request for the other triple pattern, and obtain a second set of triples. If the number of distinct RDF terms bound to the shared variable in the second set of triples is less than the number of distinct RDF terms bound to the shared variable in the first set of triples, it indicates the set difference is not empty. For federation members providing the SPF interface, the option outlined for brTPF can be used (and so can the option outlined for TPF and smart-KG). Additionally, for pairs of triple patterns that both have the shared variable in the subject position, another option exists: a star pattern request can be constructed and issued for retrieving matching triples for the pair of triple patterns. If the number of distinct RDF terms bound to the shared variable in matching triples for the star pattern is less than the number of distinct RDF terms bound to the shared variable in matching triples of any one of the single triple pattern requests, it indicates that the set difference is not empty. 3.3. Fed-DSATUR Fed-DSATUR [15] is another source selection approach that does not rely on statistics, indices, or any kind of estimates of the federation members and their datasets. Fed-DSATUR uses a greedy iterative strategy to get optimal query decomposition by considering properties of the SPARQL queries (e.g., the number of subqueries, and the triple patterns assigned to the subqueries as well as the federation members selected for each subquery). The particular information that Fed-DSATUR requires for each federation member is the same as FedX: Info1 (same as in Section 3.1): For each triple pattern of the given query, determine whether the federation member contains at least one triple that matches the triple pattern. SELECT DISTINCT ?p SELECT DISTINCT ?s SELECT DISTINCT ?o WHERE { WHERE { WHERE { ?s ?p ?o . ?s ?o . ?s ?o . } FILTER isIRI(?s) FILTER isIRI(?o) } } (a) (b) (c) SELECT DISTINCT ?p ?s WHERE { SELECT DISTINCT ?p ?o WHERE { VALUES ?p { } VALUES ?p { } ?s ?p ?o . ?s ?p ?o . FILTER isIRI(?s) FILTER isIRI(?o) } } (d) (e) Figure 3: Queries to obtain the data needed for constructing data summaries for HiBISCuS. 3.4. HiBISCuS The HiBISCuS approach [16] relies on a combination of runtime requests and a summary of the dataset in each federation member. In order to build such a data summary for a federation member the following information needs to be available about the dataset. Info3: All the URIs that are used as predicates in triples of the dataset. Info4: For each of these predicate URIs, all the URIs that are used as subjects in triples with the predicate URI. Info5: For each of these predicate URIs, all the URIs that are used as objects in triples with the predicate URI. For federation members that provide a SPARQL endpoint, the SPARQL query in Figure 3a may be used for retrieving the predicate URIs (Info3). Thereafter, for each of the predicate URIs, two queries with DISTINCT together with an isIRI() filter can be used to get the set of URIs that are used as subjects (Info4) and as objects (Info5) in triples with the predicate URI. Figures 3b and 3c illustrate these two queries. As a more efficient alternative, instead of issuing these pairs of queries separately for each predicate URI, the requests can be batched by including multiple predicate URIs via a VALUES clause (or UNION or FILTER [17]), as illustrated in Figures 3d–3e. For federation members with a SaGe interface, the same approach as for SPARQL endpoints can be applied. Notice, however, that the preemptable operator for DISTINCT has a client-side component, which means that the federation engine would need to do some processing itself for the queries that retrieve the relevant information (Figures 3a–3c) and that potentially some irrelevant intermediate solutions would be retrieved from the SaGe server. If federation members expose data through a TPF interface, a brTPF interface or an SPF interface, the set of predicate URIs (Info3) can be obtained by a request with a triple pattern of the form (?s, ?p, ?o), which retrieves the complete dataset of the federation member. The removal of duplicate (predicate) URIs has to be taken care of by the federation engine. After retrieving the entire dataset in order to obtain all the predicate URIs, the required information about subject URIs (Info4) and object URIs (Info5) can then also be extracted from the retrieved data. For federation members using a smart-KG interface, the set of distinct predicate URIs (Info3) can be obtained from the smart-KG catalog provided to the federation engine by the smart-KG server. Thereafter, for each of the predicate URIs, the set of subject URIs (Info4) and object URIs (Info5) can be obtained by fetching the relevant partitions from the smart-KG server or by issuing TPF requests with triple patterns of the form (?s, , ?o). Since this would be done for every predicate URI, the federation engine would also end up retrieving the complete dataset from the smart-KG server. Similar to TPF-like interfaces, removing duplicate URIs and filtering subjects/objects that are not URIs can only be done within the federation engine. 3.5. Source Selection in CostFed CostFed [18] also relies on a combination of runtime requests and a summary of the dataset in each federation member. The information that CostFed needs to build its type of data summary for a federation member is exactly the same information as is needed by HiBISCuS. Info3 (same as in Section 3.4): All the URIs that are used as predicates in triples of the dataset. Info4 (same as in Section 3.4): For each of these predicate URIs, all the URIs that are used as subjects in triples with the predicate URI. Info5 (same as in Section 3.4): For each of these predicate URIs, all the URIs that are used as objects in triples with the predicate URI. As already described in Section 3.4, Info3, Info4 and Info5 can be obtained in different ways depending on the interface of the federation member, where for several types of interfaces (TPF, brTPF, SPF, and smart-KG), this would mean retrieving the complete dataset from the corre- sponding federation member. 3.6. Source Selection in ANAPSID ANAPSID [21] makes use of service descriptions and runtime request to determine whether SPARQL endpoints can answer a triple pattern or not. The particular information that the source selection of ANAPSID requires from these service descriptions is a list of predicate URIs, which is the same as Info3 (cf. Section 3.4). 3.7. Source Selection in ADERIS ADERIS [19] requires specific information about the data contained in each federation member before performing any query. The particular information that the source selection of ADERIS requires for each federation member is: Info3 (same as in Section 3.4): All the URIs that are used as predicates in triples of the dataset. Info6: For each of these predicate URIs, the number of triples with that predicate URI. As already described in Section 3.4, the set of distinct predicates URIs (Info3) can be obtained in different ways depending on the interface of the federation member. SELECT DISTINCT ?o1 WHERE { ?s dbo:birthDate ?o1 . SELECT DISTINCT ?s SELECT DISTINCT ?p ?s dbo:activeYearsStartYear ?o2 . WHERE { WHERE { ?s foaf:name ?o3 . ?s ?p ?o . ?p ?o . FILTER isIRI(?o1) } } } (a) (b) (c) Figure 4: Example queries for retrieving (a–b) Info7, and (c) Info9 for Odyssey. Once the predicate URIs are known to the federation engine, for federation members that pro- vide a SPARQL endpoint or a SaGe interface, the number of triples with the same predicate URI (Info6) can be obtained via queries using the COUNT function. For federation members with a smart-KG interface, a request with a triple pattern of the form (?s, , ?o) can be issued for each predicate URI to obtain the first page of matching triples with that predicate, and estimates of the count metadata (Info6) can be obtained from the retrieved pages. For federation members supporting TPF, brTPF, or SPF, since obtaining the set of predicates URIs (Info3) required retrieving the complete dataset from the federation member (cf. Section 3.4), the counting of triples per predicate URI (Info6) can then be done in the federation engine. 3.8. Source Selection in DARQ DARQ [20] relies on a so-called service description (represented in RDF), which describes the data available from a federation member in form of capabilities. The particular information that DARQ requires each federation member to provide includes Info3 (cf. Section 3.4) and hand-crafted constraints on subjects and objects. These constraints are represented using SPARQL filter expressions. 3.9. Source Selection in Odyssey Odyssey [22] relies on statistics computed by federation members and shared with the federation engine. The particular information that Odyssey requires from each federation member is the following: Info7: For each URI that appears in the subject of some triple, the characteristic set [6] of that subject URI; i.e., all the URIs that are used as predicates in triples with that subject URI. Info8: For each such characteristic set, all subject URIs that have this characteristic set. Info9: For each such characteristic set, all URIs that are used as objects in triples with each predicate within the characteristic set. If the federation members do not provide such information, the federation engine may obtain the information via requests to the data access interfaces of the federation members. For SPARQL endpoints and SaGe servers, to get all predicates of each subject URI (Info7), a set of all subject URIs needs to be retrieved first, which can be achieved via a query with DISTINCT (shown in Figure 4a). Then, for each subject URI, the predicate URIs that belong to the characteristic set of that subject URI (Info7) can be retrieved by a SPARQL query of the form in Figure 4b. These requests may also be batched for multiple subject URIs by using a VALUES clause (similar to the batching outlined in Figure 3d in Section 3.4, but with having subject URIs in the VALUES clause, instead of predicate URIs). Once the characteristic sets for all subject URIs have been obtained, all subject URIs that share the same characteristic set (Info8) can be determined within the federation engine. Thereafter, the set of object URIs for each predicate within a characteristic set (Info9) can be obtained by queries such as the one in Figure 4c. Notice that all these queries use the DISTINCT keyword which, in the case of SaGe, as already discussed in Section 3.4, requires some processing within the federation engine and potential retrieval of some irrelevant intermediate solutions. For federation members with a TPF interface, a brTPF interface, a SPF interface, or a smart-KG interface, obtaining the set of predicates per subject URI requires retrieving the whole RDF dataset to the federation engine. After retrieving the entire dataset, all required information can be determined within the federation engine. 3.10. VoID-Based Approaches To identify and exclude federation members that have no matching triples for some of the triple patterns of a given query, some engines such as SPLENDID [23], Semagrow [24] and LHD [25] rely on external metadata that is shared by federation members and described using the Vocabulary of Interlinked Datasets (VoID) [26]. For source selection, these engines mainly make use of predicate information in the VoID descriptions, which allows them to identify whether federation members are relevant or irrelevant for any triple pattern with a bound predicate. For triple patterns with an unbound predicate, the engines use ASK queries as done by FedX. Hence, in general, the information that the source selection approaches of these engines use is the same as Info1 (Section 3.1) and Info3 (Section 3.4). 4. Summary of Findings When taking into account the capabilities of different interfaces for providing access to RDF data, we observe that all existing source selection approaches can be adopted for heterogeneous federations by making some interface-specific modifications to the processes they use to collect the information they rely on. However, for many cases, these modifications would result in processes that are of limited suitability from a practical perspective. In particular, for all approaches that rely on some form of data summary, index, or statis- tics about the data of each federation member—which includes the DARQ approach [20], ADERIS [19], HiBISCuS [16], CostFed [18], Odyssey [22], as well as VoID-based approaches as used in SPLENDID [23], LHD [25], Semagrow [24]—if federation members provide certain types of interfaces, obtaining the information required for building the corresponding data summary or index would result in retrieving the complete dataset from these federation members. As a consequence, the retrieved datasets may then also be stored locally, making the use of a federation engine obsolete. Even for the approach of Lusail [14], potentially large amounts of data would need to be transferred from the federation members to the query engine for the purpose of deciding about combining triple patterns into subqueries. The only approaches for which obtaining the relevant information is practical are the approach of FedX [13] and the Fed-DSATUR approach [15]. For these approaches, all required information can be obtained by issuing only a few requests to the federation members, where these requests are guaranteed not to result in retrieving large portions of the datasets. Acknowledgments This work was funded by CUGS (the National Graduate School in Computer Science, Sweden), by Vetenskapsrådet (the Swedish Research Council, project reg. no. 2019-05655), and by the CENIIT program at Linköping University (project no. 17.05). References [1] L. Feigenbaum, G. T. Williams, K. G. Clark, E. Torres, SPARQL 1.1 Protocol, W3C Recom- mendation, Online at http://www.w3.org/TR/sparql11-protocol/, 2013. [2] R. Verborgh, M. Vander Sande, O. Hartig, J. Van Herwegen, L. De Vocht, B. De Meester, G. Haesendonck, P. Colpaert, Triple Pattern Fragments: A Low-Cost Knowledge Graph Interface for the Web, Journal of Web Semantics 37–38 (2016) 184–206. [3] O. Hartig, C. Buil-Aranda, Bindings-Restricted Triple Pattern Fragments, in: Proceedings of the 15th Int. Conf. on Ontologies, Databases, and Applications of Semantics (ODBASE), 2016. [4] C. Aebeloe, I. Keles, G. Montoya, K. Hose, Star Pattern Fragments: Accessing Knowledge Graphs through Star Patterns, arXiv preprint arXiv:2002.09172 (2020). [5] A. Azzam, J. D. Fernández, M. Acosta, M. Beno, A. Polleres, SMART-KG: Hybrid Shipping for SPARQL Querying on the Web, in: Proceedings of The Web Conference (WWW), 2020. [6] T. Neumann, G. Moerkotte, Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins, in: Proceedings of the 27th International Conference on Data Engineering (ICDE), 2011. [7] J. D. Fernández, M. A. Martínez-Prieto, C. Gutiérrez, A. Polleres, M. Arias, Binary RDF Representation for Publication and Exchange (HDT), Journal of Web Semantics 19 (2013) 22–41. [8] A. Azzam, C. Aebeloe, G. Montoya, I. Keles, A. Polleres, K. Hose, WiseKG: Balanced Access to Web Knowledge Graphs, in: Proceedings of the Web Conference (WWW), 2021. [9] T. Minier, H. Skaf-Molli, P. Molli, SaGe: Web Preemption for Public SPARQL Query Services, in: Proceedings of the Web Conference (WWW), 2019. [10] A. Grall, T. Minier, H. Skaf-Molli, P. Molli, Processing SPARQL Aggregate Queries with Web Preemption, in: Proc. of the 17th Extended Semantic Web Conference (ESWC), 2020. [11] J. Aimonier-Davat, H. Skaf-Molli, P. Molli, Processing SPARQL Property Path Queries Online with Web Preemption, in: Proceedings of the 18th Extended Semantic Web Conference (ESWC), 2021. [12] J. Aimonier-Davat, H. Skaf-Molli, P. Molli, A. Grall, T. Minier, Online Approximative SPARQL Query Processing for COUNT-DISTINCT Queries with Web Preemption, Semantic Web (2022). [13] A. Schwarte, P. Haase, K. Hose, R. Schenkel, M. Schmidt, FedX: Optimization Techniques for Federated Query Processing on Linked Data, in: Proceedings of the 10th International Semantic Web Conference (ISWC), 2011. [14] I. Abdelaziz, E. Mansour, M. Ouzzani, A. Aboulnaga, P. Kalnis, Lusail: A System for Querying Linked Data at Scale, Proc. of the VLDB Endowment (2017). [15] M.-E. Vidal, S. Castillo, M. Acosta, G. Montoya, G. Palma, On the Selection of SPARQL Endpoints to Efficiently Execute Federated SPARQL Queries, in: Transactions on Large- Scale Data- and Knowledge-Centered Systems XXV, Springer, 2016, pp. 109–149. [16] M. Saleem, A.-C. Ngonga Ngomo, HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation, in: Proceeding of the European Semantic Web Conference (ESWC), 2014. [17] C. B. Aranda, A. Polleres, J. Umbrich, Strategies for Executing Federated Queries in SPARQL1.1, in: Proceedings of the 13th International Semantic Web Conference (ISWC), 2014. [18] M. Saleem, A. Potocki, T. Soru, O. Hartig, A.-C. N. Ngomo, CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation, in: Proceedings of the SEMANTiCS Conference, 2018. [19] S. Lynden, I. Kojima, A. Matono, Y. Tanimura, ADERIS: An Adaptive Query Processor for Joining Federated SPARQL Endpoints, in: Proceedings of the Int. Conf. on Ontologies, Databases, and Applications of Semantics (ODBASE), 2011. [20] B. Quilitz, U. Leser, Querying Distributed RDF Data Sources with SPARQL, in: Proceedings of the European Semantic Web Conference (ESWC), 2008. [21] M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo, E. Ruckhaus, ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints, in: Proceedings of the International Semantic Web Conference (ISWC), 2011. [22] G. Montoya, H. Skaf-Molli, K. Hose, The Odyssey Approach for Optimizing Federated SPARQL Queries, in: Proceedings of the International Semantic Web Conference (ISWC), 2017. [23] O. Görlitz, S. Staab, SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descrip- tions, in: Proc. of the 2nd Int. Workshop on Consuming Linked Data (COLD), 2011. [24] A. Charalambidis, A. Troumpoukis, S. Konstantopoulos, SemaGrow: Optimizing Federated SPARQL Queries, in: Proc. of the 11th Int. Conf. on Semantic Systems (SEMANTiCS), 2015. [25] X. Wang, T. Tiropanis, H. C. Davis, LHD: Optimising Linked Data Query Processing Using Parallelisation, in: Proceedings of the Linked Data on the Web Workshop (LDOW), 2013. [26] K. Alexander, R. Cyganiak, M. Hausenblas, J. Zhao, Describing Linked Datasets with the VoID Vocabulary, W3C Interest Group Note, Online at http://www.w3.org/TR/void/, 2011.