Data Structures for Indexing Triple Table??? Data Structures for Indexing Triple Table? Roman Meca, Michal Krátký, Peter Chovanec, and Filip Křižka Roman Meca, Michal Krátký, Peter Chovanec, and Filip Křižka Department of Computer Science, VŠB – Technical University of Ostrava CzechVRepublic Department of Computer Science, ŠB – Technical University of Ostrava {roman.meca.st, michal.kratky, Czech Republic peter.chovanec, filip.krizka}@vsb.cz {roman.meca.st, michal.kratky, peter.chovanec, filip.krizka}@vsb.cz Abstract. Semantic-based approaches are relatively new technologies. Some of these technologies are supported by specifications of W3 Con- sortium, i.e. RDF, SPARQL and so on. There are many areas where semantic data can be utilized, e.g. social networks, annotation of protein sequences etc. From the physical database design point of view, several index data structures are utilized to handle this data. In many cases, the well-known B-tree is used as a basic index supporting some operations. Since the semantic data are multidimensional, a common way is to use a number of B-trees to index the data. In this article, we review other in- dex data structures; we show that we can create only one index when we utilize a multidimensional data structure like the R-tree. We compare a performance of the B-tree indices with the R-tree and some its variants. Our experiments are performed over a huge semantic database, we show advantages and disadvantages of these data structures. 1 Introduction Semantic-based approaches are new technologies trying to allow computers to handle semantic information. These approaches have many specific applications. The main advantage of a semantic system is that the computer can reveal unex- pected facts, e.g. a new effective drug combination in medicine [6], unexpected relationships in social networks [19] or artificial intelligence [28] can be discov- ered. This is the reason why the semantic systems are a current research topic. The W3 Consortium have released some specifications related to semantic technologies1 , e.g. RDF [32] as a model of the semantic data or SPARQL [25] as a query language for the RDF data. In addition, more general specifications are also usable for a semantic DBMS supporting SPARQL or another query language, e.g. XML [9] or WSDL [12] for a communication with the DBMS. In this article, we also list a lot of semantic DBMS with the query languages they support and the indices they utilize. Since the semantic DBMS often utilize a relational DBMS as a storage for the RDF triple table (representing the RDF data), the B-tree [13] is often used as the main index. The main issue of this ? This work is partially supported by SGS, VŠB – Technical University of Ostrava, No. SP2015/192 and No. SP2015/170, Czech Republic. ?? This work is an extented version of an article accepted at the DATESO workshop. 1 http://www.w3.org/standards/techs/rdf#w3c all M. Nečaský, J. Pokorný, P. Moravec (Eds.): Dateso 2015, pp. 13–27, CEUR-WS.org/Vol-1343. 14 Roman Meca, Michal Krátký, Peter Chovanec, Filip Křižka physical implementation is that a number of B-trees have to be built to support queries over the triple table. However, there are other index data structures capable to handle semantic data. In this article, we show that it is possible to create only one index if we utilize a multidimensional data structure like the R- tree [22] or some of its variants (namely the Signature R-tree [30] or the Ordered R-tree [31]). In Section 2, we present some basic terms related to semantic technologies. In Section 3, we describe the basic physical design for the RDF data. Section 4 describes some negative issues of the B-tree as an index data structure for the triple table. In addition, the R-tree, R∗ -tree [8], and two their variants are de- scribed. In Section 5, we summarize the advantages and disadvantages of these data structures for various queries over the LUBM data collection [21]. Finally, we conclude the article and outline the possibilities of our future work. 2 Semantic Technologies In this section, we briefly introduce a theoretical basis of the RDF model [32] and the SPARQL query language [25] standardized by W3C. We recommend the book [20] for a more detailed review. 2.1 RDF Model RDF (Resource Description Framework ) is a general model representing infor- mation on the Web; data are modeled as a directed labeled graph [32]. Each edge represents a relationship between an object and a subject: two nodes of the graph. The label of the edge is called property. An example of the graph is given in Figure 1. This tuple (subject, property, object) is called an RDF triple (s,p,o). The values of each triple usually include IRI (Internationalized Resource Identifiers) [15] identifying an abstract or a physical resource. In [20], the author introduces the following definition: Definition 1 (RDF triple). Let us assume there are pairwise disjoint infinite sets I, B, and L, where I represents the set of IRIs, B the set of blank nodes, and L the set of literals. We call a triple (s, p, o) ∈ (I ∪ B)I(I ∪ B ∪ L) an RDF triple, where s represents the subject, p the predicate, and o the object of the RDF triple. A triple table is a set of RDF triples; it is a representation of the RDF graph. In Table 1, we see a fragment of the triple table to the RDF graph in Figure 1. A triple store or an RDF database is an engine enabling to store an RDF graph and efficient processing of queries. However, we usually require other operations like update, insert or delete. Some RDF stores add a fourth element to the triple; this fourth element contains the context of the triple [14]. There are RDF engines enabling to manage these quads [16]. Data Structures for Indexing Triple Table 15 Fig. 1. An example of an RDF graph [18] Subject Property Object LongJump type Jump Blanka Vlasic jumps HighJump GoldenLeague type Meeting Table 1. A fragment of an RDF triple table [18] The RDF specification [32] does not define any way how to store and index the triple table; therefore, there are many variants of the physical design of the triple table and we describe them in Section 3. 2.2 SPARQL Query Language Although there are many query languages for RDF data2 , e.g. SPARQL/Update (or SPALUR) [38], SPARQL 1.1 [25] is a de-facto standard query language for RDF data. It is similar to SQL in many features. SPARQL 1.1 also includes insert, update, and delete operations. The basic query construct of the SELECT statement includes SELECT WHERE . A variable in SPARQL defined by the symbol ? and a name represents the main difference com- pared to SQL; they define unknown values of o, s or p in a pattern as well as a relationship among triple patterns. We distinguish four types of the SPARQL query (for more details see [25]): – SELECT – returns the result relation defined by the projection and patterns. 2 http://www.w3.org/2001/11/13-RDF-Query-Rules/ 16 Roman Meca, Michal Krátký, Peter Chovanec, Filip Křižka – ASK – similar to the SELECT query; however, it returns the boolean value; true if the result is not empty, otherwise false. – CONSTRUCT – allows to format own result graph over the triples returned by the patterns. – DESCRIBE – returns the node (and its neighbours) defined by the patterns. A form of determines the selectivity of a query over the triple table. We can distinguish a point query (s, p, o) returning 0 or 1 triple, or a range query where the query (s, ∗, ∗) can returns more triples than the query (s, p, ∗). Example 1 (SPARQL Queries). 1. SELECT ?s ?p ?o WHERE { ?s ?p ?o } This query selects the whole triple table, it represents the range query (∗, ∗, ∗). 2. SELECT * WHERE { } ASK { } These two queries are similar; the SELECT query returns 0 or 1 triple, on the other hand, the ASK query returns true in the case the triple exists in the graph. These queries represent the point (s, p, o) query over the triple table. 3. SELECT ?s WHERE { ?s } ASK { ?s } CONSTRUCT ?s WHERE { ?s } These three queries include the same selection: the range query (∗, , ). The SELECT query returns all subjects matched by the range query, the ASK query returns true if any triple exists in the graph, and the CONSTRUCT query returns triples (∗, , ) for all triples retrieved by the selection. 4. SELECT ?p ?o WHERE { ?p ?o } This query selects all triples matched by the range query (, ∗, ∗). The selectivity of this query is probably lower than the selectivity of the queries 2 and 3; however, it is higher compared to the query 1. Moreover, the selection includes zero or more join operations. In Figure 2, we show two queries including more join operations. A query with one join is shown in Figure 2(a). In this SELECT, we can see two output variables o1 and o2. In Lines 2 and 3, the range queries (*, , *) and (*, , *) are defined. Results of these range queries are then joined using the subject represented by the j variable and objects for variables o1 and o2 are returned. A more complex SPARQL query with join is shown in Figure 2(b). This SELECT also contains the output variables o1 and o2. However, this query is evaluated by a sequence of three joins: the first join involves sets defined by queries in Lines 2 and 3, the second join involves the result of the previous join and the result of the query in Line 4, and the last join involves the result of the previous join and the result of the query in Line 5. The result of the complete query includes subjects and objects for the variables s and o. Data Structures for Indexing Triple Table 17 1. SELECT ?s ?o WHERE { 1. SELECT ?o1 ?o2 WHERE { 2. ?s ?j1 . 2. ?j ?o1 . 3. ?j1 ?j2 . 3. ?j ?o2 4. ?j2 ?j3 . 4. } 5. ?j3 ?o 6. } j2 type sc j j1 j3 type jumps jumps hasWorlRecord o1 o2 s o (a) (b) Fig. 2. Two SPARQL query with join and their graph representations 3 Existing Triple Stores In Table 2, we show triple stores introduced from 2002 to 2014. These triple stores include academic prototypes, commercial solutions as well as open source projects. Although some details of their implementation are not known, we can distinguish three basic types of the physical design for the triple table [18]: 1. Triple Table (TT ) – in this case, triples are stored in a sequence array. 2. Property Table (PT ) – in this case, we define a tuple (s, o1 , o2 , . . . , on ) for properties p1 , p2 , . . . pn . Tuples of this schema are stored in a sequence array. We can define more property tables in that cases the number of properties is higher than n. 3. Vertical Partitioning (VP ) – the property table where n = 1. Except these main approaches there are also some other variants and improve- ments, for example Hierarchical Property Partitioning utilized in roStore [17]. In some works, we distinguish the Multiple indices approach, which means that some combinations of various indices together with a modification of the above described types are depicted. In Table 2, we can see the B-tree and its variants are the most commonly used data structure indexing the triple table. 3 http://www.guha.com/rdfdb/ 4 http://rdfstore.sourceforge.net/ 5 http://www.bigdata.com/ 18 Roman Meca, Michal Krátký, Peter Chovanec, Filip Křižka Physical Index Supported Last Store Published design data query update type structure language JENA [34] 2002 2014 PT Hash-table, B-tree SPARQL RDFSuite [3] 2001 2003 PT B-tree SPARUL Sesame [10] 2002 2014 PT B-tree SPALUR 3store [23] 2003 2013 TT Hash-table RDQL/SPARQL rdfDB3 2004 2010 TT B-tree SPARQL RDFStore4 2004 2006 TT BerkeleyDB SPARQL Redland [7] 2002 2014 TT Hash-table SPARQL AllegroGraph[1] 2006 2014 B-tree SPARQL B-tree, sw-Store[2] 2009 2014 VP Bitmap SPARQL 4store [24] 2009 2013 PT Hash-table SPALUR YARS [26] 2005 2006 MI B-tree N3 extension Sparse index, YARS2 [27] 2007 MI B-tree SPARQL AVL tree, Kowari [42] 2005 2005 MI B-tree iTQL/RDQL Hexastore [41] 2008 MI B-tree SPARQL RDFJoin [35] 2008 VP B-tree SPARQL RDFKB [36] 2009 MI B-tree - BitMat [4] 2009 2013 MI 3D Bitmap SPARQL-like RDF-3X [37] 2008 2013 MI B-tree SPARQL Parliament [29] 2009 2014 MI B-tree, Heap table - B-tree, Virtuoso[16] 2009 2014 MI Bitmap SPALUR RDFCube[33] 2007 MI 3D Hash-table - GRIN [40] 2007 MI B-tree SPARQL BigData5 2008 2014 MI B-tree SPARQL 1.1 Oracle [11] 2005 2014 MI B-tree, R-tree SPARQL Marmotta [5] 2013 2014 MI B-tree SPARQL Table 2. Triple Stores. TT - triple table PT - property table VP - vertical partitioning MI - multiple indices Data Structures for Indexing Triple Table 19 4 Index Data Structures 4.1 B-tree The B-tree is an one-dimensional paged data structure supporting point and one-dimensional range queries as well as update operations [13]. As result, in the case we want to support a general range query without a sequential scan of all leaf nodes, we have to create more indices. For example, in the case of a B-tree with the compound key (s, p, o), we can effectively utilize range queries (s, p, ∗) and (s, ∗, ∗). On the other hand, fast processing of the range query (∗, p, o) demands a sequential scan over all leaf nodes of the B-tree. To cover all combination of searched dimensions with efficient range query execution, three B-trees have to be created (see Table 3). Consequently, this solution means that the size of indices is probably higher than the table size. This issue is even more evident in the case of the Quad table; in Table 4, we see that we need 6 indices to cover all range queries over quads. There are two problematic issues related to this technique: the higher space overhead and the additional overhead of the update operations since more indices have to be updated. Compound key of the B-tree (s, p, o) (o, s, p) (p, o, s) Supporting (s, p, o) (o, s, p) (p, o, s) range (s, p, ∗) (o, s, ∗) (p, o, ∗) queries (s, ∗, ∗) (o, ∗, ∗) (p, ∗, ∗) (∗, ∗, ∗) (∗, ∗, ∗) (∗, ∗, ∗) Table 3. B-tree indices for the triple table Compound key of the B-tree (s, p, o, c) (p, o, c) (o, c, s) (c, s, p) (c, p) (o, s) (s, p, o, c) (p, ∗, ∗) (o, ∗, ∗) (c, ∗, ∗) (c, p) (o, s) Supporting (s, p, o, ∗) (p, o, ∗) (o, c, ∗) (c, s, ∗) range (s, p, ∗, ∗) (p, o, c) (o, c, s) (c, s, p) queries (s, ∗, ∗, ∗) (∗, ∗, ∗, ∗) Table 4. B-tree indices for the quad table 4.2 R-tree Since the multidimensional R-tree [22] supports a general multidimensional range query, we can use it as a solution of the above mentioned problems instead of 20 Roman Meca, Michal Krátký, Peter Chovanec, Filip Křižka a sequence scan in the B-tree. The R-tree can be thought of as an extension of the B-tree in a multidimensional space. It corresponds to a hierarchy of nested n-dimensional minimum bounding rectangles (MBR). If N is an interior node, it contains couples of the form (Ri , Pi ), where Pi is a pointer to a child of the node N . If R is its MBR, then the rectangles Ri corresponding to the children Ni of N are contained in R. Rectangles at the same tree level can overlap. If N is a leaf node, it contains couples of the form (Ri , Oi ), so called index records, where Ri contains a spatial object Oi . The split algorithm has the significant affect on the index performance. Three split techniques (Linear, Quadratic, and Exponential ) proposed in [22] are based on a heuristic optimization. The Quadratic algorithm has turned out to be the most effective and other improved versions of R-trees are based on this method. An MBR can overlap another MBR in the same level of the tree; the probability increases linearly with increasing data dimension. This effect is known as curse of dimensionality [43]. There are many variants of the R-tree, e.g. R∗ -trees [8], R+ -tree [39]. The ∗ R -tree [8] differs from the R-trees mainly in the insertion algorithm. Although original R-tree algorithms tried only to minimize the area covered by MBRs, the R*-tree algorithms try to minimize overlapping between MBRs at the same levels and maximize the storage utilization. The R+ -tree [39] is a variant of the R-tree which allows no overlap between regions corresponding to nodes at the same tree level; however, an item can be stored in more than one leaf node. Since some intervals of a range query include only one value in the case of the triple table, we call the query as the narrow range query [30]. Therefore, we utilize the Signature R-tree [30] allowing to handle the range query more efficiently than the R-tree and its variants. Moreover, we use the Ordered R- tree [31] since we can define an ordering of attributes. These data structures are described in the following sections. 4.3 Signature R-tree The Signature R-tree [30] contains MBRs in inner nodes (we suppose point data in leaf nodes) and one signature related to each MBR. The signature is created for tuples inserted in the subtree related to each MBR. As result, we can use two types of filtering when a range query scans the tree: the first filtering method tests whether an MBR is intersected by a query rectangle and the second filtering method tests whether a signature can include tuples of the query. As result, the Signature R-tree reads a lower number of nodes during the range query processing. This R-tree variant is however proposed only for point data and narrow range queries. 4.4 Ordered R-tree The Ordered R-tree [31] is a simple combination of the R-tree and the B-tree. It means, we can use a general multidimensional range query, however we can Data Structures for Indexing Triple Table 21 define an ordering for tuples inserted in the tree. Evidently, we can define only one ordering in one tree. There are two consequences: 1. For some range queries (corresponding to ordering defined for the tree), all leaf nodes intersected by the query rectangle include only result tuples. It is not generally true for the R-tree and its variants, but the range query of the B-tree provides the same behaviour. 2. We get tuples of the result sorted and it is not necessary to sort them after the range query is processed. In this article, we utilize mainly the first property. 5 Experiments In our experiments6 , we compare the B-tree, as the main index data structure utilized in semantic DBMS, with the R-tree7 , Signature R-tree, and Ordered R- tree. All index data structures are implemented in C++8 . We utilize a generated synthetic data collection called LUBM including 133,573,856 triples [21], the size of the text file is 22.2 GB. Query Group Type Result set size #Queries #Iterations 1 Range query < 1; 1 > 6 10,000 2 Range query < 2; 1, 000 > 6 50 3 Range query < 1,001; 1,000,000 > 6 1 4 Range query < 1, 000, 001; ∞) 6 1 5 Point query < 1; 1 > 33,234,949 1 Table 5. Specification of query groups We test the performance of point and range queries processed over the index data structures when a SPARQL query is evaluated. We use 5 groups of queries determined by the selectivity (see Table 5)9 . QG5 represents a sequence of point queries processed during a join operation. In the case of QG1 and QG2, it is necessary to repeat a sequence of queries since the processing time of one query is unmeasurable. The number of iterations is written in the column #Iteration of the table. The column #Queries contains a number of various queries in one query group. 6 We run our experiments on 2 x Intel Xeon E5 2690 2.9GHz and 300GB RAM memory, OS Windows Server 2008. 7 More precisely, the R∗ -tree has been tested. 8 A part of the RadegastDB framework developed by DBRG – http://db.cs.vsb.cz/ 9 A complete list of queries can be found in http://db.cs.vsb.cz/ TechnicalReports/indices for rdf data-query.pdf 22 Roman Meca, Michal Krátký, Peter Chovanec, Filip Křižka We built the B-trees, the R-tree, the Signature R-tree, and the Ordered R-trees for the test data collection10 . In Table 6 and Figure 3, we see basic characteristics of these indices. Since these data structures include string ids instead of strings, a term index is built. In the case of the Ordered R-tree, we do not need more trees like in the case of the B-tree, however, in this article, we want to test whether it is possible to find an optimal ordering for the Ordered R-tree, therefore we build the tree for more orderings of the attributes. We can see that the B-tree size is up-to 3× higher than the size of the R-tree-based indices. The R-tree is build in 58% of the B-tree build time. On the other hand, the build time for other R-tree-based indices is up-to 2× less efficient compared to the B-tree. Index Data Structure #Nodes Size [GB] Build Time [s] Term index 4,543,671 8.67 3,794.7 (s, p, o) B-tree (p, o, s) 4,465,853 8.51 3,857.9 (o, s, p) R-tree 1,495,289 2.85 2,228.1 Signature R-tree 1,641,905 3.13 6,143.5 (s, p, o) 1,541,677 2.94 6,404.1 (p, o, s) 1,499,602 2.86 7,193.5 (o, s, p) 1,433,703 2.73 6,791.1 Ordered R-tree (s, o, p) 1,541,677 2.94 7,232.2 (p, s, o) 1,579,935 3.01 7,535.6 (o, p, s) 1,429,151 2.73 6,933.0 Table 6. Statistics of index data structures 8E+3 9 B-tree 7E+3 8 R-tree 6E+3 7 6 Signature R-tree 5E+3 5 Ordered R-tree SPO 4E+3 4 Ordered R-tree POS 3E+3 3 Ordered R-tree OSP 2E+3 2 Ordered R-tree SOP 1E+3 1 Ordered R-tree PSO 0E+0 0 Ordered R-tree OPS Build time [s] Index size [GB] Fig. 3. Index build time and index size 10 The page size is 2,048 B for all data structures. Data Structures for Indexing Triple Table 23 In Figure 4, we can see the query processing time for all query groups; the processing time is the average time of all queries in one group. Similarly, Fig- ure 5 includes DAC for all query groups. Evidently, the B-tree provides the most efficient performance especially in the case of the higher selectivity. The reason of this result is the minimal DAC of the B-tree since only leaf nodes includ- ing result tuples are scanned. In the case of the lower selectivity (see GP4 in Figure 4), results of all index data structures are similar. 1000,000000 100,000000 10,000000 B-tree 1,000000 R-tree 0,100000 Signature R-tree 0,010000 Ordered R-tree SPO 0,001000 Ordered R-tree POS 0,000100 Ordered R-tree OSP 0,000010 Ordered R-tree SOP 0,000001 Ordered R-tree PSO 0,000000 Ordered R-tree OPS GP1 GP2 GP3 GP4 GP5 Fig. 4. Processing time for all query groups [s] 10000000 1000000 B-tree 100000 R-tree 10000 Signature R-tree Ordered R-tree SPO 1000 Ordered R-tree POS 100 Ordered R-tree OSP Ordered R-tree SOP 10 Ordered R-tree PSO 1 Ordered R-tree OPS GP1 GP2 GP3 GP4 GP5 Fig. 5. DAC for all query groups We see that the Signature R-tree and the Ordered R-tree outperform the R-tree in most cases. Although the average processing time of the Signature R- tree is lower compared to the Ordered R-tree, we can find a query in each query group where it exists an ordering of the Ordered R-tree such that the Ordered R-tree outperforms the Signature R-tree. Let us consider query processing times in Figure 6. In the case of Q1 (S=’AssociateProfessor’, P=’type’, O=*), the Ordered R-trees SPO and SOP outperform the Signature R-tree and other Ordered R-trees, however in the case of Q7 (S=*, P=’PublicationAuthor’, O=’AssistentProfessor’) the performance of these Ordered R-trees is the low- est. Similarly, in the case of Q11 (S=*, P=*, O=’Course2’), the Ordered R-tree OPS outperforms other R-tree variants and its performance is the same as the 24 Roman Meca, Michal Krátký, Peter Chovanec, Filip Křižka performance of the B-tree. Similarly, in the case of Q14 (S=*, P=’worksFor’, O=*), the Ordered R-tree SOP outperforms other R-tree variants. However, we must keep in mind that this effect depends on a query and a concrete ordering of the Ordered R-tree. Although, it is clear that the B-tree provides the most efficient processing time, there are some improvements of multidimensional data structures. The first one, the index size of a multidimensional data structure is up to 3× lower the B-tree index size. The second one, in the case of the B-tree it is necessary to change ordering of values in a triple when a query processor want to use an index with different ordering than another index returns, it means an additional time overhead in this case. 1000,000000 100,000000 10,000000 B-tree 1,000000 R-tree 0,100000 Signature R-tree Ordered R-tree SPO 0,010000 Ordered R-tree POS 0,001000 Ordered R-tree OSP 0,000100 Ordered R-tree SOP 0,000010 Ordered R-tree PSO 0,000001 Ordered R-tree OPS Q1 Q7 Q11 Q14 Fig. 6. Processing time for some queries [s] 10000000 1000000 B-tree 100000 R-tree 10000 Signature R-tree Ordered R-tree SPO 1000 Ordered R-tree POS 100 Ordered R-tree OSP Ordered R-tree SOP 10 Ordered R-tree PSO 1 Ordered R-tree OPS Q1 Q7 Q11 Q14 Fig. 7. DAC for some queries As result, let us consider a workload including queries accessing the most tree nodes. If the cache size is lower than the number of B-tree nodes, a multidi- mensional data structure would provide the higher performance than the B-tree in the case the cache includes all nodes of the multidimensional data structure. Data Structures for Indexing Triple Table 25 6 Conclusion In this article, we compared the performance of the B-tree with the R-tree, the Signature R-tree, and the Ordered R-tree for the triple table and point and range queries processed during the evaluation of a SPARQL query. The Signature R- tree and the Ordered R-tree outperform the R-tree for most queries. Although the average processing time of the Signature R-tree is lower compared to the Ordered R-tree, in each query group, we can find a query where there is such an ordering of the Ordered R-tree outperforming the Signature R-tree. The B-tree provides the most efficient processing time; the average processing time of the B-tree is 74% of the Signature R-tree’s processing time. However, there are some specific improvements of multidimensional data structures. The first one, index size of a multidimensional data structures is up to 3× lower than the B-tree index size. The second one, in the case of the B-tree it is necessary to change ordering of values in each triple when a query processor want to use an index with different ordering than another index returns. Consequently, it means an additional time overhead of the query processing. References [1] J. Aasman. Allegro Graph: RDF Triple Database. Tech. rep. Technical Re- port 1, Franz Incorporated, 2006. url: http://www.franz.com/agraph/ allegrograph/. [2] D.J. Abadi et al. “SW-Store: a vertically partitioned DBMS for semantic web data management”. In: The VLDB Journal 18.2 (2009), pp. 385–406. [3] S. Alexaki et al. “The ICS-FORTH RDFSuite: Managing voluminous RDF description bases”. In: Proceedings of 2nd Internacional Workshop on the Semantic Web (SemWeb’01). 2001. [4] M. Atre, J. Srinivasan, and J.A. Hendler. BitMat: A Main Memory RDF Triple Store. Tech. rep. 2009. url: http://www.cs.rpi.edu/ ~atrem/ bitmat_techrep.pdf. [5] Reto Bachmann-Gmur. Instant Apache Stanbol. Packt Publishing Ltd, 2013. isbn: 978-1-78328-123-7. [6] Amos Bairoch et al. “The universal protein resource (UniProt)”. In: Nu- cleic acids research 33 (2005), pp. D154–D159. [7] D. Beckett. “The design and implementation of the Redland RDF appli- cation framework”. In: Computer Networks 39.5 (2002), pp. 577–588. [8] Norbert Beckmann et al. “The R∗ -Tree: An Efficient and Robust Access Method for Points and Rectangles”. In: Proceedings of the ACM Inter- national Conference on Management of Data (SIGMOD 1990). Vol. 19. AMC, 1990, pp. 322–331. [9] Tim Bray et al. “Extensible markup language (XML)”. In: World Wide Web Journal 2.4 (1997), pp. 27–66. [10] J. Broekstra, A. Kampman, and F. Van Harmelen. “Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema”. In: Pro- ceedings of the Semantic Web-ISWC. Vol. 2342. Springer, 2002. 26 Roman Meca, Michal Krátký, Peter Chovanec, Filip Křižka [11] E.I. Chong et al. “An efficient SQL-based RDF querying scheme”. In: Proceedings of 31th International Conference on Very Large Data Bases (VLDB 2005). VLDB Endowment. 2005, pp. 1216–1227. [12] Erik Christensen et al. Web services description language (WSDL) 1.1. Recommendation. W3C, 2001. url: http://www.w3.org/TR/wsdl. [13] Douglas Comer. “Ubiquitous B-tree”. In: ACM Computing Surveys (CSUR) 11.2 (1979), pp. 121–137. [14] R. Cyganiak, A. Harth, and A. Hogan. N-quads: Extending n-triples with context. Tech. rep. 2008. url: http://sw.deri.org/2008/07/n-quads/. [15] Martin Dürst and Michel Suignard. Internationalized resource identifiers (IRIs). Tech. rep. RFC 3987, January, 2005. url: http://www.ietf.org/ rfc/rfc3987.txt. [16] Orri Erling and Ivan Mikhailov. “Virtuoso: RDF support in a native RDBMS”. In: (2010), pp. 501–519. [17] David Faye et al. “RDF triples management in roStore”. In: Actes de IC2011 (2012), pp. 755–770. [18] David Célestin Faye, Olivier Curé, and Guillaume Blin. “A survey of RDF storage approaches”. In: ARIMA Journal 15 (2012). url: http://arima. inria.fr/015/015002.html. [19] Tim Finin et al. “Social networking on the semantic web”. In: Learning Organization journal 12.5 (2005), pp. 418–435. [20] S. Groppe. Data management and query processing in semantic web databases. Springer, 2011. isbn: 978-3-642-19356-9. [21] Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. “LUBM: A benchmark for OWL knowledge base systems”. In: Web Semantics: Science, Services and Agents on the World Wide Web 3.2 (2005), pp. 158–182. [22] Antonin Guttman. “R-trees: a dynamic index structure for spatial search- ing”. In: Proceedings of the ACM International Conference on Management of Data, (SIGMOD ’84). Vol. 14. 2. 1984, pp. 47–57. [23] S. Harris and D.N. Gibbins. “3store: Efficient bulk RDF storage”. In: vol- ume 89 of CEUR Workshop Proceedings (2003). [24] S. Harris, N. Lamb, and N. Shadbolt. “4store: The design and implementa- tion of a clustered RDF store”. In: Proceedings of 5th International Work- shop on Scalable Semantic Web Knowledge Base Systems (SSWS2009). 2009, pp. 94–109. [25] S. Harris and A. Seaborne. “SPARQL 1.1 query language”. In: W3C Rec- ommendation (2013). url: http://www.w3.org/TR/sparql11-query/. [26] A. Harth and S. Decker. “Optimized index structures for querying rdf from the web”. In: Proceedings of 3th Latin American Web Congress, (LA-WEB 2005). IEEE. 2005. [27] A. Harth et al. “Yars2: A federated repository for querying graph struc- tured data from the web”. In: The Semantic Web 4825 (2007), pp. 211– 224. [28] M Tim Jones. Artificial Intelligence A System Approach. Laxmi Publica- tions, Ltd., 2008. isbn: 978-0763773373. Data Structures for Indexing Triple Table 27 [29] D. Kolas, I. Emmons, and M. Dean. “Efficient linked-list rdf indexing in parliament”. In: Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems. Vol. 9. 2009, pp. 17–32. [30] Michal Krátký et al. “Efficient processing of narrow range queries in multi- dimensional data structures”. In: Proceedings of 10th International Database Engineering and Applications Symposium, (IDEAS’06). IEEE. 2006. [31] Filip Křižka, Michal Krátký, and Radim Bača. “On support of order- ing in multidimensional data structures”. In: Proceedings of Advances in Databases and Information Systems (ADBIS 2010). Vol. 6295. LNCS. Springer. 2010, pp. 575–578. [32] Frank Manola, Eric Miller, Brian McBride, et al. “RDF primer”. In: W3C recommendation 10 (2004). url: http://www.w3.org/TR/rdf-primer/. [33] Akiyoshi Matono, SaidMirza Pahlevi, and Isao Kojima. “RDFCube: A P2P-Based Three-Dimensional Index for Structural Joins on Distributed Triple Stores”. In: Databases, Information Systems, and Peer-to-Peer Com- puting. Vol. 4125. LNCS. Springer, 2007. isbn: 978-3-540-71660-0. [34] B. McBride. “Jena: A semantic web toolkit”. In: Internet Computing, IEEE 6.6 (2002), pp. 55–59. [35] J.P. McGlothlin and L.R. Khan. RDFJoin: A scalable data model for per- sistence and efficient querying of RDF datasets. Tech. rep. 2009. [36] J.P. McGlothlin and L.R. Khan. “RDFKB: efficient support for RDF infer- ence queries and knowledge management”. In: Proceedings of the 2009 In- ternational Database Engineering & Applications Symposium. ACM. 2009, pp. 259–266. [37] T. Neumann and G. Weikum. “RDF-3X: a RISC-style engine for RDF”. In: Proceedings of the VLDB Endowment. Vol. 1. 1. VLDB Endowment, 2008, pp. 647–659. [38] A. Seaborne et al. “SPARQL/Update: A language for updating RDF graphs”. In: W3C Member Submission 15 (2008). [39] Timos K. Sellis, Nick Roussopoulos, and Christos Faloutsos. “The R+ - Tree: A Dynamic Index for Multi-Dimensional Objects”. In: Proceedings of 13th International Conference on Very Large Data Bases (VLDB 1997). Morgan Kaufmann, 1987. [40] Octavian Udrea, Andrea Pugliese, and VS Subrahmanian. “GRIN: A graph based RDF index”. In: Proceedings of the 22nd national conference on Artificial intelligence, (AAAI’07). Vol. 1. 2007, pp. 1465–1470. [41] C. Weiss, P. Karras, and A. Bernstein. “Hexastore: sextuple indexing for semantic web data management”. In: Proceedings of the VLDB Endowment 1.1 (2008), pp. 1008–1019. [42] D. Wood, P. Gearon, and T. Adams. “Kowari: A platform for semantic web storage and analysis”. In: Proceedings of XTech 2005 Conference. 2005. [43] Cui Yu. High-Dimensional Indexing. Lecture Notes in Computer Science. Springer-Verlag, Heidelberg, 2002. isbn: 3-540-44199-9.