<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">B+Hash Tree: Optimizing query execution times for on-Disk Semantic Web data structures</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Minh</forename><forename type="middle">Khoa</forename><surname>Nguyen</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Cosmin</forename><surname>Basca</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Abraham</forename><surname>Bernstein</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">DDIS</orgName>
								<orgName type="department" key="dep2">Department of Informatics</orgName>
								<orgName type="institution">University of Zurich</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<address>
									<settlement>Zurich</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">B+Hash Tree: Optimizing query execution times for on-Disk Semantic Web data structures</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">69340E4D1DC29624DF2E23173AA8E3BD</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:18+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The increasing growth of the Semantic Web has substantially enlarged the amount of data available in RDF format. One proposed solution is to map RDF data to relational databases (RDBs). The lack of a common schema, however, makes this mapping inefficient. Some RDF-native solutions use B+Trees, which are potentially becoming a bottleneck, as the single key-space approach of the Semantic Web may even make their O(log(n)) worst case performance too costly. Alternatives, such as hash-based approaches, suffer from insufficient update and scan performance. In this paper we propose a novel type of index structure called a B+Hash Tree, which combines the strengths of traditional B-Trees with the speedy constant-time lookup of a hash-based structure. Our main research idea is to enhance the B+Tree with a Hash Map to enable constant retrieval time instead of the common logarithmic one of the B+Tree. The result is a scalable, updatable, and lookup-optimized, on-disk index-structure that is especially suitable for the large key-spaces of RDF datasets. We evaluate the approach against existing RDF indexing schemes using two commonly used datasets and show that a B+Hash Tree is at least twice as fast as its competitors -an advantage that we show should grow as dataset sizes increase.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The increasing growth of the Semantic Web has substantially increased the amount of data available in RDF <ref type="foot" target="#foot_0">1</ref> format. This growth necessitates the availability of scalable and fast data structures to index and store RDF. Traditional approaches store RDF in relational databases. Mapping RDF to a relational database typically follows one of the following approaches: <ref type="bibr" target="#b0">(1)</ref> all triples are mapped to a single three column table -an approach which will result in numerous inefficient self-joins of that table, (2) every property gets mapped to its own three column table <ref type="bibr" target="#b0">[1]</ref> -resulting in a high number of Unions for propertyunbound queries and a table creation for every newly encountered property type, or (3) draws upon domain-knowledge to map properties to a relational database schema -forgoing some flexibility when adding new properties. Moreover, according to Abadi and Weiss, storing dynamically semi-structured data such as RDF in relational databases may cause a high number of NULL values in the tables, which imposes a significant computational overhead <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b0">1]</ref>. As a consequence, many native RDF databases have been proposed <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b9">10]</ref>.</p><p>Most native RDF databases propose mapping the RDF-graph to some existing indexing scheme. The most straightforward approach, RDF-3X <ref type="bibr" target="#b9">[10]</ref> essentially proposes to store all possible subsets of the triple keys (i.e., s, p, and o from every &lt; subject, predicate, object &gt; triple) as composite keys in a traditional B+Tree structure. This approach results in 15 B+Trees, each of which having large-keyspace (e.g., sizes of |s| • |p| • |o|, |s| • |o|, etc.) and many entries. Given the O(log(n)) access time for single key lookup, this can result in a considerable time overhead for some queries. Consequently, given the ever increasing amount of data to be stored in RDF stores, traditional approaches relying on B+Trees in the sprit of RDF-3X have the potential of becoming a main bottle-neck to scalability <ref type="bibr" target="#b13">[14]</ref>. Taking the adaptation to RDF structures to the extreme, Weiss and colleagues <ref type="bibr" target="#b14">[15]</ref> propose a specialized index consisting of 3-level cascades of ordered lists and hashes. This approach provides a constant (i.e., O(c)) lookup time but has the drawback that updating hashes can become quite costly. Whilst the authors argue that updates in the Semantic Web are oftentimes rare, they are, however, common and should not be dismissed.</p><p>In this paper we propose a novel type of index structure called a B+Hash Tree, which combines the strengths of traditional B+Trees (i.e., ease of use and updatability) with the speedy lookup of a hash-based structure. Our main research idea is to enhance the B+Tree with a Hash Map to enable constant retrieval time instead of the common logarithmic one of the B+Tree. The result is a scalable, updatable and lookup-optimized, on-disk index-structure that is especially suitable for the large key-spaces of RDF datasets. Consequently, the main contribution of this paper is the presentation, formalization, and evaluation of the B+Hash Tree. This paper is structured as follows. In Section 2 we set the stage with a discussion of related work, its benefits and drawbacks. Section 3 then introduces the B+Hash Tree, provides a formalization as well as a cost model thereof, and discusses some its limitations. In Section 4 we empirically compare the B+Hash Tree to the RDF-3X like approach storing the key in a B+Tree. In the final section we summarize our conclusions and discuss future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related work</head><p>Several architectures for storing Semantic Web data have been proposed. Many of them use relational databases to index RDF data. Row store systems such as Jena <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17]</ref> map RDF triples into a relational structure, which results in creating a giant three column &lt; subject, predicate, object &gt; table. Having a single large table, however, oftentimes results in expensive self-joins; in particular if the basic graph patterns of a query are not very selective. To counter this problem, Jena creates property tables, which combine a collection of properties of a resource in one table. Whilst this approach reduces the number of self-joins it (1) assumes that the RDF actually has some common exploitable structure that does not change often over time and <ref type="bibr" target="#b1">(2)</ref> has the potential to result in a large number of NULL values where properties are missing from some resources in a table <ref type="bibr" target="#b0">[1]</ref>. Hence, this inflexibility and the NULL values may lead to a significant computational overhead <ref type="bibr" target="#b14">[15]</ref>.</p><p>An alternative approach to the property table solution are column stores such as SW-Store <ref type="bibr" target="#b0">[1]</ref>. For each unique property of the RDF dataset, SW-Store creates a two column table containing the subject and the object. Assuming a run-length encoding of the column, this provides a compact storage mechanism for RDF data that allows efficient joins, as only the join columns are retrieved from disk (as opposed to the table in row-stores). Nevertheless, if a graph pattern has an unbound property (e.g., &lt; s, ?p, ?o &gt;) then an increased number of joins and unions are inevitable <ref type="bibr" target="#b14">[15]</ref>.</p><p>A recent approach is Hexastore <ref type="bibr" target="#b14">[15]</ref>, which stores RDF data in a native disk vector-based index. Hexastore manages all six possible orderings of the RDF triple keys in their own index. In each of the six indices, a triple is split into its three levels: All levels are stored in native on-disk sorted vectors. A lookup of the triple &lt; s, p, o &gt; would, hence, result in a hash-lookup in the first level of s, the result of which would point to a second-level hash that would be used to lookup p, which would point to an ordered list containing o. Given that a hash-lookup can be achieved in constant time, Hexastore provides a constant-time lookup for any given triple. Its six-fold indexing allows a fast lookup of any triple pattern at the cost of a worst-case five fold space cost. The biggest drawback, however, is that Hexastore in its "pure" implementation does not support incremental updates, as inserts would require resorting the vectors of each of the six indices -a time consuming process. <ref type="bibr" target="#b9">[10]</ref> avoids this drawback by storing the data in B+Trees instead of vector lists. Specifically, it stores every possible subset combination of the three triple keys in a separate B+Tree. This approach allows updates and has an O(log(n)) complexity for retrieval, updates, and deletion. Note, however, that this approach leads to a huge key-space for some B+Trees (at worst |s| • |p| • |o|), as the composite key-space grows in the square of the number of nodes of the RDF graph. Hence, even O(log(n)) can become a bottleneck. A similar approach to RDF-3X or Hexastore is YARS2 <ref type="bibr" target="#b4">[5]</ref>, which indexes a certain subset of triples in 6 separate B-Trees or Hash Tables. By not treating all possible &lt; s, p, o &gt; subset combination equally, the missed indexes must be created by joining other indexes, which can be a time-consuming process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RDF-3X</head><p>A more recent approach is BitMat <ref type="bibr" target="#b1">[2]</ref>, which is a compact in-memory storage for RDF graphs. BitMat stores RDF data as a 3D bit-cube, where each dimension represents the subjects, the objects, and the predicates. When retrieving data the 3D bit-cube can be sliced for example along the "predicates-dimension" to get a 2D matrix. In each cell of the matrix, the value 1 or 0 denotes the presence respectively the absence of a subject and object bounded by the predicate of that matrix. Since bitwise operations are cheap, the major advantage of the BitMat index is its performance when executing low-selectivity queries. Nevertheless, BitMat is constrained by the available memory and, as the authors have shown in their evaluation, traditional approaches such as RDF-3X or MonetDB <ref type="bibr" target="#b7">[8]</ref> outperform BitMat on high-selectivity queries.</p><p>In real-time systems, Hybrid Tree-Hashes <ref type="bibr" target="#b10">[11]</ref> have been proposed to provide a fast in-memory access structure. To index data, the Hybrid Tree-Hash combines the use of a T-Tree and a Chained Bucket Hash. Lehman describes the T-Tree <ref type="bibr" target="#b6">[7]</ref> as a combination of the AVL-Tree and the B-Tree: similarly to B+Trees, nodes contain multiple elements whilst the binary search strategy of the AVL-Tree is employed for retrieval. To enhance the retrieval time the keys in the nodes of the T-Tree are hashed and the offset to that node is stored as the value. Data retrieval is accomplished by a lookup in the Chained Bucket Hash, which retrieves the offset value for the given search key. Then, the node that holds the data is accessed directly without traversing the T-Tree. Whilst T-Trees perform well as an in-memory data structure, their usage as an on-disk structure is problematic: First, the use of a binary tree results in deep trees, which in turn results in many disk pages being accessed. Second, the binary tree nature of the T-Tree makes it cache oblivious: range queries are highly expensive, as one has to continuously "jump" up and down the tree for traversal, leading to costly disk-seek operations. This is in contrast to the B+Tree, where the data is stored in the leaves as a linked list, resulting in fewer disk page seeks and cache awareness. To the best of our knowledge the Hybrid Tree-Hash is the most similar structure to our proposed B+Hash Tree index. The main difference is that we optimized our index structure for disk-based operations, whereas Hybrid Tree-Hashes were optimized for in-memory retrieval.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">B+Hash Tree</head><p>In this section we introduce our B+Hash Tree, a scalable, updatable and lookup-optimized, on-disk index-structure combining the strengths of B+Trees and Hashes. First, we describe the structure of the B+Hash Tree and elucidate its operations. Second, we provide a time and space complexity analysis of the relevant B+Hash Tree operations. Finally, we discuss some limitations of the B+Hash Tree and suggest appropriate solutions.</p><p>Note that throughout this section we propose to index a RDF graph akin to the RDF-3X approach. In other words, we propose a separate index for each possible subset combination of the &lt; s, p, o &gt; triples. Hence, a s 1 p 3 o 2 triple is stored in level 1 with the key s 1 , in level 2 with the composite key s 1 p 3 , and finally in level 3 using the composite key s 1 p 3 o 2 . This structure allows retrieval of all triple patterns with a single lookup <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b9">10]</ref>. In contrast to RDF-3X, we propose to use B+Hash Trees as opposed to B+Trees. Like all other approaches we also propose to dictionary encode all literals.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">B+Hash Tree Description</head><p>The architecture of the B+Hash Tree comprises two core elements: A B+Tree and the Hash Map. Here, we first explain how these elements are combined to form a joint index and then elaborate on the main operations.</p><p>We use the standard B+Tree as the basis for our B+Hash Tree. Recall that B+Trees are optimized for disk access. In particular, nodes of the trees are adapted to the size of a disk page to facilitate caching and limiting disk access. Additionally, all values are stored in the leaf nodes of the tree, which are interlinked, allowing fast index-range queries. More information about B+Trees can be found in <ref type="bibr" target="#b3">[4]</ref>.</p><p>As Figure <ref type="figure" target="#fig_0">1</ref> illustrates, the B+Hash Tree combines the B+Tree with a Hash Map. Specifically, to improve retrieval performance the leaf nodes of the B+Tree are being hashed. Each bucket entry in the Hash Map contains a key (or ID), an offset containing the address of the element designated by the key on disk and the count of distinct elements sharing the same prefix. The prefix being the anterior part of the key. As an example consider the s 2 p 1 o 4 triple: in a level 2 index the prefix should be s 2 p 1 ; in a level 1 index s 2 .</p><p>A retrieval operation in a B+Hash Tree starts with a lookup of the offset in the Hash Map using the key and then accesses the node holding the search key directly without traversing the tree. The count aids both the query optimizer (e.g., to gauge selectivity) and the execution of range scans (indicating the number of elements to be read). As exemplified in Figure <ref type="figure" target="#fig_0">1</ref>, "Bucket 1" indicates that there are two predicates for subject S 1 ("count = 2").</p><p>Note that not every leaf node needs to be hashed. Usually, only the leaf nodes containing the smallest suffix for a given prefix have to be hashed -an approach which we refer to as overall hashing. For example, on level 1 of the spo index only the leaf nodes where the S key changes need to be hashed, as illustrated in Figure <ref type="figure" target="#fig_0">1</ref>.</p><p>Alternatively, the hash can be tuned to contain the most popular keys -an approach which we refer to as cached hashing, as it employs a Hash Map akin to a cache of the B+Hash Tree's contents. Cached hashing can be tuned to reduce space consumption compared to overall hashing. This space saving comes at a cost of slowing down non-frequent accesses. Therefore, the empirical evaluation in this paper focuses on overall hashing. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Basic Operations</head><p>The B+Hash Tree has three basic operations: get, insert, and delete. To enable the same model interface as Hexastore <ref type="bibr" target="#b13">[14]</ref>, we split the get operation into two distinct ones: getIdx(a) Given a triple pattern a, look up the offset and the count of elements in the Hash Map: getIdx(a) : offset a , count a getSet(offset, count) Given an offset and a count, retrieve count a elements from the leaf node: getSet(offset a , count a ) : set a Hence, a traditional lookup would be composed of getSet(getIdx(a)), a very common index range-query, not to be mistaken with a SPARQL range-query. Furthermore, in contrast to Hexastore, we have the following data changing operations: insert(a) Insert triple a into the B+Hash Tree: insert(a) : void delete(a) Delete triple a from the B+Hash Tree: delete(a) : void Note, that an insert, respectively a delete operation may cause a rebalancing of the B+Tree. If this occurs then the keys in the newly created leaf node have to be verified if an update of their page offset's in the Hash Map is required -a process whose cost depends on the on-disk implementation of the B+Tree.</p><p>Index range-scans are quite common operations in Semantic Web applications. Just consider retrieving a list of all predicates that connect subject s with an object o. Such an operation results in the triple pattern &lt; s, ?p, o &gt;. In a B+Tree, neighboring leaf nodes are connected to each other, enabling sequential iteration through the pertinent leaf nodes. Hence, the logical way to retrieve the answers for this triple pattern in a RDF-3X like index is to iterate through the level 2 sop index starting from the "smallest" (or first indexed) p s . Hence, the B+Tree is first being traversed from the root node down to the leaf node to find the node for sop s and then sequentially iterating through the leaf nodes until a different object is encountered. When using a B+Hash Tree, in contrast, we first lookup the key so in the Hash Map of the appropriate index followed by the B+Tree traversal like in the traditional tree. Hence, we reduce the tree traversal down from the root node to the first leaf node -a O(log(n)) operation -to a single hash lookup -a constant time (O(c)) operation. From a certain data size the B+Hash Tree will, hence, outperform the B+Tree. We elaborate this fact by doing a simple complexity analysis of the most important operations in Section 3.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Time and Space Complexity Considerations</head><p>In this section we provide a formalization for the time and space complexity of the most important B+Hash Tree operations. Note that since we are talking about a disk-based index structure, the hard drive access times, as the slowest component, are likely to dominate in-memory operations. Hence, the time complexities of B+Tree operations are measured by the number of page reads. Given that the B+Tree only stores the actual data in the leaves (the inner nodes of the trees are "only" used to organize the index) and that the data for a single key typically fits into one page, the number of page reads for any simple lookup is solely dependent on the tree height. Assuming that we denote the order (i.e., the # of index elements per inner node) of the B+Tree as d and the number of entries as n, Comer <ref type="bibr" target="#b3">[4]</ref> elucidates the height of a B+Tree as:</p><formula xml:id="formula_0">height = log d (n)<label>(1)</label></formula><p>Time complexity: Given that most queries do not solely rely on simple key lookups, but actually retrieve multiple elements (e.g., find all objects for a subject) we also need to account for the number of leaf pages that need to be read.</p><p>Assuming that the values fit on s pages then the number of page reads for a query can be defined as:</p><formula xml:id="formula_1">Reads B+T ree = log d (n) + s<label>(2)</label></formula><p>Note that this formula assumes that the values are (i) stored on consecutive pages and (ii) that the leaf-pages are interlinked, which the B+Tree guarantees.</p><p>A logarithmic complexity is, obviously, excellent and has served the IT community well in many applications such as relational databases. If the key-space, however, grows enormously and the number of separate accesses for any given query is large -both of which are especially true for SPARQL queries -then even a logarithmic complexity may slow the execution down. The main rationale behind the B+Hash Tree is to cut down the time complexity of these reads using a Hash Map with its constant-time accesses. As a result, the complexity of a simple lookup is 2 -one access to retrieve the bucket hash entry and another one to access the leaf node. In addition, as before, when performing index range-scans we need to read all the pages which contain the data, resulting in:</p><formula xml:id="formula_2">Reads HashMap = 2 + s<label>(3)</label></formula><p>Consequently, using the Hash Map results in fewer disk page reads than the B+Tree, if the height of the B+Tree exceeds two levels. With Equations 2 and 3 we can calculate the number of disk page accesses for a set retrieval (range query). To estimate the total retrieval time we multiply the number of disk page reads with the average disk page access time of a common hard disk. However, in reality there are three different types of disk page reads: random read, sequential read, and cache read. Random disk page reads are the slowest kind, as the seek operation requires the HDA (Head Disk Assembly) to jump to another track. Sequential reads (i.e., reading some data from the same track) consists of waiting until the required disk page arrives under the HDA, which depends mainly on the rotational speed of the hard disk. The fastest form of access is the cache read, i.e. when a previously read page is found in the on-board disk cache and no mechanical action is required to retrieve the data.</p><p>In contrast to B+Trees -on-disk optimized data-structures enabling efficient (sequential) scans -Hash Map data lookup and retrieval is usually random, due to the lack of locality. Again, this is dependent on the actual hash implementation.</p><p>Inserting and deleting in the B+Hash Tree adds an additional level of complexity. Assuming that the Hash Map has a sufficient number of free buckets, then insert/delete operations in a B+Hash Tree add -in theory -just one more write operation over the B+Tree: the update of the Hash Map. However, if a leaf node in the B+Tree has to be split or merged during an insert, respectively delete operation, then the page offsets of the keys in the affected leaf nodes may have to be updated in the Hash Map. Depending on the B+Tree implementation, usually, only the keys where the prefix changes in the newly created or merged node, may need an update. Worst case, this is an O(n) operation where n denotes the number of keys with different prefixes in the affected node.</p><p>If the Hash Map is full, however, or there are too many collisions, then the HashMap needs to be reorganized (rehashed) resulting in a higher cost operation <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b8">9]</ref>. Space complexity: The space consumption of a B+Tree depends on the number of nodes. In practice the size of the node is chosen as such to match the size of a disk page (usually 4, 8 or 16 KB), therefore the space occupancy of a B+Tree is the number of inner nodes plus the number of data containing leaf nodes times the size of a disk page: </p><formula xml:id="formula_3">Size B+T</formula><p>where P age level denotes the number of nodes respectively disk pages on level l of the tree and P age leaf s denotes the number of data containing leaf-nodes.</p><p>The space consumption of a B+Hash Tree additionally adds the size of the Hash Map, which can be expressed by the number of chunks holding hash buckets. Chunks in turn, are typically sized to match a disk page. Consequently, the size of the Hash Map is:</p><formula xml:id="formula_5">Size HashMap = Size P age • #Chunks<label>(5)</label></formula><p>where #Chunks denotes the number of chunks needed for the Hash Map. Summarizing, we find that the B+Hash Tree provides a better complexity for reads compared to B+Trees. This advantage comes at the cost of additional space shown in Equation 5 and some time to maintain the Hash Map. We would argue that the cost in most cases is relatively small for Semantic Web applications. Addressing the former, we believe that given the price of disk space the additional space complexity for the Hash Map is negligible. Addressing the latter, it can be argued that, assuming a sufficiently large hash and a higher ratio of reads than writes/updates, the frequency of hash map reorganization operations can be limited to a few instances. Database space complexity: Consequently, given the multi-ordering multilevel index structure chosen, the total space consumption of a full database index (all possible index orderings for triples) is:</p><formula xml:id="formula_6">Size index = ORDS ord 2 lvl=0</formula><p>(Size ord,lvl (B + T ree) + Size ord,lvl (HashM ap)) <ref type="bibr" target="#b5">(6)</ref> where ord represents the current index type (i.e. SPO, OPS, etc. ∈ ORDS), while lvl denotes the current index level.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Limitations and Solutions</head><p>The overall hashing technique enhances the retrieval time of the B+Tree at the cost of space consumption as described previously. Considering empirical evidence such as Kryder's Law <ref type="bibr" target="#b12">[13]</ref>, space consumption entails a rapidly decreasing economical cost versus the high cost of query answering in today's DBMS (Database Management Systems), where real time or near real time response is often required. For high update rate scenarios, the extra overhead induced by the B+Hash Tree structure can be nullified by considering a parallel architecture where the traditional B+Tree part of the index would reside on one disk unit while the Hash Map would be served/updated on another disk. Furthermore, if the update cost of the Hash Tree is higher than that of the B+Tree at one time, one can still serve queries by reverting to the O(log(N )) worst case performance of the B+Tree with no time penalty versus traditional indexing approaches. Hence, in such a setup the worst-cost complexity of a B+Hash Tree is equal to a B+Tree (whenever the Hash Map is reorganized). In most cases (when the Hash Map is available), however, the retrieval complexity would be linear (as shown in equation 3).</p><p>If space is a hard constraint (e.g. in an embedded system) one can change the policy of updating all keys (overall hashing) to cached hashing, where similar to cache policies, only "popular" keys are stored. In general, if incremental updates are rare, then using the overall hashing approach is recommended.</p><p>System cache impact: As argued above, the Hash Map significantly improves access to leaf nodes in comparison to a "pure" B+Tree. Nonetheless, in reality any modern computing system employs a hierarchy of cache systems starting with the disk cache at the lowest level.</p><p>When considering the disk cache, there are four possible situations: (i) none of the structures are in cache -in this case the the B+Hash Tree will provide the highest performance as described previously, (ii) the Hash Map of the B+Hash Tree is in cache while the B+Tree is not -the B+Hash Tree will outperform the former, (iii) the B+Tree is cached and will gain the highest performance, and (iv) both structures are in cache, which is the most likely scenario.</p><p>Nowadays, the typical size of the disk cache varies between 8 and 64 MB. Performing a simple space consumption estimation for a B+Tree holding 30 million triples, the total number of inner nodes can be approximated to 0.5 million assuming a standard page size of 4KB. This results in an approximately 2 GB inner node index of the B+Tree. Given usual cache eviction approaches (such as Least Recently Used ) it is likely that only the higher inner node levels of such a B+Tree will be cached while the lower levels will mostly reside on disk. In this case, the B+Tree structure will have to read h − k pages from disk to reach the queried leaf page, where h is the height of the tree and k represents the number of levels in the cache. When the dataset is large enough then h − k &gt; 2 (where 2 is the lookup cost in the B+Hash Tree). In these cases, neglecting the OS filing system cache, the B+Hash Tree will outperform the traditional approach. Due to the growth of the Semantic Web we expect that this gap will grow.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Evaluation</head><p>To evaluate the performance of the B+Hash Tree compared to a B+Tree we created a prototype for both indices, which we used in conjunction with an in-memory simulation of the on-disk structures. The advantage of the disk simulation is that it can monitor actual disk page accesses regardless of wallclock-time confounding factors such as disk-cache and other operating system processes. The main disadvantage of this method is that all evaluations were constrained by the available memory (72GB). Hence, the largest dataset we could run contained 31 million triples.</p><p>Given that the B+Hash Tree is solely an index and not a full-fledged triple store, we used the query optimizer of a triple store called TokyoTyGR<ref type="foot" target="#foot_1">2</ref> to obtain the index access traces for a query and then ran the traces on the B+Hash Tree. Since we solely evaluate the index and not the overall triple store in this paper, we limited our measurements to the retrieval time of the B+Hash Tree (or B+Tree) index structure and not complete query answering such as parsing of the SPARQL query or selectivity estimations -these measures would have been the same for both index structures. Moreover, we do not monitor the additional sequential page reads for range scans, because as discussed in Equations 2 and 3, the number of additional page-accesses due to the scan is identical (i.e., s) for both considered approaches. Therefore we only track the differentiating parts of the equations, which excludes the scans. To maximize disk access performance we set the B+Tree page size to the size of the disk page size (4 KB).</p><p>Consequently, each result presented in Figures <ref type="figure" target="#fig_2">2a-f</ref> shows the number of disk page reads of a whole index access trace during a single query execution and not just a single lookup of an element in the index. All test simulations use the overall hashing technique.</p><p>For the traditional B+Tree approach, we discriminate between sequential and random disk page reads in the diagrams. In the case of the B+Hash Tree, we present only "all reads", as most reads from a Hash Map are random.</p><p>The space consumption was calculated by applying the formulas provided in our complexity analysis of the B+Hash Tree.</p><p>In the remainder of the section we first present the two datasets-the Berlin SPARQL Benchmark dataset and Yago-with their associated queries as well as performance measures, and then discuss the results and their limitations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">The Berlin SPARQL Benchmark Dataset</head><p>The first dataset, the Berlin SPARQL Benchmark<ref type="foot" target="#foot_2">3</ref> (BSBM), is a synthetic dataset. It simulates an enterprise setting, where a set of products are offered by different vendors and clients can review them <ref type="bibr" target="#b2">[3]</ref>. The dataset can be generated using the available data generator and the BSBM provides twelve distinct SPARQL queries. Some of these SPARQL queries contained "REGEX", "OFFSET", "UNION", "DESCRIBE", and other expressions, which TokyoTyGR does not support. Therefore, we selected a subset of 5 queries without these elements from which we further removed "FILTER"-expressions and "OPTIONALS", as they would be handled in the exact same way by both the B+Hash Tree and the B+Tree. The result are the 5 queries (denoted as Query 1 to 5), which we list in Appendix A in ascending query complexity (in terms of variables and triple-patterns).</p><p>To compare the performance of the index structures with increasingly larger dataset sizes, we created five different datasets ranging from 1 million to 31 million triples. The details of these datasets are shown in Table <ref type="table" target="#tab_1">1</ref>. Note that the number of unique predicates in all datasets is the same while the number of unique subjects and objects increases. The results of running the five queries on  <ref type="figure" target="#fig_2">2a</ref>-e. For every query and dataset, the B+Hash Tree is in each case faster (i.e., uses fewer disk reads) than the traditional B+Tree approach. Furthermore, the difference between the number of disk page reads for both structures increases with the size of the dataset. Therefore, for the 31 million triple dataset, the B+Hash Tree performs twice as fast as the B+Tree.</p><p>Figure <ref type="figure" target="#fig_2">2g</ref>) shows the disk space consumption of both data structures. As expected, the B+Hash Tree consumes more space. We believe, however, that for most applications, trading-in 20% of the space against a halved access-time is a worthy trade-off.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Yago Dataset</head><p>To complement the synthetic BSBM dataset with a real-world representative, we employed Yago<ref type="foot" target="#foot_3">4</ref> as a second dataset, which consists of facts extracted from Wikipedia. Again, Table <ref type="table" target="#tab_1">1</ref> shows the characteristics of the Yago dataset, which contains about 16 million distinct triples and almost 13 million resources.</p><p>The Yago dataset does not come with a defined set of queries. Therefore, we constructed three queries (numbered Queries 6-8; shown in Appendix A), which simulate a realistic information request such as "What actors play in American movies?" or "Which scientist is born in Switzerland?". In addition, we ensured that two queries (Q6 and Q7) have a low selectivity and, therefore, "touch" a lot of the data, and one query (Q8) is highly selective and is, therefore, expected to "touch" fewer disk pages.</p><p>To simplify the comparison, and as Yago has only one dataset size, we graphed all results in a single plot (Figure <ref type="figure" target="#fig_2">2f</ref>). Also, given the large number distribution, the plot employs a logarithmic scale on the x-axis.  Again, the B+Hash Tree outperforms the traditional B+Tree by accessing about half the pages. As expected, Query 8 reads fewer disk pages. It is note-worthy to observe that the performance improvement seems independent of the query's selectivity.</p><p>Figure <ref type="figure" target="#fig_2">2g</ref> again graphs the space consumption. Note that the higher space consumption of Yago compared to BSBM can be attributed to the number of distinct values for URIs and literals: while the 31 million BSBM dataset has 8 million distinct values, Yago's 16 million triples have 13 million distinct values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Discussion and Limitations</head><p>Our results confirm the theoretical analysis that the performance improvement of the B+Hash Tree compared with the normal B+Tree increases with the size of the dataset. To further illustrate the result, Figure <ref type="figure" target="#fig_3">3</ref> graphs the speed-up factor against the dataset size. Observably, the speed-up factor increases with the size of the number of triples inserted and therefore confirms Equations 2 and 3 in practice. Given that Equation 2 grows logarithmically and Equation <ref type="formula" target="#formula_2">3</ref>is constant (ignoring the scan element) we would expect the difference to grow logarithmically with dataset size.  We also investigated, if the complexity in terms of number of query variables (which varies from 2 to and triple patterns (which varies from 2 to 12) affects the speed performance. Visually, Figure <ref type="figure" target="#fig_2">2</ref> indicates no such difference. Indeed, a pairwise, two-tailed t-test confirmed that the speed-up between queries remains constant with a signifiance of far below 0.1%. This B+Hash Tree prototype consumes a considerable amount of space which can be traced to four main reasons. First, by storing all possible subset combinations of a triple we gain speed in query answering, as highlighted in the Hexastore project. Second, we set the size of a disk page to 4KB which entails B+Trees nodes, thus consuming more space. Third, we use 20 byte instead of 8 byte keys typically used in DBMS, as we wanted a global rather than a "table-local" key space. And last, we have not yet considered index compression, further reducing the consumed space while still maintaining access speed, as shown by Neumann <ref type="bibr" target="#b9">[10]</ref>.</p><p>Building the RDF index in the B+Hash Tree from scratch can increase the build time significantly compared to the B+Tree depending on the Hash Map implementation. A more thorough investigation of this issue is still open.</p><p>Limitations: We see three major limitations in our evaluation; not of our proposed approach. First, all our empirical calculations are based on an in-memory simulation of an on-disk B+Hash Tree structure. To mitigate this problem we ensured that our hard-disk model was as accurate as possible and we parameterized it with present-day hard disk parameters. In addition, we measured disk page accesses rather than wall-clock time, essentially focusing on the most time-intensive element of the queries. Consequently, we are confident that our findings generalize to the on-disk setting.</p><p>Second, our simulation disregards disk caches. Disk-caches in modern day operating systems are intricate structures that any on-disk index would share with other disk-accessing processes. This makes an adequate simulation a highlycomplex issue and may mislead evaluations in a real, on-disk setting. As discussed in Section 3.4, however, we would expect a B+Hash Tree to outperform a traditional B+Tree even in the presence of disk-caches.</p><p>Last, we used the TokyoTyGR RDF store to obtain index-access traces for each of the experimental queries. It could be argued that the results of our experiments are biased towards its query optimizer. Given that TokyoTyGR uses a selectivity-based optimizer <ref type="bibr" target="#b11">[12]</ref> like most other modern triple stores (such as Hexastore or RDF-3X) the danger of a systematic bias seems limited-especially since it seems unlikely that other query optimizations would lead to a vastly different access pattern between a B+Hash Tree and a B+Tree. Nonetheless, a completely different query optimization approach might require the re-evaluation of our results.</p><p>Note that even in light of these limitations, the use of a disk-simulation had several advantages: First, it allowed us to isolate the evaluation from confounding effects (e.g., by the operating system). Second, it allowed us to meticulously distinguish between different types of disk accesses-an undertaking that is nontrivial in a real on-disk structure. Nonetheless, a future on-disk evaluation will have to complement our current findings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and Future Work</head><p>In this paper we proposed the B+Hash Tree-a scalable, updatable, and lookup-optimized on-disk index-structure especially suitable to the Semantic Web with its large key-space. We showed that using a Hash Map to store the offsets of the leaf nodes successfully trades a slight increase in database size against significantly reduced retrieval time. When used in the context of existing index approaches such as Hexastore and RDF-3X, this will allow for effective retrieval of all possible triple patterns.</p><p>To evaluate the B+Hash Tree empirically, we benchmarked the number of page reads (and hence indirectly retrieval time) using two well-established Semantic Web test datasets. As the results show, the B+Hash Tree consistently requires approximately half the page reads of a B+Tree. Note that this difference is expected to grow with the logarithm of the dataset size.</p><p>The current implementation of the B+Hash Tree was only used in the simulated measurements. We, therefore, intend to implement a fully operational</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Architecture of the B+Hash Tree: spo index level 1</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>Space Consumption key for diagram a) -f) f) Yago (logarithmic scale) Number of Disk Page Reads Dataset Size (Mio) Dataset Size (Mio) Number of Disk Page Reads Number of Disk Page Reads Number of Disk Page Reads Number of Disk Page Reads Number of Disk Page Reads (log scale)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Number of disk reads for the queries and disk space consumption</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Speed Up Factor vs. Dataset Size (error bars show results for different queries)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>ree = Size P age (</figDesc><table><row><cell>h</cell></row><row><cell>P age level + P age leaf s )</cell></row><row><cell>level=0</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 .</head><label>1</label><figDesc>Number of triples, unique subjects, predicates, and objects in the datasets the different dataset sizes are shown in Figure</figDesc><table><row><cell cols="2">Dataset # Triples</cell><cell>S P</cell><cell>O</cell></row><row><cell></cell><cell>1,075,597</cell><cell cols="2">99,527 40 224,032</cell></row><row><cell></cell><cell cols="3">2,160,331 200,096 40 443,753</cell></row><row><cell>BSBM</cell><cell cols="3">4,969,590 452,507 40 966,120</cell></row><row><cell></cell><cell cols="3">10,632,426 966,013 40 1,979,668</cell></row><row><cell></cell><cell cols="3">31,741,096 2,863,525 40 5,297,862</cell></row><row><cell cols="4">Yago 16,348,563 4,339,591 91 8,628,329</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.w3.org/RDF/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">The TokyoTyGR is an extension of the Hexastore<ref type="bibr" target="#b14">[15]</ref> triple store. It can easily accommodate inserts/deletes and has a state of the art query optimizer based on selectivity estimation techniques.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">http://www.mpi-inf.mpg.de/yago-naga/yago/n3.zip</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgements This work was partially supported by the Swiss National Science Foundation under contract number 200021-118000.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>disk-based version of the index and evaluate it with several "truly" large datasets. In this context we also want to investigate the interaction between the B+Hash Tree and the disk-cache. Last but not least, we intend to consider the use of index-compression to develop even more efficient index structures.</p><p>Research in index structures has come a long way, from the early days of simple re-use of traditional, relational, row-based data base indices to the construction of specialized structures such as Hexastore and RDF-3X. We believe that the B+Hash Tree provides a new quality to this exploration. It does not smartly reuse existing structures like its predecessors but investigates a Semantic Web data specific algorithmic extension. As such it calls for the exploration of index structures that exploit the structural and statistical idiosyncrasies of Semantic Web data. The result of this exploration should be truly web-scalable triple stores, which lie at the very foundation of the Semantic Web vision, and the B+Hash Tree can provide a major building block towards that foundation.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">SW-Store: a vertically partitioned DBMS for semantic web data management</title>
		<author>
			<persName><forename type="first">D</forename><surname>Abadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Marcus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Madden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hollenbach</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The VLDB Journal</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="385" to="406" />
			<date type="published" when="2009-04">Apr. 2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Matrix &quot;Bit&quot; loaded: a scalable lightweight join query processor for RDF data</title>
		<author>
			<persName><forename type="first">M</forename><surname>Atre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Chaoji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Zaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Hendler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 19th international Conference on World Wide Web</title>
				<meeting>the 19th international Conference on World Wide Web<address><addrLine>Raleigh, North Carolina, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="41" to="50" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The berlin sparql benchmark</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Schultz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal on Sematnic Web and Information Systems</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="1" to="24" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">The ubiquitous b-tree</title>
		<author>
			<persName><forename type="first">D</forename><surname>Comer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="121" to="137" />
			<date type="published" when="1979">1979</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Yars2: a federated repository for querying graph structured data from the web</title>
		<author>
			<persName><forename type="first">A</forename><surname>Harth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Umbrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hogan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Decker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISWC&apos;07/ASWC&apos;07: Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference</title>
				<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="211" to="224" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Hashing and rehashing in emulated shared memory</title>
		<author>
			<persName><forename type="first">J</forename><surname>Keller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd Workshop on Parallel Algorithms</title>
				<meeting>the 3rd Workshop on Parallel Algorithms</meeting>
		<imprint>
			<publisher>WOPA</publisher>
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A study of index structures for main memory database management systems</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">J</forename><surname>Lehman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Carey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th International Conference on Very Large Data Bases</title>
				<meeting>the 12th International Conference on Very Large Data Bases</meeting>
		<imprint>
			<publisher>Morgan Kaufmann Publishers Inc</publisher>
			<date type="published" when="1986">1986</date>
			<biblScope unit="page" from="294" to="303" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Database architecture evolution: mammals flourished long before dinosaurs became extinct</title>
		<author>
			<persName><forename type="first">S</forename><surname>Manegold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Kersten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Boncz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proc. VLDB Endow</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="1648" to="1653" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Hash table methods</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">D</forename><surname>Maurer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">G</forename><surname>Lewis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Comput. Surv</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="5" to="19" />
			<date type="published" when="1975">1975</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">RDF-3X: a RISC-style engine for RDF</title>
		<author>
			<persName><forename type="first">T</forename><surname>Neumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proc. VLDB Endow</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="647" to="659" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Hybrid-TH: a hybrid access mechanism for Real-Time Main-Memory resident database systems</title>
		<author>
			<persName><forename type="first">C</forename><surname>Ryu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Jun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th International Conference on Real-Time Computing Systems and Applications</title>
				<meeting>the 5th International Conference on Real-Time Computing Systems and Applications</meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page">303</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">SPARQL basic graph pattern optimization using selectivity estimation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Stocker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Seaborne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bernstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Kiefer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Reynolds</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceeding of the 17th International World Wide Web Conference</title>
				<meeting>eeding of the 17th International World Wide Web Conference<address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="595" to="604" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Kryder&apos;s law</title>
		<author>
			<persName><forename type="first">C</forename><surname>Walter</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2005-08">Aug. 2005</date>
			<publisher>Scientific American</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">On-disk storage techniques for semantic web dataare B-Trees always the optimal solution?</title>
		<author>
			<persName><forename type="first">C</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bernstein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems</title>
				<meeting>the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems</meeting>
		<imprint>
			<date type="published" when="2009-10">Oct. 2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Hexastore: sextuple indexing for semantic web data management</title>
		<author>
			<persName><forename type="first">C</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Karras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bernstein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. VLDB Endow</title>
				<meeting>VLDB Endow</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1008" to="1019" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Jena property table implementation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Wilkinson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceeding of the Second International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS)</title>
				<meeting>eeding of the Second International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS)</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Query1: SELECT ?product ?label WHERE { ?product &lt;label&gt; ?label; &lt;type&gt; &lt;Product&gt;; &lt;productFeature&gt; &lt;ProductFeature50&gt;; &lt;productFeature&gt; &lt;ProductFeature580&gt;; &lt;productPropertyNumeric1&gt;</title>
		<author>
			<persName><forename type="first">K</forename><surname>Wilkinson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sayers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kuno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Reynolds</surname></persName>
		</author>
		<idno>LIMIT 10</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First International Workshop on Semantic Web and Databases (SWDB)</title>
				<meeting>the First International Workshop on Semantic Web and Databases (SWDB)</meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="7" to="8" />
		</imprint>
	</monogr>
	<note>Efficient RDF storage and retrieval in jena2. ?value1</note>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<ptr target="?vendor;&lt;publisher&gt;?vendor.?vendor&lt;country&gt;&lt;US&gt;.?offer&lt;deliveryDays&gt;?deliveryDays;&lt;price&gt;?price" />
		<title level="m">Query2: SELECT ?offer ?price WHERE { ?offer</title>
				<imprint>
			<biblScope unit="page">10</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<ptr target="?reviewer&lt;name&gt;?reviewerName" />
		<title level="m">Query3: SELECT ?title ?text ?reviewDate ?</title>
				<imprint>
			<biblScope unit="page">20</biblScope>
		</imprint>
	</monogr>
	<note>reviewer ?reviewerName WHERE { ?review &lt;reviewFor&gt; &lt;Product197&gt;; &lt;title&gt; ?title; &lt;text&gt; ?text; &lt;reviewDate&gt; ?reviewDate; &lt;reviewer&gt; ?reviewer</note>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<ptr target="&lt;label&gt;?productLabel.&lt;Product613&gt;&lt;productFeature&gt;?prodFeature.?product&lt;productFeature&gt;?prodFeature.&lt;Product613&gt;&lt;productPropertyNumeric1&gt;?origProperty1.?product&lt;productPropertyNumeric1&gt;?simProperty1.&lt;Product613&gt;" />
		<title level="m">&lt;productPropertyNumeric2&gt; ?origProperty2 . ?product &lt;productPropertyNumeric2&gt; ?simProperty2</title>
				<imprint/>
	</monogr>
	<note>Query4: SELECT ?product ?productLabel WHERE { ?product</note>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<ptr target="?person&lt;hasWonPrize&gt;?price" />
		<title level="m">productFeature . &lt;Product2227&gt; &lt;productPropertyTextual1&gt; ?propTextual1; &lt;productPropertyTextual2&gt; ?propTextual2; &lt;productPropertyTextual3&gt; ?propTextual3; &lt;productPropertyNumeric1&gt; ?propNumeric1; &lt;productPropertyNumeric2&gt; ?propNumeric2</title>
				<imprint/>
	</monogr>
	<note>Query7: SELECT. ; &lt;bornIn&gt; ?city . ?city &lt;locatedIn&gt; &lt;Switzerland&gt; . } Query8: SELECT ?person WHERE { ?person &lt;graduatedFrom&gt; &lt;University_of_Zurich&gt;</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
