<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Beazley: a New Storage Systems Evaluation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Mikalai</forename><surname>Yatskevich</surname></persName>
							<email>mikalai.yatskevich@comlab.ox.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="institution">Oxford University Computing Laboratory</orgName>
								<address>
									<addrLine>Wolfson Building, Parks Road</addrLine>
									<postCode>OX1 3QD</postCode>
									<settlement>Oxford</settlement>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ian</forename><surname>Horrocks</surname></persName>
							<email>ian.horrocks@comlab.ox.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="institution">Oxford University Computing Laboratory</orgName>
								<address>
									<addrLine>Wolfson Building, Parks Road</addrLine>
									<postCode>OX1 3QD</postCode>
									<settlement>Oxford</settlement>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Graham</forename><surname>Klyne</surname></persName>
							<email>graham.klyne@zoo.ox.ac.uk</email>
							<affiliation key="aff1">
								<orgName type="department">Zoology Department</orgName>
								<orgName type="institution">Oxford University</orgName>
								<address>
									<addrLine>South Parks Road</addrLine>
									<postCode>OX1 3PS</postCode>
									<settlement>Oxford</settlement>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Beazley: a New Storage Systems Evaluation</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">30C108D67FB02E6FC9C7B290A5A0200E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:50+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Evaluation is a major issue in the development of systems, sometimes as important as the implementation of a system itself. In the Semantic Web area, and especially in the area of the storage systems that provide a persistence layer for ontologies and instance data, evaluation efforts have been intermittent and area specific. In this paper we propose a new dataset for storage systems evaluation called Beazley dataset. The complete dataset version includes more than 16 millions of triples and 35 queries. We evaluate dataset exploiting several storage models of the state of the art storage systems.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Evaluation is a systematic assessment of system properties against a set of predefined criteria. Evaluation is a major issue in the development of systems, sometimes even as important as the implementation of a system itself. It has been shown in the past that performance evaluation can help implementers to better understand the sources of intractability and/or inefficiency in their systems, and to propose novel optimization techniques in an effort to make their systems more scalable in specific application scenarios.</p><p>Storage systems often called RDF stores provide a persistence layer for ontologies and instance data. They provide basic reasoning services such as computing transitive closure of the subsumption hierarchies. Storage systems differ from description logics (DL) reasoners that provide more complex reasoning services but do not provide storage facilities. The main inference services in the DL reasoners can be performed as conceptual satisfiability. For RDF stores, the main inference service is query answering.</p><p>In the Semantic Web area, and especially in the area of storage systems, evaluation efforts have been intermittent and area specific. There is no agreed standard or methodology for systems evaluation. In the evaluation of the DL systems artificially generated datasets served an important role <ref type="bibr" target="#b12">[13]</ref> until a large number of the real-world ontologies has been developed. This real-world ontologies have been used in the large scale evaluation efforts <ref type="bibr" target="#b8">[9]</ref>. The evaluation of the storage systems are focused on the artificially generated datasets <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b9">10]</ref>. Thus, the evaluation of the storage systems will benefit from the real-world datasets that will overcome the limitations of the state of the art generation methods. The most common instance generator and evaluation suite that is used by the Semantic Web community for storage systems evaluation is the Lehigh University Benchmark (LUBM) <ref type="bibr" target="#b11">[12]</ref>. Although, LUBM is used mainly for testing instance retrieval and query answering algorithms, it also has many shortcomings. First of all the ALEHI R+ DL used is significantly less expressive than the DL underpinning OWL. Moreover, the data that are created for each university are completely independent. Consequently, if one applies a clustering method during loading it is possible to apply query answering over each university independently.</p><p>In this paper we propose a new dataset for storage systems evaluation called Beazley dataset. The dataset comprises real world archeological data gathered in CLAROS initiative <ref type="bibr" target="#b14">[15]</ref> and a set of queries used in CLAROS web cite application. The dataset presents the information about archeological artifacts. It instantiates CIDOC-CRM OWL DL ontology <ref type="bibr" target="#b7">[8]</ref>. The complete dataset version includes more than 16 millions of triples and 35 queries. We evaluate the dataset exploiting both memory and disk-based storage models of the two state of the art storage systems.</p><p>The paper is structured as follows. Section 2 provides a brief introduction to the storage systems along with the datasets used for their evaluation. Section 3 provides a detailed description of the Beazley dataset. Section 4 provides a detailed description of the dataset evaluation set up. Section 5 describes the evaluation results. Section 6 concludes the paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>The majority of the evaluation efforts in the storage systems area were focused on artificially generated datasets. They provide a mechanism to cover a class of inputs in a scalable manner. The most prominent example is the Lehigh University Benchmark (LUBM) <ref type="bibr" target="#b11">[12]</ref>. LUBM consist of a small ontology, with 43 classes, 25 roles, 85 TBox axioms and 8 RBox axioms, and several Java classes that can be used to create instance assertions (ABox) for this specific TBox and RBox. The ontology describes universities, i.e. courses, students, departments, publications as well as their interrelations. For example, a student is enrolled in some courses that is taught by some academic staff, while academic staffs are associated with publications, are affiliated with other universities, lead research teams or are heads of departments. The DL of LUBM is ALEHI R+ , nevertheless it does not make heavy use of the constructors since there is just one transitive role, 5 sub-role axioms and 2 inverse role axioms. The ABox is created following the method described in <ref type="bibr" target="#b3">[4]</ref>. Finally, the benchmarking suite also comes with 14 queries that are proposed for testing a system against the generated ABoxes.</p><p>Although, LUBM is used mainly for testing instance retrieval and query answering algorithms, it also has many shortcomings. First of all the DL used is significantly less expressive than the DL underpinning OWL. Moreover, the data that are created for each university are completely independent. Consequently, if one applies a clustering method during loading it is possible to apply query answering over each university independently. An extension of LUBM to remedy these problems was the University Ontology Benchmark (UOBM) <ref type="bibr" target="#b16">[17]</ref>. UOBM extends LUBM by adding more concepts and roles that are intended to connect individuals from different universities. Although UOBM is still not large when compared to ontologies such as NCI <ref type="bibr" target="#b10">[11]</ref> or GALEN <ref type="bibr" target="#b20">[21]</ref>, it uses a relatively expressive ontology language SHIN (D). Finally, a set of test queries is also offered. Unfortunately, although UOBM does indeed make use of more complex constructors and is more structurally complex it has not been widely accepted by the Semantic Web community.</p><p>The Berlin SPARQL benchmark <ref type="bibr" target="#b4">[5]</ref> focus on integration and visualization data from various data sources. It is build around scenario that does not require heavyweight reasoning. The class hierarchy is generated in random way. The query mix includes 25 queries that represent navigation pattern in e-commerce use case. SP2Bench <ref type="bibr" target="#b21">[22]</ref> uses DBLP <ref type="bibr" target="#b15">[16]</ref> bibliographic scenario. The ontology used have 9 classes and 77 properties. The query mix includes 11 queries utilizing various SPARQL language constructs. The Billion Triple Challenge <ref type="bibr" target="#b2">[3]</ref> aims at the evaluation of the Semantic Web applications to process a large quantities of the RDF data that is represented by various schemata.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">The Beazley dataset</head><p>The Beazley dataset <ref type="bibr" target="#b14">[15]</ref> presents the information about archeological artifacts. The RDF data instantiates CIDOC-CRM ontology <ref type="bibr" target="#b7">[8]</ref>. The complete dataset version includes more than 16 millions of triples. The frequency of the triples in the dataset depending on predicate values f p = f req(D, p) is depicted on Figure <ref type="figure" target="#fig_0">1a</ref>. The frequency of the triples with a given subject value frequency f sn = f req(D, s n ), depicted on Figure <ref type="figure" target="#fig_0">1b</ref>, varies depending on a predicate value p. The f sn = f req(D, s n , is represented) and f sn = f req(D, s n , is ref f ered) are depicted on Figures <ref type="figure" target="#fig_0">1c, 2a</ref>. This makes Beazley archive dataset different from RDF datasets produced using automatic generation procedures <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b16">17]</ref>. In these works the uniform distributions and, hence, frequencies are assumed. The frequency of the triples with a given object value frequency f on = f req(D, o n ), depicted on Figure <ref type="figure">2b</ref>, varies depending on a predicate value p and a subject value frequency s n . The f on = f req(D, o n , s n , has time span), f on ∈ [1, 30] is depicted on Figure <ref type="figure">2c</ref>. The f on = f req(D, o n , took place), f on = f req(D, o n , not af ter) are depicted on Figures <ref type="figure">3a, 3b</ref>. The query set used in CLAROS web cite application <ref type="bibr" target="#b14">[15]</ref> composed from 35 queries of various size and complexity. The different queries are executed different number of times during the web application life cycle. They could be classified into two large groups. The first query group QG1 comprises Q1-Q18 presented at Table <ref type="table" target="#tab_0">1</ref>. The queries from QG1 are executed at </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">The evaluation set up</head><p>We evaluated dataset using both disk and memory-based storage models of the two state of the art storage systems: Jena TDB, Jena ARQ, Sesame-memory, Sesame-native. Jena <ref type="bibr" target="#b17">[18]</ref> is a Java framework for building semantic web applications. It includes OWL <ref type="bibr" target="#b18">[19]</ref> and RDF <ref type="bibr" target="#b13">[14]</ref> API, in memory and persistent storage models, SPARQL <ref type="bibr" target="#b19">[20]</ref> query engine. ARQ <ref type="bibr" target="#b0">[1]</ref> is a general purpose query engine, supporting SPARQL and other query languages, that can utilize several Jena storage models. In our experiments we used ARQ in memory storage model. TDB <ref type="bibr" target="#b1">[2]</ref> is a high-performance native storage engine that exploits custom indexing strategy. Sesame <ref type="bibr" target="#b6">[7]</ref> is an open source Java framework for storage and querying RDF data. Sesame supports SPARQL and SERQL <ref type="bibr" target="#b5">[6]</ref> query languages, memory-based and disk-based storage. We evaluated the systems exploiting their user interfaces.</p><p>The evaluation has been performed on AMD Phenom II 2600 Mhz Processor with 8Gb main memory installed.</p><p>The data used in the evaluation included Beazley dataset with 16 millions of triples and its reduced version with 10 millions of triples. The dataset loading time, query execution time and total query set times were measured. The query per hour and second per query measures were calculated given that each query in QG10 is executed 10 times while each query in QG1 is executed once. This setting allowed to represent the CLAROS application query mix.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">The system performance</head><p>The data loading times are presented in Table <ref type="table" target="#tab_1">2</ref>. The memory based models ARQ and Sesame-memory were not able to load the complete Beazley dataset. There was insufficient memory for ARQ. The loading into Sesame-memory were terminated after 5 days. The reduced Beazley dataset version was loaded less than in 1 hour all the systems. The Jena memory and storage based models were more efficient in the data loading then the Sesame models.</p><p>The query answering times and the other query performance measures are presented in Table <ref type="table" target="#tab_2">3</ref>.</p><p>The systems showed performance ranged from 1 millisecond to 74.8 hours per query. The native Sesame storage model was more than 5 times more efficient than its memory storage model. The ARQ was 2 orders of magnitude less efficient than TDB. It was more then order of magnitude more efficient than TDB given that query Q33 was excluded from the query set. This query took ARQ 74.8 hours to execute. Thus, it influenced on the total result. The other 34 queries were completed in 270 seconds. None of the systems was able to execute the complete query mix on the complete dataset in less that 10 minutes what makes the CLAROS application and, therefore, Beazley dataset challenging for state of the art storage systems.</p><p>The quality of the query answering results is affected by quality of the original data input. Making improvements to the incoming data (which are obtained by extraction from existing databases) is an ongoing activity, which the Beazley Archive team are addressing by (a) improving the data extraction processes, (b) by applying heuristics to clean up some of the data values (e.g. dates), (c) highlighting inconsistencies that are detected by the extraction processes and passing these back to the data originators for correction, and (d) use of thesauri and authority lists to map terminology variations to common terms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>The query set tested in the paper was used in an initial development of the CLAROS application. Naively constructed, it was designed mainly to provide functionality rather than performance. The new version of the CLAROS application will include an updated query set designed with partners from the Jena team to identify bottlenecks and improve the queries. The goal of this efforts is to redesign queries to achieve sub-second response times. The strategies for the dataset improvement are (1) pre-calculation of certain path queries to reduce run-time joins (roughly equivalent to "materialized views" in relational data), and (2) use of additional indexes associated with "virtual properties" that can reduce the need for in-memory sorting of results when processing SPARQL queries (analogous to schema-defined indexes in relational databases). Essentially, 4 techniques have been used:</p><p>1. reordering of queries so that more selective selective elements are evaluated earlier (this can also be performed automatically by the ARQ query processor in Jena); 2. "materialization" of property paths and UNIONS in queries -adding "short cut" properties to the triple store, and use these properties in queries; 3. customized indexes for finding earliest-and latest-occurrences of a given object type, and also for providing consistent ordering in other keywordbased object access queries. These new indexes are not Lucene-based, as originally intended, as Lucene handing of result sorting is less scalable than had been anticipated. Instead, a simple arrangement of flat files named by keywords, with contents sorted by the ordering key is used; 4. pre-calculation of object counts by various categories, so that counting queries can run without having to access every matching object.</p><p>Our hope is that this kind of ad-hoc optimization work can suggest ways forward for more principled ontology-based optimization of triple store access. We intend that this revised system will be the basis of a public version of the CLAROS application developed by academic groups who are focused on application of the technologies rather than technology research.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. a) The frequencies of predicates; b) The frequencies of the triples with a given subject value frequency; c) The frequencies of the triples with a given subject value frequency for predicate = is represented</figDesc><graphic coords="4,134.77,137.77,345.83,458.63" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="6,134.77,216.28,345.83,301.62" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>The Beazley query set.</figDesc><table><row><cell></cell><cell cols="7">Variables Joins Text search Ordering Comparisons OPTIONAL BOUND</cell></row><row><cell>Q1</cell><cell>2</cell><cell>0</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q2</cell><cell>11</cell><cell>15</cell><cell>X</cell><cell>X</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q3</cell><cell>11</cell><cell>15</cell><cell>X</cell><cell>X</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q4</cell><cell>9</cell><cell>14</cell><cell>X</cell><cell>X</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q5</cell><cell>6</cell><cell>7</cell><cell>X</cell><cell>X</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q6</cell><cell>17</cell><cell>10</cell><cell>X</cell><cell>X</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q7</cell><cell>1</cell><cell>2</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q8</cell><cell>1</cell><cell>2</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q9</cell><cell>5</cell><cell>2</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q10</cell><cell>9</cell><cell>14</cell><cell>2X</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q11</cell><cell>20</cell><cell>31</cell><cell>2X</cell><cell>X</cell><cell></cell><cell>X</cell><cell></cell></row><row><cell>Q12</cell><cell>11</cell><cell>19</cell><cell>2X</cell><cell>X</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q13</cell><cell>7</cell><cell>12</cell><cell>X</cell><cell>X</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q14</cell><cell>6</cell><cell>11</cell><cell>X</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q15</cell><cell>18</cell><cell>32</cell><cell>X</cell><cell>X</cell><cell></cell><cell>X</cell><cell></cell></row><row><cell>Q16</cell><cell>11</cell><cell>20</cell><cell>X</cell><cell>X</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q17</cell><cell>11</cell><cell>20</cell><cell>2X</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q18</cell><cell>19</cell><cell>32</cell><cell>2X</cell><cell>X</cell><cell></cell><cell>X</cell><cell></cell></row><row><cell>Q19</cell><cell>10</cell><cell>14</cell><cell>X</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q20</cell><cell>14</cell><cell>22</cell><cell>X</cell><cell></cell><cell>2X</cell><cell></cell><cell></cell></row><row><cell>Q21</cell><cell>20</cell><cell>32</cell><cell>X</cell><cell>X</cell><cell>2X</cell><cell></cell><cell></cell></row><row><cell>Q22</cell><cell>12</cell><cell>19</cell><cell>X</cell><cell></cell><cell>2X</cell><cell></cell><cell></cell></row><row><cell>Q23</cell><cell>13</cell><cell>22</cell><cell>X</cell><cell></cell><cell>2X</cell><cell></cell><cell></cell></row><row><cell>Q24</cell><cell>21</cell><cell>32</cell><cell>2X</cell><cell>X</cell><cell>2X</cell><cell></cell><cell></cell></row><row><cell>Q25</cell><cell>17</cell><cell>28</cell><cell>2X</cell><cell></cell><cell>2X</cell><cell></cell><cell></cell></row><row><cell>Q26</cell><cell>12</cell><cell>20</cell><cell>X</cell><cell></cell><cell>2X</cell><cell></cell><cell></cell></row><row><cell>Q27</cell><cell>8</cell><cell>12</cell><cell></cell><cell></cell><cell>2X</cell><cell></cell><cell></cell></row><row><cell>Q28</cell><cell>14</cell><cell>24</cell><cell></cell><cell>X</cell><cell>2X</cell><cell></cell><cell></cell></row><row><cell>Q29</cell><cell>11</cell><cell>20</cell><cell></cell><cell></cell><cell>2X</cell><cell></cell><cell></cell></row><row><cell>Q30</cell><cell>11</cell><cell>19</cell><cell>X</cell><cell></cell><cell>2X</cell><cell></cell><cell></cell></row><row><cell>Q31</cell><cell>18</cell><cell>32</cell><cell>X</cell><cell>X</cell><cell>2X</cell><cell></cell><cell></cell></row><row><cell>Q32</cell><cell>16</cell><cell>28</cell><cell>X</cell><cell>X</cell><cell>2X</cell><cell></cell><cell></cell></row><row><cell>Q33</cell><cell>11</cell><cell>20</cell><cell>X</cell><cell></cell><cell>2X</cell><cell></cell><cell>2X</cell></row><row><cell>Q34</cell><cell>15</cell><cell>25</cell><cell>X</cell><cell>X</cell><cell>2X</cell><cell>X</cell><cell>2X</cell></row><row><cell>Q35</cell><cell>13</cell><cell>23</cell><cell>X</cell><cell></cell><cell>2X</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Loading times, seconds.</figDesc><table><row><cell cols="3">ARQ TDB Sesame-native Sesame-memory</cell></row><row><cell>Beazley-16Mt N/A 1445.68</cell><cell>1878.43</cell><cell>N/A</cell></row><row><cell>Beazley-10Mt 360.19 434.07</cell><cell>1087.14</cell><cell>2991.44</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>Query answering times and aggregate performance measures on Beazley dataset</figDesc><table><row><cell></cell><cell cols="2">Query answering times-16Mt,s</cell><cell cols="3">Query answering times-10Mt,s</cell><cell></cell></row><row><cell></cell><cell>Sesame-native</cell><cell>TDB</cell><cell cols="4">Sesame-memory ARQ Sesame-native TDB</cell></row><row><cell>Q1</cell><cell>0.03</cell><cell>0.05</cell><cell>0.02</cell><cell>0.006</cell><cell>0.03</cell><cell>0.12</cell></row><row><cell>Q2</cell><cell>39.82</cell><cell>119.14</cell><cell>210.34</cell><cell>5.96</cell><cell>38.29</cell><cell>171.7</cell></row><row><cell>Q3</cell><cell>38.39</cell><cell>125.16</cell><cell>217.98</cell><cell>4.93</cell><cell>37.33</cell><cell>115.91</cell></row><row><cell>Q4</cell><cell>26.01</cell><cell>94.86</cell><cell>137.1</cell><cell>26.89</cell><cell>22.2</cell><cell>89.25</cell></row><row><cell>Q5</cell><cell>39.9</cell><cell>97.32</cell><cell>128.05</cell><cell>5.11</cell><cell>38.62</cell><cell>90.35</cell></row><row><cell>Q6</cell><cell>57.04</cell><cell>139.98</cell><cell>343.98</cell><cell>25.95</cell><cell>56.24</cell><cell>135.24</cell></row><row><cell>Q7</cell><cell>0.001</cell><cell>0.09</cell><cell>0.001</cell><cell>0.01</cell><cell>0.001</cell><cell>0.08</cell></row><row><cell>Q8</cell><cell>0.002</cell><cell>0.09</cell><cell>0.001</cell><cell>0.01</cell><cell>0.001</cell><cell>0.08</cell></row><row><cell>Q9</cell><cell>19.66</cell><cell>70.91</cell><cell>39.02</cell><cell>37.22</cell><cell>11.2</cell><cell>38.52</cell></row><row><cell>Q10</cell><cell>16.34</cell><cell>91.5</cell><cell>58.87</cell><cell>25.2</cell><cell>13.08</cell><cell>88.78</cell></row><row><cell>Q11</cell><cell>21.57</cell><cell>186.85</cell><cell>93.36</cell><cell>4.22</cell><cell>18.3</cell><cell>197.23</cell></row><row><cell>Q12</cell><cell>19.52</cell><cell>105.94</cell><cell>80.26</cell><cell>1.24</cell><cell>16.01</cell><cell>102.27</cell></row><row><cell>Q13</cell><cell>18.15</cell><cell>6.99</cell><cell>35.63</cell><cell>1.23</cell><cell>8.39</cell><cell>3.36</cell></row><row><cell>Q14</cell><cell>49.66</cell><cell>111.32</cell><cell>209.43</cell><cell>7.17</cell><cell>48.31</cell><cell>110.1</cell></row><row><cell>Q15</cell><cell>33.7</cell><cell>105.46</cell><cell>222.81</cell><cell>6.32</cell><cell>29.69</cell><cell>99.14</cell></row><row><cell>Q16</cell><cell>31.48</cell><cell>122.49</cell><cell>194.79</cell><cell>5.73</cell><cell>27.13</cell><cell>119.2</cell></row><row><cell>Q17</cell><cell>19.69</cell><cell>106.55</cell><cell>81.16</cell><cell>5.84</cell><cell>15.47</cell><cell>99.42</cell></row><row><cell>Q18</cell><cell>22.63</cell><cell>173.64</cell><cell>106.07</cell><cell>5.53</cell><cell>18.52</cell><cell>168.48</cell></row><row><cell>Q19</cell><cell>32</cell><cell>114</cell><cell>140.64</cell><cell>5.36</cell><cell>31.06</cell><cell>112.16</cell></row><row><cell>Q20</cell><cell>25.99</cell><cell>122.97</cell><cell>125.51</cell><cell>5.72</cell><cell>21.55</cell><cell>116.97</cell></row><row><cell>Q21</cell><cell>31.61</cell><cell>147.94</cell><cell>182.62</cell><cell>25.64</cell><cell>26.75</cell><cell>149.24</cell></row><row><cell>Q22</cell><cell>35.99</cell><cell>133.69</cell><cell>206.71</cell><cell>5.53</cell><cell>38.02</cell><cell>126.08</cell></row><row><cell>Q23</cell><cell>0.43</cell><cell>109.18</cell><cell>0.12</cell><cell>5.37</cell><cell>0.4</cell><cell>103.41</cell></row><row><cell>Q24</cell><cell>25.71</cell><cell>154.82</cell><cell>116.25</cell><cell>5.09</cell><cell>20.7</cell><cell>142.33</cell></row><row><cell>Q25</cell><cell>24.47</cell><cell>139.02</cell><cell>114.79</cell><cell>5.13</cell><cell>18.31</cell><cell>132.38</cell></row><row><cell>Q26</cell><cell>18.21</cell><cell>5.75</cell><cell>57.63</cell><cell>9.56</cell><cell>12.31</cell><cell>5.91</cell></row><row><cell>Q27</cell><cell>13.77</cell><cell>5.56</cell><cell>46.6</cell><cell>7.48</cell><cell>13.41</cell><cell>5.04</cell></row><row><cell>Q28</cell><cell>20.51</cell><cell>12.46</cell><cell>90.98</cell><cell>5.39</cell><cell>14.83</cell><cell>11.92</cell></row><row><cell>Q29</cell><cell>19.1</cell><cell>6.38</cell><cell>69.27</cell><cell>5.01</cell><cell>12.66</cell><cell>5.73</cell></row><row><cell>Q30</cell><cell>36.75</cell><cell>131.25</cell><cell>207.13</cell><cell>4.78</cell><cell>38.63</cell><cell>123.26</cell></row><row><cell>Q31</cell><cell>32.77</cell><cell>148.84</cell><cell>193.96</cell><cell>5.65</cell><cell>27.18</cell><cell>142.03</cell></row><row><cell>Q32</cell><cell>30.22</cell><cell>135.05</cell><cell>162.23</cell><cell>4.76</cell><cell>24.15</cell><cell>137.57</cell></row><row><cell>Q33</cell><cell>2.39</cell><cell>8.52</cell><cell>0.005</cell><cell>74.8h</cell><cell>0.007</cell><cell>2.41</cell></row><row><cell>Q34</cell><cell>3.33</cell><cell>12.91</cell><cell>0.005</cell><cell>4.95</cell><cell>0.008</cell><cell>4.86</cell></row><row><cell>Q35</cell><cell>0.48</cell><cell>111.27</cell><cell>0.12</cell><cell>5.01</cell><cell>0.4</cell><cell>100.23</cell></row><row><cell>Total</cell><cell>13.45m</cell><cell>52.63m</cell><cell>64.55m</cell><cell>74.87h</cell><cell>11.65m</cell><cell>50.84m</cell></row><row><cell>QpH</cell><cell>169.54</cell><cell>40.63</cell><cell>35.05</cell><cell>0.25</cell><cell>198.86</cell><cell>42.71</cell></row><row><cell>SpQ</cell><cell>21.23</cell><cell>88.59</cell><cell>102.68</cell><cell>14330.4</cell><cell>18.1</cell><cell>84.28</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">Proceedings of the International Workshop on Evaluation of Semantic Technologies (IWEST 2010). Shanghai, China. November 8, 2010.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We described a new dataset for storage systems evaluation called Beazley dataset. The dataset proved to be challenging for state of the art storage systems. In fact, none from the systems evaluated was able to demonstrate the level of performance needed for the real world application utilizing the dataset data. The work suggests that Semantic Web technologies applied indiscriminately (or naively) may not always yield acceptable performance, but significant performance improvements are possible through judicious optimizations to the stored data and queries used, without distorting the semantic coherence of the original data. Performance improvement work to date has been ad hoc, but suggests some strategies that might be considered for automated query optimization.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="http://jena.sourceforge.net/ARQ/" />
		<title level="m">ARQ -A SPARQL processor for Jena</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<ptr target="http://jena.sourceforge.net/TDB/" />
		<title level="m">TDB -A SPARQL database for Jena</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<ptr target="http://challenge.semanticweb.org/" />
		<title level="m">The Billion Triple Challenge</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Benchmarking database systems a systematic approach</title>
		<author>
			<persName><forename type="first">D</forename><surname>Bitton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dewitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Turbyfill</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">VLDB &apos;83: Proceedings of the 9th International Conference on Very Large Data Bases</title>
				<meeting><address><addrLine>San Francisco, CA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Morgan Kaufmann Publishers Inc</publisher>
			<date type="published" when="1983">1983</date>
			<biblScope unit="page" from="8" to="19" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The berlin SPARQL benchmark</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Schultz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. J. Semantic Web Inf. Syst</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="1" to="24" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">SeRQL: A Second Generation RDF Query Language</title>
		<author>
			<persName><forename type="first">J</forename><surname>Broekstra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kampman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SWAD-Europe Workshop on Semantic Web Storage and Retrieval</title>
				<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Sesame: A generic architecture for storing and querying RDF and RDF Schema</title>
		<author>
			<persName><forename type="first">J</forename><surname>Broekstra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kampman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Van Harmelen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the first Int&apos;l Semantic Web Conference (ISWC 2002)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">Ian</forename><surname>Horrocks</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">James</forename><surname>Hendler</surname></persName>
		</editor>
		<meeting>the first Int&apos;l Semantic Web Conference (ISWC 2002)<address><addrLine>Sardinia, Italy</addrLine></address></meeting>
		<imprint>
			<publisher>Springer Verlag</publisher>
			<date type="published" when="2002-05">May 2002</date>
			<biblScope unit="volume">2342</biblScope>
			<biblScope unit="page" from="54" to="68" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The dream of a global knowledge network-a new approach</title>
		<author>
			<persName><forename type="first">M</forename><surname>Doerr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Iorizzo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Comput. Cult. Herit</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="23" />
			<date type="published" when="2008-06">June 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Automated benchmarking of description logic reasoners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Gardiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tsarkov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 2006 Description Logic Workshop</title>
				<meeting>of the 2006 Description Logic Workshop</meeting>
		<imprint>
			<date type="published" when="2006">2006. 2006</date>
			<biblScope unit="volume">189</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">How incomplete is your semantic web reasoner?</title>
		<author>
			<persName><forename type="first">Ian</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Giorgos</forename><surname>Stoilos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bernardo</forename><forename type="middle">Cuenca</forename><surname>Grau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2010)</title>
				<meeting>the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2010)</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
	<note>To Appear</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">The national cancer institute&apos;s thésaurus and ontology</title>
		<author>
			<persName><forename type="first">J</forename><surname>Golbeck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Fragoso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hartel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hendler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Oberthaler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Parsia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Web Sem</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="75" to="80" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Lubm: A benchmark for owl knowledge base systems</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Heflin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Web Semantics: Science, Services and Agents on the World Wide Web</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">2-3</biblScope>
			<biblScope unit="page" from="158" to="182" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">An analysis of empirical testing for modal decision procedures</title>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">F</forename><surname>Patel-Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sebastiani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Logic Journal of the IGPL</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="293" to="323" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Resource Description Framework (RDF): Concepts and Abstract Syntax</title>
		<author>
			<persName><forename type="first">G</forename><surname>Klyne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carroll</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">W3C Recommendation</title>
		<imprint>
			<date type="published" when="2004-02-10">10 February 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">CLAROS -Bringing Classical Art to a Global Public</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kurtz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Parker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Shotton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Klyne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Schroff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wilks</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on e-Science and Grid Computing</title>
				<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="20" to="27" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Dblp database</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ley</surname></persName>
		</author>
		<ptr target="http://www.informatik.uni-trier.de/ley/db/" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Towards a complete OWL ontology benchmark</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">T</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ESWC</title>
				<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="125" to="139" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Jena: Implementing the RDF Model and Syntax Specification</title>
		<author>
			<persName><forename type="first">B</forename><surname>Mcbride</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SemWeb</title>
				<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">OWL Web Ontology Language semantics and abstract syntax</title>
		<author>
			<persName><forename type="first">P</forename><surname>Patel-Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Hayes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">W3C Recommendation</title>
		<imprint>
			<date type="published" when="2004-02-10">10 February 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">SPARQL Query Language for RDF</title>
		<author>
			<persName><forename type="first">E</forename><surname>Prud</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Seaborne</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">W3C Recommendation</title>
		<imprint>
			<date type="published" when="2008-01-15">15 January 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Foundations for an electronic medical record</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rector</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Nowlan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Methods Inf Med</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="179" to="186" />
			<date type="published" when="1991">1991</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">An experimental comparison of RDF data management approaches in a SPARQL benchmark scenario</title>
		<author>
			<persName><forename type="first">M</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hornung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Küchlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lausen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Pinkel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISWC &apos;08: Proceedings of the 7th International Conference on The Semantic Web</title>
				<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="82" to="97" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
