<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">The Empirical Robustness of Description Logic Classification</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Rafael</forename><forename type="middle">S</forename><surname>Gonc ¸alves</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">University of Manchester</orgName>
								<address>
									<settlement>Manchester</settlement>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nicolas</forename><surname>Matentzoglu</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">University of Manchester</orgName>
								<address>
									<settlement>Manchester</settlement>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Bijan</forename><surname>Parsia</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">University of Manchester</orgName>
								<address>
									<settlement>Manchester</settlement>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Uli</forename><surname>Sattler</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">University of Manchester</orgName>
								<address>
									<settlement>Manchester</settlement>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">The Empirical Robustness of Description Logic Classification</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">C6E523D20947CD4DC6559226A854D417</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:31+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In spite of the recent renaissance in lightweight description logics (DLs), many prominent DLs, such as that underlying the Web Ontology Language (OWL), have high worst case complexity for their key inference services. Modern reasoners have a large array of optimization, tuned calculi, and implementation tricks that allow them to perform very well in a variety of application scenarios even though the complexity results ensure that they will perform poorly for some inputs. For users, the key question is how often they will encounter those pathological inputs in practice, that is, how robust are reasoners. We attempt to determine this question for classification of existing ontologies as they are found on the Web. It is a fairly common user task to examine ontologies published on the Web as part of their development process. Thus, the robustness of reasoners in this scenario is both directly interesting and provides some hints toward answering the broader question. From our experiments, we show that the current crop of OWL reasoners, in collaboration, is very robust against the Web.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Motivation</head><p>A serious concern about both versions 1 <ref type="bibr" target="#b11">[12]</ref> and 2 <ref type="bibr" target="#b4">[5]</ref> of the Web Ontology Language (OWL) is that the underlying description logics (SHOIQ and SROIQ) exhibit extremely bad worst case complexity (NEXPTIME and 2NEXPTIME) for their key inference services. While since the mid-1990s, highly optimized description logic reasoners have been exhibiting rather good performance in real cases, even in those more constrained cases there are ontologies (such as Galen) which have proved impossible to process for over a decade. Indeed, concern with such pathology stimulated a renaissance of research into tractable description logics with the EL family <ref type="bibr" target="#b0">[1]</ref> and the DL Lite <ref type="bibr" target="#b3">[4]</ref> family being incorporated as special "profiles" of OWL 2. However, even though the number of ontologies available on the Web has grown enormously since the standardization of OWL, it is still unclear how robust modern, highly optimized reasoners are to such input. Anecdotal evidence suggests that pathological cases are common enough to cause problems, however, systematic evidence has been scarce.</p><p>In this paper we investigate the question of whether modern, highly-optimized description logic reasoners are robust over Web input. The general intuition of a robust system is that it is resistant to failure in the face of a range of input. For any particular robustness determination, one must decide: 1) the range of input, 2) the functional or non-functional properties of interest, and 3) what counts as failure. The instantiation of these parameters strongly influences robustness judgements, with the very same reasoner being highly robust under one scenario and very non-robust under another. For our current purposes, the key scenario is that an ontology engineer, using a tool like Protégé <ref type="bibr" target="#b13">[14]</ref>, is inspecting ontologies published on the Web with an eye to possible reuse, and, as is common, they wish to classify the ontology using a standard OWL 2 DL reasoner as part of their evaluation. This scenario yields the following constraints: 1) for input, we examine Web-based corpora, 2) functional: acceptance (will the reasoner load and process the ontology); non-functional: performance (i.e., will the reasoner complete classification before the ontology engineer gives up), 3) w.r.t. acceptance, failure means either rejecting the input or crashing while processing, and we might reasonably expect an engineer to wait up to 2 hours if the ontology seems "worth it". If a reasoner (or a set of reasoners) is successful for 90% of a corpus, we count that reasoner as robust over that corpus, with 95% and 99% indicating "strong" and "extreme" robustness. While these levels are clearly arbitrary (as is the timeout), they provide a framework to set expectations. Robustness under these assumptions does not ensure robustness under other assumptions (e.g., over subsets of these ontologies as experienced during development or over a more stringent time constraint), yet they are challenging enough that it was unclear to us ex ante whether any reasoners would be robust for any corpus. In fact, we find that the reasoners are robust or near robust for most of the cases we examine including for lower timeouts. More significantly, if we take the best result for each ontology (which represents a kind of "meta-reasoner", where our test reasoners are run in parallel), then the set of reasoners is extremely robust over all corpora. Thus, in a fairly precise, if limited, sense, we demonstrate that SHOIQ and SROIQ are practical description logics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Materials &amp; Methods</head><p>For our input data, we gathered three sets of ontologies from the Web -all versions of the NCI Thesaurus (NCIt), ontologies in the NCBO Bioportal repository, and the results of a Web crawl, each with fundamentally different characteristics.</p><p>The NCIt has been continuously developed and published in monthly versions since 2003. The NCIt archive<ref type="foot" target="#foot_0">1</ref> contains 106 versions parseable by the OWL API <ref type="bibr" target="#b9">[10]</ref>,<ref type="foot" target="#foot_1">2</ref> from release 02.00 (October 2003) through to release 12.11d (November 2012) ranging in size from 49,475 to 133,900 logical axioms and in expressivity from ALE to SH(D). The NCIt team is a fairly stable, closed team of about 20 ontology developers who use a highly regimented process and, at least since 2006, have incorporated OWL reasoners in their tools chain (namely FaCT++ and Pellet). The NCIt is large and easily accessible, thus has been an informal benchmark for reasoner developers. Additionally, the NCI has funded various infrastructure projects, including improvements to reasoners. Thus, we might reasonably expect that reasoners are robust w.r.t. this corpus, both because the NCI team may be tuning their ontology to the available reasoners (though the fact that they fund improvements suggests not), and because reasoner developers are tuning for NCIt.</p><p>The NCBO Bioportal is a Web based repository for health care and life science ontologies. We use a snapshot of (publicly downloadable ontologies from) the BioPortal repository from November 2012, consisting of 292 OWL and OBO parseable ontologies. The average number of logical axioms in the corpus is 28,439 (total: 8,190,504 and median: 979 axioms), and 89 of these ontologies contain named individuals. 4 ontologies contained no logical axioms at all and thus were discarded. In expressivity, the ontologies range from the inexpressive AL DL to the very expressive SROIQ. The ontologies are developed and used in a wide range of largely unrelated projects for a variety of purposes using a variety of tools. While Bioportal has received some attention from the research community, it is not yet a standard target for reasoner developers.</p><p>The third corpus, obtained by a short Web crawl and fuelled by a high number of seeds from Swoogle, Google and ontology repositories on the Web, was collected in November 2012. We picked a random sample of 822 ontologies, out of which 145 contained no logical axioms at all and thus were discarded, leaving 677 ontologies for our experiment. The average number of logical axioms is 2,405 (total: 1,628,207 and median: 57), and the expressivity ranges from AL to SRIQ. These ontologies span a wide range of subjects and are completely uncontrolled with respect to their origin. Perhaps not surprisingly, there are fewer axioms overall and on average, with half of the ontologies containing under 60 axioms. This may reflect less commitment to the ontologies than we see in the more curated set. However, there is no reason to think that the reasoners have been specially tuned to these ontologies and, given the worst case complexity of the logics, even small ontologies are a potential pathological case. Thus, it is unclear what the rational robustness expectation is for this set.</p><p>We selected four reasoners for testing based on the following criteria: a) coverage of all of OWL 2, b) freely available for download, c) native support for the OWL API, and finally d) based on sound, complete and terminating algorithms. As such, the chosen reasoners are Pellet <ref type="bibr" target="#b18">[19]</ref>, HermiT <ref type="bibr" target="#b17">[18]</ref>, FaCT++ <ref type="bibr" target="#b19">[20]</ref>, and JFact. We excluded, e.g., the RacerPro <ref type="bibr" target="#b8">[9]</ref>, CB <ref type="bibr" target="#b12">[13]</ref>, and KAON2<ref type="foot" target="#foot_2">3</ref> reasoners due to their lack of coverage for all OWL 2 features, and no native support for the OWL API. Finally, we did not consider approximate (either unsound or incomplete) reasoners, such as TrOWL <ref type="bibr" target="#b15">[16]</ref>, so that we can compare classification results between reasoners, and because we feel that approximation is generally only considered in cases where sound and complete reasoners fail.</p><p>For all our experiments we use the current 2013 reasoner versions, namely, Pellet v2.3.0, HermiT v1.3.6, FaCT++ v1.6.1 and JFact v1.0. However, since NCIt performance has been studied previously <ref type="bibr" target="#b6">[7]</ref>, we decided to compare the current reasoner versions with the versions used in the 2011 study, namely Pellet v2.2.2, HermiT v1.3.3, FaCT++ v1.5.3 and JFact v0.2, in order to test how much tuning to NCIt occurs.</p><p>As mentioned earlier, we set the classification timeout tp 2 hours per ontologyreasoner pair. From a scenario perspective, 2 hours is rather generous -many ontologists will give up much sooner than that. However, 2 hours gives us an idea of which "hard" ontologies are clearly in striking distance, without making completing the experiments infeasible. In the presentation below, we examine a tighter timeout (and thus harder robustness criterion) of about 100 seconds. The main experiment machine has an Intel Quad-Core Xeon 3.2GHz processor with 32GB DDR3 RAM. A second experiment involving solely the NCIt (and both reasoner sets) was performed on a machine with an Intel Dual-Core i7 2.7GHz processor, with 16GB DDR3 RAM. All tests were run on Mac OS X 10.7.5, using Java v1.7 and the OWL API v3.4.1.</p><p>The test corpora, experiment results, and reasoners used are available from http: //sites.google.com/site/reasonerbenchmark. 4   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Results</head><p>In all experiments we categorise ontology classification times into the following bins: Very Easy (≤ 1 second), Easy (1-10 seconds), Medium (10-100 seconds), Hard (100-1000 seconds), and Very Hard (&gt;1000 seconds). We denote "Impatient Robustness" as a measure of how many ontologies terminate in an acceptable time for most users, i.e., ontologies in the Medium bin or below. Throughout this section we use "Best Combo" as the best of all of 4 reasoners' results (i.e., fastest time), and, analogously, "Worst Combo" as the worst.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">NCI Thesaurus</head><p>In this experiment we test both 2011 and 2013 reasoner versions sets, and compare the performance behaviour of each reasoner. The classification times for both reasoner sets are shown in Figure <ref type="figure">1</ref>. Using the reasoner versions from 2011, and taking into account those ontologies that all reasoners managed to process and classify, FaCT++ is on average the fastest of all 4 reasoners, taking 14.7 seconds per version. JFact comes second, with an average of 22.9 seconds per version, while Pellet is the third fastest, taking on average 36.5 seconds per version, and finally HermiT is the slowest, with 150 seconds per version (see Table <ref type="table" target="#tab_0">1</ref>).</p><p>When switching to the 2013 reasoner sets the performance winner remains FaCT++, with an average of 19.2 seconds per ontology. However in second place now comes Pellet, taking 61.6 seconds on average, in third HermiT with an average of 174 seconds, and finally JFact taking 180 seconds on average per ontology. Notice that from 2011 to 2013 there was a significant improvement in JFact's robustness, with far fewer errors. Similarly Pellet has less errors in its most recent version, and the performance is superior to the 2011 version. FaCT++ and HermiT's performance slightly decreased from 2011 to 2013 on this corpus, though not nearly as much as JFact, possibly because in its most recent version JFact is able to process the more recent versions of the NCIt.</p><p>Overall, FaCT++ is the most robust reasoner for the NCIt corpus, having no errors in either 2011 or 2013 versions (see Tables <ref type="table" target="#tab_1">1 and 2</ref>). Furthermore, it is the fastest performing reasoner across nearly all versions. The least robust is, interestingly, FaCT++'s port to the Java language: JFact, due to the high number of errors reported. Though there is improvement from 2011 to 2013, this "young" reasoner is still not as fast as FaCT++. The reasoner errors encountered throughout the NCIt were, by Pellet: "OutOfMemory" errors, by HermiT: "StackOverflow", and finally by JFact: "IllegalArgument". 4 Full set of crawled ontologies is available upon request. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">NCBO BioPortal</head><p>In the snapshot of BioPortal, out of all 288 non-empty ontologies, 9 ontologies are inconsistent and there are 234 that all reasoners manage to classify within the timeout (see Table <ref type="table" target="#tab_2">3</ref>). Out of those ontologies where all reasoners completed classification, FaCT++ was on average the fastest (2.9 seconds per ontology), followed by JFact (5.9 seconds), HermiT (9.8 seconds), and finally Pellet (16.7 seconds). However, in terms of robustness, our results show that Pellet is the most robust of all reasoners, only failing to handle 9 ontologies, while HermiT, the least robust, fails to classify 27 ontologies. FaCT++ and JFact fail to handle 24 and 25 ontologies respectively (see Table <ref type="table" target="#tab_3">4</ref> for more details regarding errors).</p><p>Generally, Pellet is not only the most robust reasoner for BioPortal, with fewer errors, but also exhibits fast performance on a high number of ontologies. However, it does have the most timeouts; but note that some of these were on ontologies that other reasoners threw an error on. The remaining 3 reasoners are very close to each other performance and robustness-wise, HermiT with less timeouts but more errors than JFact and FaCT++, and slower performance. Thus HermiT is the least robust reasoner for BioPortal. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Web Crawl Corpus</head><p>Out of the 677 non-empty ontologies from the Web crawl corpus, all reasoners completed classification of 560 of them. In these 560, Pellet was the fastest reasoner on average (0.5 seconds per ontology), followed by FaCT++ (1.5 seconds), HermiT (3.1 seconds), and finally JFact (6.2 seconds). In terms of robustness, Pellet is, again, the most robust, having only thrown errors on 17 ontologies (see Table <ref type="table" target="#tab_4">5</ref>). It is also the reasoner with most timeouts, but again, several times where other reasoners threw errors. FaCT++ and HermiT both have a high number of errors, while, curiously, JFact did much better on that front in this corpus. In Table <ref type="table" target="#tab_5">6</ref>  Overall Pellet is the most robust and fastest (among ontologies that could be classified by all reasoners) reasoner for this corpus, followed closely by JFact, both in terms of robustness and performance. The least robust reasoners for the Web crawl corpus are FaCT++ and HermiT, with 88 and 89 errors, respectively. However, HermiT performed slightly better on the lower bins, while FaCT++ was clearly the slowest in this corpus.</p><p>Overall we have processed a total of 1,071 ontologies, the largest such reasoner benchmark, having found that amongst the 4 tested reasoners Pellet is the most robust of all (see Table <ref type="table" target="#tab_6">7</ref>). Surprisingly, Pellet is followed by JFact on our robustness test, due to having far less errors than FaCT++. HermiT and FaCT++ have the same overall robustness, but FaCT++ has less errors and higher impatient robustness. While Pellet is the most robust reasoner, we urge some caution in that reading. In particular, this does not mean that Pellet will always do best or even perform reasonably. In fact, it may timeout where other reasoners finish reasonably fast. The set of reasoners (taken together and considering the best results) is extremely robust across the board (for each reasoner's contribution to the best case reasoner, see Figure <ref type="figure">2</ref>). Thus, we have strong empirical evidence that the ontologies on the Web do not supply many in principle intractable cases, but only cases which are difficult for particular reasoners. Nr. Ontologies Fig. <ref type="figure">2</ref>. of times that each reasoner equals the best case, for each corpus.</p><p>Note that FaCT++ and JFact fail to process several ontologies due to poor support for OWL datatypes, particularly datatypes not specified in the OWL 2 datatype map; both of these reasoners, as well as HermiT, have little support for OWL 1 datatypes. By removing the non OWL 2 datatype errors, we would end up with FaCT++ being the most robust w.r.t. OWL 2, followed by HermiT and Pellet. From Figure <ref type="figure">2</ref> we see that FaCT++ outperforms other reasoners on many occasions, but, due to the high number of errors thrown, its robustness w.r.t. our input data is not nearly at the same level as its performance.</p><p>The 9 ontologies which no reasoner classified within the timeout range in expressivity between ALEHIF+ and SRIQ. Their average number of logical axioms is 56,179; the minimum is 341 axioms -SRIQ ontology, maximum 379,734 axioms -SR ontology, and median axioms -SHIF ontology.</p><p>It is clear that deriving a sensible ranking even simply using average or total time is not straightforward. Our results have rather strong implications for reasoner experiments, especially those purporting to show the advantages of an optimisation or a technique or an implementation: The space is very complex and it is very easy to simultaneously generate a biased sample for one system and against another. Even simple, seemingly innocuous things like timeouts and classification failures require tremendous care in handling. If results are going to be meaningful across papers we need to converge on experimental inputs, methods, and reporting forms.</p><p>Finally, in order to get an overall picture of how these robustness measurements relate to the OWL profile in which ontologies fit into, we: divide our ontologies into their corresponding OWL profile, and match them with the observed performance bin of the Best and Worst Combo reasoners. This is displayed in Figure <ref type="figure" target="#fig_1">3</ref>.</p><p>Since there is an overlap between the EL, RL and QL profiles of OWL 2, some ontologies are counted in more than one such bin, meaning that the total number of ontologies in Figure <ref type="figure" target="#fig_1">3</ref> does not add up to the number of ontologies in our corpus. However, the ontologies contained in the DL profile bin are exclusive, i.e., an ontology in the EL profile is not counted again within the DL profile. Note that, even though ontologies in the EL, RL and QL profiles of OWL are typically in the easier bins, there are some which are deemed hard, time out, or even result in error.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Related Work</head><p>There is extensive work in benchmarking reasoners, some of which focuses purely on either classification or (conjunctive) query answering (e.g., <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b16">17]</ref>). Generally, previous reasoner benchmarks used much smaller and rather ad hoc data sets, in some cases using artificial data. For the purposes of this paper, we focus solely on work involving the classification task, particularly using realistic rather than artificially-generated test data.</p><p>The Pellet reasoner was evaluated, in <ref type="bibr" target="#b18">[19]</ref>, with a corpus of 9 ontologies, presenting the average of 10 independent runs of a reasoning task -the tasks under test being consistency checking, classification and realization. Additionally, the authors compare Pellet against FaCT++ and RacerPro in terms of classification time only, using the DL benchmark test suite described in <ref type="bibr" target="#b10">[11]</ref>. The experiment showed that Pellet was not as efficient as FaCT++ or RacerPro in many, but not all, cases. In <ref type="bibr" target="#b5">[6]</ref> the authors present a system for comparing reasoners both in terms of performance and correctness of classification results. Four reasoners are put to test: FaCT++, Pellet, KAON2 and RacerPro, over a corpus of 172 naturally occurring ontologies, out of which only 31 were either in or more expressive than ALC. The benchmark results show that Pellet was the most robust reasoner, with FaCT++ a close second, being able to process, respectively, 143 and 137 ontologies. In terms of classification time, the authors state that "there is no clear winner", due to considerable fluctuation of reasoner performance across ontologies.</p><p>The evaluation of the HermiT reasoner <ref type="bibr" target="#b17">[18]</ref> was carried out against the FaCT++ and Pellet reasoners, using a corpus of ontologies derived from the Gardiner data set <ref type="bibr" target="#b5">[6]</ref>, the Open Biological Ontologies (OBO) Foundry, <ref type="foot" target="#foot_3">5</ref> and finally, several versions of the GALEN ontology. The result was that HermiT outperforms the other reasoners in the majority of tested ontologies.</p><p>In <ref type="bibr" target="#b2">[3]</ref> the authors carry out a benchmark of ontologies derived from the Watson repository. <ref type="foot" target="#foot_4">6</ref> Out of the 6,224 ontologies in Watson, only 3,303 were parseable by both Swoop and the KAON2 tools. These were then classified into 4 bins according to their expressivity; RDFS(DL), OWL DLP, OWL Lite, and OWL DL. From these bins, the authors picked 1 representative per bin, according to its popularity in previous benchmarks. The test itself involved the reasoners HermiT, Pellet, RacerPro, KAON2, OWLIM and Sesame, where the classification performance results show that HermiT was fastest in 3 out 4 cases, OWLIM being the fastest in the RDFS(DL) representative.</p><p>The author of <ref type="bibr" target="#b14">[15]</ref> performs a benchmark of the Pellet, FaCT++ and Racer reasoners, though using different interfaces (FaCT++ used DIG at the time) -thus the results are not directly comparable. This benchmark was carried out using a corpus of 135 OWL ontologies from Schemaweb. <ref type="foot" target="#foot_5">7</ref> The experiment showed that FaCT++ was the fastest (excluding timeouts) and the most robust, since it processed the most ontologies without timing-out or aborting (due to errors unrevealed by the author).</p><p>The benchmark carried out in <ref type="bibr" target="#b1">[2]</ref> compared the KAON2, Pellet, Racer, HermiT and FaCT++ reasoners, against 50 naturally occurring ontologies. However, in the paper, the authors focus only on a few examples; Racer was fastest on the Wine ontology, HermiT on DeepTree, FaCT++ on the NCI Thesaurus, and HermiT on GALEN.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Future Work</head><p>In this paper, we did not have space to discuss whether there is a performance/size or performance/expressivity correlation. By and large, our analysis shows that there is a roughly linear correlation between performance and size, and no correlation with expressivity.</p><p>Due to the large size of the Web crawl corpus, we resorted to sampling in order to obtain results in time. Though we have tested large enough samples to attain statistical significance, we hope to complete processing all ontologies in said corpus in the near future. For the purposes of this paper we limited our attention to classification, but could easily extend our benchmarking to other inference problems, even to non-standard ones such as justification finding. We also intend to tackle the vast task of identifying promising correlations between features of ontologies and their reasoning difficulty.</p><p>To address the difficulties in stable, cross-experiment comparison and interpretation, we propose to establish a comprehensive benchmark which is updated yearly. To facilitate rapid experimentation, we will provide canonical stable random samples so that experimenters can provide a comparable baseline, even if for scientific reasons they must also investigate other inputs. We will also make our test framework and computing platform available, re-running all the experiments we can gather in the prior year to provide systematic review and replication of results.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Number of ontologies in each OWL 2 profile displayed according to the performance profile of the Best and Worst Combo reasoners. On the left-hand side (Figures 3(a) and 3(c)) we show the OWL profile distribution of the ontologies in the 'Very Easy' performance bin, as it is the most densely populated bin. While on the right-hand side (Figures 3(b) and 3(d)) the remaining performance bins.</figDesc><graphic coords="10,315.04,239.67,138.68,105.54" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Binning of the NCIt corpus according to performance (2011 reasoner versions sets).</figDesc><table><row><cell>600</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>500</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>Pellet'13</cell></row><row><cell>400</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>Pellet'11</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>HermiT'13</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>HermiT'11</cell></row><row><cell>300</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>FaCT++'13</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>FaCT++'11</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>JFact'13</cell></row><row><cell>200</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>JFact'11</cell></row><row><cell>100</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>0</cell><cell>v1</cell><cell>v4</cell><cell>v7</cell><cell>v10</cell><cell>v13</cell><cell>v16</cell><cell>v19</cell><cell>v22</cell><cell>v25</cell><cell>v28</cell><cell>v31</cell><cell>v34</cell><cell>v37</cell><cell>v40</cell><cell>v43</cell><cell>v46</cell><cell>v49</cell><cell>v52</cell><cell>v55</cell><cell>v58</cell><cell>v61</cell><cell>v64</cell><cell>v67</cell><cell>v70</cell><cell>v73</cell><cell>v76</cell><cell>v79</cell><cell>v82</cell><cell>v85</cell><cell>v88</cell><cell>v91</cell><cell>v94</cell><cell>v97</cell><cell>v100</cell><cell>v103</cell><cell>v106</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="26">Pellet HermiT JFact FaCT++ Best Combo Worst Combo</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="5">Very Easy</cell><cell></cell><cell></cell><cell></cell><cell cols="17">0 (0%) 0 (0%) 0 (0%) 0 (0%)</cell><cell></cell><cell cols="4">0 (0%)</cell><cell></cell><cell></cell><cell cols="2">0 (0%)</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="3">Easy</cell><cell></cell><cell></cell><cell></cell><cell cols="23">16 (15%) 15 (14%) 16 (15%) 18 (17%) 18 (17%)</cell><cell></cell><cell></cell><cell cols="2">15 (14%)</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="4">Medium</cell><cell></cell><cell></cell><cell></cell><cell cols="23">70 (66%) 42 (40%) 52 (49%) 88 (83%) 88 (83%)</cell><cell></cell><cell></cell><cell cols="2">15 (14%)</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="3">Hard</cell><cell></cell><cell></cell><cell></cell><cell cols="18">18 (17%) 48 (45%) 0 (0%) 0 (0%)</cell><cell></cell><cell cols="4">0 (0%)</cell><cell></cell><cell></cell><cell cols="2">37 (35%)</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="5">Very Hard</cell><cell></cell><cell></cell><cell></cell><cell cols="17">0 (0%) 0 (0%) 0 (0%) 0 (0%)</cell><cell></cell><cell cols="4">0 (0%)</cell><cell></cell><cell></cell><cell cols="2">0 (0%)</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="4">Timeout</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="17">0 (0%) 0 (0%) 0 (0%) 0 (0%)</cell><cell></cell><cell cols="4">0 (0%)</cell><cell></cell><cell></cell><cell cols="2">0 (0%)</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="3">Errors</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="17">2 (2%) 1 (1%) 38 (36%) 0 (0%)</cell><cell></cell><cell cols="4">0 (0%)</cell><cell></cell><cell></cell><cell cols="2">39 (37%)</cell></row><row><cell cols="14">Impatient Robustness 81%</cell><cell></cell><cell></cell><cell cols="2">54%</cell><cell></cell><cell></cell><cell cols="3">64%</cell><cell></cell><cell cols="3">100%</cell><cell></cell><cell></cell><cell></cell><cell cols="3">100%</cell><cell></cell><cell></cell><cell></cell><cell>28%</cell></row><row><cell></cell><cell cols="9">Overall Robustness</cell><cell></cell><cell cols="3">98%</cell><cell></cell><cell></cell><cell cols="2">99%</cell><cell></cell><cell></cell><cell cols="3">64%</cell><cell></cell><cell cols="3">100%</cell><cell></cell><cell></cell><cell></cell><cell cols="3">100%</cell><cell></cell><cell></cell><cell></cell><cell>63%</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="26">Pellet HermiT JFact FaCT++ Best Combo Worst Combo</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="5">Very Easy</cell><cell></cell><cell></cell><cell></cell><cell cols="17">0 (0%) 0 (0%) 0 (0%) 0 (0%)</cell><cell></cell><cell cols="4">0 (0%)</cell><cell></cell><cell></cell><cell cols="2">0 (0%)</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="3">Easy</cell><cell></cell><cell></cell><cell></cell><cell cols="23">16 (15%) 15 (14%) 0 (0%) 19 (18%) 19 (18%)</cell><cell></cell><cell></cell><cell cols="2">0 (0%)</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="4">Medium</cell><cell></cell><cell></cell><cell></cell><cell cols="23">71 (67%) 42 (40%) 24 (23%) 87 (82%) 87 (82%)</cell><cell></cell><cell></cell><cell cols="2">23 (22%)</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="3">Hard</cell><cell></cell><cell></cell><cell></cell><cell cols="18">19 (18%) 48 (45%) 70 (66%) 0 (0%)</cell><cell></cell><cell cols="4">0 (0%)</cell><cell></cell><cell></cell><cell cols="2">70 (66%)</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="5">Very Hard</cell><cell></cell><cell></cell><cell></cell><cell cols="17">0 (0%) 0 (0%) 0 (0%) 0 (0%)</cell><cell></cell><cell cols="4">0 (0%)</cell><cell></cell><cell></cell><cell cols="2">0 (0%)</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="4">Timeout</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="17">0 (0%) 0 (0%) 0 (0%) 0 (0%)</cell><cell></cell><cell cols="4">0 (0%)</cell><cell></cell><cell></cell><cell cols="2">0 (0%)</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="3">Errors</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="17">0 (0%) 1 (1%) 12 (11%) 0 (0%)</cell><cell></cell><cell cols="4">0 (0%)</cell><cell></cell><cell></cell><cell cols="2">13 (12%)</cell></row><row><cell cols="14">Impatient Robustness 82%</cell><cell></cell><cell></cell><cell cols="2">54%</cell><cell></cell><cell></cell><cell cols="3">23%</cell><cell></cell><cell cols="3">100%</cell><cell></cell><cell></cell><cell></cell><cell cols="3">100%</cell><cell></cell><cell></cell><cell></cell><cell>22%</cell></row><row><cell></cell><cell cols="13">Overall Robustness 100%</cell><cell></cell><cell></cell><cell cols="2">99%</cell><cell></cell><cell></cell><cell cols="3">89%</cell><cell></cell><cell cols="3">100%</cell><cell></cell><cell></cell><cell></cell><cell cols="3">100%</cell><cell></cell><cell></cell><cell></cell><cell>88%</cell></row></table><note>Fig. 1. Comparison of classification times between the 2011 reasoner version set (suffixed '11) and the 2013 set (suffixed '13) over the NCIt corpus (y-axis: time in seconds, x-axis: version number).</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Binning of the NCIt corpus according to performance (2013 reasoner versions sets).</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>Binning of the BioPortal corpus according to performance.</figDesc><table><row><cell></cell><cell>Pellet</cell><cell>HermiT</cell><cell>JFact</cell><cell></cell><cell cols="3">FaCT++ Best Combo Worst Combo</cell></row><row><cell>Very Easy</cell><cell cols="7">190 (66%) 170 (59%) 184 (64%) 218 (76%) 236 (82%) 152 (53%)</cell></row><row><cell>Easy</cell><cell cols="6">56 (19%) 61 (21%) 58 (20%) 24 (8%) 28 (10%)</cell><cell>58 (20%)</cell></row><row><cell>Medium</cell><cell cols="2">10 (3%) 15 (5%)</cell><cell>8 (3%)</cell><cell></cell><cell>7 (2%)</cell><cell>11 (4%)</cell><cell>10 (3%)</cell></row><row><cell>Hard</cell><cell>4 (1%)</cell><cell>4 (1%)</cell><cell>2 (1%)</cell><cell></cell><cell>2 (1%)</cell><cell>4 (1%)</cell><cell>2 (1%)</cell></row><row><cell>Very Hard</cell><cell>6 (2%)</cell><cell>3 (1%)</cell><cell>0 (0%)</cell><cell></cell><cell>3 (1%)</cell><cell>4 (1%)</cell><cell>2 (1%)</cell></row><row><cell>Timeout</cell><cell>13 (5%)</cell><cell>8 (3%)</cell><cell cols="3">11 (4%) 10 (3%)</cell><cell>5 (2%)</cell><cell>15 (5%)</cell></row><row><cell>Errors</cell><cell>9 (3%)</cell><cell cols="4">27 (9%) 25 (9%) 24 (8%)</cell><cell>0 (0%)</cell><cell>49 (17%)</cell></row><row><cell>Impatient Robustness</cell><cell>89%</cell><cell>85%</cell><cell>87%</cell><cell></cell><cell>86%</cell><cell>95%</cell><cell>76%</cell></row><row><cell>Overall Robustness</cell><cell>92%</cell><cell>88%</cell><cell>88%</cell><cell></cell><cell>88%</cell><cell>98%</cell><cell>78%</cell></row><row><cell></cell><cell>Error</cell><cell></cell><cell cols="4">Pellet HermiT JFact FaCT++</cell></row><row><cell></cell><cell cols="2">StackOverflow</cell><cell>2</cell><cell>0</cell><cell>1</cell><cell>0</cell></row><row><cell></cell><cell cols="2">OutOfMemory</cell><cell>1</cell><cell>1</cell><cell>2</cell><cell>0</cell></row><row><cell cols="3">UnsupportedDatatype</cell><cell>0</cell><cell>13</cell><cell>4</cell><cell>14</cell></row><row><cell></cell><cell cols="2">InternalReasoner</cell><cell>2</cell><cell>0</cell><cell>1</cell><cell>0</cell></row><row><cell></cell><cell cols="2">IllegalArgument</cell><cell>0</cell><cell>12</cell><cell>16</cell><cell>6</cell></row><row><cell></cell><cell cols="2">MalformedLiteral</cell><cell>0</cell><cell>1</cell><cell>0</cell><cell>0</cell></row><row><cell cols="3">ConcurrentModification</cell><cell>3</cell><cell>0</cell><cell>0</cell><cell>0</cell></row><row><cell></cell><cell cols="2">Reasoner crashed</cell><cell>0</cell><cell>0</cell><cell>0</cell><cell>4</cell></row><row><cell cols="3">IndexOutOfBounds</cell><cell>1</cell><cell>0</cell><cell>1</cell><cell>0</cell></row><row><cell></cell><cell cols="2">Total Errors</cell><cell>9</cell><cell>27</cell><cell>25</cell><cell>24</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 .</head><label>4</label><figDesc>Errors and exceptions that occurred during classification of BioPortal ontologies.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 .</head><label>5</label><figDesc>the errors found across the corpus are broken down. Binning of the Web crawl corpus according to performance.</figDesc><table><row><cell></cell><cell>Pellet</cell><cell>HermiT</cell><cell>JFact</cell><cell></cell><cell cols="3">FaCT++ Best Combo Worst Combo</cell></row><row><cell>Very Easy</cell><cell cols="7">597 (88%) 536 (79%) 557 (82%) 566 (84%) 642 (95%) 493 (73%)</cell></row><row><cell>Easy</cell><cell cols="5">44 (6%) 36 (5%) 45 (7%) 12 (2%)</cell><cell>26 (4%)</cell><cell>44 (6%)</cell></row><row><cell>Medium</cell><cell>2 (0%)</cell><cell>8 (1%)</cell><cell cols="2">11 (2%)</cell><cell>0 (0%)</cell><cell>3 (0%)</cell><cell>12 (2%)</cell></row><row><cell>Hard</cell><cell>1 (0%)</cell><cell>1 (0%)</cell><cell cols="2">4 (1%)</cell><cell>5 (1%)</cell><cell>2 (0%)</cell><cell>3 (0%)</cell></row><row><cell>Very Hard</cell><cell>0 (0%)</cell><cell>1 (0%)</cell><cell cols="2">1 (0%)</cell><cell>1 (0%)</cell><cell>0 (0%)</cell><cell>1 (0%)</cell></row><row><cell>Timeout</cell><cell>16 (2%)</cell><cell>6 (1%)</cell><cell cols="2">5 (1%)</cell><cell>5 (1%)</cell><cell>4 (1%)</cell><cell>10 (1%)</cell></row><row><cell>Reasoner Errors</cell><cell cols="5">17 (3%) 89 (13%) 54 (8%) 88 (13%)</cell><cell>0 (0%)</cell><cell>114 (17%)</cell></row><row><cell>Impatient Robustness</cell><cell>95%</cell><cell>86%</cell><cell>91%</cell><cell></cell><cell>85%</cell><cell>99%</cell><cell>81%</cell></row><row><cell>Overall Robustness</cell><cell>95%</cell><cell>86%</cell><cell>91%</cell><cell></cell><cell>86%</cell><cell>99%</cell><cell>82%</cell></row><row><cell></cell><cell>Error</cell><cell></cell><cell cols="4">Pellet Hermit JFact FaCT++</cell></row><row><cell></cell><cell cols="2">StackOverflow</cell><cell>13</cell><cell>0</cell><cell>0</cell><cell>0</cell></row><row><cell></cell><cell cols="2">OutOfMemory</cell><cell>2</cell><cell>0</cell><cell>2</cell><cell>0</cell></row><row><cell></cell><cell cols="2">NullPointer</cell><cell>0</cell><cell>0</cell><cell>36</cell><cell>0</cell></row><row><cell></cell><cell cols="2">UnloadableImport</cell><cell>0</cell><cell>1</cell><cell>1</cell><cell>1</cell></row><row><cell></cell><cell cols="2">ClassCast</cell><cell>0</cell><cell>0</cell><cell>1</cell><cell>0</cell></row><row><cell cols="3">UnsupportedDatatype</cell><cell>0</cell><cell>81</cell><cell>1</cell><cell>86</cell></row><row><cell cols="3">Datatype constraint</cell><cell>2</cell><cell>0</cell><cell>0</cell><cell>0</cell></row><row><cell></cell><cell cols="2">IllegalArgument</cell><cell>0</cell><cell>3</cell><cell>5</cell><cell>0</cell></row><row><cell></cell><cell cols="2">MalformedLiteral</cell><cell>0</cell><cell>2</cell><cell>0</cell><cell>0</cell></row><row><cell></cell><cell cols="2">ReasonerInternal</cell><cell>0</cell><cell>0</cell><cell>8</cell><cell>1</cell></row><row><cell></cell><cell cols="2">UnsupportedFacet</cell><cell>0</cell><cell>2</cell><cell>0</cell><cell>0</cell></row><row><cell></cell><cell>Total</cell><cell></cell><cell>17</cell><cell>89</cell><cell>54</cell><cell>88</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 6 .</head><label>6</label><figDesc>Errors and exceptions that occurred during classification of the Web crawl ontologies.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 7 .</head><label>7</label><figDesc>Binning of all three corpora: BioPortal, NCIt (2013), and Web crawl. Under robustness rows, values in square brackets indicate robustness w.r.t. OWL 2 alone.</figDesc><table><row><cell></cell><cell>Pellet</cell><cell>HermiT</cell><cell>JFact</cell><cell cols="3">FaCT++ Best Combo Worst Combo</cell></row><row><cell>Very Easy</cell><cell cols="6">787 (73%) 706 (66%) 741 (69%) 784 (73%) 878 (82%) 645 (60.2%)</cell></row><row><cell>Easy</cell><cell cols="4">116 (11%) 112 (10%) 103 (10%) 55 (5%)</cell><cell>73 (7%)</cell><cell>102 (9.5%)</cell></row><row><cell>Medium</cell><cell cols="4">83 (8%) 65 (6%) 43 (4%) 94 (9%)</cell><cell>101 (9%)</cell><cell>45 (4.2%)</cell></row><row><cell>Hard</cell><cell cols="3">24 (2%) 53 (5%) 76 (7%)</cell><cell>7 (1%)</cell><cell>6 (1%)</cell><cell>75 (7.0%)</cell></row><row><cell>Very Hard</cell><cell>6 (1%)</cell><cell>4 (0%)</cell><cell>1 (0%)</cell><cell>4 (0%)</cell><cell>4 (0%)</cell><cell>3 (0.3%)</cell></row><row><cell>Timeout</cell><cell cols="4">29 (3%) 14 (1%) 16 (1%) 15 (1%)</cell><cell>9 (1%)</cell><cell>25 (2.3%)</cell></row><row><cell>Errors</cell><cell cols="4">26 (2%) 117 (11%) 91 (8%) 112 (10%)</cell><cell>0 (0%)</cell><cell>176 (16.4%)</cell></row><row><cell>Total (excl. Errors)</cell><cell>1016</cell><cell>940</cell><cell>964</cell><cell>944</cell><cell>1062</cell><cell>870</cell></row><row><cell>Total (incl. Errors)</cell><cell>1071</cell><cell>1071</cell><cell>1071</cell><cell>1071</cell><cell>1071</cell><cell>1071</cell></row><row><cell cols="5">Impatient Robustness 92% 82% [90%] 83% 87% [96%]</cell><cell>98%</cell><cell>74% [87%]</cell></row><row><cell>Overall Robustness</cell><cell cols="4">95% 88% [96%] 90% 88% [97%]</cell><cell>99%</cell><cell>81% [96%]</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://evs.nci.nih.gov/ftp1/NCI_Thesaurus</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://owlapi.sourceforge.net</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://kaon2.semanticweb.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">http://obofoundry.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">http://watson.kmi.open.ac.uk/WatsonWUI/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">http://schemaweb.info/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Pushing the EL envelope</title>
		<author>
			<persName><forename type="first">F</forename><surname>Baader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Brandt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lutz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 19th Int. Joint Conf. on Artificial Intelligence (IJCAI-05)</title>
				<meeting>of the 19th Int. Joint Conf. on Artificial Intelligence (IJCAI-05)</meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A testing framework for OWL-DL reasoning</title>
		<author>
			<persName><forename type="first">M</forename><surname>Babik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hluchy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the Int. Conf. on Semantics, Knowledge and Grids (SKG-08</title>
				<meeting>of the Int. Conf. on Semantics, Knowledge and Grids (SKG-08</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Benchmarking owl reasoners</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Haase</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Volz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the Int. Workshop on Advancing Reasoning on the Web: Scalability and Commonsense (ARea-08</title>
				<meeting>of the Int. Workshop on Advancing Reasoning on the Web: Scalability and Commonsense (ARea-08</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Tractable reasoning and efficient query answering in description logics: The DL-Lite family</title>
		<author>
			<persName><forename type="first">D</forename><surname>Calvanese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>De Giacomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lembo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lenzerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rosati</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. of Automated Reasoning</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="385" to="429" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">OWL 2: The next step for OWL</title>
		<author>
			<persName><forename type="first">Cuenca</forename><surname>Grau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Motik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Parsia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">F</forename><surname>Patel-Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Sattler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. of Web Semantics</title>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Framework for an automated comparison of description logic reasoners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Gardiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tsarkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 5th Int. Semantic Web Conf. (ISWC-06)</title>
				<meeting>of the 5th Int. Semantic Web Conf. (ISWC-06)</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Analysing the evolution of the NCI thesaurus</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Gonc ¸alves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Parsia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Sattler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 24th IEEE Int. Symposium on Computer-Based Medical Systems (CBMS-11</title>
				<meeting>of the 24th IEEE Int. Symposium on Computer-Based Medical Systems (CBMS-11</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">LUBM: A benchmark for OWL knowledge base systems</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Heflin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. of Web Semantics</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">2-3</biblScope>
			<biblScope unit="page" from="158" to="182" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">RACER system description</title>
		<author>
			<persName><forename type="first">V</forename><surname>Haarslev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Möller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 1st Int. Joint Conf. on Automated Reasoning (IJCAR-01)</title>
		<title level="s">Lecture Notes in Artificial Intelligence</title>
		<meeting>of the 1st Int. Joint Conf. on Automated Reasoning (IJCAR-01)</meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2001">2001</date>
			<biblScope unit="volume">2083</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The OWL API: A Java API for working with OWL 2 ontologies</title>
		<author>
			<persName><forename type="first">M</forename><surname>Horridge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bechhofer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 6th Int. Workshop on OWL: Experiences and Directions (OWLED-09)</title>
				<meeting>of the 6th Int. Workshop on OWL: Experiences and Directions (OWLED-09)</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">DL systems comparison</title>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">F</forename><surname>Patel-Schneider</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 11th Int. Workshop on Description Logics (DL-98</title>
				<meeting>of the 11th Int. Workshop on Description Logics (DL-98</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">From SHIQ and RDF to OWL: The making of a web ontology language</title>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">F</forename><surname>Patel-Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Van Harmelen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. of Web Semantics</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="7" to="26" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Consequence-driven reasoning for Horn SHIQ ontologies</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Kazakov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 21st Int. Joint Conf. on Artificial Intelligence (IJCAI-09)</title>
				<meeting>of the 21st Int. Joint Conf. on Artificial Intelligence (IJCAI-09)</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">The Protégé OWL plugin: An open development environment for semantic web applications</title>
		<author>
			<persName><forename type="first">H</forename><surname>Knublauch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">W</forename><surname>Fergerson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">F</forename><surname>Noy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Musen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 3rd Int. Semantic Web Conf. (ISWC-04)</title>
				<meeting>of the 3rd Int. Semantic Web Conf. (ISWC-04)</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Benchmarking DL reasoners using realistic ontologies</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Pan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 1st Int. Workshop on OWL: Experiences and Directions (OWLED-05)</title>
				<meeting>of the 1st Int. Workshop on OWL: Experiences and Directions (OWLED-05)</meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Soundness preserving approximation for tbox reasoning</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">Z</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 24th AAAI Conf. on Artificial Intelligence (AAAI-10)</title>
				<meeting>of the 24th AAAI Conf. on Artificial Intelligence (AAAI-10)</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A comparison of reasoning techniques for querying large description logic aboxes</title>
		<author>
			<persName><forename type="first">U</forename><surname>Sattler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Motik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 13th Int. Conf. on Logic for Programming and Automated Reasoning (LPAR-06</title>
				<meeting>of the 13th Int. Conf. on Logic for Programming and Automated Reasoning (LPAR-06</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">HermiT: A highly-efficient OWL reasoner</title>
		<author>
			<persName><forename type="first">R</forename><surname>Shearer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Motik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 5th Int. Workshop on OWL: Experiences and Directions (OWLED-08EU)</title>
				<meeting>of the 5th Int. Workshop on OWL: Experiences and Directions (OWLED-08EU)</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Pellet: A practical OWL-DL reasoner</title>
		<author>
			<persName><forename type="first">E</forename><surname>Sirin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Parsia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Cuenca Grau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kalyanpur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Katz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. of Web Semantics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="51" to="53" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">FaCT++ description logic reasoner: System description</title>
		<author>
			<persName><forename type="first">D</forename><surname>Tsarkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 3rd Int. Joint Conf. on Automated (IJCAR-06</title>
				<meeting>of the 3rd Int. Joint Conf. on Automated (IJCAR-06</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
