<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Ontology Consistency and Instance Checking For Real World Linked Data</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Gavin</forename><forename type="middle">E</forename><surname>Mendel-Gleason</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Trinity College Dublin</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Rob</forename><surname>Brennan</surname></persName>
							<email>rob.brennan@scss.tcd.ie</email>
							<affiliation key="aff0">
								<orgName type="institution">Trinity College Dublin</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Kevin</forename><surname>Feeney</surname></persName>
							<email>kevin.feeney@scss.tcd.ie</email>
							<affiliation key="aff0">
								<orgName type="institution">Trinity College Dublin</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Ontology Consistency and Instance Checking For Real World Linked Data</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">760EF129F7722296873FF5CFC7FD11B6</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T23:43+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Many large ontologies have been created which make use of OWL's expressiveness for specification. However, tools to ensure that instance data is in compliance with the schema are often not well integrated with triple-stores and cannot detect certain classes of schema-instance inconsistency due to the assumptions of the OWL axioms. This can lead to lower quality, inconsistent data. We have developed a simple ontology consistency and instance checking service, SimpleConsist <ref type="bibr" target="#b6">[8]</ref>. We also define a number of ontology design best practice constraints on OWL or RDFS schemas. Our implementation allows the user to specify which constraints should be applied to schema and instance data.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Many Linked Data stores have large amounts of quite variable <ref type="bibr" target="#b5">[7]</ref> data (e.g. DBpedia <ref type="bibr" target="#b1">[3]</ref>). Triples can exist in a triple-store which have no associated schema or conform to no constraints on the shape or type of data. Typically such data is considered low quality and is hard to consume.</p><p>Earlier work showed that OWL semantics make it ill suited as a language of constraints <ref type="bibr" target="#b7">[9]</ref>. However maintaining ontology consistency and conformance is central to high quality data storage. Programmatic consumption of data is simplified if the data is well formed and well typed. Data management is simplified if inserts, deletes and updates that might violate well formedness constraints is signalled.</p><p>To solve these problems, we use a persistent triple-store in ClioPatria <ref type="bibr" target="#b0">[1]</ref> and a plugin constraint checker called SimpleConsist <ref type="bibr" target="#b6">[8]</ref>, both implemented in SWI-Prolog <ref type="bibr" target="#b9">[11]</ref>. SimpleConsist is used to maintain ontological consistency and constraints on instance data such that it conforms to an ontology described in an OWL fragment using a narrower reading of the OWL semantics. In particular, we make use of a closed world assumption, and a unique names assumption. It is implemented as a REST service within the Dacura <ref type="bibr" target="#b4">[6]</ref> data curation system.</p><p>The philosophy for our ontology consistency and instance checking is to view the ontology as static assertions, which must be self-consistent, and to which a given instance state must conform. Given some triple-store state S we check to make sure that our set of constraints C are satisfied. When updating or inserting into the triple-store, somewhat arbitrary program logic can take place, after which the triple-store is in a state S . If C does not hold for S' then we roll-back to the previous state S. We provide counter-example witnesses L to the failure of the constraint C. These witnesses are useful for debugging schema and instance updates as it gives information about what precisely went wrong in the constraint checking. The constraint rules are a combination of consistency constraints, instance type checking and best practices.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Constraints</head><p>SimpleConsist implements our constraints on ontology consistency and implements instance checking. Because we want witnessing information of the failure to satisfy the constraints, we write constraints which yield the witnesses of a failure. These witnesses are realised as resources not conforming to the constraints. Failure to provide a witness of the negation of the constraint is viewed as success. The failure witnessing predicates are briefly described in Table <ref type="table">1</ref>.</p><p>All witnesses of class cycles are given in the list L. Each element of the list names both the offending class, and the path through the classes. The other constraints return information about the reasons for failure.</p><p>invalidInstanceRange(L) requires some explanation as it is an implementation of a type checker for literals and class instances and so requires knowing what a literal can be. The constraint implements type checking to ensure that all literals are of the appropriate type according to the ranges specified in properties. These literals can be any RDF literal types of the XML Schema which are valid for OWL <ref type="bibr" target="#b3">[5]</ref>. All ranges which specify a class have targets which are instances of an appropriate class (either the class itself or a subclass). Using artificially populated triple stores we timed the reasoner for various numbers of triples generated from the instance generator. These timings can be seen in Figure <ref type="figure">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Prior Work</head><p>There are many reasoners for fragments of OWL, e.g 17 are mentioned in <ref type="bibr" target="#b8">[10]</ref>. Many are sophisticated, however, the lack of the unique name assumption can lead to problems for users developing schemata, making it virtually impossible to use OWL to impose constraints.</p><p>CWM (Closed World Machine) <ref type="bibr" target="#b2">[4]</ref> is a reasoner which takes our same pragmatic approach to closed worlds and unique names. It is capable of expressing the types of constraints we are interested in, in a parsimonious fashion. However, it functions at the level of transformations of RDF files rather than being a fully functioning database system. Running the reasoner would require export of the triple-store which is not practical for large datasets which are changing in real-time.</p><p>There are several tools provided with the Apache Jena[2] system which facilitate consistency and instance checking. In particular the Eyeball system is modular and allows the user to introduce new constraints by adding Java code to perform inspection. However, it does not implement full type checking of instance data as our constraints do.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Future Work and Conclusion</head><p>Triple stores may be applied to complex ontologies which have incremental schema changes. However it is a challenge to provide tools which make publishing OWL-based high quality (consistent) large scale data easy. This requires constraints on schemata and admin tools to ensure that updates to datastores maintain integrity. Our SimpleConsist service has provided practical solutions to these problems. We found it useful to reduce the expressive complexity which can be found in OWL when constructing our interpretation of ontologies, limiting to unique names and closed worlds and preferring to allow higher level data curation processes deal with the greater ambiguity often inherent in large scale data.</p><p>In future work constraint checks on more OWL features will be explored. Our priority are OWL features that do not come into conflict with manageability of the schema and tractability of constraint checking. We would also like to have a method of checking instance updates which limits checking to entities which could cause constraint failures. Instance updates are generally more frequent than schema changes and so checker execution time will be more important.</p></div>		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Acknowledgement</head><p>This research is partially supported by the European Union's Horizon 2020 research and innovation programme under grant agreement No 644055 (ALIGNED, aligned-project.eu).</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>¬ duplicateClasses(L)</p><p>No two classes may have the same name.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>¬ orphanSubClasses(L)</head><p>No subclass can be a child of an unspecified class.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>¬ classCycles(L)</head><p>No cycles exist in the class hierarchy. ¬ duplicateProperties(L) No two properties have the same name. ¬ orphanSubProperties(L) No Subproperty is the child of an unspecified property.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>¬ propertyCycles(L)</head><p>No cycles exist in the property hierarchy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>¬ invalidRange(L)</head><p>Ranges must refer to classes or types, and must be unique.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>¬ invalidDomain(L)</head><p>Domains must refer to classes or types, and must be unique.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>¬ orphanInstances(L)</head><p>Instances must be members of a class.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>¬ orphanProperties(L)</head><p>Instances must not use properties which are not defined. ¬ invalidInstanceRange(L) An element of the range of a property must be well typed. ¬ invalidInstanceDomain(L) An element of the domain of a property must be well typed. </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="http://cliopatria.swi-prolog.org/help/whitepaper.html" />
		<title level="m">Whitepaper: The ClioPatria semantic web server</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">DBpedia: A nucleus for a web of open data</title>
		<author>
			<persName><forename type="first">Sören</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christian</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Georgi</forename><surname>Kobilarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jens</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zachary</forename><surname>Ives</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">6th International Semantic Web Conference</title>
				<meeting><address><addrLine>Busan, Korea</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="11" to="15" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Berners-Lee</surname></persName>
		</author>
		<ptr target="http://www.w3.org/2000/10/swap/doc/cwm.html" />
		<title level="m">CWM -closed world machine</title>
				<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">XML Schema datatypes in RDF and OWL. W3c working group note</title>
		<author>
			<persName><forename type="first">Jeremy</forename><forename type="middle">J</forename><surname>Carroll</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeff</forename><forename type="middle">Z</forename><surname>Pan</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006-03">March 2006</date>
			<pubPlace>W3C</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Improving curated Web-Data quality with structured harvesting and assessment</title>
		<author>
			<persName><forename type="first">Kevin</forename><forename type="middle">C</forename><surname>Feeney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O'</forename><surname>Declan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wei</forename><surname>Sullivan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rob</forename><surname>Tai</surname></persName>
		</author>
		<author>
			<persName><surname>Brennan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. J. Semant. Web Inf. Syst</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="35" to="62" />
			<date type="published" when="2014-04">April 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">RDF ontology (Re-)Engineering through large-scale data mining</title>
		<author>
			<persName><forename type="first">Johannes</forename><surname>Lorey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ziawasch</forename><surname>Abedjan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Felix</forename><surname>Naumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christoph</forename><surname>Böhm</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Semantic Web Conference (ISWC)</title>
				<imprint>
			<date type="published" when="2011-11">November 2011</date>
		</imprint>
	</monogr>
	<note>Finalist of the Billion Triple Challenge</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">Gavin</forename><surname>Mendel-Gleason</surname></persName>
		</author>
		<ptr target="https://github.com/GavinMendelGleason/dacura" />
		<title level="m">SimpleConsist plugin for ClioPatria</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Adding integrity constraints to OWL</title>
		<author>
			<persName><forename type="first">Boris</forename><surname>Motik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ian</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ulrike</forename><surname>Sattler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">OWLED</title>
				<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">258</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Resource-constrained reasoning using a reasoner composition approach</title>
		<author>
			<persName><forename type="first">Wei</forename><surname>Tai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><surname>Keeney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Declan O'</forename><surname>Sullivan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="35" to="59" />
			<date type="published" when="2015-01">January 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">SWI-prolog</title>
		<author>
			<persName><forename type="first">Jan</forename><surname>Wielemaker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tom</forename><surname>Schrijvers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Markus</forename><surname>Triska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Torbjörn</forename><surname>Lager</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2010-11">November 2010</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
