<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Bounds: Expressing Reservations about Incoming Data</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Martin</forename><forename type="middle">G</forename><surname>Skjaeveland</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Informatics</orgName>
								<orgName type="institution">University of Oslo</orgName>
								<address>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Audun</forename><surname>Stolpe</surname></persName>
							<email>audun.stolpe@ffi.no</email>
							<affiliation key="aff1">
								<orgName type="institution">Norwegian Defence Research Establishment (FFI)</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Bounds: Expressing Reservations about Incoming Data</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">EE7577B9D7950F95B0E753C94779A396</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T21:26+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper introduces the Boundz vocabulary, an RDF vocabulary for expressing reservations about incoming data. We argue that the need for such a vocabulary is real and pressing, and that it is a useful validation tool for any recipient of RDF data that wishes to formulate restrictions on amendments in terms of the data it is already holding. The Boundz vocabulary has a simple mathematical theory that can be expressed in terms of bounded homomorphisms between RDF graphs. We present the basics of this theory, and show that bounded homomorphisms implement conservative extensions over a restricted class of ontology languages, but can also prevent cases of ontology hijacking. We additionally present a prototype implementation with promising evaluation results.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Information attainable through the Web is unique, not only in terms of its scale and diversity, but also in its manner of production, being as it is characterised by collaborative accumulation of data and a lack of central authority and editorial control. This open, distributed and flat nature of the Web is often the essential ingredient that ensures the liveness of web data, exemplified by community curated databases such as Wikipedia, Wikidata and Freebase. Nevertheless, it does have implications for trust, data quality and interface design that may require data publishers to protect themselves from unwanted, independent thirdparty contributions <ref type="bibr" target="#b4">[5]</ref>. There is, of course, no answer to which amendments that ought to be considered harmful in general. Rather, harmfulness is in the eye of the beholder and will depend upon the intended uses of a dataset and/or its associated schema; it may concern the terminology that is used to encode the data or it may concern only the data itself. The following three examples illustrate both cases.</p><p>Ontology hijacking. Ontology hijacking is the contribution of statements about classes and/or properties from a non-authoritative publisher that affects the logical properties, and thus also the reasoning, of those classes and properties. A thirdparty contributor could, for instance, subsume the dcterms:subject property from the Dublin Core vocabulary, say, under its own concept of a ex:topic, but would then, in the terminology of <ref type="bibr" target="#b4">[5]</ref>, be 'hijacking' dcterms:subject. If subsequently reasoning were to be applied to the recipient of the data, this hijacking would result in (at least) one (extra) statement using ex:topic being inferred for each explicitly asserted or inferred statement using dcterms:subject. <ref type="foot" target="#foot_0">3</ref> Thus, ontology hijacking is harmful insofar as it can increase the amount of data that is inferred from the ontology of the recipient considerably. Of course, hijacking can also affect inference over data provided by other parties, parties that may be relying on the terminology of the recipient to stay fixed.</p><p>Ontology-driven faceted browsing. The idea behind faceted search is to analyse and index search items along multiple orthogonal taxonomies that are called subject facets <ref type="bibr" target="#b15">[16]</ref>. From the end-users viewpoint, searching is reduced to the selection of categories along these. In semantic faceted search, the facets are based on ontologies and may be generated by reasoning <ref type="bibr" target="#b15">[16]</ref>. This makes the design of an interface and the user-experience of interacting with the system vulnerable to terminological changes, whence prudence and predictiveness dictates that one does not allow just any third-party to make assertions about classes and properties in the ontology that generates the facets, even though they may be allowed to contribute instance data.</p><p>Closed topics. In recent years the concept of open government data has evolved into a febrile research area which has catalysed major public investments into data dissemination and reuse. The concept has also made its way into international law, e.g., the European Public Sector Information Directive. The access to open government data has been spearheaded by official government websites such as UK's data.gov.uk and its US analogue data.gov. There are also notable examples such as openelectiondata.org, which, although it is not a government initiative, has gained official endorsement. Government data often contains what may be called closed topics, that is, data that once it is published should not be altered or amended. Election results is a case in point. Thus, although a data hub serving government data may wish to remain distributed and collaborative, it may wish to 'seal off' certain subsets of the data while keeping others open.</p><p>In this paper we introduce an RDF vocabulary for expressing reservations about incoming data such as exemplified above. The vocabulary has an appealingly simple theory and admits an efficient implementation. We present one such implementation, together with some preliminary test results that show the feasibility of our approach. The paper is organised as follows. Section 2 recapitulates the theoretical background as set out in <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b14">15]</ref>, where the central concept is that of a bounded homomorphism. We relate bounded homomorphisms to the concept of a logical conservative extension by showing that the co-domain of a homomorphism under the weakest bound is a conservative extension of the domain, given that the homomorphism relates saturated ontologies in which each axiom is expressed as a single triple. However, we also argue as a flip side of the same coin, that bounded homomorphism in general can not be expressed by ontologies-nor need they concern terminological axioms. Even when they do concern axioms, e.g., when protecting a vocabulary against hijacking, ontologies cannot in general express them. In Section 3 we introduce and explain the Boundz vocabulary, and present an example of using it. We describe a prototype implementation together with some tentative evaluation results in Section 4. Section 5 contains related work, and we conclude in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Theoretical Background</head><p>Let U , B and L respectively denote pairwise disjoint, fixed and infinite sets of URIs, blank nodes and literals. Fix U = U ∪ B ∪ L as the set of elements. Define the set of (RDF) triples as the set T = (U ∪ B) × U × U. A triple is commonly written as a sequence of its elements, t = s, p, o , where s, p and o are called respectively the subject, predicate and object of the triple. An (RDF) graph G is a finite set of triples. If G is a graph, then U(G) is the set of elements occurring in G.</p><p>The design of the Boundz vocabulary is based on the notion of a bounded RDF homomorphism, which was first introduced in <ref type="bibr" target="#b14">[15]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 1 (Homomorphism). Let G and H be graphs. A homomorphism</head><formula xml:id="formula_0">h : G → H is a function h : U(G) → U(H) satisfying the condition; for all s, p, o ∈ U: s, p, o ∈ G ⇒ h(s), h(p), h(o) ∈ H.</formula><p>RDF homomorphisms, as homomorphisms, reflect the structure of the domain in the co-domain, but they do not in general reflect the structure of the co-domain back into the domain. This is evident since the co-domain of a homomorphism may be a strict superset of the image of the homomorphism, whence the co-domain is not in general addressed by the homomorphism. Nevertheless, properties of the co-domain can be expressed in terms of a homomorphism by placing restrictions on the class of homomorphisms one is willing to consider. In <ref type="bibr" target="#b14">[15]</ref> these restrictions are called bounds:</p><p>Definition 2 (Bounded Homomorphism). Let h : G → H be a homomorphism. A simple bound is one of following conditions; for all s, p, o ∈ U:</p><formula xml:id="formula_1">h(s), p, o ∈ H ⇒ s, p, o ∈ G (S) s, h(p), o ∈ H ⇒ s, p, o ∈ G (P) s, p, h(o) ∈ H ⇒ s, p, o ∈ G (O) s, p, o ∈ H ⇒ s, p, o ∈ G ( )</formula><p>New bounds may be built from the simple bounds by combining them conjunctively and/or disjunctively. A bounded homomorphism is a homomorphism that satisfies a bound. If h satisfies the bound β, we call h a β-map.</p><p>The essential idea in <ref type="bibr" target="#b14">[15]</ref> is to control the amendment of a dataset by interlocking two graphs in a reciprocal simulation of varying degrees of strength by combining the homomorphism condition with a bound. The two graphs in question represent the recipient of the data before and after a contribution is made, and the relationship between the recipient and the contributor is regulated by requiring the existence of an RDF homomorphism that reflects the structure of each in the other. That is, suppose G is some community-curated data set encoded in RDF, and that H is an amendment contributed by some peer. Then the reservations that G may have about H may be expressed in terms of conditions on an RDF homomorphism h of G into G ∪ H that ensures that G ∪ H and H simulate each other to some extent deemed sufficient from the point of view of G.</p><p>In <ref type="bibr" target="#b14">[15]</ref> it is required of a homomorphism h that it be the identity function on subjects and objects of triples. In other words, the only elements that are allowed to vary from an RDF graph to its homomorphic image are the predicates. It is an important property of this class of RDF homomorphisms that the problem of checking whether a bounded instance exists among them is in P [15, <ref type="bibr">Theorem 9</ref>]. In the present paper, we shall be even more restrictive and require h to be the identity function on all elements of its domain. This yields a purely morphological notion of simulation where the only variations a homomorphism talks about are variations of form. We record this under the name of bounded extension.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 3 (Bounded extension). Let G and H be graphs. If there is bounded homomorphism</head><formula xml:id="formula_2">h : G → G ∪ H, and h(u) = u for all u ∈ U(G), then G ∪ H is a bounded extension of G.</formula><p>There are 19 different non-equivalent bounds for homomorphisms. If we let '⊥' designate 'no bound'-making a ⊥-map an unbounded homomorphism-we can arrange all 19 bounds and ⊥ in a lattice according to logical implication; this is done in Figure <ref type="figure">1</ref>. Here, if we have β 1 ≤ β 2 for two bounds β 1 , β 2 , it means that β 2 is at least as strong as β 1 -meaning that any β 2 -map is also a β 1 -map. The weakest bounded homomorphism is the (S ∧ P ∧ O)-map, while -map is the strongest. Figure <ref type="figure" target="#fig_0">2</ref> offers a compact explanation of the patterns of new triples that the target is willing to accept under the different bounds, i.e., the permissible triples in H \ G for a homomorphism h : G → G ∪ H. In the figure, the patterns use n ('new') to indicate that an element in this position must be new to G (n ∈ G), while a ('any') specifies that any element is allowed (a ∈ U). Multiple patterns in a position of the lattice mean that a triple matching any of the patterns satisfies the corresponding bound.</p><p>Bounded homomorphisms can themselves be combined to yield new bounded homomorphisms: </p><formula xml:id="formula_3">Theorem 1. [15, Lemma 7] If h 1 ,</formula><formula xml:id="formula_4">(u) = h 2 (u) for all u ∈ dom(h 1 ) ∩ dom(h 2 ). Then h 1 ∪ h 2 is a bounded homomorphism satisfying the infimum of {β 1 , β 2 }.</formula><p>What this means in practice is that the 19 different bounds in the lattice may be used exercise detailed control over incoming data-if desirable down to the level of the individual vocabulary element and be combined into one homomorphism. We shall work through an example in Section 3. For now, we only pause by the bound labelled S ∧ P ∧ O. This is the weakest non-trivial bound in the lattice, and  it says that resources known to the recipient cannot be put in known relationship to one another, if they do not already stand in those relationships. Since it is the weakest, it follows that every other bound enforces the same restriction. It is interesting therefore that the (S ∧ P ∧ O)-bound simulates conservative extensions for a restricted class of ontologies-a not insignificant fact that we record next.</p><formula xml:id="formula_5">S ∨ P ∨ O S ∨ O S ∨ P O ∨ P S P O S ∨ (P ∧ O) P ∨ (S ∧ O) O ∨ (S ∧ P) (S ∧ P) ∨ (S ∧ O) ∨ (P ∧ O) S ∧ (P ∨ O) P ∧ (S ∨ O) O ∧ (S ∨ P) S ∧ O S ∧ P O ∧ P S ∧ P ∧ O ⊥ Fig.</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Relation to Conservative Extensions of Ontologies</head><p>The notion of a conservative extension has received a fair bit of attention in the description logic literature in recent years, cf., <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b6">7]</ref>, as it provides a mathematical handle on what it means to amend an ontology without compromising the set of conclusions that that ontology already licenses. On the face of it, this ambition is somewhat similar to ours, so it is natural to consider the relationship between the two notions. To be sure, the dissimilarities are at least as obvious: the concept of a conservative extensions is a logical notion, whereas the concept of a bounded homomorphism is purely graph theoretic. For essentially the same reason, the former applies primarily to ontologies, whereas the latter can apply to any formalism represented as graphs. Nevertheless, there are circumstances under which the two concepts coincide.</p><p>Henceforth, an ontology is a set of ontological axioms formulated in a language representable in OWL. The signature of an ontology or axiom χ, sig(χ), is the set of concept, role and individual names occurring in χ, and |= will represent the standard entailment relation for OWL semantics. Furthermore, let RDF(χ) be the RDF representation of an ontology or axiom χ as defined in <ref type="bibr" target="#b8">[9]</ref>, and set the following two definitions. Proof. To simplify the proof we will use the following simple lemma: The conditions of Definition 5 put strong requirements on such ontologies. The first condition requires that all axioms are represented in the RDF mapping of the OWL ontology as singleton triples. This restricts the permissible ontology language, but leaves a well-identified and still useful subset. The set of OWL axioms expressible using a single triple is listed in <ref type="bibr" target="#b8">[9]</ref> and is also used to define the OWL LD profile <ref type="bibr" target="#b3">[4]</ref> (LD for linked data). This profile is the subprofile of the standardised OWL RL profile <ref type="bibr" target="#b7">[8]</ref> restricted to single triple axioms, and is especially designed for the Linked Data community after evaluating the use of ontological constructs in the web of data. It turns out that this profile covers the better portion of the language that is actually in use. Roughly, the profile contains all of the RDFS vocabulary and the different "equality/inequality" axioms for classes, properties and individuals from OWL 2, e.g., owl:disjointWith, owl:equivalentProperty, owl:sameAs and owl:differentFrom, and additionally property types like owl:FunctionalProperty and owl:TransitiveProperty. Important omissions from the profile are owl:someValuesFrom, owl:allValuesFrom, cardinality axioms, and owl:unionOf and owl:intersectionOf. The second requirement of the definition states that the ontology must be completely saturated, i.e., all consequences must be explicitly stated in the ontology. In general, this would be an impossible problem for most ontology languages as the set of all consequences would be infinite. However, for the profile we are restricted to by the first requirement of Definition 5, the size of a completely saturated ontology, when excluding datatype support, is bounded by |C| 3 , where C is the number of resources occurring in the ontology and entailment ruleset <ref type="bibr" target="#b3">[4]</ref>. We believe that this shows that for our purposes the notion of a saturated, single triple ontology is still a useful one.</p><formula xml:id="formula_6">G is a bounded extension of H iff H ⊆ G and if t ∈ G \ H, then u ∈ U(H) for some u ∈ t. If O 1 ⊆ O 2 ,</formula><p>It should be emphasised that the case where conservative extensions and bounded homomorphisms coincide has been carefully circumscribed, and that the similarities do not stretch all that far. Conservative extensions cannot in general be simulated by bounded homomorphisms as we have defined them. Conversely, adding dcterms:subject rdfs:subPropertyOf ex:topic to an ontology that does not already contain ex:topic is conservative, and so is adding a new election result given that the election result is codified in the recipient's terms. Hence, conservative extensions do not offer the detailed level of control required to prevent the kind of cases that were described in the introduction. Yet, these cases are easy to express with bounded homomorphisms. A related fact is that it is not in general possible to express bounds with ontologies. One reason is the close connection between description logics and the guarded fragment of first-order logic, which does not make it possible to express dependencies between two variables of the kind necessary for formulating bounds. Moreover, OWL is not 'directional' in the sense that a homomorphism is, and does therefore not distinguish between elements from the source and the target. We conclude that it is natural to construct a special purpose vocabulary and software to represent and manage such relations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Vocabulary</head><p>The Boundz vocabulary<ref type="foot" target="#foot_1">4</ref> comprises 32 classes, 34 properties and 3 individuals; an informal and simplified overview of its most important top-level classes and properties is depicted in Figure <ref type="figure" target="#fig_2">3</ref>. The central class of the vocabulary is Bound. All the bounds in Figure <ref type="figure">1</ref> are represented as subclasses of this class and in the same hierarchical structure as in the figure. The atomic bounds are S, P and O (and and ⊥), all other bounds are defined from these. Bounds may be assigned to graphs, making the graph a BoundedGraph which should be taken to mean that any incoming data to this graph must satisfy the bounds in order to be accepted by the graph. If multiple bounds are overlapping in scope, the strongest bound overrides weaker bounds, i.e., the accepted exchange should always satisfy all bounds. An ExchangeSchema is a different way of placing bounds on graphs. It is a specification for a data exchange from a set of source graphs to one target graph that must satisfy the bounds in the schema. The result is an Exchange which contains the payload, i.e., the set of triples that successfully passed the bounds, and a set of Violations, which contains the triples not meeting the requirements of the bounds. An exchange schema can also specify whether of not to require that the sources and target are saturated by a reasoner, and if the payload and/or violations should be listed in the resulting exchange instance. The latter is convenient to control if one just wants to check whether a set of bounds are satisfied, and not to replicate the sources into the exchange. The vocabulary also includes the possibility of placing Restrictions and Exceptions on bounds. A restriction gives a way of constraining the scope of the bound to only concern triples with elements of a specified value, of a given type, or belonging to a certain namespace. An exception applies when the conditions posed by a bound are not met and prescribes how the data that did not break the bound should be handled. The current possibilities are to abort the exchange altogether, ignore all data from the data source causing the violation, or ignore only the problematic triples and accept the remaining triples regardless of the data source.</p><p>In the same spirit as the vocabulary R2R <ref type="bibr" target="#b1">[2]</ref>, with which RDF dataset mappings can be specified and published for sharing and re-use, we believe that the Boundz vocabulary can be used to formulate and share bounds for vocabularies. With this in mind we have published a set of bounds which restricts the use of vocabulary elements in the RDFS vocabulary and the OWL LD profile in those cases where one wants protection from ontology hijacking. <ref type="foot" target="#foot_2">5</ref> We believe that this library can grow by adding useful specialised bounds which have natural interpretations for popular vocabularies.</p><p>Example 1. The BBC Music dataset contains, amongst other things, data about artists and their record releases, represented in part using the FOAF vocabulary and the Music Ontology. <ref type="foot" target="#foot_3">6</ref> A mo:MusicArtist is related to his or her mo:Records by the foaf:made relation and may have many mo:fanpages. A record may be of a certain mo:Genre. Suppose the BBC wishes to protect its dataset by requiring that amendments meet the following requirements:</p><p>1. The vocabulary that the BBC uses must not be hijacked by adding new superclasses or superproperties. 2. Adding new foaf:made relationships is not tolerated, unless both artist and record is new to the BBC dataset; their current library is regarded as complete with respect to the albums of enlisted artists, but is open for extensions with new artists. 3. More fanpages may be added, but an existing fanpage cannot be related to more artists. 4. No new information about existing genres may be added. 5. Also, assume the BBC keeps a special dataset about the Beatles which is not under their management, so they want to disallow any new information using only elements from this dataset. However, new information may relate to the Beatles dataset.</p><p>These requirements are enforced by the following bounds:  The Boundz vocabulary identifies bounds with URIs using the bounds' label from Figure <ref type="figure">1</ref> written in prefix notation <ref type="foot" target="#foot_4">7</ref> as the localname of the URI, e.g., bz:KKspo identifies the bound S ∧ P ∧ O, bz:Aso is S ∨ O, bz:s is S, and bz:T is . The example bounds specification above talks about two bounded graphs, ex:bbcmusic and ex:beatles. The latter graph is bounded by bz:KKspo (line 9) which assures that requirement 5 is met; no new triple may re-arrange elements in the Beatles dataset. However, it allows triples where at least one element is not in the receiving dataset, hence, adding triples that use only in part elements from the dataset is permitted. The remaining bounds concern the BBC Music's graph. The bound on line 2 is defined in the boundzLibrary vocabulary and protects the ex:bbcmusic dataset from being hijacked by RDFS axioms, i.e., axioms that superimpose new superclasses, superproperties, and domain and range definitions onto existing concepts and properties. Requirement 3 is specified with the bound on line 3 which disallows adding new foaf:made relationships unless both the subject and object of the triple is new to the receiving target. Line 4 contains a bound which allows adding fanpages if the object of the triple, i.e., the fanpage resource, is new to the target. This bound is equipped with an exception which ignores the violating triples of this bound, and allows other triples to pass. All other bounds in the listing will reject the complete incoming dataset if their conditions are not satisfied. The bounds on lines 6 and 7 require that new triples where the subject or object is of type Genre in the BBC dataset cannot be added, thus making sure that nothing new can be said about genres, i.e., they are write-protected.</p><p>To test the practical usefulness of the Boundz vocabulary we have implemented a test prototype which takes as input an RDF file containing one or more exchange schemata and computes and outputs an exchange instance for every schema. The prototype is written in Java using the Jena framework<ref type="foot" target="#foot_5">8</ref> and Pellet reasoner. <ref type="foot" target="#foot_6">9</ref>After the input file is read into memory, we apply reasoning to the exchange schemata to reveal possible inconsistencies and allow for a simpler parsing of the vocabulary model using, e.g., superproperties to discover the different schemata settings. For each exchange schema, the specified source and target graphs are read into memory, and saturated if this is specified. Exchanges are then computed and written to output according to settings in the schema. Bounds are checked by a simple algorithm which iterates through the specified bounds, searching for violating triples. The prototype implementation is evaluated using the Lehigh University Benchmark data generator. <ref type="foot" target="#foot_7">10</ref> The data generator allows one to create datasets of different sizes and with different content by supplying a random seed. We have generated different combinations of source data ranging in total sizes from 15K triples to 13M triples and tested a single exchange schema specification with various bounds against one target of 6M triples. Each test was repeated 10 times on a regular desktop computer using a maximum of 8 GB of heap space. The running times for checking the bounds and producing outputdisabling output of the payload and violation triples, and excluding the time to load the source and target graphs into memory-are presented in the graph above, the x-axis indicates the sum of the triples in the sources and the y-axis the time in seconds to complete the check. This simple evaluation shows promising results, the increase in time spent develops linearly against increase in size of input, and the running time for checking bounds with sources of 13M triples against a 6M triple sized target is ≈5 minutes. The prototype implementation, complete test and test results are published on http://sws.ifi.uio.no/project/boundz.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Related work</head><p>An important approach to RDF validation is based on expressing integrity constraints in an OWL ontology. Since OWL is designed to supplement rather than to validate data, this approach involves interpreting parts, or the whole of an ontology under a closed world semantics <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b17">18]</ref>. Tools such as TrOWL and Pellet ICV implement this approach, which has the virtue that constraints can be automatically inferred from the domain description in the ontology.</p><p>Another approach is represented by the SPIN SPARQL syntax which offers a vocabulary for encoding SPARQL queries in RDF <ref type="bibr" target="#b5">[6]</ref>. The idea is to link class definitions with SPARQL queries to capture constraints and rules that formalise the expected behaviour of those classes.</p><p>The IBM Resource Shapes vocabulary <ref type="bibr" target="#b10">[11]</ref> describes the properties that a resource of a given types is required to have. Validation over resource shapes can then be implemented as a set of ASK queries over the graph.</p><p>The current paper is a full version of <ref type="bibr" target="#b13">[14]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>We have presented a vocabulary that can be used for implementing the reservations a data hub might have against incoming data that is not under the control of that data hub itself, and we have presented elements of the theory behind it. The vocabulary expresses constraints on an incoming contribution in terms of what data the hub already contains. These constraints can be formalised as bounded homomorphisms from the consuming data hub into the union of the consumer and contribution.</p><p>The possible uses for bounds we have currently identified are automatic validation (or rejection) of incoming data, identifying conservative extensions of simple ontological schemata, write-protecting (parts of) datasets-with different degrees of strength, and simple implementations of trust, e.g., ignoring sources that do not meet specific bounds. Bound sets corresponding to these use cases can be published as independent RDF resources, and they can be combined and re-used for data hubs with similar needs. Checking conformance with a bound set is computationally tractable, and testing shows that it is practically feasible even over fairly large datasets. Our experimental evaluation indicates that the execution time grows linearly in the size of input data.</p><p>Ideas for future work include integrating the Boundz vocabulary with existing vocabularies for describing the content of RDF sources, for instance by using the VoID vocabulary <ref type="bibr" target="#b0">[1]</ref> to capture the relationship between the receiving dataset and the exchange payload. We also plan to extend the theory and the vocabulary to cover a symmetric notion of bounds. Currently, our approach is based on regarding one graph as the receiver and the other as the contributor, and the bounds are designed to protect the content of the receiver from being distorted or skewed by the contributor. A natural generalisation is to consider both as peers and to redefine the payload as the uncontroversial subset of the union of the datasets. A potentially interesting further development is to add degrees of trust to the mix by ordering bound sets according to priority or the trustworthiness of its issuer.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Permissible triple patterns.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>then, trivially, O 2 is not a conservative extension of O 1 and RDF(O 2 ) is not a bounded extension of RDF(O 1 ), so assume otherwise. Let h : RDF(O 1 ) → RDF(O 2 ) be a bounded homomorphism, and α be axiom such that sig(α) ⊆ sig(O 1 ). If O 2 |= α, then by Definition 5, t = RDF(α) ∈ RDF(O 2 ), where t is a single triple. Since sig(α) ⊆ sig(O 1 ), we can apply the lemma and get that RDF(α) ∈ RDF(O 1 ), thus, by Definition 5 again, O 1 |= α. If O 1 |= α, then O 2 |= α, by Definition 5 and the fact that h maps identically from RDF(O 1 ) to RDF(O 2 ). The other direction of the proof is similar.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Boundz vocabulary, an informal and simplified excerpt.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>1 ex:bbcmusic a bz:BoundedGraph ; 2 bz:hasBound bzs:RDFS , 3 [ 4 [ 6 [ 7 [</head><label>123467</label><figDesc>a bz:Aso ; bz:predicateValue foaf:made ] , a bz:o ; bz:predicateValue mo:fanpage ; 5 bz:hasException bz:ignoreViolations ] , a bz:T ; bz:subjectClass mo:Genre ] , a bz:T ; bz:objectClass mo:Genre ] .</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>8 ex:</head><label>8</label><figDesc>beatles a bz:BoundedGraph ; 9 bz:hasBound [ a bz:KKspo ] .</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>h 2 are homomorphisms bounded by β 1 and β 2 respectively, and h 1</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>Definition 4 (Conservative extension). Let O 1 and O 2 be ontologies such that O 1 ⊆ O 2 . We say that O 2 is a conservative extension of O 1 if for every axiom α with sig(α) ⊆ sig(O 1 ) we have O 2 |= α iff O 1 |= α. Definition 5 (Saturated, single triple ontology). An ontology O is a saturated, single triple ontology if 1) for all α ∈ O, RDF(α) is a single triple, and 2) if O |= α then α ∈ O. Theorem 2. Let O 1 and O 2 saturated, single triple ontologies, then O 2 is a conservative extension of O 1 iff RDF(O 2 ) is a bounded extension of RDF(O 1 ).</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">Consult<ref type="bibr" target="#b4">[5]</ref> for a formal definition of ontology hijacking and an evaluation illustrating how it may have significant unintentional, hence possibly harmful, effects.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">Vocabulary URI: http://sws.ifi.uio.no/vocab/boundz</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">Vocabulary URI: http://sws.ifi.uio.no/vocab/boundzLibrary</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_3">See http://datahub.io/dataset/bbc-music and sample http://www.bbc.co.uk/ music/artists/79239441-bfd5-4981-a70c-55c3f15c1287.rdf.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_4">Also called polish notation; K (koniunkcja) means conjunction and A (alternatywa) means disjunction. We use this notation to avoid parenthesis (and other URL unfriendly characters) in the bound labels.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_5">http://jena.apache.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_6">http://clarkparsia.com/pellet/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_7">http://swat.cse.lehigh.edu/projects/lubm/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Describing Linked Datasets with the VoID Vocabulary</title>
		<author>
			<persName><forename type="first">K</forename><surname>Alexander</surname></persName>
		</author>
		<ptr target="http://www.w3.org/TR/void/" />
	</analytic>
	<monogr>
		<title level="m">W3C Interest Group Note</title>
				<meeting><address><addrLine>W3C</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The R2R Framework: Publishing and Discovering Mappings on the Web</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Schultz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the First Int. Workshop on Consuming Linked Data</title>
				<meeting>of the First Int. Workshop on Consuming Linked Data</meeting>
		<imprint>
			<date type="published" when="2010">COLD2010. 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Did I Damage my Ontology? A Case for Conservative Extensions in Description Logics</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ghilardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lutz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wolter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 10th Int. Conference on Principles of Knowledge Representation and Reasoning (KR&apos;06</title>
				<meeting>of the 10th Int. Conference on Principles of Knowledge Representation and Reasoning (KR&apos;06</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">OWL: Yet to arrive on the Web of Data?</title>
		<author>
			<persName><forename type="first">B</forename><surname>Glimm</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the WWW2012 Workshop on Linked Data on the Web</title>
				<meeting>of the WWW2012 Workshop on Linked Data on the Web<address><addrLine>LDOW</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012">2012. 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Scalable Authoritative OWL Reasoning for the Web</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hogan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Harth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Polleres</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. Journal on Semantic Web and Information Systems</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="49" to="90" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">SPIN -Modeling Vocabulary</title>
		<author>
			<persName><forename type="first">H</forename><surname>Knublauch</surname></persName>
		</author>
		<ptr target="http://www.w3.org/Submission/spin-modeling/" />
	</analytic>
	<monogr>
		<title level="m">W3C Member Submission</title>
				<meeting><address><addrLine>W3C</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Conservative Extensions in Expressive Description Logics</title>
		<author>
			<persName><forename type="first">C</forename><surname>Lutz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Walther</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wolter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 20th Int. Joint Conference on Artificial Intelligence (IJCAI-07)</title>
				<meeting>of the 20th Int. Joint Conference on Artificial Intelligence (IJCAI-07)</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<ptr target="http://www.w3.org/TR/owl2-profiles/" />
		<title level="m">OWL 2 Web Ontology Language: Profiles</title>
				<editor>
			<persName><forename type="first">B</forename><surname>Motik</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
	<note>Second Edition</note>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<ptr target="http://www.w3.org/TR/owl2-mapping-to-rdf/" />
		<title level="m">OWL 2 Web Ontology Language: Mapping to RDF Graphs</title>
				<editor>
			<persName><forename type="first">P</forename><forename type="middle">F</forename><surname>Patel-Schneider</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Motik</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
	<note>Second Edition</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Closed World Reasoning for OWL2 with NBox</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">Z</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Tsinghua Science and Technology</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="692" to="701" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">OSLC Resource Shape: A language for defining constraints on Linked Data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ryman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">L</forename><surname>Hors</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Speicher</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the WWW2013 Workshop on Linked Data on the Web</title>
				<meeting>of the WWW2013 Workshop on Linked Data on the Web<address><addrLine>LDOW</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Towards Integrity Constraints in OWL</title>
		<author>
			<persName><forename type="first">E</forename><surname>Sirin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 6th Int. Workshop on OWL: Experiences and Directions (OWLED</title>
				<meeting>of the 6th Int. Workshop on OWL: Experiences and Directions (OWLED</meeting>
		<imprint>
			<date type="published" when="2009">2009. 2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Bounded RDF Data Transformations</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Skjaeveland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Stolpe</surname></persName>
		</author>
		<ptr target="http://hdl.handle.net/10852/9104" />
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
		<respStmt>
			<orgName>. University of Oslo</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Tech. rep</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Bounds: Expressing Reservations about Incoming Data</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Skjaeveland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Stolpe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Position paper for W3C&apos;s RDF Validation Workshop-Practical Assurances for Quality RDF Data</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Preserving Information Content in RDF Using Bounded Homomorphisms</title>
		<author>
			<persName><forename type="first">A</forename><surname>Stolpe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Skjaeveland</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web: Research and Applications</title>
				<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">7295</biblScope>
			<biblScope unit="page" from="72" to="86" />
		</imprint>
	</monogr>
	<note>LNCS. Proc. of the 9th ESWC 2012</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">User-Centric Faceted Search for Semantic Portals</title>
		<author>
			<persName><forename type="first">O</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Viljanen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hyvänen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web: Research and Applications</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">4519</biblScope>
			<biblScope unit="page" from="356" to="370" />
		</imprint>
	</monogr>
	<note>LNCS. Proc. of the 4th ESWC 2007</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Adding Integrity Constraints to the Semantic Web for Instance Data Evaluation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Tao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2010</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="volume">6497</biblScope>
			<biblScope unit="page" from="330" to="337" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">TrOWL: Tractable OWL 2 Reasoning Infrastructure</title>
		<author>
			<persName><forename type="first">E</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">Z</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ren</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web: Research and Applications</title>
				<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="volume">6089</biblScope>
			<biblScope unit="page" from="431" to="435" />
		</imprint>
	</monogr>
	<note>LNCS. Proc. of the 7th ESWC 2010</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
