Towards Vagueness-Aware Semantic Data Panos Alexopoulos1 , Boris Villazon-Terrazas1 , and Jeff Z. Pan2 1 iSOCO, Intelligent Software Components S.A., Av. del Partenon, 16-18, 1-7, 28042, Madrid, Spain, {palexopoulos,bvillazon}@isoco.com 2 Department of Computing Science, University of Aberdeen, Meston Building, Aberdeen, AB24 3UE, UK. Abstract. The emergence in recent years of initiatives like the Linked Open Data (LOD) has led to a significant increase in the amount of structured semantic data on the Web. In this paper we argue that the shareability and wider reuse of such data can very often be hampered by the existence of vagueness within it, as this makes the data’s meaning less explicit. Moreover, as a way to reduce this prob- lem, we propose a vagueness metaontology that may represent in an explicit way the nature and characteristics of vague elements within semantic data. 1 Introduction Ontologies are formal shareable conceptualizations of domains, describing the mean- ing of domain aspects in a common, machine-processable form by means of concepts and their interrelations [4], and enabling the production and sharing of data that are commonly understood among human and software agents. Achieving the latter requires ensuring that the meaning of ontology elements is explicit and shareable, namely that all users have an unambiguous and consensual understanding of what each ontological el- ement actually represents. In this paper we examine how vagueness affects shareability and reusability of semantic data. Vagueness is a common natural language phenomenon, demonstrated by concepts with blurred boundaries, like tall, expert etc., for which it is difficult to determine precisely their extensions (e.g. some people are borderline tall: neither clearly “tall” nor “not tall”) [5]. Our position is threefold. i) That vagueness exists not only within isolated, application- specific, semantic data but also in public datasets that should be shareable and reusable. ii) That vagueness hampers the comprehensibility and shareability of these datasets and cause problems. iii) That the negative effects of vagueness can be partially tackled by making the data vagueness-aware, namely by annotating their elements with metain- formation about the nature and characteristics of their vagueness. In the next section we explain and support the first two parts of our position with real world examples. In section 3 we describe how semantic data can become vagueness-aware via a vagueness metaontology. Sections 4 and 5 present related work and summarize our own. 2 Motivation and Approach Rationale The possibility of vagueness in ontologies and semantic data has long been recognized in the research literature, especially in the area of Fuzzy Ontologies [3] [2]. An inspec- tion of well-known ontologies and public semantic data reveals that the possibility is indeed a reality. A characteristic group of such elements are categorization relations where entities are assigned to categories with no clear applicability criteria. An exam- ple of such a relation is “hasFilmGenre”, found in Linked Data datasets like Linked- MDB (http://linkedmdb.org) and DBpedia (http://dbpedia.org), that relates films with the genres they belong to. As most genres have no clear applicability criteria there will be films for which it is difficult to decide whether or not they be- long to a given genre. A similar argument can be made for the DBpedia relations “is dbpedia-owl:ideology of ” and “dbpedia-owl:movement”. Another group of vague ele- ments comprises specializations of concepts according to some vague property of them. Examples include “Famous Person” and “Big Building”, in the Cyc Ontology (http: //www.cyc.com/platform/opencyc), and “Managerial Role” and “Competi- tor”, found in the Business Role Ontology (http://www.ip-super.org). The presence of vague terms in semantic data often causes disagreements among the people who develop, maintain or use it. Such a situation arose in a real life sce- nario where we faced significant difficulties in defining concepts like “Critical System Process” or “Strategic Market Participant” while trying to develop an electricity mar- ket ontology. When, for example, we asked our domain experts to provide exemplary instances of critical processes, there was dispute among them about whether certain pro- cesses qualified. Not only did different domain experts have different criteria of process criticality, but neither could anyone really decide which of those criteria were sufficient for the classification. In other words, the problem was the vagueness of the predicate “critical”. While disagreements may be overcome by consensus, they are inevitable as more users alter, extend, or use semantic data. A worse situation is when a user misinter- prets the intended meaning of a vague term and uses it wrongly. Imagine an enterprise ontology where the concept “Strategic Client” was initially created and populated by the company’s Financial Manager whose implicit criterion was the amount of revenue the clients generated for the company. Imagine also the new R&D Director querying the instances of this concept when crafting an R&D strategy. If their own applicability criteria for the term “Strategic” do not coincide with the Financial Manager’s, using the returned list of clients might lead to poor decisions. The above examples show how the inherent context-dependence and subjectivity that characterizes vagueness may affect shareability in a negative way, due to potential disagreements or misunderstandings. More generally, typical use-case scenarios where this may happen include: 1. Structuring Data with a Vague Ontology: When domain experts are asked to define instances of vague concepts and relations, then disagreements may occur on whether particular entities constitute instances of them. 2. Utilizing Vague Facts in Ontology-Based Systems: When knowledge-based sys- tems reason with vague facts, their output might not be optimal for those users who disagree with these facts. 3. Integrating Vague Semantic Information: When semantic data from several sources need to be merged then the merging of particular vague elements can lead to data that will not be valid for all its users. 4. Evaluating Vague Semantic Datasets for Reuse: When data practitioners need to decide whether a particular dataset is suitable for their needs, the existence of vague elements can make this decision harder. It can be quite difficult for them to assess a priori whether the data related to these elements are valid for their application context. To reduce the negative effects of vagueness, we put forward the notion of vagueness- aware semantic data, informally defined as “semantic data whose vague ontological elements are accompanied by comprehensive metainformation that describes the nature and characteristics of their vagueness”. For example, a useful piece of metainformation is the set of applicability criteria that the element creator had in mind when defining the element (e.g. the amount of generated revenue as a criterion for a client to be strate- gic in the previous section’s example). Another is the element creator itself (e.g. the author of a vague fact). In any case, our position is that having such metainformation, explicitly represented and published along with the vague semantic data, can improve the latter’s comprehensibility and shareability, especially in regard to the four scenar- ios of the previous section. For example, the knowledge of the same vague concept’s intended applicability criteria in two different datasets can i) prevent their merging in case these criteria are different and ii) help a data practitioner decide which of these two concepts’s associated instances are more suitable for his/her application. 3 Making Ontologies Vagueness-Aware 3.1 Key Vagueness Aspects In the literature two kinds of vagueness are identified: quantitative- or degree-vagueness; and qualitative- or combinatory vagueness [5]. A predicate has degree-vagueness if the existence of borderline cases stems from the lack of precise boundaries for the predicate along one or more dimensions (e.g. “bald” lacks sharp boundaries along the dimension of hair quantity while “red” can be vague for both brightness and saturation). A pred- icate has combinatory vagueness if there are a variety of conditions pertaining to the predicate, but it is not possible to make any crisp identification of those combinations which are sufficient for application. A classical example of this type is “religion” as there are certain features that all religions share (e.g. beliefs in supernatural beings, rit- ual acts) yet it is not clear which are able to classify something as a religion. Based on this typology, we suggest that for a given vague term it is important to represent and share the following explicitly: – The type of the term’s vagueness: Knowing whether a term has quantitative or qualitative vagueness is important as elements with an intended (but not explicitly stated) quantitative vagueness can be considered by others as having qualitative vagueness and vice versa. – The dimensions of the term’s quantitative vagueness: When the term has quan- titative vagueness it is important to state explicitly its intended dimensions. E.g., if a CEO does not make explicit that for a client to be classified as strategic, its R&D budget should be the only pertinent factor, it will be rare for other company members to share the same view as the vagueness of the term “strategic” is multi- dimensional. – The necessary applicability conditions of the term’s qualitative vagueness: Even though a term with qualitative vagueness lacks a clear definition of sufficient conditions for objects to satisfy it, it can still be useful to define the conditions that are necessary for its applicability. This will not only narrow down the possible interpretations of the term (by including conditions that other people may forget or ignore) but will also provide better grounding on any discussion or debate that might arise about its meaning. Furthermore, vagueness is subjective and context dependent. The first has to do with the same vague term being interpreted differently by different users. Two company executives might have different criteria for the term “strategic client”. Even if they share an understanding of the type and dimensions of this term’s vagueness, a certain amount of R&D budget (e.g. 1 million euros) makes a client strategic for one but not the other. Similarly, context dependence has to do with the same vague term being interpreted or applied differently in different contexts even by the same user; celebrating an anniver- sary is different to celebrating a birthday when it comes to judging how expensive a restaurant is. Therefore we additionally suggest that one should explicitly represent the term’s creator as well as the applicability context for which it is defined or in which it is used. 3.2 A Metamodel of Vague Ontology Elements Ontology elements that can be vague are typically concepts, relations, attributes and datatypes [2]. A concept is vague if – in the given domain, context or application sce- nario – it admits borderline cases; namely if there could be individuals for which it is indeterminate whether they instantiate the concept. Similarly, a relation is vague if there could be pairs of individuals for which it is indeterminate whether they stand in the rela- tion. The same applies for attributes and pairs of individuals and literal values. Finally, a vague datatype consists of a set of vague terms which may be used within the ontology as attribute values (e.g. performance may take as values terms like poor, mediocre and good). To formally represent these vague elements by means of a metaontology, we con- sider the OWL metamodel defined in [6] and extend it by defining each vague element as a subclass of its corresponding element and by defining appropriate metaproperties that reflect the key aspects discussed in the previous sections. Figures 1 and 2 provide an overview of the metamodel while a concrete example of how this may be used to an- notate a vague ontology is available at http://boris.villazon.terrazas. name/data/VagueOntologyExample.ttl The metamodel is to be used by producers and consumers of semantic data, the former utilizing it to annotate the vague part of their ontologies with relevant metain- formation and the latter querying this metainformation to better use them. Vagueness annotation is a manual task, meaning that knowledge engineers and domain experts should detect the vague elements, determine the relevant characteristics (type, dimen- sions, etc.) and populate the metamodel. How this task may be best facilitated is a subject for further research, but a good starting point would be the integration of the process within traditional semantic data production processes. Regarding the consump- tion of a vagueness-aware ontology, the first benefit it has for its potential users is that Fig. 1. Classes of Vagueness Metamodel Fig. 2. Properties of Vague Elements it makes them aware of the existence of vagueness in the domain. This is important be- cause vagueness is not always obvious, meaning it can easily be overlooked and cause problems. The second benefit is that the ontology’s users may query each of the vague elements’ metainformation and use it in order to reduce these problems. For example, when structuring data with a vague ontology, disagreements may oc- cur on whether particular objects are instances of vague concepts. If, however, informa- tion like the applicability conditions and contexts of these elements are known to the people who perform this task, then their possible interpretation spaces will be reduced. Also, when vague elements are used within some end-user application, the availability of vagueness metainformation can help the system’s developers in two ways. i) It will make them aware of the fact that the ontology contains vague information and thus some of the system’s output might not be considered accurate by the end-users. ii) They may use the vagueness metainformation to try to deal with that. For example, the applicabil- ity context of a vague axiom can be used in a recommendation system to explain why a particular item was recommended. Finally, in dataset integration and evaluation scenar- ios, the vagueness metamodel can be used to compare ontologies’ vagueness compati- bility. For example, if the same two vague classes have different vagueness dimensions, then the one class’s set of instance membership axioms might not be appropriate for the second’s as it may have been defined with a different vagueness interpretation in mind. A simple query to the two ontologies’ vagueness metamodel could reveal this issue. 4 Related Work Representing semantic data metainformation is common in the community, like the VoID vocabulary for describing Linked datasets [1]. However, no vagueness-related vocabularies are yet available. In a more relevant approach an OWL 2 model for repre- senting fuzzy ontologies is defined [3]. It focuses, however, on enabling the represen- tation of fuzzy degrees and fuzzy membership functions within an ontology, without any information regarding the intended meaning of the fuzzy elements’ vagueness or the interpretation of their degrees (e.g. the dimensions a concept membership degree covers). Thus, our approach is complementary to fuzzy ontology related works, in the sense that it may be used to enhance the comprehensibility of fuzzy degrees. 5 Conclusions and Future Work In this paper we considered vagueness in semantic data and we demonstrated the need and potential benefits of making the latter vagueness-aware by annotating their elements with a metaontology that explicitly describes the vagueness’s nature and characteristics. The idea is that even though the availability of the metainformation will not eliminate vagueness, it will manage to reduce the high level of disagreement and low level of comprehensibility it may cause. This increased semantic data comprehensibility and shareability we intend to establish in our future work through user-based experiments. Acknowledgement The research has been funded from the People Programme (Marie Curie Actions) of the European Union’s 7th Framework Programme P7/2007-2013 under REA grant agree- ment no 286348. References 1. Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing Linked Datasets On the Design and Usage of VoID , the “Vocabulary Of Interlinked Datasets”, VoID working group, 2009. 2. Alexopoulos, P., Wallace, M., Kafentzis, K., Askounis, D.: IKARUS-Onto: A Methodology to Develop Fuzzy Ontologies from Crisp Ones. Knowledge and Information Systems, 32(3):667- 695, September 2012. 3. Bobillo, F., Straccia, U.: Fuzzy ontology representation using OWL 2. International Journal of Approximate Reasoning, 52(7):1073-1094, October 2011. 4. Chandrasekaran, B., Josephson, J.R., Benjamins, V.R.: What are ontologies, and why do we need them. IEEE Intelligent Systems, pp 20-26, 1999. 5. Hyde, D.: Vagueness, Logic and Ontology. Ashgate New Critical Thinking in Philosophy, 2008. 6. Vrandecic, D., Volker, J., Haase, P., Tran, D.T., Cimiano, P.: A Metamodel for Annotations of Ontology Elements. In Proceedings of the 2nd Workshop on Ontologies and Meta-Modeling, 2006.