=Paper=
{{Paper
|id=Vol-3249/paper2-FMKD
|storemode=property
|title=Representation Heterogeneity (short paper)
|pdfUrl=https://ceur-ws.org/Vol-3249/paper2-FMKD.pdf
|volume=Vol-3249
|authors=Fausto Giunchiglia,Mayukh Bagchi
|dblpUrl=https://dblp.org/rec/conf/jowo/GiunchigliaB22
}}
==Representation Heterogeneity (short paper)==
Representation Heterogeneity ∗ Fausto Giunchiglia, Mayukh Bagchi Department of Information Engineering and Computer Science, University of Trento, I-38123 Povo, Trento, Italy Abstract Semantic Heterogeneity is conventionally understood as the existence of variance in the representation of a target reality when modelled, by independent parties, in different databases, schemas and/ or data. We argue that the mere encoding of variance, while being necessary, is not sufficient enough to deal with the problem of representational heterogeneity, given that it is also necessary to encode the unifying basis on which such variance is manifested. To that end, this paper introduces a notion of Representation Heterogeneity in terms of the co-occurrent notions of Representation Unity and Representation Diversity. We have representation unity when two heterogeneous representations model the same target reality, representation diversity otherwise. In turn, this paper also highlights how these two notions get instantiated across the two layers of any representation, i.e., Language and Knowledge. Keywords Semantic Heterogeneity, Representation, Unity, Diversity. 1. Introduction The phenomenon of Semantic Heterogeneity is conventionally understood as the existence of variance in the representation of the same target reality when computationally modelled by independent parties in database schemas or data sets [1]. The principal ramifications of semantic heterogeneity in data management include representational incompleteness and inconsistency with the consequent loss of semantic interoperability [2]. This is a widely studied problem in increasingly emergent application scenarios like multilingual data integration (see, for instance, [3, 4]) and many partial solutions at the schema and data level have been proposed (see, for instance, [5, 6]). However, so far, there is no unifying model of why and how semantic heterogeneity manifests itself and, even less, a general solution. We prefer to talk of Representation Heterogeneity, rather than of Semantic Heterogeneity, to emphasize the fact that heterogeneity is an intrinsic property of any representation [7], wherein different observers encode different representations of the same target reality depending on the local context [8, 9]. Representation heterogeneity is, in turn, rooted in the more general om- nipresent phenomenon of World Heterogeneity. Thus, for instance, there is a need of determining whether two different (occurrences of) musical instruments are actually the same instrument. In this perspective, we define the problem of representation heterogeneity as follows. Given that (i) there are no two identical occurrences of reality, not even of the same reality, and that ∗ This research has received funding from the “DELPhi - DiscovEring Life Patterns” project funded by the MIUR (PRIN) 2017. The Eighth Joint Ontology Workshops (JOWO’22), August 15-19, 2022, Jönköping University, Sweden Envelope-Open fausto.giunchiglia@unitn.it (F. Giunchiglia); mayukh.bagchi@unitn.it (M. Bagchi) Orcid 0000-0002-5903-6150 (F. Giunchiglia); 0000-0002-2946-5018 (M. Bagchi) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). I CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) (ii) there are no two identical representations, of even of the same occurrence of reality, then (iii) how can we establish whether two heterogeneous representations actually represent the same reality? We argue that the current understanding of semantic heterogeneity as the ‘existence of variance’, while being crucially necessary, is not sufficient. There can be no variance without a prior notion of a unifying reference taken as the basis for computing the variance itself. To that end, we propose to ground the notion of representation heterogeneity in that of world heterogeneity which, in turn, we model as the co-occurrence of (World) Unity and (World) Diversity. Unity 1 models the ability of recognizing two different real world phenomena as different occurrences of the same target reality, and, given Unity, Diversity models the ability of recognizing the existence of differences among them. In turn, this allows to disambiguate the (also omnipresent) phenomenon of representation heterogeneity into the two distinct phenomena of Representation Unity which models the fact that two representations represent the same target reality, and given Representation Unity, the notion of Representation Diversity models the differences between the representations. Finally, we model these two notions it in terms of the co-occurrence of Unity and Diversity into two distinct ordered layers, i.e., Language and Knowledge. The Language layer comprises the heterogeneity arising in the conceptual and lexical-semantic level. Instead, the Knowledge layer comprises of the heterogeneity arising in the schema and data level. This paper is organized as follows. Section 2 introduces the notions of world and represen- tation heterogeneity. Section 3 unwinds the notion of representation heterogeneity into the notions of Language and Knowledge unity and diversity. Section 4 concludes the paper with a short comparison with Brachman and Guarino’s proposed layering of representations. 2. World heterogeneity Consider the motivating example (see Table 1) of the two datasets encoding information about the same target reality - a musical instrument identified as `2290SDC50' . The first dataset is a record in a musical instruments catalog from Europe capturing some geophysical details of the aforementioned instrument such as production, collection, width etc. The second dataset, instead, is a record from the instrument’s host museum in India encoding details such as its company and width. There are at least four levels which complicate the representation of the entity (with label) `2290SDC50' . First, the fact that the same musical instrument is conceptualized differently in the two datasets, viz. chordophone and stringed instrument respectively. Second, but non-trivially, the fact that the first dataset is in English whereas the second dataset is in Hindi. Third, the observation that for modelling the same entity, each dataset employ a different set of properties thus, essentially, leading to two different descriptions. Finally, the fact that even for a common property such as the width, the values recorded are different due to the adherence to different units of measurement. This example is a direct instantiation of the phenomenon of Representation Heterogeneity, which is conventionally understood as the existence of variance in the representation, interpre- tation and resulting meaning. Representation heterogeneity is pervasive, wherein, even for the same target reality, different observers encode different representations depending on the 1 The notion of Unity is unrelated to its namesake in OntoClean [10] Table 1 Two heterogeneous datasets encoding information about musical instruments. Chordophone List No. Production Collection Width Date 2290SDC50 India SDC Museum 162 26-06-1950 तंतु वाद्य (Stringed Instrument) सूची संख्या (Inventory No.) कंपनी (Company) चौड़ाई (Width) २२९०एसडीसी५०(2290SDC50) कला भवन (Kala Bhawan) ६.३८ (6.38) local context, purpose, focus or other factors. In turn, representation heterogeneity is rooted in the unavoidable phenomenon of World Heterogeneity. Genetic diversity allows species to adapt to changes in the environment, production diversity allows economies to adapt to changes in market dynamics, and social and cultural diversity fuel progress in the society. Heterogeneity is the key distinguishing feature of life, there will never be, e.g., two identical places or two identical individuals. Still, despite this, we are able to determine whether two heterogeneous occurrences of reality are actually occurrences of the same reality. Thus for instance, we can determine whether (or not) two different (occurrences of) objects are two instances of a musical instrument or whether (or not) two different (occurrences of) musical instruments are two occurrences of the same instrument. We formalize the intuitions above as follows. World Heterogeneity: there are no two identical occurrences of the same or different realities; Representation Heterogeneity: there are no two identical representations of the same or different occurrences of reality. Notice how the latter is an instance of the former and, as such, it is also unavoidable, as it is the consequent (non-)generality of any representation [11]. It is impossible to construct a representation capable of capturing the infinite richness of the real world and also the infinite ways, provided by language, to describe the world itself. Thus, on one hand, for any chosen representation, there will always be an aspect of the world which is not captured and, on the other hand, there will always be an alternative way to represent the same aspect of the world. Based on these premises, it should be evident that the understanding of semantic heterogeneity as the ‘existence of variance’, while being crucially necessary, is not sufficient to characterize the heterogeneity of representations, and even less to suggest a way to handle it. The crucial observation is that, being everything (and every representation) different from everything else (and any other representation), a proper notion of variance can only be given based on a prior notion of a unifying reference taken as the basis for computing the variance itself. And, once defined the unifying reference, we need also, as a second step, dependent on the previous, to make precise the basis on which we compute what is different. 3. Representation heterogeneity We model the representation heterogeneity in terms of the co-occurrence of the two notions of Representation Unity and Representation Diversity. Representation Unity models the fact that two different representations actually represent the same target reality, e.g., two encounters with the same chordophone `2290SDC50' , or two encounters with a musical instrument. Given a Representation Unity, namely for any two representations for which it has been identified the reason why there is Unity, Representation Diversity models the ability of recognizing their mutual differences. Thus, for instance, we can recognize that two musical instruments (unity) are a chordophone and an aerophone (diversity), or the fact that two musical instruments have different width. Thus, while establishing Representation Unity allows to determine the space of what exists across multiple perceptions of the same target reality, establishing Representation Diversity allows to determine the space of variations, e.g., the properties, of any target reality which was decided to exist. Based on this, we propose the following solution to the problem of representation heterogeneity: For any two representations, given that: (representation heterogeneity): there are no two identical representations of reality, decide whether: the two representations represent the same reality. If this is the case then we say that we have the unity of representations, otherwise we say that we have representation diversity. But how to establish this fact? We do this by recursively reducing the problem of representation heterogeneity to the problems of language and knowledge heterogeneity and, in turn, to the co-occurrence of Unity and Diversity in the Language and the Knowledge layers. Let us briefly highlight how this works in practice. In the Language layer, the need for co-occurrence of Language Unity and Language Diversity are primarily due to the different conceptual hierarchies into which (the same) target realities are hierarchically modelled, in terms of Genus and Differentia [12, 13]. Thus, for instance, Koto, Dulcimer and Guitar are different while being string instruments. Language Unity models the ability of recognizing two different concepts as different occurrences of the same common concept. Thus for instance, in the above example, we establish language unity by establishing that we have musical instruments. Language Diversity models the ability of recognizing the differences between them, in the example above, that we have two different musical experiments. So, we always have both language unity and language diversity. The one which is selected depends on the level of abstraction at which we are thinking. Are we looking for two musical instruments, and we do not care which one, or are we looking for a specific one, e.g., a Guitar? A similar situation happens in the Knowledge layer. If in the language layer we need to define the objects we are looking for, in the knowledge layer we need to clarify the specific properties (of such objects) we are interested in. Knowledge Unity models the ability of recognizing two different entities as different occurrences of the same common entity, while, for any occurrence of Knowledge Unity, Knowledge Diversity models the ability of recognizing the differences between the entities. Thus, for instance, once we have decided at the language level that we are interested in Guitars, we may establish that we are interested in guitars of a certain form, or of a certain color. The selected set of properties determines which unity and diversity we are looking for. The process by which we decide on representation heterogeneity will lead to different results depending on the specifics of what we are looking for, as it is the case in our everyday life. We formalize this in the following notion Language Heterogeneity: whether there is language unity or language diversity depends on the selected level of abstraction in the Genus-Differentia hierarchy; Knowledge Heterogeneity: given a certain language unity, whether there is knowledge unity or knowledge diversity depends on the properties selected as reference. This leads to the following refined solution to the problem of representation heterogeneity: For any two representations, given that: (representation heterogeneity): there are no two identical representations of reality, decide whether: the two representations represent the same reality, based on: the selected language level of abstraction and follow-up selected knowledge level properties. In other words, a general solution to the problem of semantic/ representation heterogeneity must be parametric on the local purpose, with the prupose being characterized in terms of two precise choices at the language and knowledge level. 4. Conclusion - on layering representations Figure 1: Knowledge Representation Levels - Guarino (taken from [14]). Our solution to the problem of semantic heterogeneity is based on a stratification of representa- tions where, starting from the heterogeneity of the world, we articulate the unity and diversity of two representations in terms of a set of choices made at the language and knowledge level. This is not the first time a stratification of representation has been provided. Most noticeable is the model proposed first by Brachman [15] and later modified by Guarino [14, 16] (see Figure 1). Withe respect to Brachman’s proposal, Guarino emphasized that the focus of the epistemological level was on structuring and formal reasoning and not on formal representation of concepts which remained arbitrary and neutral as concerns ontological commitment. He argued against this very ontological neutrality and advocated that a rigorous ontological analysis can greatly “improve the quality of the knowledge engineering process” [14]. To that end, he proposed a new knowledge representation level termed as the ontological level positioned between epistemological and conceptual level (see Figure 1). The ontological level was intended as the “level of meaning” and offered primitives which “ satisfy formal meaning postulates ... restrict[ing] the interpretation of a logical theory on the basis of formal ontology, intended as a theory of a priori distinctions” [14]. As part of the ontological level, several distinctions (in the form of metaproperties, e.g., sortal, rigidity) were proposed (see [16] for full details). The stratification of representation proposed here is orthogonal to the one proposed by Brachman and Guarino. Their work was focused on the knowledge engineering process by which one would generate a certain model of reality. Our work is focused on how to solve the problem of semantic heterogeneity. The solution we propose is a stratified model of how we represent the world and of how we deal with semantic heterogeneity. Quoting from [17], representation heterogeneity should be seen a “feature which must be maintained and exploited” and not a “defect that must be absorbed”, as it is the means by which we cope with the complexity of the world heterogeneity. The work so far, as partially cited in this paper, make us hopeful. References [1] A. Halevy, Why your data won’t mix: New tools and techniques can help ease the pain of reconciling schemas., Queue 3 (2005) 50–58. [2] F. Giunchiglia, M. Fumagalli, Entity type recognition–dealing with the diversity of knowl- edge, in: Proc. KRR, volume 17, 2020, pp. 414–423. [3] G. Bella, L. Elliott, S. Das, S. Pavis, E. Turra, D. Robertson, F. Giunchiglia, Cross-border medical research using multi-layered and distributed knowledge, in: 10th Int. Conf. on Prestigious Applications of Intelligent Systems @ ECAI 2020, IOS Press, 2020. [4] F. Giunchiglia, A. Zamboni, M. Bagchi, S. Bocca, Stratified data integration, in: 2nd Int. Wshop On Knowledge Graph Construction (KGCW), co-located with ESWC, 2021. [5] C. A. Knoblock, P. Szekely, J. L. Ambite, A. Goel, S. Gupta, K. Lerman, M. Muslea, M. Taheriyan, P. Mallick, Semi-automatically mapping structured sources into the semantic web, in: Extended Semantic Web Conference, Springer, 2012, pp. 375–390. [6] M. Leida, A. Gusmini, J. Davies, Semantics-aware data integration for heterogeneous data sources, Journal of Ambient Intelligence and Humanized Computing 4 (2013) 471–491. [7] F. Giunchiglia, M. Fumagalli, On knowledge diversity., in: JOWO, 2019. [8] F. Giunchiglia, M. Fumagalli, Concepts as (recognition) abilities, in: Formal Ontology in Information Systems, IOS Press, 2016, pp. 153–166. [9] F. Giunchiglia, M. Fumagalli, Teleologies: Objects, actions and functions, in: International conference on conceptual modeling (ER 2017), Springer, 2017, pp. 520–534. [10] N. Guarino, C. Welty, Evaluating ontological decisions with ontoclean, Communications of the ACM 45 (2002) 61–65. [11] P. Bouquet, F. Giunchiglia, Reasoning about theory adequacy. a new solution to the qualification problem, Fundamenta Informaticae 23 (1995) 247–262. [12] F. Giunchiglia, K. Batsuren, G. Bella, Understanding and exploiting language diversity., in: IJCAI, 2017, pp. 4009–4017. [13] F. Giunchiglia, L. Erculiani, A. Passerini, Towards visual semantics, Springer Nature Computer Science 2 (2021) 1–17. [14] N. Guarino, The ontological level, Philosophy and the cognitive sciences (1994). [15] R. J. Brachman, On the epistemological status of semantic networks, in: Associative networks, Elsevier, 1979, pp. 3–50. [16] N. Guarino, The ontological level: Revisiting 30 years of knowledge representation, in: Conceptual modeling: Foundations and applications, Springer, 2009, pp. 52–67. [17] F. Giunchiglia, Managing diversity in knowledge, in: ECAI 2006: 17th European Confer- ence on Artificial Intelligence, volume 141, IOS Press, 2006, p. 4.