=Paper=
{{Paper
|id=Vol-3603/Paper12
|storemode=property
|title=KOSonto: an Ontology for Knowledge Organization Systems, Their Constituents, and Their
Referents
|pdfUrl=https://ceur-ws.org/Vol-3603/Paper12.pdf
|volume=Vol-3603
|authors=Jean Noël Nikiema,Fleur Mougin,Vianney Jouhet,Stefan Schulz
|dblpUrl=https://dblp.org/rec/conf/icbo/NikiemaMJ023
}}
==KOSonto: an Ontology for Knowledge Organization Systems, Their Constituents, and Their
Referents==
KOSonto: An ontology for knowledge organization systems, their constituents, and their referents Jean Noel Nikiema1,2,∗ , Fleur Mougin3 , Vianney Jouhet4,3 and Stefan Schulz5,6 1 Department of Management, Evaluation and Health Policy, School of Public Health, Université de Montréal, Canada 2 Centre de recherche en santé publique, Université de Montréal et CIUSSS du Centre-Sud-de-l’Île-de-Montréal, Canada 3 Univ. Bordeaux, Inserm UMR 1219, Bordeaux Population Health Research Center, team AHeaD, Bordeaux, France 4 CHU de Bordeaux, Pôle de santé publique, Service d’information médicale, Bordeaux, France 5 Medical University of Graz, Institute for Medical Informatics, Statistics and Documentation, Graz, Austria 6 Averbis GmbH, Freiburg, Germany Abstract The structure of knowledge organization systems (KOSs) – domain vocabularies, thesauri, terminologies, classification systems, and ontologies – follows different architectural principles and semantic theories. However, many use cases require their integrated use in a given domain. Building a common framework for KOSs is then a prerequisite for any principled account of their use when data annotated by different KOSs should be integrated. We propose an approach rooted in formal ontology, the aim of which is to harmonize the description of the domain itself with the description of the representational artifacts that claim to organize and represent knowledge of this domain. We propose a transparent framework for describing KOSs with a focus on the biomedical domain. Using comprehensive and consistent terminology, we formalize what KOSs represent by introducing KOSonto, an ontology that characterizes representational artifacts on the one hand and describes the relationships to their referents in the domain of application on the other hand. KOSonto uses OWL-DL axioms and is built under BFO and IAO. It accounts for a range of elements that are characteristic of different kinds of KOSs. We illustrate how KOSonto can be used to describe typical biomedical KOSs such as ICD-10, SNOMED CT, and MeSH. Further work will improve the alignment of KOSonto to foundational ontologies and apply this framework to optimize the creation, use, and reuse of mappings between heterogeneous KOSs. Keywords Knowledge organization systems, formal ontologies, KOSonto 1. Introduction For decades, biomedical informatics has focused on artifacts for organizing domain knowl- edge [1]. Terminologies, controlled vocabularies, dictionaries, thesauri, classifications, nomen- clatures, and ontologies – for which we will use the overarching term “knowledge organiza- tion systems” (KOSs) [2] – vary in scope, granularity, and design principles. Most of them Proceedings of the International Conference on Biomedical Ontologies 2023, August 28th-September 1st, 2023, Brasilia, Brazil ∗ Corresponding author. Envelope-Open jean.nikiema@umontreal.ca (J. N. Nikiema); fleur.mougin@u-bordeaux.fr (F. Mougin); vianney.jouhet@chu-bordeaux.fr (V. Jouhet); stefan.schulz@medunigraz.at (S. Schulz) Orcid 0000-0002-9396-6423 (J. N. Nikiema); 00000-0002-7436-3010 (F. Mougin); 0000-0001-5272-2265 (V. Jouhet); 0000-0001-7222-3287 (S. Schulz) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 130 were created to address specific use cases, often without making their model of meaning explicit. Efforts invested in KOSs have contributed to interoperability and to a better understanding of (classes of) domain entities and the terminological units that refer to them. However, many of the current denominations for KOS are imprecise and misleading. Particularly, the terms “Controlled vocabulary” and “Terminology systems” are misleading because they suggest that these systems mostly describe human language, although they are often applied to KOSs that have a clear focus on the description of domain entities. For example, systems like ICD-10, ChEBI, the Gene Ontology, and the NCBI taxonomy are often referred to as controlled vocabu- laries although they do not convey any lexical information. Words such as “term”, “concept”, “ontology”, “entity”, “descriptor”, “class”, “property”, and “relation” are used inconsistently, which leads to misunderstandings, particularly in cross-disciplinary cooperations. It also raises the question of why attempts to standardize the entities of a given domain, viz. biomedicine and health, are not underpinned by a meta-level standardization of the description of the realm of representation itself, viz. its symbols and its relation to the language and the reality of the domain itself. Only a few principled analyses of the domain of representation itself have been carried out. In the 1990s, MIVoc set out to standardize basic semantic notions in medical informatics [3]. Since then, library science, linguistics, philosophy, and the semantic web have fueled knowledge orga- nization activities, without much uptake of MIVoc, which was withdrawn in 2006. Addressing the need for international cooperation, a multilingual scale standardization initiative has been proposed [4], again with only minor generalizations of the usage of this initiative. The result is a confusing mix of approaches, theories, technical terms, and conceptualizations that have been waiting for a thorough cleanup. It is imperative to carry out this long-awaited cleaning-up process due to the growing importance of diverse KOSs, such as ICD-10, SNOMED CT, MeSH, and MedDRA for biomedical knowledge management. Each of these systems possesses a particular structure, user community, scenarios of use, and representational philosophy. This is particularly pressing where the integration and alignment of KOSs require semantic harmonization between KOSs (and data annotated with them). Indeed, KOSs have always been used in combination. Therefore, despite their structural and semantic heterogeneity, semantic links and correspondences between their elements are sought. There is a long tradition of building bridges between biomedical KOSs. KOS alignment has been an important topic in the semantic web and knowledge graph communities. Numerous heuristics for KOS alignment/mapping/harmonization have been described [5]. In the biomedical domain, the UMLS Metathesaurus [6] has been of great benefit as the longest- running and most enduring effort of KOSs alignment. Additional efforts such as HeTOP [7] and BioPortal [8] are also noteworthy. These resources, driven by practical needs, support some integration of KOSs, without emphasizing their semantic particularities [9]. Only the UMLS Metathesaurus mappings are continually revised manually by domain experts, although there are some automation initiatives [10]. This paper proposes a ontology-based framework for describing KOSs themselves, together with what they denote. We propose “KOSonto”, an OWL model under Basic Formal Ontology 131 (BFO) 20201 [11] and using elements of the Information Artifact Ontology (IAO)2 [12], which is available at https://github.com/JeanNikiema/kosonto. We believe that only after a principled ontological analysis of the constituents of typical KOSs and their representational commitments the value and restrictions of different kinds of KOSs will be sufficiently determined, and the consequences of an alignment between KOSs of different kinds be predicted. A clear and consistent terminology for KOSs themselves should avoid the pitfalls of divergent interpreta- tions of ill-defined words like “ontology”, “concept”, “entity”, “property”, or “knowledge”. We deliberately avoid most of these words (or only use them in a clear context, such as “SNOMED CT concept”). The paper is structured as follows: Section 2 presents KOSonto based on a KOS content framework; in Section 3, we describe some biomedical KOSs (ICD-10, MeSH, SNOMED CT, and a small HL7 value set) according to KOSonto; and we discuss our main findings in Section 4. 2. KOSonto – The ontology of knowledge organization systems 2.1. Kinds and content of knowledge organization systems Different elements support the characterization of KOSs as a meaningful ontological category. First, we consider all KOSs information content entities (ICEs) according to IAO. ICEs are immaterial but inherent in one or more material bearers [13]. For instance, the content of the KOS (e.g., SNOMED CT) just as of a work of fiction (e.g., Victor Hugo’s “Les Contemplations”), both ICEs, can be stored in many electronic storage systems at the same time. In addition, the constituents of KOSs are also ICEs, e.g., the concept 195967001 – “Asthma (disorder)” or the poem “Vere novo”, respectively. Secondly, KOSs and their constituents are linked to referents (detailed in the following subsection) in a real or fictional world they intend to represent. Finally, they are artifacts and as such created by humans. With these three characteristics, KOSs are representational artifacts [14] constituted by a number of representational units (RUs), which denote particular referents. Accordingly, a backbone of a common KOS framework requires a clear characterization of both referents and RUs. KOSonto uses OWL-DL axioms and is classifiable using the HermiT reasoner [15]. The reason for the choice of OWL-DL is popularity, tool support, and human understandability, but also the appropriateness of this language for representing a mostly static domain. KOSonto includes a typology of possible referents of RUs and their ontological foundation, the ontological nature of the RUs themselves, the symbols used in KOSs, and a typology of KOSs. 2.2. The referent as the kind of entity of what is represented in KOSs A referent is what is represented in a KOS, more precisely the thing in the world that is denoted by a RU of a KOS. Everything can be a referent, as the only requirement for being a referent is to be denoted. “Referent” is therefore not a meaningful ontological category and not represented in KOSonto. KOSOnto is based on three fundamental categories: particular entities, type entities, and class entities. 1 https://basic-formal-ontology.org/BFO-2020/ 2 https://github.com/information-artifact-ontology/IAO/ 132 Particular entities3 are concrete in space and time, have objective existence and ontological significance, and exist independently of human perception or language [16]. We introduce the class Particular as the disjunction of bfo:Continuant and bfo:Occurrent. Type entities (or types) correspond to repeatable, or instantiable (often qualified as abstract) entities. When a type is instantiated by a specific particular, this particular can be referred to as an instance of this type. We divide them further into: • Universals as defined by Aristotle: encompass anything that can be instantiated by par- ticulars. Aristotelian universals are immanent, i.e. they exist in their instances, which precludes universals without instances such as unicorns or intergalactic travels. • Types by intension: represent entities of meaning given by means of a formal definition, comparable to the characteristic function in set theory. They do not necessarily extend to things in reality. They allow for defining, e.g., a unicorn as being a pink horse with a single horn, without however claiming its existence. Intensional meanings have classes of particulars as their extensions, including empty classes. • Types by extension: depend on their particular members, without further descriptions. For example, the set {America, Europe, Africa, Asia, Antarctica, Oceania} necessarily and sufficiently corresponds to what is understood by “Continent”. However, such representations are not very common in the biomedical domain. • Cognitive types: correspond to mental representations rooted in language and sensory perceptions, regardless of any concrete correlation. A cognitive representation of a particular or conceptual unicorn or the use of the word “centaur” does not mean that unicorns or centaurs exist. We introduce FictionalType as a subclass of CognitiveType for those things that exist in a fictional world only. Class entities (or classes) are central elements of description logics and are some- times considered equivalent to types [17]. Their set-theoretic semantics empha- sizes the importance of classes of particulars. This is why we grant them a prominent status as siblings of Particular entities and Type entities and fully de- fine them as the extension of a type that can have only particulars as members: Class equivalentTo (extends 𝑠𝑜𝑚𝑒 type ) and (has_member 𝑜𝑛𝑙𝑦 Particular). (1) In KOSonto, classes are always implemented as classes of OWL particulars, ensuring a coherent and well-defined framework. It manages to sidestep the logical conundrum presented by Russell’s paradox [18] and maintains its operational efficacy. Thus, the definition of classes is provided by the set of characteristics of their actual or potential members, the undefined cognitive meaning, or the universal properties inherent in all their members. Whether a class is currently or supposed to be empty is not a unifying criterion. For example, the classes that extend the types Unicorn, Centaur, and Elf are not identical. Indeed, classes may exist without any particular member having all the characteristics identified to be a member. Classes are defined if their necessary and sufficient characteristics allow a particular entity to be recognized as a member of this class; otherwise, the classes are considered primitive. 3 To avoid confusion we highlight that particular entities are modelled as OWL individuals (A-Box elements), but types are also modelled as A-Box entities. 133 Summing up, types have actual or at least hypothetical instances; the instances of a type are the members of the class it extends to. In OWL, the operator to express class membership is, rather confusingly, rdf:type4 . OWL requires a bi-partition between classes (T-box entities) and individuals (A-box entities). For understanding our model, it is therefore important to be aware of the ontological notion of particular as introduced above and the technical notion of “OWL individuals”. Note that all types in the ontological sense are therefore modelled in KOSOnto as OWL individuals, along with particulars proper. KOSonto introduces the object property instance_of as the relation between a particular and a type in the above sense, and is_a as the transitive relation between two types (modelled as A-box entities, i.e. OWL individuals). Our example ontology illustrates the parallelism between types and classes as follows. The axiom asserting that a particular that instantiates a type 𝑇1 also instantiates 𝑇2 if 𝑇1 is_a 𝑇2 is expressed by the property inclusion “instance_of ∘ is_a subPropertyOf instance_of”. We have the A-Box entities (OWL individuals) Horse𝑡𝑦𝑝𝑒 , Vertebrate𝑡𝑦𝑝𝑒 and Animal𝑡𝑦𝑝𝑒 as members of AristotelianUniversal, on which an OWL reasoner (with A-box reasoning) computes the following expected inferences5 : Statements on OWL individuals Reasoner inference Bucephalus; Facts: instance_of Horse𝑡𝑦𝑝𝑒 (2) Bucephalus; Facts: instance_of Vertebrate𝑡𝑦𝑝𝑒 (5) Horse𝑡𝑦𝑝𝑒 ; Facts: is_a Vertebrate𝑡𝑦𝑝𝑒 (3) Bucephalus; Facts: instance_of Animal𝑡𝑦𝑝𝑒 (6) Vertebrate𝑡𝑦𝑝𝑒 ; Facts: is_a Animal𝑡𝑦𝑝𝑒 (4) Horse𝑡𝑦𝑝𝑒 ; Facts: is_a Animal𝑡𝑦𝑝𝑒 (7) We then define OWL classes (T-box entities) based on the hierarchy modeled in the A-box: Statement Reasoner inference Horse equivalentTo instance_of value Horse𝑡𝑦𝑝𝑒 (8) Horse subClassOf Animal (11) Vertebrate equivalentTo instance_of value Vertebrate𝑡𝑦𝑝𝑒 (9) Vertebrate subClassOf Animal (12) Animal equivalentTo instance_of value Animal𝑡𝑦𝑝𝑒 (10) Thus, we represent the hierarchical structure of the ontology at the A-box level underneath the type hierarchy. In KOSonto, we introduce FictionalType subClassOf CognitiveType. While FictionalType does not require specifying the kind of instances of fictional types, the FictionalType class only allows instances of the InformationContentEntity (ICE) type. We can therefore distinguish between types of particulars: (i) those that are not ICEs, (ii) those that are ICEs, and (iii) those that are uncommitted: e.g., horses, centaurs (mythical human-horse hybrids), and sumxus (animals of which science disagrees whether only mythical or really existing), or green horses (potential future breeding result). At the A-box level, these entities can nevertheless be linked together, using formal relations such as is_a or instance_of but also by informal relations such as is_narrower_than in addition to the aforementioned relations: Individual: Sumxus𝑡𝑦𝑝𝑒 ; Facts: is_a Vertebrate𝑡𝑦𝑝𝑒 (13) Individual: GreenHorse𝑡𝑦𝑝𝑒 ; Facts: is_a Horse𝑡𝑦𝑝𝑒 (14) 4 http://www.w3.org/1999/02/22-rdf-syntax-ns#type 5 In the equations, classes are depicted in italics, KOSonto relations and A-Box entities are represented in bold, and other parts of the OWL syntax are shown in normal font. Names of OWL individuals that symbolize types have “type” as subscript. 134 Individual: Centaur𝑡𝑦𝑝𝑒 ; Facts: is_narrower_than Vertebrate𝑡𝑦𝑝𝑒 (15) Individual: Chiron; Facts: instance_of Centaur𝑡𝑦𝑝𝑒 (16) with Centaur𝑡𝑦𝑝𝑒 being a member of FictionalType while Sumxus𝑡𝑦𝑝𝑒 or GreenHorse𝑡𝑦𝑝𝑒 could just be members of CognitiveType or even TypeByIntension. The latter case applies to the scenario where sufficient defining criteria exist, as in the case of the green horse: GreenHorse equivalentTo instance_of GreenHorse𝑡𝑦𝑝𝑒 (17) GreenHorse equivalentTo Horse and has_proper_part some (Hair and bearer_of GreenColor) (18) GreenHorse𝑡𝑦𝑝𝑒 is a member of type, without commitment to the existence of non-informational instances in reality, other than if it were introduced as a member of AristotelianUniversal. It is therefore left open whether the defined class (18) has members. This degressive description is crucial, considering that in certain cases, fictional entities, metaphors, or analogies may be introduced in KOSs to model specific conditions or processes and facilitate the understanding of intricate phenomena. The important concept “Chi” in traditional medicine systems illustrates the need for such a specification. Other examples are mental disorders whose understanding evolves over time in line with new research, clinical insights, and social acceptance (e.g., malleus maleficarum or neurasthenia) or whose existence are denied. One might argue that types, in the broadest sense, also include relational objects such as predicates or relations, just like mathematical objects in general. A discussion of this is beyond the scope of this paper, in which we refer only to ICEs (cf. subsection 2.3). 2.3. The representational unit as an atomic representation in KOSs Having presented the range of possible referents for KOS elements – with a digression beyond realism in order to demonstrate the model’s flexibility – we now turn to the RUs themselves and their grounding in KOSonto. All RUs are ICEs in the sense of IAO, i.e. generically dependent continuants. What can be considered an atomic form of representation in KOSs varies. We propose a different kind of RUs by using the referent as support of categorization. By answering the question from the perspective of a KOS builder (Once we have identified the referent, how can we represent it in a KOS? ) and from a KOS user (For a specific referent, what is its atomic form of representation in a KOS? ), we can identify three overlapping and inclusive levels of RUs. Each of these atomic representations can point to a referent. The minimal requirements for RUs are expressed by (19). This corresponds to level 1, with a term being a human-readable word or phrase, belonging to a domain-specific vocabulary. If it acts as a preferred label, it must be unique in the KOS. Labels are often artificially constructed (e.g., “Biopsy of head and neck structure”) but self-explanatory and unambiguous, regardless of their use in human communication. Although a label can act as a unique identifier, alphanumeric identifiers are more common, apart from the preferred label. 135 RepresentationalUnit equivalentTo InformationContentEntity and proper_part_of some KnowledgeOrganizationSystem and not (has_proper_part some RepresentationalUnit) and (19) ((has_proper_part some (Literal and bearer_of some IdentifierRole) and has_proper_part some (NaturalLanguageTerm and bearer_of some PreferredLabelRole)) or (has_proper_part some OWL_ClassExpression)) In most cases, KOSs offer more than one term per RU (level 2), and they play different roles. KOSonto distinguishes PreferredLabelRole from EntryTermRole with the subclasses Exact- SynonymRole, CloseSynonymRole, AmbiguousSynonymRole, HyponymRole, EllipticSynonymRole. Exact synonyms have the same meaning as the preferred label, and close synonyms have a very similar meaning in the context of the use of this RU. Ambiguous synonyms belong to more than one RU in a KOS, e.g., “lead” for an electric contact or for the chemical element “Pb”. According to the definition provided in [19] for composite representations, KOSonto introduces three composite representations: definitions, descriptions, and exemplifications. Definitions provide sufficient and necessary criteria whereas descriptions (also known as elucidations, e.g., in BFO 2020 [11]) provide only necessary criteria. Exemplifications are descriptions by means of concrete examples. RUs with composite representations are introduced in KOSonto as ExplainedRepresentationalUnit: ExplainedRepresentationalUnit equivalentTo RepresentationalUnit and bearer_of some CompositeTextualRepresentationRole and (20) has_proper_part some Literal Another form of representation is an axiomatic representation. RUs may have composite representations as axioms described in a formal language: FormalRepresentationalUnit equivalentTo RepresentationalUnit and bearer_of some DefiniendumRole and (21) proper_part_of some LogicalAxiom Logical axioms are constituted by logical constructors (symbols and literals) respecting a specific syntax and grammar, e.g., OWL syntax. Logical axioms can be the only atomic representation available for a referent (level 1), or provide, with or without textual composite representations, additional information regarding an RU’s referent(s) (level 3). We have excluded from our analysis those KOS components that denote relational entities, i.e. in their broadest sense n-ary predicates. Whether they are “first-class” RUs or mere connectors between RUs is controversial. KOSonto includes them as subclasses of BinaryPredicate and TernaryPredicate and further elaborates on subclasses of these property classes in terms of hierarchy-building predicates, ontological predicates [20], and predicates according to domain / range restrictions in terms of types, particulars, or literals. For example, OWL object properties (OWL_ObjectProperty) are ontological relations that hold between particulars, and OWL datatype properties (OWL_DataTypeProperty) between particulars and literals. Ternary relations are not supported by OWL [21] and rarely occur in ontologies, with BFO 2020 being a notable exception [11]. 136 3. Application to known biomedical KOSs In this section, we briefly apply our framework to some reference biomedical KOSs. ICD-10 is based on a strictly tree-shaped is_narrower_than hierarchy6 . The disjointness of sibling RUs is a fundamental paradigm, expressed by is_disjoint_with. Exceptions are RUs named “others”, which can be logically described as the complement of the union of their siblings. With existing clinical conditions as their referents, ICD-10 RUs can be described as denoting Aristotelian universals, apart from some examples of epistemic intrusions, such as H40.0 – “Glaucoma suspect”. Such ICD-10 RUs could be seen as instantiating CognitiveType or alternatively ICE (denoting practitioner’s knowledge about a patient). Finally, ICD-10 exhibits its own type-to-type relation named “exclusion”, which restricts the meaning of a given RU (e.g., Diabetes mellitus excludes known cases of Diabetes in pregnancy). The knowledge about a patient’s condition, as a relevant coding criterion, sheds light on the unclear nature of the referents (diseases, signs, symptoms, or diagnoses) [22]. Finally, ICD-10 exhibits terms described by EllipticSynonymRole, i.e. terms that implicitly require their hypernym to be human- understandable. An example is “Lip” as a label for D10.0, which is contextualized by the parent RU D10 – “Benign neoplasm of mouth and pharynx” so that human users intuitively interpret D10.0 as “Benign neoplasm of the lip”. MeSH has an informal tree structure, which can best be represented by the is_nar- rower_than relation – narrower in meaning as defined by SKOS – because the hierarchy encompasses both taxonomic and mereological aspects. Its RUs represent topics in biomedical publications, which are best interpreted as instances of CognitiveType. In the context of MeSH trees, RUs have a tree number as an identifier (ID). However, the same term may belong to several trees with different identifiers. Tree-related RUs and tree-independent RUs have to be distinguished. The latter ones (characterized by a separate unique identifier, or UID) can be interpreted as the hypernyms of the former ones. The hierarchical structure of MeSH is therefore more than just the overlay of trees because there is no transitivity between branches of superposed trees via a shared descriptor. As descriptors are ambiguous, there are no unique la- bels. For example, the descriptor “Nose” has the tree IDs A01.456.505.733, A04.531, and A09.531, as well as the UID: D009666. The coverage of entry terms and free-text definitions is large. Due to the limited granularity of MeSH, many entry terms are not synonyms but hyponyms. SNOMED CT is a KOS based on OWL-EL and can thus be seen as a hierarchy of classes. All its RUs are FormalRepresentationalUnit as they all have axiomatic representations. SNOMED CT allows post-coordination, then also exhibits composite representation as level 1 RUs. SNOMED CT concepts can be seen as extensions of AristotelianUniversals or TypeByIntention, although in some cases such as 249820005 – “Absence of toe (finding)”, a full logical definition, covering the intended meaning of this RU, is not given due to the lack of negation support in OWL-EL. For each RU, most terms are synonyms or near-synonyms with the fully specified names in each language being the preferred label. Textual composite representations are rare. Another particularity is that, in SNOMED CT, everything is named “concept”, even those RUs that correspond to binary predicates and which are represented as OWL object properties. HL7 hl7VS-appointmentReasonCodes, along with many other so-called value sets of the 6 Most of it may be interpreted as is_a 137 HL7 standard is here presented as an example of a minimalist form of a KOS, consisting only of a flat list of RUs, here ROUTINE, WALKIN, CHECKUP, FOLLOWUP, and EMERGENCY. The labels correspond to the ID of each RU. All RUs in this KOS have free-text definitions. The RUs denote Aristotelian universals as they can all be instantiated by a particular appointment. 4. Discussion Related work. KOSonto is the first, strictly ontology-based, attempt to lay a foundation for a principled ontological account of KOSs, in order to support interoperability and data integration in a domain characterized by the use of numerous KOSs with different structures, semantics, partly overlapping content, and diverging use cases. KOSonto is built under BFO and IAO. In IAO, referents are restricted to particulars, following BFO as an ontology of particulars, although BFO has never clearly committed to the representation of portions of reality beyond particulars. However, not all biomedical KOSs of interest are committed to ontological realism [23] – as claimed for OBO Foundry ontologies – and not all discourses in science and health have only particulars or classes of particulars as referents [24]. KOSonto addresses this by extending BFO beyond the Continuant / Occurrent bipartition, by introducing the common parent Particular, which is then juxtaposed to Type and Class. The consideration of types as “first-class citizens”, besides particulars, can also be found in other upper-level approaches, e.g., Lowe’s four-category ontology [25], as well as in the foundational ontologies GFO and UFO [26]. We have also granted this role to Class because it is a central element of KOSs and it is an implementation-specific construct in KOSs. Classes facilitate categorization and organization and can be seen as a specific manifestation or implementation of a type, not the type itself. Despite the focus on BFO and IAO in this work – because of the need to represent ICEs – we aim at the compliance of our approach with other ontological frameworks, including the Ontology of General Topology (OGT), and the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE). KOSOnto deliberately does not use domain-based frameworks as support, such as the semiotic triad [27] or “Ontoterminology” [28], as it prioritizes practical relevance for KOS harmonization over attempts to achieve a painful reconciliation of the diverging philosophical views. Main findings. KOSonto and its application to known KOSs highlight how KOSs of the most diverse types can be described: from small, non-hierarchical to mono- and multi-hierarchical systems, from informal to formal ones, and from reality-based to language-centered ones. Besides providing descriptions for the architectural constituents of KOSs, KOSonto particularly accounts for a typing of the different kinds of referents and their relevance for KOSs: types whose instances are ICEs, types whose instances are particulars, types that do not commit to either being hypothetical or real, non-empty classes of particulars or information entities, non-ontological relations such as is_broader_than. Formal KOSs can be described as KOSs that contain some FormalRepresentationalUnit and are expressed in a language based on logic, such as OWL, OBO syntax, or FOL. However, it is important to note that KOSonto has so far centered on referents that focus on what kind of entity is referenced, and not on the “targeted referents”, i.e. what particular entity proper is meant by the use of an RU in a health record [29]. The changing nature and ambiguity raised by the “targeted referent” represent facets associated with the practical use of RUs, which our framework has not specifically addressed for now. 138 However, by clearly distinguishing different types of referents, it should be compatible with the ideas underlying the cited “Referent Tracking” paradigm. For example, in John Doe’s record, the ICD-11 code “2D10.Z” – within an instance of the FHIR resource Condition – has a particular cancer in John Doe’s thyroid gland as its referent, which is an instance of Thyroid cancer𝑡𝑦𝑝𝑒 (equally a member of the class Thyroid cancer, which is the extension of Thyroid cancer𝑡𝑦𝑝𝑒 ). In contrast, in another Jane Doe’s record, the same ICD-11 code is present, but in a FHIR Condition instance where the slot verificationStatus is filled by Refuted. Here, the referent is not a particular entity (there is no cancer in Jane Doe’s thyroid) but the type Thyroid cancer𝑡𝑦𝑝𝑒 itself, referred to in a negation statement (an ICE instance). This points to the potential of KOSonto in supporting the work on referent tracking. KOSonto also addresses the complex topic of identifiers, lexical features, and textual composite representations of RUs in KOSs. While many classical ontologies are not concerned with lexical features of RUs beyond unique IDs and human-readable labels (e.g., most Gene Ontology classes do not have synonyms), other KOSs (e.g., MeSH) have their focus on a high coverage of lexical units such as variants, synonyms, and entry terms. When describing KOSs, we consider the lexical properties such as labels, synonyms, and textual composite representations as orthogonal to the ontological aspects. Large parts of SNOMED CT are rich in ontology axioms and also in synonyms and entry terms. Textual composite representations are only present in a small part of SNOMED CT. On the other hand, there are small KOSs, such as the HL7 value set described above, which lack both ontological and lexical richness and are limited to unique labels and textual composite representations. This shows that an easy, mono-axial categorization of KOSs, e.g., in the sense that axiom-rich ontologies are at one end of the spectrum and lexicon-rich informal thesauri at the other, is not satisfactory. Further work will require to align KOSOnto with the descriptions of lexical features of KOSs, such as OntoLex-Lemon [30]. Limitations. We have intentionally left out all aspects of metadata (provenance, version, editorial notes, authors, etc.), as there are already initiatives such as the PROV Ontology7 or Metadata vocabulary for Ontology Description and publication (MOD) [31]. We deliberately did not elaborate on the quality issues of KOSs either. Particularly, the mismatch between labeling and implicit meaning of KOSs in a particular scenario of use is a known issue [32], particularly due to numerous exclusion rules in classification systems such as ICD-10, so that labels can no longer be interpreted literally. There is also the problem of fuzzy and even misleading labeling of key concepts such as “Clinical finding” or “Qualifier value” in SNOMED CT [33]. Another quality issue is the misuse of formal languages such as OWL to express thesaurus-style content, driven for example by the popularity of the Protégé ontology editor, which seduces users into creating frame-like knowledge models without being aware of the far-reaching consequences of logical inference, cf. [34] for the NCI thesaurus. Similar issues would arise in implementing classification systems like ICD-10 in OWL [35]. A detailed analysis of KOS quality issues is currently not in the scope of KOSonto. On the other hand, our decision to use OWL for the description of KOSonto limited the expressivity of the ontology. This is a pragmatic compromise by the authors who recognize the fact that, despite the reasons mentioned above for using OWL, even OWL is not always well understood and implemented in its full expressivity. 7 https://www.w3.org/TR/prov-o/ 139 5. Conclusion and future work By developing KOSonto, we responded to the need for a principled analysis and description of KOSs in the biomedical field. The modeling principles of KOSonto are of important signif- icance, as they lay the foundation for a deeper understanding of how biomedical language and discourse, biomedical entities in reality, and representational artifacts are interconnected. Recognizing that KOSs are an extremely heterogeneous class of knowledge artifacts, built by different communities, for different purposes, on different knowledge organization traditions and using different architectures, it is difficult to foresee a convergent evolution in the near future. Therefore, the need for mapping between different KOSs becomes inevitable, demanding a common framework. The proposed ontology not only characterizes the representational artifacts themselves but also delves into their relationships with a wide range of referents, span- ning from real entities to hypothetical and even fictional entities, all of which hold relevance within healthcare and life science discourse. Moving forward, the next crucial phase entails evaluating the suitability of the KOSonto framework for formally describing and facilitating mapping and harmonization activities across diverse KOSs. This evaluation will contribute to the ongoing search for common ground and improve the effectiveness of creating, using, and reusing mappings between heterogeneous KOSs. 6. Acknowledgements We would like to express our deepest gratitude to the CRESP research center, for their support and provision of accommodations during the development of this article. References [1] S. Schulz, J.-M. Rodrigues, et al., Interface terminologies, reference terminologies and aggregation terminologies, Stud Health Technol Inform 245 (2017) 940–944. [2] A. Isaac, E. Summers, SKOS simple knowledge organization system primer: W3C working group note 18, 2009. URL: https://www.w3.org/TR/skos-primer/. [3] Medical Informatics Vocabulary (MIVoc). iTeh standards store, 1997. URL: https://standards. iteh.ai/catalog/standards/cen/beb23db9-36ca-4283-aaad-79ff90535f0f/env-12017-1997. [4] F. Dhombres, J. Charlet, et al., Knowledge representation and management, it’s time to integrate, Yearb Med Inform 26 (2017) 148–151. [5] J. N. Nikiema, V. Jouhet, et al., Integrating cancer diagnosis terminologies based on logical definitions of SNOMED CT concepts, J Biomed Inform 74 (2017) 46–58. [6] A. T. McCray, S. J. Nelson, The representation of meaning in the UMLS, Methods Inf Med 34 (1995) 193–201. [7] HeTOP, CISMeF, 1997. URL: https://www.hetop.eu/hetop/. [8] N. F. Noy, N. H. Shah, et al., BioPortal: ontologies and integrated data resources at the click of a mouse, Nucleic Acids Res 37 (2009) W170–W173. [9] L. Zheng, Z. He, et al., A review of auditing techniques for the UMLS, J Am Med Inform Assoc 27 (2020) 1625–1638. 140 [10] G. Bajaj, V. Nguyen, et al., Evaluating biomedical word embeddings for vocabulary alignment at scale in the UMLS Metathesaurus using siamese networks, in: Proc 3rd Workshop on Insights from Negative Results in NLP, 2022, pp. 82–87. [11] J. N. Otte, J. Beverley, A. Ruttenberg, Basic Formal Ontology, Appl Ontol (2022) 1–27. [12] B. Smith, W. Ceusters, Aboutness: towards foundations for the information artifact ontology, in: Proc of the 6th Intl Conf on Biomed Ontologies, 2015, pp. 1–5. [13] E. M. Sanfilippo, Ontologies for information entities, Appl Ontol 16 (2021) 111–135. [14] B. Smith, W. Kusnierczyk, et al., Towards a reference terminology for ontology research and development in the biomedical domain, in: CEUR Proc, volume 222, 2006, pp. 57–65. [15] B. Glimm, I. Horrocks, al., HermiT: an OWL 2 reasoner, Journal of automated reasoning 53 (2014) 245–269. [16] T. Sider, Ontological realism, Metametaphysics (2009) 384–423. [17] C. M. Fonseca, J. P. A. Almeida, al, Multi-level conceptual modeling: Theory, language and application, Data & Knowledge Engineering 134 (2021) 101894. [18] A. D. Irvine, H. Deutsch, Russell’s paradox (1995). [19] R. Arp, B. Smith, et al., Building ontologies with BFO, MIT press, 2015. [20] B. Smith, W. Ceusters, et al., Relations in biomedical ontologies, Gen Biol 6 (2005) 1–15. [21] R. Hoehndorf, A. Oellrich, et al., Relations as patterns: bridging the gap between OBO and OWL, BMC Bioinformatics 11 (2010) 441. [22] S. Schulz, J.-M. Rodrigues, et al., What’s in a class? Lessons learnt from the ICD–SNOMED CT harmonisation, Stud Health Technol Inform 245 (2014) 1038–1042. [23] D. Chalmers, Ontological anti-realism, Metametaphysics: New essays on the foundations of ontology (2009) 77–129. [24] S. Schulz, M. Brochhausen, et al., Higgs bosons, mars missions, and unicorn delusions, in: Proc 2nd Intl Conf on Biomedical Ontologies. CEUR Proc, volume 833, 2011, pp. 183–189. [25] E. J. Lowe, The four-category ontology, Clarendon Press, 2005. [26] S. Borgo, A. Galton, et al., Foundational ontologies in action, Appl Ontol 17 (2022) 1–16. [27] P. T. Raggatt, The dialogical self and thirdness: A semiotic approach to positioning using dialogical triads, Theory & Psychology 20 (2010) 400–419. [28] C. Roche, Ontoterminology, in: Proc 8th LREC, 2012, pp. 2626–2630. [29] W. Ceusters, The place of referent tracking in biomedical informatics, in: Terminology, Ontology and their Implementations, Springer, 2022, pp. 39–46. [30] J. Bosque-Gil, J. Gracia, et al., The OntoLex Lemon lexicography module. Final community group report, 2019. [31] B. Dutta, A. Toulet, et al., New generation metadata vocabulary for ontology description and publication, in: Metadata and Semantic Research, Springer, 2017, pp. 173–185. [32] M. Kreuzthaler, M. Brochhausen, et al., Linguistic and ontological challenges of multiple domains contributing to transformed health ecosystems, Front Med 10 (2023) 1073313. [33] S. Schulz, R. Cornet, et al., Consolidating SNOMED CT’s ontological commitment, Appl Ontol 6 (2011) 1–11. [34] S. Schulz, D. Schober, et al., The pitfalls of thesaurus ontologization–the case of the NCI thesaurus, in: AMIA Annu Symp Proc, 2010, pp. 727–731. [35] A. Rector, S. Schulz, et al., On beyond Gruber:“Ontologies” in today’s biomedical informa- tion systems and the limits of OWL, J Biomed Informatics 100 (2019) 100002. 141