KR-MED 2006 "Biomedical Ontology in Action" November 8, 2006, Baltimore, Maryland, USA Towards a Reference Terminology for Ontology Research and Development in the Biomedical Domain 1 Barry Smith, Ph.D., 2Waclaw Kusnierczyk, M.D., 3Daniel Schober, Ph.D., 1 Werner Ceusters, M.D. 1 Center of Excellence in Bioinformatics and Life Sciences, Buffalo NY/USA 2 Department of Computer Computer and Information Science, NTNU,Trondheim, Norway 3 European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK phismith@buffalo.edu, waku@idi.ntnu.no, schober@ebi.ac.uk, ceusters@buffalo.edu Ontology is a burgeoning field, involving researchers terms, such as ‘class’, ‘object’, ‘instance’, from the computer science, philosophy, data and ‘individual’, ‘property’, ‘relation’, etc., all of which software engineering, logic, linguistics, and have established, but unfortunately non-uniform, terminology domains. Many ontology-related terms meanings in a range of different disciplines. with precise meanings in one of these domains have Among philosophical ontologists, the term different meanings in others. Our purpose here is to ‘instance’ means an individual (for example this initiate a path towards disambiguation of such terms. particular dog Fido), which is an instance of a We draw primarily on the literature of biomedical corresponding universal or kind (dog, mammal, etc.). informatics, not least because the problems caused In OWL, ‘instance’ means ‘element’ or ‘member’ of a by unclear or ambiguous use of terms have been class (where ‘class’ means ‘general concept, category there most thoroughly addressed. We advance a or classification … that belongs to the class extension proposal resting on a distinction of three levels too of owl:Class’2). often run together in biomedical ontology research: Standardization agencies such as ISO, CEN and 1. the level of reality; 2. the level of cognitive W3C have been of little help in engendering cross- representations of this reality; 3. the level of textual disciplinary uniformity in the use of such terms, since and graphical artifacts. We propose a reference their standards are themselves directed towards terminology for ontology research and development specific communities. Standardization efforts under that is designed to serve as common hub into which the auspices of W3C or UML or Dublin Core, too, the several competing disciplinary terminologies can have not addressed these problems. For while OWL- be mapped. We then justify our terminological DL, for example, has a rigorously defined semantics,3 choices through a critical treatment of the ‘concept this does not by any means guarantee that an ontology orientation’ in biomedical terminology research. formulated using OWL-DL is an error-free representation of its intended domain, and nor – until PREAMBLE the day when the use of OWL or of some successor becomes uniform common practice – will it do Ever since the invention of the computer, scientists anything to resolve the problems of semantic and engineers have been exploring ways of ambiguity adverted to in the above. ‘modeling’ or ‘representing’ the entities about which In the domain of biomedical informatics a number machines are expected to reason. But what do of attempts have been made to resolve these ‘modeling’ and ‘representing’ mean? What is a problems4,5,6 in light of an increasing recognition that ‘conceptual model’ or an ‘information model’ and many ambitious terminological systems developed in how can they and their components be this field are marked by unclarity over what, unambiguously described? precisely, they have been designed to achieve. Are Two questions here arise: To what do expressions biomedical controlled vocabularies ‘concept such as ‘concept’, ‘information’, ‘knowledge’, etc. representations’ or ‘knowledge models’? And if they precisely refer? And what is it to ‘model’ or are either of these things, how, if at all, do they relate ‘represent’ such things? If information and to the reality – the tumors, diseases, treatments, knowledge themselves consist in representations, then chemical interactions – on the side of the patient? what could an information representation or a knowledge representation be? There is, to say the OBJECTIVES AND METHODS least, some suspicion of redundancy here. As we have argued elsewhere, the term ‘concept’ is The purpose of this communication is to initiate a marked in a peculiarly conspicuous manner by process for resolving such problems by drawing on problems in this regard.1 But the problem of multiple the best practices in ontology which are now conflicting meanings arises also in regard to other beginning to take root through the efforts of 57 organizations such as the National Center for should thus be interpreted by analogy with talk of Biomedical Ontology,7 the Open Biomedical Onto- ‘levels of granularity’: if we have apprehended all the logies (OBO) Consortium,8 the OBO Foundry,9 and liquid in a vessel, then in a sense we have thereby others.10 apprehended also all the molecules. Yet for scientific What is needed is a set of terms referring in purposes molecules and liquids must be distinguished unambiguous fashion to the different kinds of entities nonetheless, and the same applies, for the purposes of surveyed above, which can serve as common target clarity in our thinking about ontologies, to the three for mappings from other discipline- and levels delineated in the above. computational idiom-centric terminologies, thereby mediating efficient pairwise translations between FOUNDATIONS these terminologies themselves. Our strategy is to advance precision via clear Here we give precise definitions to a number of informal definitions rooted in what we assume are central terms, which will then be used in conformity commonly accepted intuitions, providing references thereto in the remainder of the paper. Really existing to associated formal treatments where possible. In ontologies and related artifacts are typically selecting terms we have sometimes chosen constructed to realize a mixture of different sorts of expressions precisely because they have not been ends (terminologies, for example, to support clinical used by others and hence do not have established record keeping and large-scale epidemiological (and potentially conflicting) meanings. In other cases studies, and to serve as controlled vocabularies for we have adapted existing terms to our purposes by the expression of research results). Hence they providing them with more precise definitions or (in typically combine the features of artifacts of different case of primitive terms) elucidations. basic types. Our reference terminology is designed to These proposals are focused primarily on the reflect these basic types. Hence the definitions we ontology-related needs of natural science, including propose for terms such as ‘ontology’ or ‘class’ do not the clinical basic sciences, though we believe them to imply any claim to the effect that everything called an be of quite general applicability. ‘ontology’ or ‘class’ in the literature exhibits just the We start out from a distinction of three levels of characteristics referred to in the definition.. entities which have a role to play wherever ontologies An ENTITY is anything which exists, including are used: objects, processes, qualities and states on all three levels (thus also including representations, models, • Level 1: the objects, processes, qualities, states, beliefs, utterances, documents, observations, etc.) etc. in reality (for example on the side of the patient); A REPRESENTATION is for example an idea, image, • Level 2: cognitive representations of this reality on record, or description which refers to (is of or about), the part of researchers and others; or is intended to refer to, some entity or entities • Level 3: concretizations of these cognitive external to the representation. Note that a representations in (for example textual or graphical) representation (e.g. a description such as ‘the cat over representational artifacts. there on the mat’) can be of or about a given entity even though it leaves out many aspects of its target. A This tripartite distinction will awaken echoes of the COMPOSITE REPRESENTATION is a representation Semantic Triangle of Ogden and Richards, to which built out of constituent sub-representations as their we return in the sequel. For present purposes we note parts, in the way in which paragraphs are built out of that the indispensability of Level 1 reflects the fact sentences and sentences out of words. The smallest that even those who see themselves as building for constituent sub-representations are called example ‘data models’ in the domain of the life REPRESENTATIONAL UNITS; examples are: icons, sciences are attempting to create thereby artifacts names, simple word forms, or the sorts of which stand in some representational relation to alphanumeric identifiers we might find in patient entities in the real world. Level 2 reflects the fact that records. Note that many images are not composite a crucial role is played in ontology and terminology representations since they are not built out of smallest development by the cognitive representations of representational units in the way in which molecules human subjects. Level 3 reflects the fact that are built out of atoms. (Pixels are not representational cognitive representations can be shared, and serve units in the sense defined.) scientific ends, only when they are made If we take the graph-theoretic concretization of the communicable in a form whereby they can also be Gene Ontology11 as our example, then the subjected to criticism and correction, and also to representational units here are the nodes of the graph implementation in software. (taken to comprehend terms and unique IDs), which Note that the three levels overlap; thus the textual are intended to refer to corresponding entities in and graphical artifacts distinguished in Level 3 are reality. But the composite representation refers, themselves objects on Level 1. Our talk of ‘levels’ 58 through its graph structure, also to the relations PARTICULARS in reality (Level 1), (in the vernacular between these entities, so that there is reference to also called ‘tokens’ or ‘individuals’), that is to say entities in reality both at the level of single units and with individual patients, their lesions, diseases, and at the structural level.12 bodily reactions, divided into CONTINUANTS and A COGNITIVE REPRESENTATION (Level 2) is a OCCURRENTS.13 Some particulars, such as human representation whose representational units are ideas, beings, planets, ships, hurricanes, receive PROPER thoughts, or beliefs in the mind of some cognitive NAMES (they may also receive unique identifiers, such subject – for example a clinician engaged in applying as social security numbers) which are used in theoretical (and practical) knowledge to the task of representational artifacts of various sorts. But we can establishing a diagnosis. refer to particulars also by means of complex A REPRESENTATIONAL ARTIFACT (Level 3) is a expressions – that man on the bench, this representation that is fixed in some medium in such a oophorectomy, this blood sample – involving way that it can serve to make the cognitive GENERAL TERMS of different sorts, including: representations existing in the minds of separate i. General terms such as ‘apoptosis’, ‘fracture’, subjects publicly accessible in some enduring fashion. ‘cat’, which represent structures or characteristics in Examples are: a text, a diagram, a map legend, a list, reality which are exemplified – the very same a clinical record, or a controlled vocabulary. Clearly structures or characteristics; over and over again – in such artifacts can serve to convey more or less an open-ended collection of particulars in arbitrarily adequately the underlying cognitive representations – disconnected regions of space and time. Consider for and can be correspondingly more or less intuitive or example the way in which a certain DNA structure is understandable. instantiated as a transcript (RNA-structure) over and Because representational artifacts such as over again in cells of our body. SNOMED CT give textual form to cognitive ii. General terms such as ‘danger’, ‘gift’, ‘surprise’, representations which pre-exist them, some have which draw together entities in reality which share taken this to mean that these artifacts are in fact made common characteristics which are not intrinsic to the up of representations which refer to (are of or about) entities in question. these cognitive representations (the ‘concepts’) from iii. General terms such as ‘Berliner’, ‘Paleolithic’, out of which the latter are held to be composed. which relate to specific collections of particulars tied We shall argue below that this reflects a deep to specific regions of space and time. confusion, and that the constituent units of General terms of the first sort refer to UNIVERSALS representational artifacts developed for scientific (in the vernacular also called ‘types’ or ‘kinds’). A purposes should more properly (and more universal is something that is shared in common by straightforwardly) be seen as referring to the very all those particulars which are its INSTANCES. The same entities in reality – the diseases, patients, body universal itself then exists in Level 1 reality as a parts, and so forth – to which the underlying cognitive result of existing in its particular instances. When a representations of clinicians and others refer. Such clinician says ‘A and B have the same disease’, she is artifacts are in this respect no different from scientific referring to the universal; when she says ‘A’s diabetes textbooks. They are windows on reality, designed to is more advanced than B’s,’ then she is referring to serve as a means by which representations of reality the respective instances. on the part of cognitive agents can be made available It is overwhelmingly universals which are the to other agents, both human and machine. A simple entities represented in scientific texts, and a good phrase, such as ‘the cat over there on the mat’, can be prima facie indication that a general term ‘A’ refers to used to refer more or less successfully to what is, in a universal is that ‘A’ is used by scientists for reality, a portion of reality of a highly complex sort – purposes of classificiation and to make different sorts and the same applies to all of the types of artifacts of law-like assertions about the individual instances referred to above. The window on reality which each of A with which they work in the lab or clinic. provides is, to be sure, in every case from a certain perspective and in such a way as to embody a certain nose part_of body granularity of focus. Yet the entities to which it refers are full-fledged entities in reality nonetheless – the Mary’s nose part_of Mary very same, full-fledged entities in reality with which Mary’s nose instance_of we are familiar also in other ways, for example nose because they provide us with food or companionship. Table 1 – Three Basic Sorts of Binary Relation REALITY Both particulars and universals stand to each other The clinician is concerned first and foremost with in various RELATIONS. Thus particulars stand to the corresponding universals in the relation of 59 INSTANTIATION. This and other binary relations (of the distinction is of no import. Indeed we believe that parthood, adjacency, derivation) used in biomedical taking account of this distinction is indispensable to ontologies13 can be divided into groups as in Table 1, creating an path to improvement of ontologies.16 which uses Roman for particulars, bold type for We use the term PORTION OF REALITY to relations involving particulars, and italics for comprehend both single universals and particulars universals and for relations between universals. and their more or less complex combinations. Some A COLLECTION OF PARTICULARS (of molecules in portions of reality – for example single organisms, John’s body, of pieces of equipment in a certain planets – reflect autonomous joints of reality (that is, operating theater, of operations performed in this they would exist as separate entities even in a world theater over a given period of months) is a Level 1 denuded of cognitive subjects). Other portions of particular comprehending other particulars as its reality are products of fiat demarcations of one or MEMBERS.14 We note that confusion is spawned by other sort,17 as when we delineate a portion of reality the fact that we can use the very same general terms by focusing on some specific granular level (of to refer both to universals and to collections of molecules, or molecular processes), or on some particulars. Consider: specific family of universals (for example when we • HIV is an infectious retrovirus view the human beings living in a given county in light of their patterns of alcohol consumption). • HIV is spreading very rapidly through Asia A DOMAIN is a portion of reality that forms the A CLASS is a collection of all and only the particulars subject-matter of a single science or technology or to which a given general term applies. Where the mode of study; for example the domain of general term in question refers to a universal, then the proteomics, of radiology, of viral infections in mouse. corresponding class, called the EXTENSION of the Representational artifacts will standardly represent universal (at a given time), comprehends all and only entities in domains delineated by level of granularity. those particulars which as a matter of fact instantiate Thus entities smaller than a given threshold value the corresponding universal (at that time). may be excluded from a domain because they are not The totality of classes is wider than the totality of salient to the associated scientific or clinical extensions of universals since it includes also purposes.18 DEFINED CLASSES, designated by terms like ‘employee of Swedish bank’, ‘daughter of Finnish REPRESENTATIONAL ARTIFACTS spy’. Languages like OWL are ideally suited to the In developing theories, biomedical researchers seek formal treatment of such classes, and the popularity representations of the universals existing in their of OWL has encouraged the view that it is classes respective domain of reality. They first develop which are designated by the general terms in cognitive representations, which they then transform terminologies. (OWL classes are not, however, incrementally into representational artifacts of various identical with classes in the usual set-theoretic sense sorts. on which we draw also here.) In developing diagnoses, and in compiling such Some OWL classes (above all Thing and Nothing) diagnoses into clinical records, clinicians seek a are ‘primitive’ (which means: not defined), and these representation of salient particulars (diseases, disease classes are sometimes asserted to constitute an OWL processes, drug effects) on the side of their patients. counterpart of universals (‘natural kinds’) in the sense Drawing on their theoretical understanding of the here defined.15 Because OWL identifies the relation universals which these particulars instantiate (which of instantiation with that of membership, however, it in turn draws on prior representations formed in in effect identifies universals with their extensions. relation to earlier particulars19), they first develop a Through relations of greater and lesser generality cognitive representation of what is taking place within both classes and universals are organized into trees, a given collection of particulars in reality, which they the former on the basis of the subclass relation, the then transform into representational artifacts such as latter on the basis of the is_a relation (whereby, clinical documents, entries in databases, and so forth, again, in the OWL framework the two relations are which may then foster more refined cognitive identified). Because the instances of more specific representations in the future. universals are ipso facto also instances of the The mentioned representations are typically built corresponding more general universals, the latter up out of sub-representations each of which, in the hierarchy is, when viewed extensionally, a proper part best case, mirrors a corresponding salient portion of of the former. As we shall discuss further in our reality. The most simple representations (‘blood! ’) treatment of the argument from borderline cases mirror universals or particulars taken singly; more below, it is difficult to draw a sharp line between complex representations – such as therapeutic terms designating universals and those designating schemas, diagnostic protocols, scientific texts, defined classes. This does not mean, however, that pathway diagrams – mirror more complex portions of 60 reality, their constituent sub-representations being structural fit, degree of completeness and degree of joined together in ways designed to mirror salient redundancy.16,18 By exploiting such classifications we relations on the side of reality. can measure the quality improvements made in In the ideal case a representation would be such successive versions, and also use such measures as a that all portions of reality salient to the purposes for basis for further improvement.20 which it was constructed would have exactly one To make a representation interpretable by a corresponding unit in the representation, and every computer, it must be published in a language with a unit in the representation would correspond to exactly formal semantics and so converted into a one salient portion of reality.19 Unfortunately, in a FORMALIZED REPRESENTATION. The choice of domain like biomedicine, ideal case will likely remain language will depend on the complexity of what one forever beyond our grasp. Researchers working on needs to express and on the sorts of reasoning one the level of universals may fall short by creating needs to perform. While OWL, for example, can cope representations which either (i) fail to include general well with defined classes, it may not have sufficient terms for universals which are salient to their domain, expressive power to meet the needs of ontologies in or (ii) include general terms which do not in fact the life sciences domain. Thus it seems to be denote any universals at all. Similarly, clinicians incapable, for example, of capturing the relations working on the level of particulars may fall short of involved even in simple interactions among pluralities the best case by creating misdiagnoses, either (i) by of continuants, or of capturing the changes which take failing to acknowledge particulars which do exist and place in such continuants (for example growth of a which are salient to the health of a given patient, or tumor) over time.21,22 (ii) by using representational units assumed to refer to Most inventories in the biomedical field (including particulars where no such particulars exist. most EHRs) have still exploited hardly at all the A TAXONOMY is a tree-form graph-theoretic powers of formal reasoning. The paradigm of representational artifact with nodes representing Referent Tracking represents an exception to this universals or classes and edges representing is_a or rule,20 since it involves precisely the embedding of a subset relations. highly structured representation of particulars in a An ONTOLOGY is a representational artifact, formalized representation of the corresponding comprising a taxonomy as proper part, whose universals. representational units are intended to designate some combination of universals, defined classes, and THE CONCEPT ORIENTATION certain relations between them.13 A REALISM-BASED ONTOLOGY is built out of terms We believe that ontologies, inventories and similar which are intended to refer exclusively to universals, artifacts should consist exclusively of representational and corresponds to that part of the content of a units which are intended to designate entities in Level scientific theory that is captured by its constituent 1 reality. Defenders of the concept orientation in general terms and their interrelations. medical terminology development have offered a A TERMINOLOGY is a representational artifact series of arguments against this view, to the effect consisting of representational units which are the that such terminologies should include also (or general terms of some natural language used to refer exclusively) representational units referring to what to entities in some specific domain. are called ‘concepts’.23 An INVENTORY is a representational artifact built First, is what we can call the argument from out of singular referring terms such as proper names intellectual modesty, which asserts that it is up to or alphanumeric identifiers. Electronic Health domain experts, and not to terminology developers, to Records (EHRs) incorporate inventories in this sense, answer for the truth of whatever theories the including both terms denoting particulars (‘patient terminology is intended to mirror. Since domain #347’, ‘lung #420’) and more complex expressions experts themselves disagree, a terminology should involving terms designating universals and defined embrace no claims as to what the world is like, but classes (‘the history of cancer in patient #347’s reflect, rather, the coagulate formed out of the family’).20 concepts used by different experts. In the best case, again, each of the representational Against this, it can be pointed out that communities artifacts listed above (ontologies, taxonomies, working on common domains in the medical as in inventories) will be such that its representational units other scientific fields in fact accept a massive and stand in a one-to-one correspondence with the salient ever-growing body of consensus truths about the entities in its domain. In practice, however, such entities in these domains. Many of these truths are, artifacts can be classified on the basis of the various admittedly, of a trivial sort (that mammals have ways in which they fall short of this best case, in hearts, that organisms are made of cells), but it is terms of properties such as correctness, degree of precisely such truths which form the core of science- 61 based ontologues. Where conflicts do arise in the Some patients do, after all, believe that they are course of scientific development, these are highly James Bond, or that they see unicorns. The realist localized, and pertain to specific mechanisms, for approach is however perfectly well able to example of drug action or disease development, comprehend also phenomena such as these, even which can serve as the targets of conflicting beliefs though it is restricted to the representation of what is only because researchers share a huge body of real. For the beliefs and hallucinatory episodes in presuppositions. question are of course as real as are the persons who We can think of no scenario under which it would suffer (or enjoy) them. And certainly such beliefs and make sense to postulate special entities called episodes may involve concepts (in the properly ‘concepts’ as the entities to which terms subject to psychological sense of this term). But they are not scientific dispute would refer. For either, for any such about concepts, they do not have concepts as their term, the dispute is resolved in its favor, and then it is targets – for they are intended by their subjects to be the corresponding level 1 entity that has served as its about entities in flesh-and-blood external reality. referent all along; or it is established that the term in Fourth, is the argument from medical history. The question is non-designating, and then this term is no history of medicine is a scientific pursuit; yet it longer a candidate for inclusion in a terminology. We involves use of terms such as ‘diabolic possession’ cannot solve the problem that we do not know, at which, according to the best current science, do not some given stage of scientific inquiry, to which of refer to universals in reality. But again: the history of these groups a given term belongs, by providing such medicine has as its subject-domain precisely the terms instead with guaranteed referents called beliefs, both true and false, of former generations ‘concepts’. It may, finally, be the case that it is not the (together with the practices, institutions, etc. disputed term itself which is at issue, but rather some associated therewith). Thus a term like ‘diabolic more complex expression, as when we talk about ‘G. possession’ should be included in the ontology of this E. Stahl’s concept of phlogiston’, but that the latter discipline in the first place as component part of refers to some entity – a concept – in (psychological) terms designating corresponding classes of beliefs. In reality is precisely not subject to scientific dispute. addition it may appear also as part of a term Sometimes the argument from intellectual modesty designating some fiat collection of those diseases takes an extreme form, for example on the part of from which the patients diagnosed as being possessed those for whom reality itself is seen as being were in fact suffering. The evolution of our thinking somehow unknowable (‘we can only ever know our about disease can then be understood in the same way own concepts’). Arguments along these lines are of that we deal with theory change in other parts of course familiar from the history of philosophy. Stove science, as a reordering of our beliefs about the provides the definitive refutation.24 Here we need note ontological validity and salience of specific families only that they run counter not just to the successes, of terms – and once again: concepts themselves play but to the very existence, of science and technology no role as referents.20,26 as collaborative endeavors. Fifth, is the argument from syndromes. The Second, is the argument from creativity. Designer subject-matters of biology and medicine are, it is drugs are conceived, modeled, and described long held, replete with entities which do not exist in reality before they are successfully synthesized, and the but are rather convenient abstractions. A syndrome plans of pharmaceutical companies may contain such as congestive heart failure, for example, is putative references to the corresponding chemical nothing more than a convenient abstraction, used for universals long before there are instances in reality. the convenience of physicians to collect together But again: such descriptions and plans can be many disparate and unrelated diseases which have perfectly well apprehended even within terminologies common final manifestations. Such abstractions are, it and ontologies conceived as relating exclusively to is held, mere concepts. what is real. Descriptions and plans do, after all, According to the considerations on fiat exist. On the other hand it would be an error to demarcations advanced above, however, syndromes, include in a scientific ontology of drugs terms pathways, genetic networks and similar phenomena referring to pharmaceutical products which do not yet are indeed fully real – though their reality is that of (and may never) exist, solely on the basis of plans and defined (fiat) classes rather than of universals. A descriptions. Rather, such terms should be included similar response can be given also in regard to the precisely at the point where the corresponding many human-dependent delineations used in instances do indeed exist in reality, exactly in expressions like ‘obesity’ or ‘hypertension’ or accordance with our proposals above. ‘abnormal curvature of spine’. These terms, too, refer Third, is what we might call the argument from to entities in reality, namely to defined classes which unicorns. Some of the terms needed in medical rest on fiat thresholds established by consensus terminologies refer, it is held, to what does not exist. among physicians. 62 Sixth is the argument from error. When erroneous ‘electron’ or ‘cell’, on the one hand, and ‘fall on stairs entries are entered into a clinical record and inter- or ladders in water transport NOS, occupant of small preted as being about level 1 entities, then logical unpowered boat injured’ (Read Codes) on the other. conflicts can arise. For Rector et al., this implies that But there are also borderline cases such as ‘alcoholic the use of a meta-language should be made compul- non-smoker with diabetes’, or ‘age-dependent yeast sory for all statements in the EHR, which should be, cell size increase’, which call into question the very not about entities in reality, but rather about what are basis of the distinction. called ‘findings’.25 Instead of p and not p, the record In response, we note first the general point, that would contain entries like: McX observed p and O’W arguments from the existence of borderline cases in observed not p, so that logical contradiction is general have very little force. For otherwise they avoided. The terms in terminologies devised to serve would allow us to prove from the existence of people such EHRs would then one and all refer not to with borderline complements of hair that there is no diseases themselves, but rather to mere ‘concepts’ of such thing as baldness or hairiness. diseases. This, however, blurs the distinction between As to the specific problem of how to classify entities in reality and associated findings, and opens borderline expressions, this is a problem not for the door to the inclusion in a terminology of terminology, but rather for empirical science. For problematic findings-related expressions such as borderline terms of the sorts mentioned will, as an SNOMED’s ‘absent nipple’, ‘absent leg’, etc. inevitable concomitant of scientific advance, be in Certainly clinicians need to record such findings. But any case subjected to a filtering process based on then their findings are precisely that a leg is absent; whether they are needed for purposes of (for example not that a special kind of (‘absent’) leg is present. therapeutically) fruitful classifications, and thus for In the domain of scientific research we do not the expression of scientific laws. embargo entirely the making of object-language Science itself is thereby subject to constant update. assertions simply because there might be, among the A term taken to refer to a universal by one generation totality of such assertions, some which are erroneous. of scientists may be demoted to the level of non- Rather, we rely on the normal workings of science as designating term (‘phlogiston’) by the next. This a collective, empirical endeavor to weed out error means also that representational artifacts of the sorts over time, providing facilities to quarantine erroneous considered in the above, because they form an entries and resolve logical conflicts as they are integral part of the practice of science, should identified. We have argued elsewhere that these same themselves be subject to continual update in light of devices can be applied also in the medical context.26 such advance. But again: we can think of no The argument for the move to the meta-level is circumstance in which updating of the sort in question sometimes buttressed by appeal to medico-legal would signify that phlogiston is itself a concept, or considerations seen as requiring that the EHR be a that some expression was at one or other stage being record not of what exists but of clinicians’ beliefs and used by scientists with the intention of referring to actions. Yet the forensic purposes of an audit trail can ‘concepts’ rather than to entities in reality. equally well be served by an object-language record if we ensure that meta-data are associated with each THE SEMIOTIC TRIANGLE entry identifying by whom the pertinent data were entered, at what time, and so forth. Finally is what we might call the argument from On the other side, moreover, even the move to multiple perspectives. Different patients, clinicians meta-level assertions would not in fact solve the and biologists have their own perspectives on one and problems of error, logical contradiction and legal the same reality. To do justice to these differences, it liability. For the very same problems arise not only is argued, we must hold that their respective when human beings are describing, on the object- representations point, not to this common reality, but level, fractures, or pulse rates, or symptoms of rather to their different ‘concepts’ thereof. coughing or swelling, but also on the meta-level when This argument has its roots in the work of Ogden they are describing what clinicians have heard, seen, and Richards, and specifically in their discussion of thought and done. The latter, too, are subject to error, the so-called ‘semiotic triangle’, which is of fraud, and disagreement in interpretation. importance not least because it embodies a view of Seventh is the argument from borderline cases. As meaning and reference that still plays a fateful role in we have already noted above, there is at any given the terminology standardization work of ISO.26 stage no bright line between those general terms As Figure 1 makes clear, the triangle in fact refers properly to be conceived as designating universals not to ‘concepts’, but rather to what its authors call and those designating merely ‘concepts’ (or defined ‘thought or reference’,27 reflecting the fact that Ogden classes). Certainly there are, at any given stage in the and Richards’ account is rooted in a theory of development of science, clear cases on either side: psychological causality. When we experience a 63 certain object in association with a certain sign, then terminology literature henceforth? There are of memory traces are laid down in our brains in virtue of course sensible uses of this term, for example in the which the mere appearance of the same sign in the literature of psychology. In the terminology literature, future will, they hold, ‘evoke’ a ‘thought or reference’ however, ‘concept’ has been used in such a directed towards this object through the reactivation bewildering variety of confused and confusing ways of impressions stored in memory. that we recommend that it be avoided altogether. It is tempting to suppose that, when considered extensionally, all of the mentioned alternative readings come down to one and the same thing, namely to an identification of ‘concept’ with what we have earlier called ‘defined class’. If ‘concept’ could be used systematically in this way in terminological circles, then this would, indeed, constitute progress of sorts, though the question would then arise why ‘defined class’ itself should not be used instead. Unfortunately, however, the proposal in question Figure 1 – Ogden and Richards’ Semiotic Triangle stands in conflict with the fact that ‘concept’ is used by its adherents to comprehend also putative referents The two solid edges of the triangle are intended to even for terms – such as ‘surgical procedure not represent what are held to be causal relations of carried out because of patient’s decision’ – which do ‘symbolization’ (roughly: evocation), and ‘reference’ not designate defined classes because they designate (roughly: perception or memory) on the part of a nothing at all. Here again, we believe, a proper symbol-using subject. The dashed edge, in contrast, treatment would involve appeal to appropriate fiat signifies that the relation between term and referent – classes, defined in terms of utterances, interrupted the relation that is most important for the discussion plans, expectations, etc. on the part of the subjects of terminology – is merely ‘imputed’. involved. The background assumption here is that multiple What, now is to be said of terms such as ‘concept perspectives are both ubiquitous and (at best) only model’, ‘knowledge representation’, ‘information locally and transiently resolvable. The meanings model’, and so forth referred to in our premble words have for you or me depend on our past above? To the extent that concept-based experiences of uses of these words in different kinds terminological artifacts consist in representations not of contexts. Ambiguity must be resolved anew (and a of the reality on the side of the patient but rather of new ‘imputed’ relation of reference spawned) on each the entities in some putative ‘realm of concepts’, the successive occasion of use. From this, Ogden and term ‘concept model’ may be justified. This term is Richards infer that a symbolic representation can indeed used by SNOMED CT in its own self- never refer directly to an object, but rather only descriptions, though given SNOMED’s scientific indirectly, via a ‘thought or reference’ within the goals, we believe that, on the basis of the arguments mind. given above, it should be abandoned. Still more It is a depsychologized version of this latter thesis problematic is the term ‘knowledge model’ or which forms the basis of the concept orientation in ‘knowledge representation’ (GALEN). For in the contemporary terminology research. The terms in absence of a reference to reality to serve as terminologies refer not to entities in reality, it is held, benchmark, what could motivate a distinction but rather to ‘concepts’ in a special ‘realm’. The lat- between knowledge and mere belief.19 And what, in ter are not transparent mediators of reference; rather the absence of a reference to reality, could motivate they are its targets, and the job of the terminologist is adding or deleting terms in successive versions of a to callibrate his list of terms in relation not to reality terminology, if every term is in any case guaranteed a but to this special ‘realm of concepts’.26 reference to its own specially tailored ‘concept’. The relation between terms in a terminology and As to ‘information model’, here one standard the reality beyond becomes hereby obscured. Reality uncertainty concerns the relation between an entity in exists, if at all, only behind a conceptual veil – and reality and the body of information used to ‘repre- hence familiar confusions according to which for sent’ this entity in some information system. Is it in- example the concept of bacteria would cause an formation which is being ‘modeled’ in an information experimental model of disease, or the concept of model, or the reality which this information is about? vitamin would be ‘essential in the diet of man’.28 The documentation of the HL7 Reference Information Model (RIM)29 adds extra layers of ‘CONCEPTS’ AND ‘MODELS’ uncertainty by conceiving its principal formulas as How, then, should ‘concept’ be properly treated in the referring to the acts in which entities are observed for 64 example in a clinical context. Simultaneously, of Anatomy. J Biomed Inform 2003;36:478-500. however, it conceives these formulas as referring also 11. http://geneontology.org/. to the documentation of such acts for example in an 12. Wittgenstein L. 1921 Tractatus Logico- information system. The apparent contradiction is to Philosophicus, London: Routledge, 1961. some degree resolved by the RIM on the basis of its 13 Smith B, Ceusters W, Klagges B et al.. Relations assertion that there is in any case ‘no distinction in biomedical ontologies. Genome Biol, between an activity and its documentation’.30 2005;6(5):R46. 14. Bittner T, Donnelly M, Smith B. Individuals, CONCLUSION universals, collections. Formal Ontology in Information Systems (FOIS 2004), p. 37-48. Drawing on our distinction of the three levels of 15. Drummond N. Introduction to ontologies. http:// reality, cognition and representational artifact we www.cs.man.ac.uk/~drummond/presentations/Int have sought to formulate an unambiguous roductionToOWL50mins.ppt. terminology for describing ontologies and related 16. Ceusters W, Smith B. A realism-based approach artifacts. The proposed terminology allows us to to the versioning and evolution of biomedical characterize more precisely the sorts of things which ontologies. Proc AMIA Symp 2006, in press. go wrong when the distinction between these levels is 17. Smith B. Fiat objects. Topoi, 2001;20(2):131-48. ignored, or when one or other level is denied, so that 18. Bittner T, Smith B. A theory of granular parti- the approach may also help in improving such tions. Foundations of Geographic Information artifacts in the future. Science, London, 2003, p. 117-51 19. Smith B. From concepts to clinical reality, J Acknowledgements Biomed Inform. 2006 Jun;39(3):288-98. This work was supported by the Wolfgang Paul 20. Ceusters W, Smith B. Strategies for referent Program of the Humboldt Foundation, the Volks- tracking in Electronic Health Records. J Biomed wagen Foundation, the European Union Semantic Inform. 2006 Jun;39(3):362-78. Mining Network, by BBSRC Grant BB/D524283/1, 21. Bera P, Wand Y. Analyzing OWL using a and by the NIH Roadmap Grant U54 HG004028. philosophy-based ontology. Formal Ontology in Thanks are due also to Jim Cimino, Chris Chute, Information Systems (FOIS 2004), p. 353-62. Gunnar Klein, Alan Rector, Stefan Schulz, and Kent 22. Kazic T. Putting semantics into the semantic Spackman for fruitful discussions. web: How well can it capture biology? Pac Symp Biocomputing 2006;11:140-51. References 23 Cimino JJ. In defense of the desiderata. J Biomed Inform. 2006;39:299-306. (URLs last accessed July 1, 2006) 24. Franklin J. Stove’s discovery of the worst 1. Smith B. Beyond concepts, or: Ontology as argument in the world. Philosophy 2002;77:615- reality representation, Formal Ontology in 24. www.maths.unsw.edu.au/~jim/worst.pdf. Information Systems (FOIS 2004), p. 73-84. 25. Rector A, Nolan W, Kay S. Foundations for an 2. http://www.w3.org/2003/glossary. electronic medical record. Methods Inf Med, 3. Patel-Schneider PF, Hayes P, Horrocks I. OWL 1991;30:179-86. Web Ontology Language. 2004. http://www.- 26. Smith B, Ceusters W, Temmerman R. Wüsteria, w3.org/TR/owl-semantics. Stud Health Technol Inform. 2005;116:647-652. 4. Spackman KA, Reynoso G. Examining 27. Ogden CK, Richards IA. The Meaning of SNOMED from the perspective of formal Meaning. 3rd ed. New York, 1930. ontological principles. Workshop on Formal 28. The UMLS Semantic Network. http://semantic Biomedical Knowledge Representation (KR- network.nlm.nih.gov/. MED 2004), p. 72-80. 29. HL7 V3 Reference Information Model: Version 5. Johansson I. Bioinformatics and biological V 01-20. Normative Ballot 11/22/2005. reality. J Biomed Inform. 2006;39(3):274-87. 30. Smith B, Ceusters W. HL7 RIM: An incoherent 6 Klein GO, Smith B. Concept systems and standard, Proc MIE, 2006, p. 133-138 ontologies. http://ontology.buffalo.edu/concepts /ConceptsandOntologies.pdf. 7. http://ncbo.us/. 8. http://obo.sourceforge.net/. 9. http://obofoundry.org/. 10. Rosse C, Mejino JL, Jr. A reference ontology for biomedical informatics: the Foundational Model 65