Formalization of indicators of diagnostic performance in a realist ontology Adrien Barton1,2,* Régis Duvauferrier2,3 and Anita Burgun4 1 The Institute of Scientific and Industrial Research, Osaka University, Japan 2 INSERM UMR 1099, LSTI, Rennes, France   3 CHU de Martinique, Université Antilles-Guyane, France 4 INSERM UMR 1138 team 22, Centre de Recherche des Cordeliers, Paris, France ABSTRACT However, such positive and negative predictive values We  present  a  formalization  of  indicators  of  diagnostic  performance   are typically not available in the scientific literature. Instead, (sensitivity,   specificity,   positive   predictive   value   and   negative   predic-­‐ tive  value)  in  the  context  of  a  realist  ontology.  We  dissociate  the  indica-­‐ they are generally computed from other probabilistic values, tors   of   diagnostic   performance   from   their   estimations   and   argue   that   namely: the prevalence value of M in g, generally defined as the  former  should  be  represented  in  a  first  place  in  biomedical  ontolo-­‐ the proportion of people who have the disease M in g, and gies.   Our   formalization   does   not   require   to   introduce   any   possible,   non-­‐actual   entities   -­‐   like   the   result   a   person   would   get   if   a   medical   test   hereafter abbreviated Prev(g,M); the sensitivity value of the would  be  performed  on  her  -­‐  and  is  therefore  acceptable  in  an  ontology   test A for M in g, generally defined as the proportion of peo- built  in  a  realist  spirit.  We  formalize  an  indicator  of  diagnostic  perfor-­‐ ple who would get a positive result by A among those who mance  as  a  data  item  that  is  about  a  disposition  borne  by  a  group;  the   have the disease M in g (that is, the proportion of true posi- diagnostic   value   of   this   indicator   is   given   by   the   objective   probability   value  assigned  to  this  disposition.   tives among diseased), hereafter abbreviated Se(g,A,M); and the specificity value of A for M, generally defined as the 1 INTRODUCTION proportion of people who would get a negative result by A among those who do not have the disease M in g (that is, the 1.1 Definition of indicators of diagnostic perfor- proportion of true negatives among non-diseased), hereafter mance abbreviated Sp(g,A,M). As a matter of fact, these values are Biomedical ontologies aim at providing the most exhaustive related through the following Bayesian equations: and rigorous representation of reality as described by bio- Prev(g, M) Se(g, A, M) medical sciences. A large part of medical reasoning con- PPV(g, A, M) = Prev(g, M) Se(g, A, M) + (1- Prev(g, M)) (1- Sp(g, A, M)) cerns diagnosis and is essentially probabilistic. It would be (1 - Prev(g, M)) Sp(g, A, M) an asset for biomedical ontologies to be able to support such NPV(g, A, M) = Prev(g, M) (1 - Se(g, A, M)) + (1 - Prev(g, M)) Sp(g, A, M) a probabilistic reasoning. Ledley & Lusted (1959)’s seminal article on Bayesian In the wake of Ledley & Lusted (1959), the sensitivity reasoning in medicine defines different kind of probabilistic and specificity values have often been considered as de- entities. Consider for example the simple case of an instance pending only on the pathophysiological characteristics of of test of type A aiming at detecting if a patient in a group g the disease, and thus as independent of the group of people has an instance of disease of type M1. The performance of under consideration. However, sensitivity and specificity test A in diagnosing M can be quantified by the positive values do in fact depend upon the group under considera- predictive value of this test, hereafter abbreviated PPV, and tion: this is the “spectrum effect” (Brenner & Gefeller, generally defined as the proportion of people who have the 1997; for a detailed explanation, see Barton, Duvauferrier & disease among those who would be tested positive by A in g Burgun, 2015). Spectrum effect can be manifested, for ex- (that is, the proportion of true positives among positives); ample, as a dependence of sensitivity and specificity on the and by the negative predictive value, hereafter abbreviated degree of severity of the disease in the group under consid- NPV, and generally defined as the proportion of people who eration (Park, Yokota, Gill, El Rassi, & McFarland, 2005). do not have the disease among those who would be tested In the remainder of the articles, sensitivity, specificity, negative by A in g (that is, the proportion of true negatives PPV and NPV will be called “indicators of diagnostic per- among negatives). Those two values provide the probability, formance” and abbreviated “IDPs”. once the result of test A is observed, that the patient has the disease M. 1.2 The challenge of representing indicators of diagnostic performance in an ontology To the extent that they aim at representing biomedical 1 knowledge and enabling medical reasoning, biomedical These will be abbreviated in the following as “a test A” and “the patient has M”. ontologies should provide a formalization of IDPs as well as Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes the prevalence. This article will propose such a formaliza- judged to be representative of the population as a whole, and tion in the context of the OBO Foundry (Smith et al., 2007), these values are then extrapolated to the frequencies in the one of the most massive sets of interoperable ontologies in entire population. Second, whether a given person has M or the biomedical domain, built on the upper ontology BFO. not cannot generally be known for sure, through reasonable The question of how probabilistic notions can be repre- means: sometimes, the only way to be certain is to perform sented in ontologies has been tackled from different per- an autopsy on the deceased patient. Consequently, a “gold spectives in the past. For example, da Costa et al. (2008) standard” must be chosen, namely the best reasonable avail- have proposed the new PR-OWL format that extends the able diagnostic test3. If a patient gets a positive result to this classical OWL format; we take here a different approach, gold standard test, one will conclude that he has the disease; which does not aim at changing the OWL format. Soldato- if he gets a negative result, one will conclude that he does va, Rzhetsky, De Grave, & King (2013) have described a not have it. model in which probabilities can be assigned to research For example, Park et al. (2005) estimate the sensitivity of statements. We have proposed an alternative approach the Neer test for diagnosing the impingement syndrome; (Barton, Burgun, & Duvauferrier, 2012) in which we show their estimation is made on a sample of 552 patients consid- how probabilities can be assigned to dispositions, upon ered as representative of the general population, using as which we are going to build here. gold standard surgical observation. The proportion of pa- Sensitivity and specificity have been recently introduced tients tested positive by the Neer test among those who are in the Ontology of Biological and Clinical Statistics (OBCS; tested positive by surgical operation in the sample is con- Zheng et al., 2014) as subclasses of Data item – a classifica- sidered as representative of the sensitivity value - which can tion that we will endorse here, and extend to PPV and NPV. be interpreted as the proportion of people who would be A data item, as defined by the Information Artifact Ontolo- tested positive by the Neer test among those who have an gy (IAO), is intended to be a truthful statement about some- impingement syndrome in the whole population. Similar thing. In order to formalize IDPs, one should thus clarify estimation strategies hold for prevalence, specificity, PPV what entities in the real world they are about. and NPV. Sensitivity value2, as we said, is generally defined as the Note that the estimations of the values of the prevalence, proportion of people who would get a positive result by A sensitivity, specificity, PPV and NPV depend on both the among those who have the disease M. But note here the sample and the gold standard; however, the real values of conditional structure: what is referred to is the proportion of the prevalence, sensitivity, specificity, PPV and NPV, as true positives among diseased if A was performed on them. defined above, depend neither on the sample, nor on the In practical situations, however, the sensitivity value will be gold standard. estimated by performing the test on a sample of the popula- 2.2 What should be represented in an ontology? tion only – not the entire population g. This will lead to two difficulties. First, it will be necessary to differentiate clearly This being clarified, one can ask which entities should be the IDPs’ values from their estimations, and to determine preferably represented in an ontology: the IDPs’ values, or which of those should be represented in a first place in an their estimations? ontology – part 2 will be devoted to this issue. Second, pos- For sure, we have no direct access to such IDPs’ values; sible-but-non-actual situations cannot be straightforwardly but this does not imply that they should not be represented defined in a realist ontology like BFO; this problem will be in an ontology. To clarify why, consider an analogy: the explained and solved in part 3, by considering that an IDP is measure of the ambient temperature by reading the height of a data item about a disposition borne by an instance of a mercury column in a thermometer. Suppose that at a given group of individuals, whose probability value will be identi- time, this height is aligned with the sign “20 °C” written on fied to the diagnostic value of the IDP. This will provide a the thermometer. In such a case, an ontology curator would formal characterization of IDPs. be in a first place interested in formalizing the fact that the ambient temperature is 20°C, rather than in formalizing the 2 THE INDICATORS AND THEIR fact that the mercury column in the thermometer is at the ESTIMATIONS same height as the sign “20°C”. In a similar fashion, imagine that 65% of people are test- 2.1 Two limits for the estimations of indicators of ed positive for a gold standard of M in a sample s of a popu- diagnostic performance lation g. The ontology should then formalize in a first place Numerical estimations of IDPs face two limits (Barton et the fact that 65% of the people in g have M, rather than the al., 2015). First, frequencies will be measured on a sample 3 Even if the gold standard consists in the naked-eye observation of a mac- roscopic disorder associated exclusively with this disease, this can still 2 Note the distinction between a sensitivity and its value: a sensitivity is a theoretically lead to a diagnostic error: any empirical evidence is defeasi- data item, but its value is a number. ble. Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes fact that 65% of the people in s have a positive result to this 3.1 From proportions to objective probabilities: gold standard. This estimation of this prevalence value may the randomization strategy be false (it is indeed very likely to be false, strictly speak- We will explain first how the proportion of a subgroup in a ing), but future estimations will lead to its being corrected to group can be formalized as a probability value assigned to a bring it closer to the real value. As a matter of fact, realist disposition; this will help explaining later how the propor- ontologies are built according to a fallibilist methodology tion of a subgroup in a group undergoing a possible, non- (Smith & Ceusters, 2010): they represent the state of the actual condition can be formalized along similar lines. world according to our best knowledge at the present in- Dispositions are entities that can exist without being stant, and can be corrected as our knowledge of the world is manifested; an example of disposition is the fragility of a refined. glass, which can exist even when the glass does not break. That being said, it is possible to represent in an ontology We will use Röhl & Jansen's (2011) model of disposition in the measurement process of a temperature involving the BFO, which associates to every instance of disposition one height of a mercury column in a thermometer. Similarly, or several instances of realizations, and one or several in- one could represent the different estimation processes of the stances of triggers (a trigger is the specific process that can IDPs, and the results to which they led. Such processes are lead to a realization occurring). In this model, the fragility biomedical investigations, and should therefore be formal- of a glass is a disposition of the glass to break (the breaking ized in an ontology like OBI (Ontology for Biomedical In- process is the realization) when it undergoes some kind of vestigations, Brinkman et al., 2010), a prominent OBO stress (the process of undergoing such a stress is the trig- Foundry candidate dedicated to these issues. This would be ger); this disposition inheres in the glass. Starting with the relevant in order to formalize in an ontology different esti- definition of these entities and their relations at the instance mations given by various samples and gold standards. How- level, Röhl & Jansen proceed to formalize them at the uni- ever, medical practitioners are first and foremost interested versal level. We have shown in a former article (Barton, in the IDPs’ values themselves, rather than in their estima- Burgun & Duvauferrier, 2012) how to adapt this model to tions, and thus we will deal here with the formalization of probabilistic dispositions. Thus, an instance of balanced the former. coin is the bearer of a disposition instance to fall on heads This clarification being made, we can now consider the (the realization process) when it is tossed (the trigger pro- second difficulty mentioned at the end of part 1, namely the cess), to which an objective probability 1/2 can be assigned. formalization of possible-but-non-actual situations in BFO. We will now apply this model to the situation at hand. Consider the prevalence Prev(g,M), which was defined 3 A FORMALIZATION OF INDICATORS OF above as the proportion of bearers of M in the actual popula- DIAGNOSTIC PERFORMANCE IN APPLIED tion g. We can define the disposition dg,M, borne by the ONTOLOGIES group g, that a person randomly drawn in g has M. More Sensitivity value has been interpreted as the proportion of specifically, let’s write Tg the process “randomly drawing a people who would get a positive result to A among M’s person in g”, and Rg,M the process “drawing by Tg someone bearers in g. This definition thus involves the condition of who has M”: the triggers of dg,M are instances of Tg and its performing the test A on the members of g. As we said, such realizations are instances of Rg,M. Following the lines of a condition is never realized, because the test is performed Barton et al. (2012), one can thus define the probability as- (at best) on a sample of the population, not on the whole signed to the disposition4 dg,M, which is the probability of population g: the performance of test A on g’s members is a drawing randomly someone who has M in g. This probabil- possible (leaving aside practical difficulties), non-actual ity is equal to the proportion of individuals who have M in condition. Interpreting specificity, PPV, and NPV along the g, that is, to Prev(g,M): as a matter of fact, if there are former lines would also imply such possible, non-actual e.g. 10% diseased people in g, then the probability of draw- conditions. ing randomly a diseased person in g is 10%. Thus, the prev- However, BFO is built according to the realist methodol- alence value can be identified to the objective probability ogy, which implies that all the instances it recognizes should assigned to the disposition dg,M. We name this strategy the be actual entities (cf. Smith & Ceusters, 2010). Thus, one “randomization” of the proportion of M’s bearers among g. cannot represent directly such a possible-but-not-actual 4 condition in an ontology based on BFO. In order to solve In Barton et al. (2012), a probability was assigned to a triplet (d, T, R) rather than to a disposition d, because we had to take into account disposi- this difficulty, we will introduce a strategy named “random- tion that may have several classes of triggers or realizations (that is, multi- ization”, enabling to formalize the probabilities of interest as trigger and multi-track dispositions, cf. Röhl & Jansen, 2011). However, in assigned to an actual entity, namely a disposition. This the present situation, dg,M is simple-trigger and simple-track: all its triggers strategy will enable to represent IDPs in a realist fashion, are instances of Tg , and all its realizations are instances of Rg,M. Therefore, compliant with BFO’s spirit. the probability value assigned to (dg,M, Tg , Rg,M) can be, for practical mat- ters, assigned directly to dg,M. Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes The randomization strategy may not be necessary to for- In our framework, the (diagnostic) value of a sensitivity malize a prevalence, which characterizes a proportion in an Seg,A,M is the probability value assigned to the disposition actual group, and thus could be formalized as such in an dSe,g,A,M; this can be formalized by writing that if s is a sensi- ontology based on BFO. But this strategy can also be ap- tivity, then: plied to proportions of people in groups subject to a possi- s has_diagnostic_value p ⇔ ∃ d ∧ d is_a Disposi- ble, non-actual condition – and thus, be relevant to formal- tion ∧ s is_about d ∧ d has_probability_value p ize sensitivity and other IDPs. As a matter of fact, the sensi- As dSe,g,A,M is an individual, it cannot be related directly to tivity value Se(g,A,M) was defined as the proportion of peo- the universals A and M. However, it can be related indirectly ple who would get a positive result to A among M’s bearers to them, by the following formalization. First, dSe,g,A,M can in g. This value can be “randomized” as follows. We can be seen as an instance of a disposition universal symbolized define dSe,g,A,M as the disposition to draw someone randomly as DSe,A,M, which has as trigger the processus universal who is tested positive by A, among the individuals of g who TSe,A,M: “performance of test A on the members of a group, have M. More specifically, let’s define the process and random draw of a person among those who have the TSe,g,A,M as the “performance of test A on the individuals in g, disease M”; and as realization the process universal and random draw of an individual among those who have RSe,A,M defined as “drawing by TSe,A,M of someone who got a the disease M”5; and the process RSe,g,A,M as the “drawing by positive result to A”. We can then introduce two new rela- TSe,g,A,M of someone who got a positive result to A”. The tions sensitivity_disposition_of_test and sensitivi- triggers of dSe,g,A,M are instances of TSe,g,A,M, and its realiza- ty_disposition_for_disease (abreviated as se_of_test and tions are instances of RSe,g,A,M . One can then define the sen- se_for_disease) such that DSe,A,M se_of_test A and sitivity value Se(g,A,M) as the objective probability as- DSe,A,M se_for_disease M. These two relations are introduced signed to this disposition dSe,g,A,M,: indeed, if there are e.g. for pragmatic reasons of facility of use: on a foundational 15% of the diseased people in g who would get a positive level, DSe,A,M and M (resp. A) could be related through a result by A, then the probability of randomly drawing some- complex array of relations and entities that involve the rela- one who would get a positive test result by A among dis- tion has_trigger between DSe,A,M and TSe,A,M, as well as a eased people in g is equal to 15%. sequence of relations between TSe,A,M and M (resp. A). Such Specificity value can be defined along similar lines, as an analysis would raise theoretical issues though, as instanc- probabilities assigned to actual dispositions borne by the es of DSe,A,M can exist even if no instance of M or A do exist. group g noted dSp,g,A,M (and similarly for the PPV and NPV). We would therefore face here issues similar to the ones ad- Although dSe,g,A,M and dSp,g,A,M are both dispositions inhering dressed by Röhl & Jansen (2011) and Schulz et al. (2014). in g, they have different triggers and different realizations; Finally, we introduce a class Sensitivity that can be char- the process TSp,g,A,M is the “performance of test A on the in- acterized as a subclass of Data item, which is related to a dividuals in g, and random draw of an individual among disposition through the above-mentioned relations: those who do not have the disease M” and the process s instance_of Sensitivity ⇒ s instance_of Data item ∧ RSp,g,A,M is the “drawing by TSp,g,A,M of someone who got a ∃ d instance_of Disposition ∧ ∃ a instance_of Test ∧ negative result to A”. ∃ m instance_of Disease ∧ s is_about d ∧ d se_of_test a ∧ d se_for_disease m 3.2 A formal model of indicators of diagnostic We can also introduce SeA,M, the class of sensitivities of performance in ontologies test A for disease M (in whatever group), which can be for- Let us now consider how to formalize these probability val- malized as a subclass of Sensitivity related to M and A ues in ontologies. First, a group g will be considered as any through the following relations: collection of humans (for more on collections, see Jansen & s instance_of SeA,M ⇒ s instance_of Sensitivity ∧ Schultz, 2010). dSe,g,A,M is a disposition individual inhering ∃ d instance_of Disposition ∧ ∃ a instance_of A ∧ in the group g; and a probability value can be assigned to ∃ m instance_of M ∧ s is_about d ∧ d se_of_test a ∧ this disposition using a datatype property d se_for_disease m has_probability_value. Sensitivity of A for M in g will be Figure 1 summarizes this formalization of sensitivity denoted Seg,A,M, and following OBCS, it will be defined as a (with universals in boxes, instances in diamonds, and the data item. Thanks to our analysis above, we can now answer numerical value assigned by datatype properties in a circle). our original question, and state what this sensitivity is about: Specificity, PPV and NPV can be formalized along similar Seg,A,M is_about dSe,g,A,M. We can also introduce a relation lines, as data items about dispositions related to tests and has_diagnostic_value that relates a sensitivity to its value. diseases through relations that could be labeled sp_of_test, sp_of_disease, ppv_of_test, etc. 5 In general, we cannot determine in practice with certainty which individu- als of g have M, and which do not; but the practical impossibility to realize this trigger does not preclude to define this entity. Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes 4 CONCLUSION ACKNOWLEDGEMENTS We have thus provided a practically tractable formalization We would like to thank the audience at several seminars, as of IDPs in a realist ontology, which clearly dissociates IDPs well as four anonymous reviewers, for their helpful com- from their estimations (which are relative to a sample and a ments. Adrien Barton thanks the Japanese Society for Pro- gold standard). It also solves the difficulty of considering motion of Science for financial support. possible, non-actual conditions in a realist ontology based on BFO. REFERENCES Note that IDPs raise also other theoretical issues. For ex- Barton, A., Burgun, A., and Duvauferrier, R. (2012) Probability assign- ample, one may want to aggregate two sensitivity values ments to dispositions in ontologies. Proc. 7th Int. Conf. Form. Ontol. Se(g,A,M) and Se(g’,A,M) assigned to two different groups Inf. Syst. FOIS2012 (M. Donnelly & G. Guizzardi, eds.), 3–14, Amster- g and g’ in order to reach a finer assessment of the sensitivi- dam: IOS Press. ty in a larger group; how to do this is a question for the me- Barton, A., Duvauferrier, R. and Burgun, A. (2015) Une analyse ta-analyst though, not the ontologist, who is first and fore- philosophique des indicateurs de performance des tests diagnostiques most concerned with representational issues. médicaux. Submitted manuscript. Brenner, H. and Gefeller, O. (1997) Variation of sensitivity, specificity, This model could then be extended in three directions. A likelihood ratios and predictive values with disease prevalence. Stat. first step would consist in formalizing the estimations of the Med., 16 (9), 981–991. IDPs, and their relations to a given sample and gold stand- Brinkman, R. R., Courtot, M., Derom, D., Fostel, J. M., He, Y., Lord, P., ard. Second, the relations se_of_test and se_for_disease Malone, J., et al. (2010) Modeling biomedical experimental processes could be reduced to basic relations and entities already ac- with OBI. J. Biomed. Semant., 1 Suppl 1, S7. cepted in the OBO Foundry. Third, it could be used by on- Costa, P. C. G. da, Laskey, K. B. and Laskey, K. J. (2008) PR-OWL: A tology-based diagnostic systems that would compute posi- Bayesian ontology language for the semantic web. In: Uncertainty Rea- tive predictive values or negative predictive values from the soning for the Semantic Web I, 88–107, Springer. prevalence, sensitivity and specificity values; more general- Jansen, L., and Schulz, S. (2011) Grains, components and mixtures in ly, it could be articulated with medical Bayesian networks. biomedical ontologies. J. Biomed. Semant., 2 Suppl 4, S2. As it takes into account the dependence of IDPs upon the Ledley, R. S. and Lusted, L. B. (1959) Reasoning foundations of medical diagnosis. Science, 130(3366), 9–21. group of people considered, it has the potential to contribute Mirnezani, M.R.C.S., Nicholson, J. and Darzi, A. (2012) Preparing for to the development of precision medicine (Mirnezami, Ni- precision medicine. N. Engl. J. Med., 366(6), 489-491. cholson & Darzi, 2012), an emerging approach that takes Park, H. B., Yokota, A., Gill, H. S., El Rassi, G., McFarland, E. G. (2005) into consideration patients characteristics and dispositions, Diagnostic accuracy of clinical tests for the different degrees of sub- including individual variability in genes, to offer more per- acromial impingement syndrome. J. Bone Joint Surg. Am., 87(7), 1446– sonalized preventive, diagnostic and therapeutic strategies. 1455. Röhl, J., Jansen, L. (2011) Representing dispositions. J. Biomed. Semant., 2 (Suppl 4), S4. Schulz, S., Martínez-Costa, C., Karlsson, D., Cornet, R., Brochhausen, M., and Rector, A. (2014) An Ontological Analysis of Reference in Health Record Statements. Proc. 8th Int. Conf. Form. Ontol. Inf. Syst. FOIS2014 (P. Garbacz & O. Kutz, eds.), 289-302, Amsterdam: IOS Press. Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L. J., et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol., 25(11), 1251–1255. Smith, B. and Ceusters, W. (2010) Ontological realism: A methodology for coordinated evolution of scientific ontologies. Appl. Ontol., 5(3), 139– 188. Soldatova, L. N., Rzhetsky, A., Grave, K. De and King, R. D. (2013) Rep- resentation of probabilistic scientific knowledge. J. Biomed. Semant., 4(Suppl 1), S7. Zheng, J., Harris, M. R., Masci, A. M., Lin, Y., Hero, A., Smith, B., and He, Y. (2014). OBCS: The Ontology of Biological and Clinical Statis- tics in ICBO2014. Houston, TX, USA Figure 1 Sensitivity of a test A for a disease M in a group g with probability value 0.75 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes