A System for Ontologically-Grounded Probabilistic Matching Rita Sharma David Poole Clinton Smyth Georeference Online Ltd. and Department of Computer Science Georeference Online Ltd., Dept. of Computer Science, UBC University of British Columbia http://www.georeferenceonline.com http://www.cs.ubc.ca/∼rsharma/ http://www.cs.ubc.ca/∼poole/ Abstract more general terms than others) and different levels of de- tail (some have parts and sub-parts and some may be de- scribed holistically). Descriptions of mineral occurrences This paper is part of a project to match descrip- tions of real-world instances and probabilistic are recorded at varied levels of abstraction and detail be- models, both of which can be described at mul- cause some areas have been explored in more detail than others. There are some models that people spend careers in tiple level of abstraction and detail. We use an ontology to control the vocabulary of the appli- developing and that are described in great detail for those parts that the modeler cares about. Other models are less cation domain. This paper describes the issues well developed, and described only in general terms. Be- involved in probabilistic matching of hierarchi- cal description of models and instances using cause the instance and model descriptions are generated asynchronously, the levels of detail cannot be expected to Bayesian decision theory, which combines on- tologies and probabilities. We have two fielded match. We do, however, need to make decisions based on applications of this framework; one for landslide all of the information available. prediction and one for mineral exploration. This work has arisen from from an ongoing project in which we are building decision-making tools for mineral exploration (MineMatch) and hazard mapping (Hazard- 1 Introduction Match). MineMatch is similar in its goals to the Prospec- tor expert system [Hart, 1975], but builds on the develop- In many problem domains we need to match instances and ments in probabilistic reasoning and ontologies of the last models of real-world phenomena. For example, in geol- 30 years. In previous work [Smyth and Poole, 2004; Poole ogy, geological surveys of states, provinces and countries and Smyth, 2005], we described models using qualitative publish descriptions of mineral occurrences in their juris- probabilities, based on the kappa calculus, which measures diction; these form the instances in one of our applica- uncertainty in degree of “surprise”. In this paper, we de- tions. People spend careers describing probabilistic models velop an approach based on probability for making deci- of where different minerals can be found. There are two sions. main tasks we consider: In MineMatch we work with more than 25,000 instances of mineral occurrences that are described using various • given an instance, determine which models best fits taxonomies, including the British Geological Survey Rock it. This would be used, for example, by someone who Classification scheme1 and the Micronex taxonomy of has the mineral rights on a piece of land and wants to Minerals2 . We also work with more than 100 deposit know what mineral deposits may be there based on the type models, including those described by the US Geo- description of the property. logical Survey3 and the British Columbia Geological Sur- vey4. Similarly, in HazardMatch we work with tens of • given a model, determine which instances best match thousands of spatial instances (polygons) described using the model. This would be used by someone who has standard taxonomies of environmental modeling such as a model of where gold can be found, and they want to rock type, geomorphology and geological age. To date we find which piece of land is most likely to contain gold, based on their model. 1 http://www.bgs.ac.uk/bgsrcs/ 2 http://micronex.golinfo.com These models and instances are typically described by dif- 3 http://minerals.cr.usgs.gov/team/depmod.html ferent people at different levels of abstraction (some use 4 http://www.em.gov.bc.ca/Mining/Geolsurv/ have worked with approximately ten models of landslide room hazards which we compare with the spatial instances. This work is quite different to other work on combining bathroom bedroom livingroom probability and ontologies [Ding and Peng, 2004; Pool, Fung, Cannon and Aikin, 2005; Costa, Laskey and Laskey, tvroom 2005] because we are using the ontologies to construct a rich hypotheses space rather than (only) having probabil- masterbedroom kidsbedroom ities over the ontologies. The running example we use in this paper is one where we can describe apartments and/or houses and their models. Figure 1: Part of a taxonomic hierarchy of room types. 2 Models and Instances Instances are things in the world. We describe instances by The ontologies provide a hypothesis space over which we naming them and specifying their features (values on vari- can have probability distribution. We consider that prob- ous properties). For example, an instance could be a partic- abilistic models (scientific theories) that makes probabilis- ular rock outcrop, a volcano that used to exist, or apartment tic prediction about a domain will provide the uncertainty #103 at 555 Short St. A feature of that apartment could be knowledge about properties and relations. that its size is large and it contains two bathrooms. Models are concepts in someone’s head that describe some 4 Describing Model and Instances phenomenon of interest. For example, someone may have a model of what rocks are likely to contain gold, a model of We adopt the OWL [McGuinness and van Harmelen, 2004] where landslides may occur, or a model of an apartment terminology of describing domains in terms of individuals, that Sue would be content to live in. In the system we classes and properties. consider here, models are named and described in terms of probability distributions over the features of an instance 4.1 Instances that manifests that phenomenon. For example, the gold model will specify the probability over the features of a An instance is described by its value on various properties. particular instance that is likely to contain gold. The land- This can include its relationship to other individuals (e.g., slide model will specify the probability over the features its parts). We, however, do not only want to state positive for a particular location that predict whether that location facts, but also negative facts such as that an apartment does is prone to landslides. The model of Sue’s apartment will not contain a bedroom, or that the kitchen is a red colour specify the features that predict whether Sue would be ex- but is not a pink (without enumerating all of the non-pink pected to like a particular apartment. red colours). Thus we will represent instance descriptions with the quadruples of form: Given an instance and a model, the aim of matching, in the context of this paper, is to determine the probability that the hindividual, property, value, truthvaluei instance manifests the phenomenon of the model. where truthvalue is either present or absent For example, to say that an apartment has a master bed- 3 Ontologies room, but does not have a kid’s bedroom we could write: The models and instances are described at different levels hapt1, containRoom, masterbedroom, presenti of abstraction using ontologies. As part of the ontologies we assume that we have taxonomic hierarchies that specify hapt1, containRoom, kidsbedroom, absenti the vocabulary for different levels of abstraction. The tax- It is important to distinguish an instance from its descrip- onomic hierarchy defines the hierarchical relationship be- tion. An instance is a real physical thing that exists in the tween concepts. Figure 1 shows an example of a taxonomic world we are reasoning about (the real world at some time, hierarchy. A bedroom is a kind of room. A masterbedroom some temporally extended world, or even some imaginary is a kind of bedroom. In this figure, room is the topmost world). A description is a set of quadruples. class. We do not assume that the ontologies include uncertainty 4.2 Models about properties and relations. Ontologies are created and maintained by communities, which can agree on vocabu- Models describe abstract instances rather than any partic- lary, even if they do not agree on probabilities and models. ular instance. For example, apartment model Apt 13 may Apt 13 br1 and the arc connecting these two individuals represent p1 quadruple hApt 13, containsRoom, br1, p2i. hasSize p2 p3 containsRoom containsRoom large p4 br1 br2 hasType 5 Abstraction hierarchies and probabilities hasType containsBed p5 p6 bedRoom p7 masterBedRoom bed1 containsBed When matching a model with an instance, we need to take p8 hasType bed2 into consideration the type uncertainty (because the in- stance and model are at varied levels of abstraction). To softbed p9 hasType cope with type uncertainty, we consider that taxonomic hi- hardbed erarchies in the ontology are associated with probabilities. In particular, given a taxonomic hierarchy, we want a mech- Figure 2: A Semantic network representation of apartment anism that can compute P(C j |Ck ), where C j is the subClas- model Apt 13. sOf Ck . This is the probability that an individual is a C j given all that you know about it is that it is a Ck . describe features that Sue would love to have in an apart- We are not considering that the probabilities associated ment, e.g., she usually wants a master bedroom in her apart- with hierarchies are part of individual models. We are con- ment5 . In particular, a model describes an instance that ex- sidering them as a part of super model. hibits some phenomenon. It specifies what must be in an In this paper we consider only taxonomic hierarchies which instance, what cannot be there and what we would expect are trees and where we can compute P(C j |Ck ), as discussed to be there. in Section 5.1. We are working on techniques for comput- A model describes a set of related individuals. One of these ing P(C j |Ck ), when hierarchies are not trees, and where we individuals is the designated top-level individual. For ex- need to consider the problem of multiple inheritance, and ample, in model Apt 13 the individuals are the apartment, interdependence between subclasses. bedrooms, beds etc, and the designated top-level individual is the apartment. 5.1 Tree abstraction hierarchies A model is described in terms of quadruples of the form: Each class in a taxonomic hierarchy specifies a probability distribution over its immediate subclasses. That is, each hind, prop, val, probi link in the tree hierarchy is associated with a conditional probability. This is the probability that an individual is in where ind is an individual, prop is a property, val is either a class C j , given that all you know about it is that it is in a an individual or a class or a primitive value (depending on class Ck , and that C j is the immediate subClassOf Ck . For whether prop is an objecttype property, or a hasType prop- example, the class room in the hierarchy shown in Figure 1 erty, or a datatype property), and prob is the probability. has a probability distribution over its immediate subclasses. This quadruple specifies that an instance individual that has Suppose we have as part of this distribution: value val on property prop will matches the model individ- ual ind with probability prop. P(bedroom|room) = 0.3 Example The semantic network representation of part of an apartment model Apt 13 is shown in Figure 2. P(bedroom|room) represents the probability that a random The nodes represent individuals, classes and data types. room is a bedroom. The top object in a semantic network represents the Similarly, we can specify the probability of an immedi- individual that we are talking about. The individual ate subClassOf bedroom given bedroom, with probabilities Apt 13 in Figure 2 is an apartment. Each arc is la- such as: beled with a probability. The value val associated with probability prob, individual ind, and arc from ind to P(masterbedroom|bedroom) = 0.2 val, labeled with property prop, represent quadruple hind, prop, val, probi. For example, individuals Apt 13, P(masterbedroom|bedroom) represents the probability 5 For this paper do not think of these as preferences. We could that a room is a masterbedroom given all that you know have a similar matcher for preferences, but this paper is about about it is that it is a bedroom. models of uncertainty. Think of the model of what Sue would like as the probability that she will move into the apartment and The prior probability that an individual is in a class can be still be there after 6 months. This is, in fact, what the landlord is computed in a recursive manner by multiplying the prob- interested in. abilities up in the tree. The probability that an individual belongs to root class (room) is 1 (as it represents the set of the instance. We want to determine the posterior probabil- all individuals). That is, P(room) = 1. For example, given ity of M ∼ i given the i’s description, which specifies the the probability as above, P(masterbedroom) can be calcu- probability that the instance i manifests the phenomenon lated as follows: that the model is modeling. P(masterbedroom) In general, Mk ∼ i j represents that model individual Mk matches the instance individual i j , where a model individ- = P(masterbedroom|bedroom) × ual is one of the individuals described in the model (i.e., P(bedroom|room) × P(room) it is one of the first elements of a quadruple), and an in- = 0.2 × 0.3 stance individual is one of the individuals described in the instance description. In this representation, computing the probability that i ∈ Ck We cannot directly determine the match between model given that i ∈ C j is linear in depth difference of C j and Ck and instance unless we know which model individuals cor- and otherwise is not a function of the hierarchy’s size. respond to which instance individuals. We use Mk = i j to denote that model individual Mk corre- 6 Supermodel sponds to instance individual i j and Mk =⊥ to denote that individual Mk does not corresponds to any instance indi- As discussed in Section 4.2 a model describes a concrete vidual. A role assignment is a list of correspondence state- instance that matches that model. In particular, it speci- ments of the forms Mk = i j , Mk =⊥ such that each Mk ap- fies what must be in an instance, what cannot be there and pears exactly once in the list and each i j appears at most what we would expect to be there. However, a model does once. not specify what happens when the model doesn’t hold (as that depends on what other models there are, and the back- Note that match, ∼, does not define the role assignment. It ground probabilities). The role of the supermodel is to pro- defines the degree of match, given a role assignment. vide background information (that is beyond any model) Given a role assignment, the model description defines on how likely individual—property—value triples are. In a Bayesian network. The problem of matching a model particular, the super model contains the following: M with an instance i reduces to computing P(M ∼ i|observation) from the constructed Bayesian network, • the supermodel contains the probability distribution of where observation is the instance i’s description. each class in the tree abstraction hierarchies as dis- cussed in Section 5.1. 7.1 Construction of Bayesian network • the supermodel contains quadruples of the form: Given a role assignment, the semantic network defines a Bayesian network. We can construct it dynamically during hcl, prop, val, priori the inference as follows: where cl is a class in the taxonomic hierarchy, prop • there is a Boolean node Mk ∼ i j for each correspon- is a property, val is either an individual or a class or dence statement Mk = i j , where i j 6=⊥ of the role as- a primitive value, and prior is the prior (background) signment. probability. • there is a Boolean node for each correspondence state- That is, the prior probability that an individual of ment Mk =⊥ of the role assignment, which we will type cl has value val for property prop is avail- write hMk =⊥i. This node will be observed with value able from the supermodel. For example, quadruple true. hroom, hasColour, “green′′, 0.4i tells us that the prior probability of a random room has “green” colour is • for each individual Mk in the model description and 0.4. for each functional property prop such that prop is hasType or datatype, there is a random variable which we will write hMk , propi. The domain of hMk , propi 7 Probabilistic Matching is the range of property prop. One objective of the matcher is to rank the models or in- • for each individual Mk in the model description and stances given instance and model descriptions. The basic for each non-functional property prop such that prop problem is to match an instance with a model. When we is datatype or the range of prop is class (i.e., prop say that a model M matches an instance i, we write M ∼ i is hasType) and for each value V in the range of to mean that M matches with i. Note that M is the top-level prop, there is a Boolean variable, which we will write individual in the model and i is the top-level individual in hMk , prop,V i. • the parent of each Mk ∼ i j node, and each hMk =⊥i • The probability distribution for each hMk , prop,V i node, is node M p ∼ i p such that there is a directed node conditioned on its parent Mk ∼ i j is: edge from M p to Mk in the semantic network (i.e., P(hMk , prop,V i = true| Mk ∼ i j = true) = p quadruple M p , prop, Mk , prob exists in the model description). P(hMk , prop,V i = true| Mk ∼ i j = f alse) = prior where p is the probability associated with value • the parent of each hMk , Pi, and each hMk , P,V i node is V in the semantic network. That is, quadruple node Mk ∼ i j . hMk , prop,V, pi exists in the model description. The • the probability distribution of each Mk ∼ i j node prior probability prior is defined by the supermodel, conditioned on its parent M p ∼ i p is: i.e., quadruple hMk , prop,V, priori exists in the super- model. P( Mk ∼ i j = true| M p ∼ i p = true) = p Example Consider matching the apartment model P( Mk ∼ i j = true| M p ∼ i p = f alse) = prior Apt 13 as shown in Figure 2 with instance apt1 defined as where p is the probability associated with the indi- follows: vidual Mk in the semantic network. That is, quadru- ple M p , prop, Mk , prob is the part of the model de- hapt1, hasSize, “large′′i scription. The prior probability prior is taken from hapt1, containsRoom, R1, presenti quadruple M p , prop, Mk , prior that exists in the su- hR1,type, masterbedroom, presenti permodel. hR1, containsBed, b1, presenti hb1,type, bed, presenti • the probability distribution for each hMk =⊥i node hapt1, containsRoom, R2, presenti conditioned on its parent M p ∼ i p is: hR2,type, room, presenti P(hMk =⊥i = true| M p ∼ i p = true) = 1− p For the individual br1 of the model as shown in Figure2, P(hMk =⊥i = true| M p ∼ i p = f alse) = 1 − prior we can have the following possible mappings: where p is the probability associated with the indi- br1 = R1 vidual Mk in the semantic network. That is, quadru- br1 = R2 ple M p , prop, Mk , p is the part of model description. br1 =⊥ The prior probability prior is taken from quadruple When br1 maps to R1, we can have the following possible M p , prop, Mk , prior that exists in the supermodel. mappings for br2: • The domain of a hMk , propi node is the range of prop. To specify the conditional probability of br2 = R2 hMk , propi node conditioned on its parent Mk ∼ i j , br2 =⊥ we do not have the distribution over all the values of hMk , propi, rather, we have the probability for values When both model and instance have many individuals of that model cares about. The conditional probability the same types there are many possible role assignments. P(hMk , propi | Mk ∼ i j is: For the role assignment: Apt 13 = apt1, br1 = R1, bed1 = b1, br2 =⊥, the semantic network shown in Figure 2 de- – If the range of prop is class, fines a Bayesian network as shown in Figure 3. P(hMk , propi | Mk ∼ i j ) is: The Boolean variable hApt 13 ∼ apt1i denotes whether P(hMk , propi ∈ V | Mk ∼ i j = true) = prob model Apt 13 matches with instance apt1. The Boolean P(hMk , propi ∈ V | Mk ∼ i j = f alse) = prior variable hbr1 ∼ R1i represents whether individual br1 of the model matches with the individual R1 of the instance. – If prop is datatype property: The Boolean variable hbr2, ⊥i represents whether individ- ual br2 of the model does not map to any individual of P(hMk , propi = V | Mk ∼ i j = true) = prob instance. P(hMk , propi = V | Mk ∼ i j = f alse) = prior The conditional probabilities of the Bayesian network shown in Figure 3 are constructed using the supermodel where prob is the probability associated with quadru- and Apt 13’s description. Some of these probabilities are shown below: ple hMk , prop,V, probi in the model description. The prior probability prior is taken from the quadruple P(hApt 13 ∼ apt1i = true) = p0 hMk , prop,V, priori that exists in the supermodel. P(hApt 13, hasSizei = “large”| hApt 13 ∼ apt1i = true) = p1 Apt 13 p1 hasSize p2containsRoom containsRoom p3 large br1 br2 p4 p6 hasType hasType containsBed containsBed p5 p7 bedRoom bed1 masterBedRoom bed2 p8 p9 hasType hasType softbed hardbed + role assignmnet: Apt 13 = apt1, br1 = R1, bed1 = b1, br2 =⊥ hApt 13 ∼ apt1i hbr1 ∼ R1i hApt 13, hasSizei hbr2 =⊥i hbed1 ∼ b1i hbr1, hasTypei hb1, hasTypei Figure 3: A Bayesian network defined by the semantic network shown in Figure 2 for the role assignment: Apt 13 = apt1, br1 = R1, bed1 = b1, br2 =⊥. When hApt 13 ∼ apt1i = f alse, P(hApt 13, hasSizei = conditional probability of hMk , propi ∈ v is: “large”| hApt 13 ∼ apt1i = f alse) is the prior probability that Apt 13 is of size “large”. P(hMk , propi ∈ v| M p , prop ∈ V ) = 1.0 After constructing a Bayesian network, given a role assign- P(hMk , propi ∈ v| M p , prop 6∈ V ) = P(v|¬V ) ment, from the semantic network, we want to compute the Using Bayes rule, probability P(v|¬V )can be posterior probability of M ∼ i, i.e, P(M ∼ i|observation). computed as follows: The observation is the instance i’s description, which can be at different level of abstraction than M’s description. In P(v) − P(V) the Bayesian network shown in Figure 3, the model in- P(v|¬V ) = 1 − P(V) dividual br1 maps to instance individual R1. The model specifies hbr1,type, bedroom, p4i and instance specifies where P(v) and P(V ) are the probabilities of hR1,type, masterbedroom, presenti, which are at different classes v and V respectively. We can compute level of abstraction. To insert the evidence in the con- P(v) and P(V ) using the probabilities associated structed Bayesian network, we need to take this differ- with the abstraction hierarchies6 as discussed in ence into consideration. In particular, we need to map Section 5.1. the instance description to the evidence for the constructed Bayesian network. 7.3 Inference 7.2 Mapping an instance description to evidence We can compute the posterior probability of match, P(M ∼ i|observation), from the constructed Bayesian network us- An instance description is a set of quadruples of the forms ing any standard inference algorithms, e.g., VE [Zhang and hik , prop,V, presenti, and hik , prop,V, absenti. We map the Poole, 1994]. The posterior probability of match depends instance description to the evidence for the constructed on the role assignments of the individuals. We maximize it Bayesian network as follows: over all possible role assignments. • the quadruple hik , prop,V, presenti, if prop is non- 8 Conclusion functional, provides observation: hik , prop,V i = true for the constructed Bayesian network. In this paper, we have proposed a framework for decision making in rich domains, where we can describe the obser- • the quadruple hik , prop,V, absenti, if prop is non- vations (or instances) in the world at multiple levels of ab- functional, provides observation: hik , prop,V i = straction and detail and have probabilistic models at differ- f alse. ent levels of abstraction and detail, and be able to use them to make decisions. We can build knowledge-based deci- • the quadruple hik , prop, v, presenti, if prop is sion tools in various domains such as mineral exploration functional and datatype, provides observation: and hazard mappings, where we need to have probabilistic hik , propi = v. reasoning and rich ontologies. • the quadruple hik , prop, v, presenti, if prop is func- Acknowledgments tional objecttype or prop is hasType, provides obser- Rita Sharma was partly supported by a MITACS Postdoc- vation: hik , propi ∈ v. We have two cases: toral fellowship. – If the observation hik , propi ∈ v implies “true” or “false” for the node M p , prop , such that M p = References ik exists in the role assignmnet, we do the normal (usual) conditioning in the Bayesian network. Costa, P., Laskey, K. and Laskey, K. [2005]. Pr-owl: A – If the observation hik , propi ∈ v does not imply bayesian ontology language for the semantic web, “true” or “false” for the node M p , prop , such Proceedings of the ISWC Workshop on Uncertainty that M p = ik exists in the role assignment (i.e, v Reasoning for the Semantic Web. is the superclass of value V of M p , prop ), we Ding, Z. and Peng, Y. [2004]. A probabilistic extension to provide soft evidence for the node M p , prop in ontology language owl, Proceedings of the 37th An- the constructed Bayesian network. nual Hawaii International Conference on System Sci- For the soft conditioning, we create an observed ences (HICSS’04). child hMk , propi of M p , prop , Mk is the same type as M p . We observed hMk , propi ∈ v. The 6 These are provided by the supermodel. Hart, P. [1975]. Progress on a computer-based consultant, Proceedings of International Joint Conference on Ar- tificial Intelligence, pp. 831–841. McGuinness, D. and van Harmelen, F. [2004]. Owl web ontology language overview, W3C Recommendation 10 February 2004, W3C . Pool, M., Fung, F., Cannon, S. and Aikin, J. [2005]. Is it worth a hoot? qualms about owl for uncertainty rea- soning, Proceedings of the ISWC Workshop on Un- certainty Reasoning for the Semantic Web. Poole, D. and Smyth, C. [2005]. Type uncertainty in ontologically-grounded qualitative probabilistic matching, Eighth European Conference on Symbolic and Quantitative Approaches to Reasoning with Un- certainty (ECSQARU-2005), pp. 763–774. Smyth, C. and Poole, D. [2004]. Qualitative probabilis- tic matching with hierarchical descriptions, Proceed- ings of Ninth International Conference on the Prin- ciples of Knowledge Representation and Reasoning (KR-2004). Zhang, N. and Poole, D. [1994]. A simple approach to Bayesian network computation, Proc. of the 10th Candian Conference on Artificial Intelligence, pp. 171–178.