Implementing Uncertainty in a Logic Programming Framework Trevor Martin Artificial Intelligence Group, Department of Engineering Mathematics University of Bristol Bristol BS8 1TR, UK email: trevor.martin@bris.ac.uk Abstract. We briefly outline the need to incorporate uncertainty and flexibility into the semantic web knowledge representation, and argue that support logic programming - a combination of soft computing and logic programming – is one way to address this problem. To retain consistency with emerging frameworks, we show how a support logic program modelling uncertainty in relations and rules can be compiled into crisp Horn clauses suitable for reasoning within a system which does not explicitly model uncertainty. In cases where there is uncertainty in attribute values, an extension to the inference mechanism is required. Keywords: Soft semantic web, support logic, probabilistic rules, Fril, program transformation. 1 Introduction Recent development of the semantic web has included the adoption of logic programming (in various guises) as a key element in processing semantic web rules. In turn, this has led to a re-emergence of the question of whether it is necessary to include an explicit mechanism for handling uncertainty in the representation, and how best to process the uncertainty. During the “boom” years of logic programming, much research effort (see e.g. [1-4]) went into combining logic programming with various uncertainty calculi, both from the theoretical and implementation viewpoints. Fril is a practical implementation of logic programming with uncertainty, which has been used in a number of applications (see for example chapter 8 of [5]). In this paper, we outline some of the design features of Fril in the belief that they contain relevant lessons for the semantic web. We focus on implementation aspects, not on properties such as expressibility or model-theoretic semantics which are covered elsewhere [4, 6-8]. Perhaps the key issue in the implementation of an uncertain logic programming language is whether to incorporate the processing of uncertainty as a core part of the system or to build a meta-level on top of an existing (crisp) implementation. The latter approach tends to be slower (as one is effectively building an interpreter) but has the advantage of compatibility with existing systems; the former approach gives the implementer more control and flexibility. Fril was implemented as an extended Warren Abstract Machine, and took the first route above. In section 3 we outline the use of transformations to run uncertain logic programs within a standard system (without uncertainty in attribute values). This may be a more productive route, given the increasing commercial interest in implementing semantic- web related systems (e.g. Oracle 10 includes RDF management). 1.1 The Need for Uncertainty The semantic web aims to bridge the human-machine communication gap “… the theme is human beings doing the thinking and machines helping it work on a larger scale …” [9], p 187-8). In order to achieve this, the semantic web provides definitions for terms used in natural language, and a logic-style framework for reasoning. Representations based on first order logic have many advantages - logic is easily understood and provides a powerful computing formalism through logic programming. However, human language is far more subtle and expressive than a formal model, and deals with many concepts that are defined by common usage rather than by necessary and sufficient conditions. Such vague concepts are widespread in attribute values (Mary’s age is young), relations (80mph is a safe speed on an open highway, Star Wars Episode II is a romantic movie) and rules (person X is likely to have high blood pressure IF person X is overweight ) where the italicised terms are vague. The inherent vagueness used in many natural language terms means there is often a mis-match between natural language and crisp logic-based representations. In common with many knowledge representational formalisms, it is clear that the semantic web requires a mechanism that is able to cope with this type of knowledge. 1.2 How should we deal with uncertainty ? From a mathematical perspective, it is difficult to argue against the use of probability as a tool for handling uncertainty. Indeed, it is relatively easy to show that if one is prepared to bet as an indication of one’s level of certainty about an event then it is not rational to base one’s behaviour on anything other than the laws of probability. However, it is worth remembering the words of Einstein: As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality ("Geometry and Experience" 1921) The assumptions underlying probability include • the availability of precise definitions for events and • procedures to determine whether or not a given event has occurred. Much human knowledge and communication is based on natural language, and one of the strengths of natural language is its capacity to efficiently convey a large amount of information relatively compactly. This relies on a shared understanding of terms, without necessarily sharing precisely the same definition of terms. For instance, an Englishman announcing that he is “travelling to Europe” would be understood to mean somewhere on the opposite side of the channel to England; an American saying the same thing would probably include the UK as a possible destination. The word “Europe” denotes a collection of countries, but its precise definition is elusive - is it a set of countries marked as Europe on a particular map, members of the European union (now? in 1970? in 1975? in 2010?), countries eligible to enter European Championship football, countries eligible to enter the Eurovision song contest .... ? Each “source” has its own definition of the term, but we are able to understand and use the concept in communication without the inconsistency causing a problem. Thus when considering the integration of different sources, we may need to consider uncertainty in class memberships, in relations and in rules. To illustrate, let us consider two ontologies which define the class youngPerson. Since young is not a precisely defined concept, it is likely that the two will adopt different standards – let us say ontology o1 defines all people aged 20 and under as belonging to the class o1:youngPerson and ontology o2 defines all people aged 25 and under as belonging to the class o2:youngPerson. How should we combine these definitions? Two extremes would be to use the union or intersection of the extensions c: youngPerson = o1:youngPerson∩ o2:youngPerson or c: youngPerson = o1:youngPerson ∪ o2:youngPerson where c denotes the combined ontology. A more natural choice would be to acknowledge that the source of the problem is vagueness in the (combined) class definition and say that cases where the separate ontologies disagree should be modelled by an intermediate class membership or probability, related to the proportion of ontologies that would classify a particular instance as included or excluded from the class. In the case above, there is no disagreement in the case of instances with age 20 or less (i.e. o1’s definition), or in ages above 25. However for instances with age in the interval (20, 25] there is intermediate support for their membership in the combined class. This combination is not based on similarities between the labels, but on the fact that the ontologies may classify the same underlying instances into different classes. In the same way, one movie database might class a particular movie as “horror” whilst another labels it as “suspense”. The degree of overlap in the categories can give support for rules integrating both sources [10]. 2 Fril and Support Logic Programming Fril [5] is a logic programming language incorporating uncertainty at a fundamental level. Whilst not (yet) available as a web tool, the framework and implementation are relevant to the handling of uncertainty within the semantic web. The calculus for handling uncertainty is based on mass assignment theory [5, 11, 12], which gives a coherent framework for dealing with both fuzzy and probabilistic uncertainty. Both the language and underlying theory are dealt with in detail elsewhere [5, 13, 14] ; the essential extension compared to a logic program is that clauses may be quantified by support pairs. These are interpreted as intervals containing conditional probabilities, so for the simple rule “p if q” we could write p IF q : ([u1 v1] [u2 v2]) which is interpreted as stating that the conditional probabilities obey u1 ≤ Pr (p|q) ≤ v1 u2 ≤ Pr (p|¬q) ≤ v2 The symbols p and q represent predicates with arguments, not simple propositions. A more general form of rule is also allowed, in which more complex interactions in the body of the rule can be taken into account. For example, given a rule p IF q AND r the general rule takes account of different probability intervals for p|qr, p|q¬r, p|¬qr, p|¬q¬r. This can be used to model Bayesian nets, although the inference is limited to the direction imposed by the rule structure, i.e. conclude rule head from rule body. For simplicity we only consider the basic Fril rule here, which in the above case would only consider intervals for p|qr and p|¬(qr). Clearly this leads to considerable efficiency savings as the number of goals in the rule body increases, and has been found adequate in most practical problems. By convention [u2 v2] defaults to [0, 1] in which case it may be omitted, and [u1 v1] defaults to [1, 1] Unit clauses (facts) may also be quantified by support pairs, in the case that tuples do not completely satisfy a predicate - for example, take the relation Fluent-Speaker ⊆ Person × Language and consider whether tuples (John, English), (John, French), (John, Spanish) satisfy this relation. If John is a native English speaker, and knows no French, then the first two tuples respectively belong and do not belong to the relation. However, if John knows a little Spanish then the final tuple lies between these two extremes. In the voting model [12], each voter must decide whether or not a tuple belongs to the relation and the (point valued) support pair is then the proportion of voters who vote yes. This can be extended to give general support pairs [u v] where u is the proportion voting in favour, 1-v is the proportion voting against and v-u is the proportion of abstentions. Clearly it is meaningless to include a second support pair for unit clauses. Inference in Fril proceeds by constructing a proof tree, in the same depth-first manner as Prolog. The supports of the clauses used in the proof tree are then combined to give a support for the solution. By default, the support for a conjunction of goals is the product of the individual supports, i.e. given two facts q1 : [x1 y1] q2 : [x2 y2] the support for the conjunction q1 AND q2 is [x1*x2 y1*y2]. In general, one can only say that the support lies in the interval [MAX(0, x1+x2-1), MIN(y1, y2) ] but by assuming maximum entropy we can use the product to calculate the support for the conjunction (see [5] for discussion of this point). Consider a single basic Fril rule of the form p IF q1 AND q2 AND … AND qn : ([u1 v1] [u2 v2]) when the following facts are given or derivable qi : [ui vi] 1 ≤ i ≤ n More generally the facts may not completely match the terms in the rule and the support pairs [ai b i] will be determined using semantic unification. Let the combined support for the body of the rule be [x y]. A generalised Jeffrey's rule for support pairs is the basic inference rule of support logic, so that the inference is p : [z1 z2] where % u1" y + u2 " (1# y) if u1 $ u2( % v1" x + v2 " (1# x) if v1 $ v2( z1 = ' * z2 = ' * & u1" x + u2 " (1# x) if u1 > u2) & v1" y + v2 " (1# y) if v1 > v2) For example, the case discussed above can be modelled by the probabilistic rules c:youngPerson(X) IF o1:youngPerson(X) : [1, 1] ! c:youngPerson(X) IF o2:youngPerson(X) : [0.5, 1] where [0.5, 1] is the support or interval probability that the rule head c:youngPerson(X) is true, given that the rule body o2:youngPerson(X) is true. The first rule has a support of [1,1] which is the default value for a rule and is normally omitted for clarity. This approach can be extended to multiple rules, (although we note that it is unable to deal with the case where two ontologies have no overlap in classification since we are then effectively dealing with two different concepts.). If all information is taken into account and negation can be properly handled within the rules, then it is possible to work without intervals e.g. we could have a rule that x is in c:youngPerson with a probability of 0.5 if x is in o2:youngPerson and not in o1:youngPerson. The interval in the second rule arises because we ignore the information given by o1, and only consider o2. This projection onto one ontology simplifies computation considerably at the expense of working with intervals rather than point values. Ding [15] takes a slightly different approach, allowing limited interaction between rules and compiling the uncertainty into Bayes nets. We note in passing that if we consider that each ontology corresponds to an (equally likely) possible world, our approach has several links to probabilistic logic programming described by [16]. We also note that this treatment assumes both ontologies are concerned with the same set of instances. To answer a query, a standard logic programming proof tree is created with extended unification to deal with uncertain attribute values; a probabilistic calculation is then carried out over the proof tree to determine the support for the conclusion. All proof paths are examined to determine the overall support. If multiple proof paths are available for a conclusion, the overall support is the intersection of supports from the individual proof paths. This approach differs from other fuzzifications of Prolog. Uncertainty is expressed as a conditional probability of the rule head given the body. In contrast, most theoretical and practical treatments of uncertainty in logic programming associate uncertainty with the implication. As is well known from fuzzy control and multi-valued logics, there are very many plausible implication operators and much time and effort has been devoted to arguing their relative merits. The support logic programming approach avoids this controversy. 3 Implementing the Support Logic Calculus Inference in a support logic program proceeds by constructing a proof tree. The supports of the clauses used in the proof tree are combined to give a support for the solution. Three different cases need to be considered:  combination of multiple proof paths for an atom  conjunction of different atoms in a rule body  conditional support for the rule head from the rule support and support for the body 3.1 The comb operator In normal logic programming, a single solution to a query is sufficient - it proves that a solution exists, and finds instantiations of the query variables corresponding to this solution. In support logic programming, the situation is different. Consider the rules p IF q p IF r and let us suppose that q and r are true (with different supports). We have two proof paths for the conclusion p, with supports : p : [x1 y1] from the first proof path, and p : [x2 y2] from the second. By default, Fril assumes that the overall support must be consistent with both proof paths, and deduces p : [max (x1, x2), min(y1, y2) ] subject to the support being a non-empty interval. If there is more than one proof path for an atom, we look for a support pair which is compatible with all of them, i.e. the intersection of the intervals. Thus if atom b has support [li ui] from proof tree i, the overall support for b is I [l ,u ] =[max(l ),min(u )] i i i i i i i If this interval is empty, there is an inconsistency in the program. We define the (binary) support combination operator, comb, as ! comb([ l1,u1 ], [ l2 ,u2 ]) = [max( l1,l2 ),min( u1,u2 )] This is monotonic, commutative and associative. comb(S1, S2) ⊆ comb(S3, S2) if S1 ⊆ S3 comb(S1, S2) = comb(S2, S1) ! comb(S1,comb(S2,S3))=comb(comb(S1,S2),S3) for arbitrary support pairs S1, S2, S3 where S=[l,u]. 3.2 The conj operator By default, the support for a conjunction of goals is the product of the individual supports, i.e. given two facts b1 : [lb1, ub1] b2 : [lb2 , ub2] the support for the conjunction b1 AND b2 is given by the conjunction operator conj([lb1, ub1], [lb2, ub2]) = [lb1× lb2 ub1× ub2] which again is monotonic, commutative and associative. In addition, we note that comb distributes over conj i.e. conj (comb( S1,S2 ),S3 ) = comb(conj ( S1,S3 ),conj ( S2 ,S3 )) for arbitrary support pairs S1, S2, S3. Thus if we have two atoms b1 and b2 with k1 proof paths for b1 (with supports S1i , i=1…k1) and k2 proof paths for b2 with supports S2j , j=1…k2, ! we can calculate the overall body support from the supports for the k1*k2 proof paths for the conjunction (b1 AND b2). Because of the associativity of comb and conj, this can!be extended to any finite number of proof paths for a finite number of atoms in a rule body. ! In general, one can only say that the support for a conjunction lies in the interval [MAX(0, lb1,+lb2-1), MIN( ub1 , ub2) ] but by assuming maximum entropy we can use the product to calculate the support for the conjunction (see [5] for discussion of this point). 3.3 The cond operator Consider a single basic Fril rule of the form h IF b1 AND b2 AND … AND bn : Sr where h, bi (i=1…n) are atoms forming the head and body of the rule respectively. Support logic allows a rule support to be of the form [Shb, Shnb] where Shb is the support for the head given that the body is true and Shnb is the support for the head given that the body is not true. If Sr = (Shb, Shnb) = ([lhb, uhb], [lhnb, uhnb]) when the following facts are given or derivable by single proof trees bi j : Sij 1 ≤ i ≤ n, 1 ≤ j ≤ ki (where the body atom bi has ki proof paths). Let the combined support for the body of the rule be Sb, calculated by repeated application of the conj and comb operators defined above. If Sr is the rule support, the support for h is calculated using a ! generalised Jeffrey's rule for support pairs cond(Sr, Sb) = [lh, uh] where % l " ub + lhnb " (1# ub ) if lhb $ lhnb ( % u " l + uhnb " (1# lb ) if uhb $ uhnb ( lh = ' hb * uh = ' hb b * & lhb " lb + lhnb " (1# lb ) if lhb > lhnb ) & uhb " ub + uhnb " (1# ub ) if uhb > uhnb ) (see [5] for discussion of the use of Jeffrey’s rule as a special case of updating mass assignments). By case analysis of this expression, it is easy to see that the cond ! operator is monotonic in the body support, i.e. Sb1 ⊆ Sb2 iff cond(Sr, Sb1) ⊆ cond(Sr, Sb2) for a fixed rule support Sr and body supports Sb1, Sb2. Additionally, cond(Sr, comb(Sb1, Sb2) ) = comb( cond(Sr, Sb1), cond(Sr, Sb2) ) hence we can calculate the support for a solution by finding a proof path, computing its support, then looking for another proof path and intersecting its support with that already found. This is equivalent to a depth-search of the SLD tree [17], recording each solution and its support. In contrast, a straightforward formulation of support logic requires a breadth-search of the tree, where the support at an or-node can only be calculated when all sub-trees rooted at that node have been evaluated. Instead of calculating " " %% [lh ,uh ] = cond$# Sr , conj$#comb i=1…n j=1…ki ( Sij )'' && we find " " %% $ $ j '' ! [lh ,uh ] = comb$cond$ Sr , conj (Si )'' i=1…n # # j=1…ki && where the n-ary operators are obvious extensions from the binary case. In Prolog terms, this corresponds to an “all-solutions” query at the root node only, as compared to all-solutions ! queries for each goal appearing in the search tree. The efficiency gain is significant. We can transform a support logic program to a crisp logic program by adding a single argument to the head and to each body goal, to represent the support computation. Thus a rule h(X,Y) IF b1(X,Z) AND b2(Z,Y) : Sr becomes h(cond(Sr, conj(S1,S2)), X, Y) IF b1(S1, X,Z) AND b2(S2, Z,Y) and a fact b1(a, b) : Sf becomes b1(Sf, a, b) Placing the support as the first argument is an arbitrary but useful convention. A query then changes from ? h (X, Y) to ? h(S, X, Y) followed by evaluation of the support expression S for all proof paths. The compilation of supported clauses to crisp clauses with embedded support arguments is straightforward, as is the mechanism to find all solutions. Compilation (or run-time translation) of uncertain logic programs into standard logic programs has been implemented before (e.g. [13, 18] by adding extra goals to the rule body. The disadvantage of this approach is that it creates an execution overhead and may disrupt compiler optimisations. In contrast, the method outlined above uses an extra argument and only marginally affects execution speed by adding a single unification step. It fits into the crisp logic programming approach proposed for the semantic web. Hence we can write (or learn) support logic rules and facts, and evaluate uncertain conclusions without having to extend the whole framework. 3.4 Uncertainty in Attribute Values We have so far ignored the question of uncertain attribute values - for example, suppose we know that (i) Mary’s age is 28 or 29, or (ii) Mary is in her late twenties. The first case is representable (but awkward) in logic, the second is essentially fuzzy as different values satisfy “late twenties” to a greater or lesser degree. This can be represented by a fuzzy set, interpreted as a fuzzy constraint on values of the attribute - i.e. there should be a single value, but it is not precisely known. Fuzzy sets can also be used to represent uncertain values in rule bodies e.g. company X performed well in year Y IF turnover-of X in Y is high-turnover AND profit-of X in Y is about10% where the italicised terms are fuzzy sets. This requires us to generalise the unification process since the presence of fuzzy attributes allows partial matching. For example, assume that the height of Bob is above_average and we have a query to identify tall people. Fril uses semantic unification, to determine a conditional probability for the matching of two terms and hence calculates a support pair for the goal. Essentially, a virtual rule is assumed ht of Bob is tall IF ht of Bob is above_average : S where S is calculated from the probability of matching the fuzzy sets. Note that this process is asymmetric - the probability of tall given above_average is not necessarily the same as the probability of above_average given tall. This is obvious if one considers a crisp case - for example, define a small dice value as {1, 2} and even as {2, 4, 6}. Clearly Pr(small | even ) ≠ Pr(even | small ) This process of matching fuzzy sets is known as semantic unification to distinguish it from the normal logic unification method, which is purely syntactic. Semantic unification is a fundamental part of Fril. A support pair is automatically calculated for the match, and incorporated into the overall calculation of support for the query, using probabilistic semantic unification. Further details are given in [5] and details of fast algorithms suitable for implementing semantic unification in a Warren Abstract Machine architecture are given in [19, 20] 4 Summary The aim of the semantic web is to make content both human and machine understandable. We claim that this means avoiding artificially precise definitions and modelling human-understandable terms by adopting a representation that handles uncertainty naturally and efficiently. Considerable existing work has been devoted to incorporating uncertainty into logic programming - uncertain facts and rules can be executed efficiently within a standard framework but uncertain attribute values require more fundamental modification. References 1. Martin, T.P. and F. Arcelli Fontana, Logic Programming and Soft Computing - an Introduction, in Logic Programming and Soft Computing, T.P. Martin and F. Arcelli Fontana, Editors. 1998, RSP/Wiley: UK. p. 1-18. 2. Martin, T.P., J.F. Baldwin, and B.W. Pilsworth, Implementation of Fprolog - a Fuzzy Prolog Interpreter. Fuzzy Sets & Systems, 1987. 23 p. 119-129. 3. van Emden, M., Quantitative Deduction and its Fixpoint Theory. J.Logic Prog, 1986. 3: p. 37-53. 4. Kifer, M. and V.S. Subrahmanian, Theory of generalized annotated logic programming and its applications. J Logic Prog, 1992. 12: 335 - 367. 5. Baldwin, J.F., T.P. Martin, and B.W. Pilsworth, FRIL - Fuzzy and Evidential Reasoning in AI. 1995, U.K.: Research Studies Press (John Wiley). 391. 6. Krajci, S., R. Lencses, and P. Vojtas, A comparison of fuzzy and annotated logic programming. Fuzzy Sets and Systems, 2004. 144(1): p. 173-192. 7. Vojtas, P., Fuzzy logic programming. Fuzzy Sets and Systems, 2001. 124(3): p. 361-370. 8. Damasio, C.V. and L.M. Pereira. Sorted Monotonic Logic Programs and their Embeddings. in IPMU-04. 2004. Perugia, Italy. 9. Berners-Lee, T., Weaving the Web. 1999, London: Texere. 272. 10. Martin, T.P. and Y. Shen, Improving access to multimedia using multi- source hierarchical meta-data, in Adaptive Multimedia Retrieval: User, Context, and Feedback. 2006, Springer. p. 266 - 278. 11. Baldwin, J.F., A Calculus For Mass Assignments In Evidential Reasoning, in Advances in the Dempster-Shafer Theory of Evidence, M. Fedrizzi, J. Kacprzyk, and R.R. Yager, Editors. 1992, John Wiley: N.Y. 12. Baldwin, J.F., The Management of Fuzzy and Probabilistic Uncertainties for Knowledge Based Systems, in Encyclopedia of AI, S.A. Shapiro, Editor. 1992, John Wiley. p. 528-537. 13. Baldwin, J.F., Support Logic Programming, in Fuzzy Sets - Theory and Applications, A. Jones, Editor. 1986, D. Reidel. p. 133-170. 14. Baldwin, J.F. and T.P. Martin, An Abstract Mechanism for Handling Uncertainty, in Uncertainty in Knowledge Bases, B. Bouchon-Meunier, R.R. Yager, and L.A. Zadeh, Editors. 1991, Springer Verlag. p. 126-135. 15. Ding, Z. and Y. Peng. A Probabilistic Extension to Ontology Language OWL. in 37th Hawaii Int Conf on System Sciences (HICSS'04). 2004: IEEE. 16. Dekhtyar, A. and V.S. Subrahmanian, Hybrid probabilistic programs. Journal of Logic Programming, 2000. 43(3): p. 187-250. 17. Lloyd, J.W., Foundations of Logic Programming. 2nd ed. 1987: Springer. 18. Baldwin, J.F., Evidential Support Logic Programming. Fuzzy Sets and Systems, 1987. 24: p. 1-26. 19. Baldwin, J.F. and T.P. Martin. Fast Operations on Fuzzy Sets in the Abstract Fril Machine. in First IEEE Int Conf on Fuzzy Systems. 1992. IEEE Press. 20. Baldwin, J.F., J. Lawry, and T.P. Martin. Efficient Algorithms for Semantic Unification. in Inf. Processing and Management of Uncertainty. 1996. Spain.