=Paper=
{{Paper
|id=Vol-2028/paper4
|storemode=property
|title=Towards a Domain-Independent Method for Evaluating and Scoring Analogical Inferences
|pdfUrl=https://ceur-ws.org/Vol-2028/paper4.pdf
|volume=Vol-2028
|authors=Joseph A Blass,Irina Rabkina,Kenneth D. Forbus
|dblpUrl=https://dblp.org/rec/conf/iccbr/BlassRF17
}}
==Towards a Domain-Independent Method for Evaluating and Scoring Analogical Inferences==
43 Towards a Domain-independent Method for Evaluating and Scoring Analogical Inferences Joseph A Blass, Irina Rabkina, & Kenneth D. Forbus Northwestern University, Evanston, IL, USA {joeblass, irabkina}@u.northwestern.edu forbus@northwestern.edu Abstract. This paper proposes a domain-independent method to evaluate infer- ences for analogical reasoning, via a prototype system. The system assigns ana- logical inferences confidences based on the quality of the mapping and the sys- tem’s confidence in the facts used to generate the inference. An initial imple- mentation is applied to two domains. Keywords: Analogical Reasoning, Inference Evaluation, Confidence. 1 Introduction Any reasoning system which asserts facts through the processing and manipulation of previously known information ought to have a measure of confidence in the accuracy of those newly asserted facts. Even if a given reasoning technique is sound, inferred facts are only as accurate as the assumptions upon which they are based. For example, systems that reason via formal logic produce reliable inferences, but if the reasoning environment is complex enough, or a particular axiom is missing, contradictions may pass undetected. Furthermore, forward chaining systems overgenerate inferences, while backchaining systems are directed, but require a known goal for reasoning. On the other hand, probabilistic systems such as Bayes Nets [1] are good at determining how likely a particular inference is, but require a lot of training data or carefully hand-tuned priors. Analogy is a case-based reasoning technique that constructs an alignment be- tween two cases, with a preference for shared structure, and uses that structure to make inferences from one case to another [2]. Inspired by human cognition, analogical rea- soning does not require a fully articulated domain theory and can work from single examples and partial information. However, the inferences made by an analogical rea- soning system may not be correct, and while there are evaluation measures based on the structure of the mapping and candidate inferences, all of the methods used in previ- ous systems have been domain and/or task specific. This paper proposes a unified approach to evaluating and scoring analogical infer- ences. It integrates logical reasoning, analogical reasoning, and probabilistic reasoning to provide confidence estimates for analogical inferences. We present an initial imple- mentation and some experimental results as a proof of concept of these ideas. Copyright © 2017 for this paper by its authors. Copying permitted for private and academic purpose. In Proceedings of the ICCBR 2017 Workshops. Trondheim, Norway 44 1.1 SME, SAGE, and Cyc The principles underlying our system are domain general. Our implementation uses the Structure-mapping Engine (SME, [3]) and a supplemented Cyc knowledge base [4]. What is important about the Cyc ontology for the present paper is that it provides mi- crotheories. Microtheories serve as contexts, e.g. one microtheory might describe mod- ern-day Chicago, while another describes Chicago as it was during the Fire. Microthe- ories can inherit from each other, e.g. when performing social reasoning, a common microtheory to include is HumanActivitiesMt, which as its name suggests, describes things people commonly do. Microtheories enable locally consistent reasoning, even though the knowledge base (KB) taken as a whole is inconsistent, e.g. there are mi- crotheories describing different, incompatible fictional worlds. For analogical reason- ing, we implement cases as microtheories, which enables reasoning to be done with respect to different cases locally. All reasoning is done with respect to a context, that is, a microtheory and all of the microtheories it inherits from. SME [3] is a computational model of analogy that computes mappings between two structured cases, a base and a target. Each mapping includes correspondences between elements in the two cases, candidate inferences based on those correspondences, and a structural evaluation score calculated based on the structural similarity between the two cases. The higher the score, the more similar the cases and the more trusted the map- ping. The Sequential Analogical Generalization Engine (SAGE [5]) uses SME map- pings to create generalizations between cases. These generalizations can then be used as cases for further SME comparisons. Rather than keep only facts common to all gen- eralized cases, SAGE generalizations are a joint distribution over the facts in all con- stituent cases. Each fact is stored in the generalization together with its probability, that is, the proportion of cases in that generalization that contains it. Only facts whose prob- ability falls below a preset threshold are not included in the generalization. This scheme allows the generalization to maintain information about which facts are likely, not only which are universal. For example, consider a generalization composed of three cases that describe dogs: a Golden Retriever, a yellow Labrador, and a Dalmatian. The gen- eralization will have the fact that a dog has 4 legs with probability 1.0 and the fact that it has yellow fur with a probability of 0.67. The inference evaluation system makes use of these probabilities, along with the structural evaluation score. 2 Inference Evaluation When the system reasons its way to a new fact in a context, it can either be certain it is true, certain it is false, or somewhere in between. The system uses disjointness reason- ing, logical contradiction and implication, and the parameters of SME mappings to de- termine the system’s confidence that an inference is true. All reasoning is done with respect to the context in which the inference is to be asserted. 45 2.1 Disjointness Reasoning, Contradiction, and Implication If the system has inferred that an entity is of a certain type, and there is already a con- textualized assertion that it is of another type that is by definition disjoint from the first, the system simply rejects that inference. For example, if Fluffy is a dog, it cannot assert that it is a cat unless it first retracts that it is a dog. In the Cyc knowledge base, certain collections are marked as disjoint collection types, such that if an entity is an instance of one of those types, it cannot be an instance of another. When our system detects that an inference is of the form (in-Context ?context (isa ?entity ?newType)), it gathers all the other declarations of that entity’s type in context ?context. If any of those other types are disjoint with ?newType, then the system rejects the inference. Inferences can also be rejected if they are contradicted by known implication rules. If there is a rule of the form A -> ~I, where I is the analogical inference, and A is known to be true, the inference can be rejected. Similarly, if there is a rule of the form I -> A, and A is explicitly known to be false, the inference can be rejected. Implication is similar: If there is a rule of the form A -> I, and A is true, the infer- ence has been confirmed. Similarly, if there is the rule ~A -> I, and A is known to be false. The confidence in the implied fact is a function of the confidence assigned to the facts used to imply it. Contrapositives of the rules for implication and contradiction are generated on-the-fly. We do not assume rules are sufficiently complete to generate all inferences generated by analogy. Even if they were, analogy would be useful for focus- ing logical reasoning. The system makes use of forward chaining in a targeted fashion, only for verification, which is more efficient than simply forward-chaining. 2.2 Inferences from Analogical Reasoning When the system derives an inference using an analogical mapping, it may be able to directly prove or disprove it. Failing that, it is desirable to have a measure of the extent to which the inference is trusted. The normalized SME match score is one such signal. Another is the degree to which the facts the inference is based on (in the base and target cases) are trusted. If the base case is a SAGE generalization, then the fact probability in the generalization tells us how likely that fact is within that generalization. For a non- generalized case, the system does not know the extent to which the case itself is an outlier or whether any one fact in the case is core to the overall concept that the case encodes. Inferences from individual cases should be trusted less than high-probability generalization facts, since there is evidence from the generalization that the high-prob- ability facts are more common. Putting it all together, analogical inferences are assigned confidence scores thus: PሺInferenceሻ = ܿݐܽܯℎܵܿ∗ ݐݏݑݎܶ݁ݏܽܤ ∗ ݁ݎ ෑ ܲሺݐሻ ௧ ఢ ௧௧ ௦௨௧ The BaseTrust is as described above: If the base case is a SAGE generalization then: BaseTrust = ෑ ܲሺܾሻ ఢ ௦ ௦௨௧ 46 And otherwise it is set to the default normalizing value (currently, 0.7). Given this formula, confidence scores are always on the interval (0, 1): normalized match scores are on that interval, and fact confidence scores are on the interval (0, 1]. Confidence cannot be zero since zero-confidence facts are simply suppressed, rather than asserted with confidence zero. Normalized match scores are a measure of the degree of overlap between cases, rather than the total amount of information being mapped from one case to another. These can be low but are never zero: for a mapping to be generated at all, there must be some degree of overlap. We use a product, rather than, say, a sum, of these components to keep confidence on the (0,1) interval. Intuitively it makes sense that a fact inferred from many facts should be trusted less than one inferred from only a few (if we are equally uncertain of the supporting facts). The more facts used to support an inference, the greater the chance that one of them is false and that the inference is therefore invalid. If the con- fidence scores were allowed to be greater than one, then the confidence of inferences might become greater as we moved further out along inference chains. The system uses a Truth Maintenance System, which has a single argument to mark a belief as in or out. This renders combining evidence from multiple arguments moot. 2.3 Implementation In the current implementation, facts that are assumed (for example, the details of the case that is to be reasoned about) have a confidence of 1. Our inference evaluator first tries to determine whether an inference is contradicted or implied; if it fails, it checks whether that inference is from analogy and scores it appropriately, and otherwise, as- signs it the default normalizing score. Contradiction and implication are handled using backward chaining from axioms in the knowledge base, using resource bounds. In our implementation, all inferences are given a confidence score and a reason for that score. The reason is the facts and axioms that were used to generate the score. For implied facts, the score is the product of the confidences of the facts that imply it (be- cause perhaps those antecedents are not trustworthy). Contradicted facts are currently simply rejected, although in future implementations they will be scored based on the likelihood of the facts used to reject them. Confidence scoring for analogical inferences is described above. SME mapping scores can be normalized in three different ways, all of which are on in the interval [0:1]. The base normalized score is a measure of how much of the base case is mapped in the mapping, that is, how much of the base case overlaps with the target case. If the target is much larger than the base but the base is highly alignable with a sub-set of the target, the base normalized score will be quite high even if the match score is low. The target normalized score is the corresponding measure for how much of the target case is mapped, and the normalized score is the average of the base and target normalization scores. The default is the average normal score. Base normalization tends to be used in recognition tasks, where covering the entire base is the criteria, whereas target normalization tends to be used in reasoning tasks, where finding precedents that can lead to inferences within a more complex sit- uation is important. 47 3 Evaluation We tested the confidence scoring and contradiction components of this initial imple- mentation on two tasks: Analogical Chaining and Moral Reasoning. Analogical Chain- ing is a commonsense reasoning technique that elaborates a case description by re- peated analogy to small cases called Common Sense Units (CSUs) [6]. These CSUs can be extracted automatically from natural language, and are thus easy to provide to the reasoning system. As analogical chaining uses analogical reasoning, it does not require a fully articulated domain theory or rules constructed by experts, can reason with partial knowledge, and can use the same case for prediction or explanation. Ana- logical chaining has been tested on questions from the Choice of Plausible Alternatives commonsense reasoning test (COPA, [7]). As analogical chaining asserts inferences by analogy, then asserts new inferences building on those previous inferences, it is very valuable to give it a measure of confidence in those inferences. We examined the performance of the inference evaluation system on 11 COPA ques- tions, whose internal representations were automatically extracted from the English text of the question using EA-NLU [8]. These questions were selected because they require repeated analogical inference (i.e., chaining) to solve. The system had a case library of around 50 cases it could retrieve and reason with. For every question tested, the confi- dence scores assigned to inferences were lower the further down they were along the inference chain; this means the inferences that enabled the system to answer the ques- tions had lower confidence scores than the intermediate inferences used to infer them, reflecting the system’s lower confidence the further out it went from established facts. Inference scores ranged from 0.02 (for an inference made only using facts from the COPA question itself) down to 1×10-6 (for an inference several steps removed from the question facts). All but three questions did not involve any dead-end reasoning: ana- logical chaining found the correct answer for those questions without exploring any fruitless inference chains. We will examine two cases that involved dead-end reasoning in detail. One question asks: “The egg splattered. What was the cause of this?” The answers are “I dropped it” and “I boiled it.” The system first hypothesized that the egg splatter- ing was caused by some unknown violent impact, and assigned that inference a confi- dence score of 0.01 (low inference scores are discussed below). It then hypothesized, as an alternative explanation for the egg splattering, that the egg hit the floor. This did not involve the abstract impact from the first inference, but was based only on the ques- tion facts. However, the mapping had a lower match score than the first, so it was given a confidence of 0.0004. The system then pursued, in separate reasoning contexts, ex- planations for the first two inferences. In the system’s case library was a case describing how an object was violently impacted when it was hit with a rock, so it hypothesized that perhaps the unknown impact on the egg was caused by a rock. Despite being based on a higher confidence inference (the first inference asserted, where p=0.01), this in- ference had a low match score and therefore resulted in a score of 2×10-6. Finally, the system used a fourth case to explain the inference that the egg hit the floor by hypoth- esizing that it was dropped. Despite being based on a lower-confidence fact than the inference about the rock, this inference had a higher match score and thus received a 48 confidence of 2×10-5. While low, this score is still an order of magnitude higher than the dead-end hypothesis about the rock based on more highly-trusted initial inference. Another question asks: “The truck crashed into the motorcycle on the bridge. What happens as a result?” The answers are “The motorcyclist died” and “The bridge col- lapsed”. The automatically constructed question representations involve only one state- ment about a motorcycle (and no motorcyclist) but several statements about the crash- ing event (who was involved, where it happened, etc.). The system retrieves cases based on what is present in the case, so it began by reasoning about a familiar case involving a vehicle crashing. In that case the vehicle was an airplane, so the system first posited that perhaps the crash in question involved an airplane malfunctioning (p= 8×10-5). The system then retrieved a story about a child falling out of bed and crashing onto the floor. It used this case to posit that the crash was caused when the truck fell out of bed (p = 9×10-5). Building on the airplane inference, it hypothesized that the airplane lost power (p=1×10-6), then that the motorcycle lost power, (0.012), and finally, having exhausted its knowledge of crashes, that the motorcyclist dies (p=2×10-5). In this case the correct inference has a lower confidence score than all but one of the dead-end inferences. This example illustrates the pitfalls both of Analogical Chaining and of the inference evaluation system. After the system had posited the airplane, it was all too happy to continue reasoning about it, and the match scores were high enough along that reason- ing chain (and low enough for the case that gave it the answer) that those erroneous inferences were scored much higher than the one that seems obvious to humans (hu- mans of course have much prior knowledge the system lacks). The system can be led astray and mask the utility of useful inferences if it marks even one incorrect inference as highly probable. Furthermore, it seems wrong to give the system a hard-and-fast rule stating that airplanes cannot be involved in car crashes. Such a situation may be ex- tremely unlikely, which could be recognized by accumulating cases in a generalization about car and motorcycle crashes, but, as Hollywood has shown us, it’s not impossible. This raises an important point about the interplay between analogical reasoning and first-principles reasoning. Analogical learning can provide explicit evidence of what can happen, because analogical generalizations provide structured, relational probabil- istic representations of what has happened. But analogical learning only implicitly gathers evidence about what cannot happen. First principles reasoning is better at ruling out the kinds of things that are impossible (e.g. vehicles cannot fall out of beds because they cannot fit in them). MoralDM is a computational moral reasoning system that makes decisions by anal- ogy to moral generalizations [9,10]. In one experiment, generalizations are formed from cases that either involve the principle of double effect [11], or do not. This prin- ciple states that harm caused as a side effect of preventing a greater harm is morally acceptable, but not harm caused in order to prevent that greater harm. The canonical example illustrating this principle is that most people say it is morally acceptable to switch a trolley that will hit five people onto a side track where it will hit one person (double effect), but not to instead push someone in front of the trolley to save those same five people (not double effect). In these moral generalizations, the facts indicating whether the case involves double effect and which case-specific action should be taken 49 have probability 1, whereas the facts specific to the case (whether it is a trolley or tor- pedo doing the harm, for example, how many people are hurt, or what the mechanisms are to save those people) have lower probabilities. We took the inferences made in reasoning about moral cases by analogy to these generalizations and checked them with the inference evaluator. This was both to test the generalization normalizing component and to get a sense of whether even highly trusted inferences have low scores (the map- pings that generate these inferences have high unnormalized scores). While the scores for the high-confidence facts are still quite low (in all cases approximately 0.02), the scores for the low-confidence facts are much lower, corresponding to the lower propor- tion of constituent cases in which they appear. In the same mapping where the decision fact was scored at 0.02, for example, the fact about the form the harm took was scored at 0.005. This example demonstrates the utility of taking generalization fact probability into account: had these inferences been made by analogy to an ungeneralized case, the inference evaluator would have given them both scores of 0.014. Using generalization probabilities gives the system a means to assess different inferences from the same mapping. 4 Related Work Most previous work on analogical inference validation has been domain specific. For example, Ouyang and Forbus used first principles reasoning within the physics problem solving domain to validate candidate inferences produced by SME [12]. While the val- idation improved their system’s performance, a complete domain model had to be as- sumed. Similarly, Klenk and Forbus used a small set of hand-encoded heuristics to ver- ify candidate inferences during transfer learning [13]. While these were not a complete domain model, the heuristics were specific to inferences that could be made in the test domain. While the system described in this paper allows for domain-specific verifica- tion (i.e. through implies statements), it is domain-general. Furthermore, unlike previ- ous systems which rated an inference as true or false, the current system allows for intermediate rankings. Similar intermediate rankings have been used to evaluate inferences derived by non- analogical reasoning systems. Examples include fuzzy logic networks [14], Bayesian Logic models (BLOG, [15]), and Markov Logic Networks (MLNs, [16]). By assigning a fuzzy truth space to antecedents, fuzzy logic networks are able to derive fuzzy truth values for inferred consequents. They allow for incomplete domain knowledge, but do require a handwritten set of rules. Fuzzy rules can be used in combination with data sampled in a particular space to rule in or out inferences made within that space ([17]). Fuzzy logic networks assign qualitative truth values (e.g. “mostly true”) to inferences, rather than calculating a quantitative confidence measure. BLOG models and MLNs calculate numerical probabilities for inferences. BLOG models do so by defining a probability distribution over a set of possible worlds deter- mined by prewritten axioms. A Metropolis-Hastings Markov chain Monte Carlo ap- proach can then be used to make inferences from the distribution [18]. Using MCMC increases the time and computational cost of inference scoring in these models. MLNs 50 take a different approach: they define a Markov Network over a set of first-order logic sentences and constants, such that a node exists for each grounding of each predicate and a connecting feature exists for every possible grounding of each sentence [15]. Weights are assigned to these features based on the likelihoods of the sentences they describe. A probability distribution is then specified over the ground network. The structure of the network can be learned, given the sentences their possible groundings [19]. The disadvantage of MLNs is scaling: the network grows with additional predi- cates, as well as additional potential groundings. This also means that every potential grounding of every potential predicate must be present in the training set. The presented inference evaluation technique could be used in other analogical rea- soning systems that score (or could score) the quality of their matches (that is, which have a measure similar to SME’s match score). For example, inferences in AMBR ([20]) have evidence accrued in favor and against them, based on semantic and struc- tural similarity. Top scoring hypotheses are asserted into the reasoning environment, but the amount of evidence in favor of them is not. If this evidence were stored as a confidence measure of facts as they are asserted, future inferences could be made not only on the basis of evidence in favor or against them, but the degree to which that evidence is itself believed. In HDTP (e.g. [21]), analogical mappings are constructed via a process of anti-uni- fication. For example, a formula p1(a) in a base and p2(a) in a target is replaced in the mapping by a general predicate P(a), where P is a generalization of both predicates p1 and p2. A measure of similarity of P to p1 and p2 could be used to score inferences made using formula P(a); the scores of those inferences could then be used to score future inferences made using those inferred facts. In HDTP, inferences are checked for logical consistency; expanding logical consistency checks for inferences is the next extension to be performed on our system. 5 Future Work Even mappings with high unnormalized match scores, indicating a high quality match, may have low normalized match scores, depending on the relative size of the cases and how much information is left out of the mapping. In the current implementation, low confidence scores assigned to analogical inferences were driven largely by low normal- ized mapping scores. Small cases with little structural overlap should yield low-confi- dence scores, since the mappings used to generate the inferences are not seen by the system as being particularly reliable, informative mappings (as indicated by the low score). However, while analogical inferences should have lower scores than logically implied inferences, they should not be vanishingly low. One possibility is to use the highest normalized score as a multiplier in calculating inference confidence scores, ra- ther than always using the same mapping score normalizing function. Each function provides different information, but a high score in either indicates that the mapping includes a high degree of overlap from one case to another. Scoring inferences using the highest normalization score will still involve incorporating the score of the match, the score of the justifying target facts, and the probability of the generalization facts. 51 Given the ubiquity of certain role predicates (objectActedOn, performedBy, etc.) analogical chaining can make some inferences that, to a human, seem quite silly. Hav- ing the ability to rule out those silly inferences using logical forms of commonsense is desirable but is not being done in the current implementation. The Cyc knowledge base contains millions of axioms, but we are currently only using a small subset (the dis- jointWith axioms). We plan to explore reasoning techniques that enable us to exploit more of this knowledge, especially horn clauses and implication statements, for con- straint-checking (e.g. [22]). Contradictions should perhaps be asserted with a confidence proportional to the scores of the facts contradicting them, rather than suppressed entirely. If facts are seen as relatively likely, then the contradiction is also likely. If contradictions are asserted, they will must signal which facts contradict them, to keep reasoning consistent. Many analogy inferences involve positing skolem entities. These are entities present in the base and participating in the candidate inference but which are not present in the target. For example, the event in which the egg was impacted in the above example was posited as a skolem variable. Fundamentally, however, these are open variables, and implication can help resolve them. Contradiction works in a similar way, but in- stead can only rule out resolutions: just because a rule says that a particular individual cannot fill a role does not mean that it says that no one can. Finally, further testing is needed on a wider range of domains, as well as further empirical testing of the analogical inference confidence scoring. While we have veri- fied the implication and contradiction through disjointness components of the inference evaluation system are functioning properly, these need to be empirically tested. We can thereafter examine accruing and weighting evidence for and against facts. 6 Conclusion We presented an initial implementation of a system to evaluate analogical inferences, which have no guarantee of being correct. The system can identify certain inferences as being more likely than others, but further evaluation and extension of the system is needed. Nonetheless, this seems to be a promising direction for inference validation and assessment, and points towards a method for resolving skolem variables in analog- ical inferences. 7 Acknowledgements This research was supported by the Socio-Cognitive Architectures for Adaptable Au- tonomous Systems Program of the Office of Naval Research, N00014-13-1-0470. 52 References 1. Pearl, J., Russell, S.: Bayesian Networks. UCLA Cognitive Systems Laboratory, Technical Report R-277, November 2000. In M.A. Arbib (Ed.), Handbook of Brain Theory and Neural Networks, Cambridge, MA: MIT Press, 157-160 (2003). 2. Gentner, D.: Structure‐Mapping: A Theoretical Framework for Analogy. Cognitive Science, 7(2), 155-170. (1983). 3. Forbus, K. D., Ferguson, R. W., Lovett, A., Gentner, D: Extending SME to Handle Large‐ Scale Cognitive Modeling. Cognitive Science. (2016). 4. Lenat, D.: CYC: A large-scale investment in knowledge infrastructure. Comm. of ACM, 38(11), 33-38 (1995). 5. McLure, M. D., Friedman, S. E., Forbus, K. D.: Extending Analogical Generalization with Near-Misses. Procs. of the 29th AAAI Conf. on Artificial Intelligence, Austin, TX (2015). 6. Blass, J. A., Forbus K. D.: Analogical Chaining with Natural Language Instruction for Com- monsense Reasoning. Procs. Of the 31st AAAI Conference on Artificial Intelligence. San Francisco, CA. pp. 4357-4363 (February, 2017). 7. Roemmele, M., Bejan, C. A., Gordon, A. S.: Choice of Plausible Alternatives: An Evalua- tion of Commonsense Causal Reasoning. AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning (2011, March). 8. Tomai, E., & Forbus, K. D.: EA NLU: Practical Language Understanding for Cognitive Modeling. In FLAIRS Conference. (2009, March) 9. Dehghani, M., Tomai, E., Forbus, K. D., Klenk, M.: An Integrated Reasoning Approach to Moral Decision-Making. Procs. Of the 31st AAAI Conference on Artificial Intelligence. pp. 1280-1286 (2008, July). 10. Blass, J. A., Forbus, K. D: Moral Decision-Making by Analogy: Generalizations vs. Exem- plars. Procs. of the 29th AAAI Conference on Artificial Intelligence Austin, TX. (2015). 11. Foot, P.: The Problem of Abortion and the Doctrine of Double Effect. Oxford Review 5, pp. 5-15. (1967). 12. Ouyang, T. Y., & Forbus, K. D.: Strategy variations in analogical problem solving. Procs. of the 21st AAAI Conference on Artificial Intelligence. pp. 446-451. (2006, July). 13. Klenk, M., & Forbus, K.: Analogical model formulation for transfer learning in AP Phys- ics. Artificial intelligence, 173(18), pp. 1615-1638. (2009). 14. Zadeh, L. A.: Fuzzy logic. Computer, 21(4), pp. 83-93. (1988). 15. Milch B, Marthi B, Russell S.: BLOG: Relational modeling with unknown objects. ICML Workshop on Stat. Rel. Learning and its Connections to Other Fields. pp. 67-73. (2004). 16. Richardson, M., Domingos, P.: Markov Logic Networks. Mac. Learn, 62, 107-136. (2006). 17. Ughetto, L., Dubois, D., Prade, H.: Implicative and conjunctive fuzzy rules-A tool for rea- soning from knowledge and examples. In AAAI/IAAI, pp. 214-219. (1999, July). 18. Milch, B., Russell, S.: General-Purpose MCMC inference over relational structures. Procs. of the 22nd Conference on Uncertainty in Artificial Intelligence. pp. 349-358. (2006, July). 19. Kok, S., Domingos, P.: Learning the structure of Markov logic networks. In Proceedings of the 22nd International Conference on Machine Learning. pp. 441-448. (2005, August). 20. Kokinov, B., Petrov, A: Integrating memory and reasoning in analogy-making: The AMBR model. The analogical mind: Perspectives from cognitive science, pp. 59-124. (2001). 21. Schwering, A., Krumnack, U., Kühnberger, K., Gust, H.: Analogical reasoning with SMT and HDTP. 2nd European Cog Sci Conference, Delphi, Greece. pp. 652-657. (2007). 22. Sharma, A. B., Forbus, K. D.: Automatic Extraction of Efficient Axiom Sets from Large Knowledge Bases. Procs. of the 27th AAAI Conf. on Artificial Intelligence. (2013, June).