Cross-domain Correspondences for Explainable Recommendations Aaron Stockdill Grecia Garcia Garcia Advait Sarkar Daniel Raggi Holly E. A. Sutherland advait@microsoft.com Mateja Jamnik Peter C.-H. Cheng Microsoft Research aaron.stockdill@cl.cam.ac.uk g.garcia-garcia@sussex.ac.uk Cambridge, UK daniel.raggi@cl.cam.ac.uk h.sutherland@sussex.ac.uk mateja.jamnik@cl.cam.ac.uk p.c.h.cheng@sussex.ac.uk University of Cambridge, UK University of Sussex, UK ABSTRACT Humans use analogies to link seemingly unrelated domains. A mathematician might discover an analogy that allows them to use mathematical tools developed in one domain to prove a theorem in another. Someone could recommend a book to a friend, based on understanding their hobbies, and drawing an analogy between them. Recommender systems typically rely on learning statistical Figure 1: The numbers 1 through to 5 as rows of dots. correlations to uncover these cross-domain correspondences, but it is difficult to generate human-readable explanations for the corre- spondences discovered. We formalise the notion of ‘correspondence’ between domains, illustrating this through the example of a simple mathematics problem. We explain how we might discover such n correspondences, and how a correspondence-based recommender system could provide more explainable recommendations. KEYWORDS n+1 representation, correspondence, analogy, concept mapping Figure 2: The general relationship for n dots from the case ACM Reference Format: n = 5; by counting the whole grid and halving the result, we have n(n + 1)/2 black-edged dots. Aaron Stockdill, Daniel Raggi, Mateja Jamnik, Grecia Garcia Garcia, Holly E. A. Sutherland, Peter C.-H. Cheng, and Advait Sarkar. 2020. Cross-domain Cor- respondences for Explainable Recommendations. In Proceedings of IUI work- shop on Explainable Smart Systems and Algorithmic Transparency in Emerging them; both are advanced techniques to reach the solution n(n + 1)/2. Technologies (ExSS-ATEC’20). Cagliari, Italy, 8 pages. This algebraic representational system (hereafter a representation) is formally appropriate, but may not be cognitively appropriate. 1 INTRODUCTION We can re-represent Equation (1) as in Figure 1, but instantiated When explaining a concept, people adapt their explanation for their to n = 5, where numbers are rows of dots. Modulo generalisation, audience [12]. This switch can be motivated by the concept itself, these representations show the same key information, but their the audience’s knowledge, or the motivation behind sharing the cognitive features are different: the dot diagram hints at the closed concept [11, 15]. Each factor must be balanced by the explainer form solution whereas in the symbolic representation this is missing. before choosing a representation for the explainee. Now, observing that we have a triangle, applying simple counting Consider the problem of finding a closed form solution to the and reflection rules yields the solution in Figure 2. sum of integers between 1 and n. The ‘formal’ presentation might Symbolic algebra and grids of dots are not equivalent representa- be tions: the former can encode many concepts that the latter cannot. Õn i (1) Nor are the two forms of the problem equivalent: our remark ‘mod- i=1 ulo generalisation’ exposes one difference. But the statements do which is compact and unambiguous. But it presupposes an under- capture the same important properties [13]: integers, associativity, standing of higher mathematical notation, while providing no clues commutativity, equality, etc. to the closed form solution. One might have to perform an induc- A complication in preserving important properties across repre- tion, or identify an equivalent series and work algebraically with sentations is that the properties take different forms. The symbol 2 becomes ◦◦, while + becomes stacking. Similar to analogy or struc- ExSS-ATEC’20, March 2020, Cagliari, Italy tural similarity, we define correspondences [13] as links between Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). properties which ‘fill the same role’ in their respective representa- tions. Correspondences are thus both a heuristic for transformation ExSS-ATEC’20, March 2020, Cagliari, Italy Stockdill et al. between representations, and part of a justification for why a par- how important they are to the problem q, and how ‘good’ those ticular representation may be suitable for the given problem. reasons are in general (the correspondence strength s). In this paper we explore precisely what correspondences are Previous work around structure-mapping constructs correspon- and how we can discover them, with the aim of helping users to dences between two analogous concepts by matching their internal discover and understand analogies. Our work is part of a pipeline to associations [4]. This requires well-defined internal structure—in automatically recommend alternative representations to help users our digitally catalogued world, such a requirement is increasingly solve mathematics problems. We envisage a complete intelligent easily met. But structure-mapping assumes we already know that tutoring system capable of understanding the individual student, two concepts are related, we just need to work out how they are and adapting to their needs and preferences dynamically, for each related. When we are searching for an analogy, we do not know this. problem they are tasked to solve. This paper represents one step Thus, the purpose of our research is to discover related concepts towards this goal. through analogy. We have an implementation of the concepts discussed in this 2 MOTIVATION paper; the code is available at https://github.com/rep2rep/robin. Analogical reasoning is a powerful tool [7]. Higher-order corre- spondences describing relationships allow deeper understanding 3 CORRESPONDENCES of both the source and target statements by exploiting their struc- In this section, we formalise the idea of correspondence first intro- ture [5]. From education to scientific discovery, insight occurs when duced in [13]. disparate ideas are brought together [2, 6, 8, 16]. We previously introduced in [13] a rep2rep framework for auto- 3.1 Transformations matically analysing and suggesting to users a suitable alternative A correspondence associates an encoding of content in one repre- representation for their current problem. The aim is for such intel- sentation to an encoding of the same content in another represen- ligent systems to tailor this suggested representation to the needs tation, along with a measure of how well that content is preserved of the person solving the problem. Our framework enables one to between them. When both source and target representations are describe problems, representations, and the correspondences be- specified, the correspondences describe how they are analogous; tween them in a way that can be input to an algorithm which then when only the source representation is specified, the correspon- produces the appropriate recommendation. The framework pro- dences specify requirements. In the example from Section 1, dots vides templates which analysts can fill with many kinds of values: fulfil the requirements: they can encode integers and summation. A types, tokens, patterns, and other properties. Types describe the representation like the lambda calculus would also fulfil the require- grammatical roles in a representation, while tokens are what fills ments, but a Venn diagram representation would not. If instead our these roles. Patterns are ‘conceptual groupings’ of properties which problem required encoding trigonometric functions, our dot repre- form recognisable units for expert users. Analysts can associate sentation and Venn diagram representations would fail to satisfy attributes to properties: this could be types, descriptions, counts, or the requirements, and not be recommended. other information. Our framework is purposefully general: repre- Formally, we define a correspondence as a triple sentations are extremely diverse, and we want to encode as many as possible. ⟨ p1, p2, s ⟩ (3) Correspondences between representations are catalogued by where p1 and p2 are formulae of properties and s is a real value analysts. This passes the burden of discovering analogies to a hu- between 0 and 1 [13]. The property formulae use the connectives man ahead of time; when our algorithm suggests an appropriate and, or, and not with the expected interpretations. This allows representation, it selects relevant correspondences from this set. for sophisticated relationships between representation properties. This simplifies the recommendation task considerably, but it leaves Value s denotes the strength of the correspondence: when s = 0, analysts with a formidable cataloguing challenge, so even partially there is no correspondence, the content is unrelated; when s = 1, automating this step is a high priority. We shall explore our ap- there is a perfect correspondence, the content is the same. Í proach to this problem in Section 4. If we wanted to encode the correspondence ‘a is like stacking The final recommendation from the rep2rep algorithm1 is a list either horizontally or vertically’, we first identify the properties of potential representations ordered high-to-low by their formal involved. Properties consist of a kind, a value, and attributes. In score. We define the formal score vqr for representation r to encode this case, we have a ‘token’ kind of property from an algebraic problem q as representation, q Õ vqr = sc · ic · matchqr (c) p1 = (token, , a 1 ) Í (2) c ∈C (with attributes abstracted to a 1 for this example), while we have where C is the set of correspondences, sc is the strength of corre- two ‘pattern’ properties from the dots representation, q spondence c, ic is the importance of c for problem q, and matchqr (c) is an indicator function equal to 1 when c is satisfied by the proper- p2 = (pattern, stack-horizontal, a 2 ) ties of q and r , and 0 otherwise. Intuitively, we are counting reasons and p3 = (pattern, stack-vertical, a 3 ) why representation r is good, and weighting those reasons by both (where the attributes are similarly abstracted). The correspondence 1 The recommendation is presently based only on the formal properties; ongoing work would thus be incorporates the user profile into the decision. ⟨ p1, p2 or p3, 0.9 ⟩. (4) Cross-domain Correspondences for Explainable Recommendations ExSS-ATEC’20, March 2020, Cagliari, Italy Note the key word either in our specification: these target properties To address these concerns, we define the strength of the corre- are not independent, so we do not write two correspondences. spondence ⟨ p1, p2, s ⟩ to be Instead, we use a disjunction connective to make the sufficiency of Pr(p2 | p1 ) − Pr(p2 ) one property or the other explicit. This is a strong correspondence, s= . (6) 1 − Pr(p2 ) and hence has a high strength of s = 0.9. When Pr(p2 | p1 ) < Pr(p2 ), we redefine the correspondence to be Our definition of correspondences makes them explainable. Be- between p1 and not p2 : cause we can inspect which correspondences are activated, we can Pr(not p2 | p1 ) − Pr(not p2 ) Pr(p2 ) − Pr(p2 | p1 ) describe the analogy. Beyond saying two things are alike, correspon- = . (7) dences encode precisely how the two things are alike; with strength, 1 − Pr(not p2 ) Pr(p2 ) we can justify how much they are alike. Thus, a description of how Both are undefined when Pr(p1 ) or Pr(p2 ) are 0 or 1. We consider this and why an analogy was made could be explicated to the user: to an acceptable compromise, as this indicates either an ‘impossible’ represent summation, consider stacking the dots horizontally or property, or a ‘guaranteed’ property. Neither case is useful in our vertically. framework. Informally, Equation (6) states the increase in probability of ob- 3.2 Property probabilities serving p2 relative to the maximum potential increase of observing If we consider words to be the atomic units of written English, we p2 . The numerator is the difference between the informed and un- can compute an occurrence probability derived from its frequency informed probability of p2 , while the denominator is the difference in a particular corpus. Similarly for representations, the property is between perfect knowledge of p2 and the uninformed probability the atomic unit; each has an occurrence probability derived from its of p2 . This balances the need to measure the difference in proba- frequency in a particular corpus.2 Under a Bayesian interpretation, bility against the confounding effect of p2 already having a high we can assign each property a prior probability. Thus each property probability. Similarly, Equation (7) informally states the decrease in p in a representation is associated with a probability Pr(p). probability of p2 relative to the potential decrease of p2 . While having a baseline probability for individual properties is Correspondence strength is related to two existing concepts: useful, context gives us further clues. For two analogous problems mutual information [3], and Kullback-Leibler (KL) divergence [9]. in different representations, knowing which properties are present Mutual information is a symmetric measure of how much infor- in one will update our knowledge of the properties present in other. mation is shared between two distributions. This symmetry makes Returning to our example of adding integers, observing the oper- Í it inappropriate for our use-case: correspondences are not neces- ator in Equation (1) primes us to expect stacking—whether horizon- sarily symmetric, one ‘direction’ can be stronger than the other. tal or vertical—in Figure 1. The conditional probability Pr(p2 | p1 ) KL divergence is asymmetric, but not bound to the interval [0, 1]. expresses our knowledge about p2 after observing that p1 is present. Because properties behave as Bernoulli random variables, we can We define Pr(p2 | p1 ) as per the usual definition, adapted to our normalise the KL divergence to the interval [0, 1] with information domain and corpus. For a representation A, we can write problems content: KL(Pr(p1 | p2 ) ∥ Pr(p1 ))/I(p1 ). But KL divergence forms ai . Each problem can potentially be transformed into suitable prob- a leaky abstraction. As we will see in Section 4, strength as de- lems TB (ai ) in representation B. Note that TB (ai ) is a set; more than fined in Equation (6) encapsulates relationships between property one transformation might be appropriate. Using the count operator formulae only in terms of prior probabilities and strength; KL di- #, and sat(p, q) as a predicate on whether the property formula p is vergence requires calculations on posterior probabilities that are satisfied by the properties in question q, we have not necessarily available. Í  b j ∈ TB (ai ) | sat(p1, ai ) ∧ sat(p2, b j ) 4 EXPLANATIONS i# Pr(p2 | p1 ) = . (5) Correspondences are central to our representation recommendation i # b j ∈ TB (ai ) | sat(p 1 , ai ) Í  process, so we need to understand how they are discovered and how they can be interpreted as explanations for the recommendation. Transformations can be imperfect; we consider the transformation appropriate if a human expert would be able to reach the same 4.1 Discovering correspondences solution as under the original problem statement. Consider the correspondence between numbers and dots, specifi- cally, between 2 and ◦◦: these fill the same role in their respective 3.3 Strength representations. Hence, Correspondence strength must reflect the analogical similarity of ⟨ (token, 2, {hasType = number}), the constituent property formulae. Naively, this is the conditional (token, ◦◦, {hasType = dot-arrangement}), 1.0 ⟩ probability, but conditional probabilities are not directly compara- ble: Is there a big difference to the prior probability? How much Where does this come from? For a human this is ‘obvious’, but could the probabilities be different? it must be deduced from somewhere. We propose five rules to automatically discover correspondences split into three groups: identity, correspondence-based, and representation-based. What 2 For now we assign prior probabilities based on expert knowledge. In future work follows is a high-level summary; further technical details are in we will compute such occurrence probabilities from large corpora of problems and Appendix A. Figure 3 shows a diagrammatic interpretation of four representations. of the rules. ExSS-ATEC’20, March 2020, Cagliari, Italy Stockdill et al. Table 1: Some properties of the algebraic and dot represen- a1 c1 tations. We will use these to build correspondences. b1 c2 a2 b2 Repr. Properties Algebraic (type, number, œ) Figure 3: A diagram of three representations, each with two (token, 1, {hasType = number}) properties. Some properties are related through an attribute, Dot (type, dot-arrangement, œ) the dashed arrow. Correspondences are solid arrows. Some (token, ◦, {hasType = dot-arrangement}) example discoveries are: a 1b1 and b1c 1 generate a 1c 1 with [cmp]; a 1b1 generates a 2b2 with rule [val]; a 2b2 generates a 1b1 with rule [atr]; and a 2b2 generates b2a 2 with rule [rev]. Because these two representations are disjoint—no properties are equivalent, sharing a kind and value—the rule of identity [idy] The rule of identity states that two properties with the same cannot be used. We must provide an initial seed correspondence, kind and same value are perfectly corresponding: something the analyst might notice quickly. For this, we use p1 ≡ p2 ⟨ (token, 1, {hasType = number}), [idy] (8) (token, ◦, {hasType = dot-arrangement}), 1.0 ⟩ ⟨ p1, p2, 1.0 ⟩ where the relation ≡ is the string match on kinds and values. This That is, 1 corresponds perfectly to ◦. From this seed we can apply rule implies that correspondences are reflexive with strength 1, and rules to generate new correspondences. First, we apply the rule of allows naturally overlapping representations (e.g., algebra and a reversal to associate ◦ with 1: tabular representation may reuse numbers) to map cleanly into one ⟨ (token, 1, {hasType = number}), another. (token, ◦, {hasType = dot-arrangement}), 1.0 ⟩ [rev] Correspondence-based rules build new correspondences from ⟨ (token, ◦, {hasType = dot-arrangement}), existing ones. The first is the rule of reversal, (token, 1, {hasType = number}), 0.9 ⟩ ⟨ p1, p2, s ⟩ Notice the strength reduced slightly: this is because in the catalogue [rev] (9) ⟨ p2, p1, s ′ ⟩ a 1 occurs more often than a ◦. (In this case, ◦◦ is not two ◦s, they where are different dot arrangements.) Pr(p1 ) 1 − Pr(p2 ) Using the correspondence between ◦ and 1, we can discover s′ = s · · . (10) 1 − Pr(p1 ) Pr(p2 ) correspondences between their attributes. Applying the rule of This allows us to ‘walk backwards’ along a correspondence. The attributes, we have a correspondence between numbers and dot second is the rule of composition, arrangements. The derivation is shown in Equation (14) on the ⟨ p1, p2, s ⟩ ⟨ p2, p3, s ′ ⟩ following page, where we abuse the attribute notation to state the [cmp] (11) attributes share a label. ⟨ p1, p3, s · s ′ ⟩ We can continue applying rules in this way to discover new allowing us to chain together correspondences. Note that due to possible correspondences. A richer set of properties would allow independence assumptions, the correspondence between p1 and p3 for more rule applications and more discoveries. might be stronger than calculated; the product s ·s ′ is a conservative suggestion. 4.2 Correspondences as explanations Finally, representation-based rules exploit the internal structure Understanding both the definition and source of correspondences, of each representation to suggest new ‘parallel’ correspondences, we can interpret their function in making a recommendation. Cor- as illustrated in Figure 3 by a 1b1 and a 2b2 . The two rules are dual: respondences provide two modes of explanation: descriptive expla- the rule of attributes nation, and constructive explanation. ⟨ p1, p2, s ⟩ attr(p1, l, e 1 ) attr(p2, l, e 2 ) [atr] (12) A descriptive explanation is useful when two structures are con- ⟨ e 1, e 2, s ⟩ sidered the same; for example, our statements about summing num- and the rule of values bers in Equation (1) and arranging dots in Figure 1 are analogous. ⟨ e 1, e 2, s ⟩ attr(p1, l, e 1 ) attr(p2, l, e 2 ) How they are analogous might be unclear, but the correspondences [val] (13) ⟨ p1, p2, s ⟩ that link them form an explanation. Consider our correspondence where attr(p, l, e) is a predicate asserting that property p has an ⟨ (token, 1, a 1 ), (token, ◦, a 2 ), 1.0 ⟩ attribute with label l and entry e. These rules allow us to use an linking 1 and ◦. Or consider the more sophisticated correspondence ‘internal relationship’ (encoded with attributes) such as type infor- Í from Equation (4) linking the operator with stacking. By inform- mation to build inter-representational correspondences. ing the user that these are the strongest correspondences which Let us explore these rules in action using our examples of al- are satisfied by both the source and target statements, we explain gebra and dots. Table 1 lists a subset of the properties from each how they are analogous. representation.3 3 Taking this table as the universe of properties, there are eight potential correspon- final as an exercise for the reader; there are at least two different derivations of the dences, four of which are meaningful. Here we provide one, derive two, and leave the final correspondence. Cross-domain Correspondences for Explainable Recommendations ExSS-ATEC’20, March 2020, Cagliari, Italy ⟨ (token, ◦, {hasType = dot-arrangement}), (token, 1, {hasType = number}), 1.0 ⟩ hasType = hasType [atr] (14) ⟨ (type, number, œ), (type, dot-arrangement, œ), 1.0 ⟩ Conversely we can consider constructive explanations. If a target (type, title, œ) (type, character, œ) (type, actor, œ) structure is not known, we can use correspondences to describe what properties the analogous statement should have. Consider (token, Blade Runner, {isThe = title}) again Equation (1), our algebraic problem statement, but this time (token, Harrison Ford, {isThe = actor}) without knowing the equivalent statement in dots. Using the same (token, Rick Deckard, {isThe = character, correspondences from our descriptive explanation, we can suggest playedBy = Harrison Ford}) to a student that this problem might be solved by representing 1 as ◦, and to use either vertical or horizontal stacking to convey Í the . Such hints can support progress through the problem, and Figure 4: Example properties for a Films category. potentially reveal to the student deep insights into numbers and summation. Contrast our correspondences with alternative approaches: log- (type, title, œ) (type, character, œ) (type, writer, œ) ical or statistical recommendations [1]. Logical approaches are (token, The Caves of Steel, {isThe = title}) restrictive, requiring strong guarantees about the domains while (token, Isaac Asimov, {isThe = writer}) failing to capture the inherently fuzzy nature of connections people make between domains. In particular, analogies which are merely (token, Elijah Baley, {isThe = character}) ‘good enough’—in the sense that they have the shape of similarity without being rigorously correct—cannot be reasoned with formally. Figure 5: Example properties for a Books category. Statistical approaches solve the problems associated with logical approaches, but obscure the underlying reasoning. Without any internal structure, the suggestions are opaque to the user: there is attributes4 preserved [5]. Indeed, Gentner defines an analogy to no justification or support on why this is analogous, and thus a be ‘a comparison in which relational predicates, but few or no good recommendation. Correspondences aim to allow a degree of object attributes, can be mapped from base to target’ [5]. Our rules uncertainty through strength while retaining sufficient formality address this with the ‘hasType’ attribute: through the types of through properties to provide valuable explanations. patterns or relation-tokens, we can extract these relation mappings. The extended type system accommodates higher-order relations. Our property framework can be extended beyond typing rela- 5 DISCUSSION tions with other attributes, which act as meta-relations; these are Correspondences are one part of our representation recommenda- automatically used by the existing discovery rules. tion pipeline. Previous work has shown the recommendations in general are meaningful [13]. But it is worth analytically examining 5.2 Generality the quality of correspondences in isolation. In this section we con- Correspondences, and the rules to discover them, were developed sider correspondences and the discovery rules with respect to their within our rep2rep representation recommendation framework. But theoretical motivation and their generality. A discussion about the the underlying idea—that analogical similarities across domains are theoretical limitations of correspondences is in Appendix B. discoverable—abstracts to a more general setting. We now define the necessary components to apply correspondences, and gener- alise the discovery rules on these components (full details are in 5.1 Cognitive grounding Appendix C). First, we can observe that the rules [rev] and [cmp] are context- Human reasoning takes many forms, but broadly we can describe independent, so we can leave them alone. Second, we observe that reasoning as deductive, inductive, or abductive [10]. The simplest ‘properties’ can be made opaque; instead we assume only a set of to automate—captured by the identity, reversal, and transitivity atoms. The [val] and [atr] rules relied on attributes from prop- rules—is deductive reasoning. Through operations on existing cor- erties, so must be replaced. Instead, we define a single rule [rel], respondences we can deduce new correspondences. Thus, corre- which is the same except for replacing attr(p, l, e) with R(p, e) for spondences form a sort of logic. some relation R on atoms p and e. Finally, we keep the equivalence While the rules of identity, reversal and transitivity are moti- relation ≡ from the [idy] rule; for each setting define it as a special vated by deductive reasoning, attribute- and value-correspondence equivalence relation on the atoms. In this way we leave behind rules are motivated by inductive reasoning. Specifically, we assume the context of representation selection, and instead have a general a meaningful structure must be preserved, and this structure is correspondence framework for any domain. exposed through relationships. By observing and following these relationships, we can infer new correspondences. 4 Attribute here is used in the manner of Gentner [5]; these are not what we define Foundational work by Gentner explored how analogical power is as attributes. All other uses of attribute outside direct quotes are according to our proportional to the relations preserved, compared to the superficial definition of attribute. ExSS-ATEC’20, March 2020, Cagliari, Italy Stockdill et al. To demonstrate this generalisation, we can encode the problem of REFERENCES recommending a product to a customer, rather than a representation [1] J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. 2013. Recommender to a student. Our sets of atoms are no longer representations, but systems survey. Knowledge-Based Systems 46 (2013), 109–132. [2] Joan Condell, John Wade, Leo Galway, Michael McBride, Padhraig Gormley, product categories: books and films, for example. We use the rep2rep Joseph Brennan, and Thiyagesan Somasundram. 2010. Problem solving tech- property structure to help us manage the encoding, so our atoms niques in cognitive science. Artificial Intelligence Review 34, 3 (Oct 2010), 221–234. [3] Thomas M. Cover and Joy A. Thomas. 2005. Elements of Information Theory. John remain (kind, value, attributes). Our kinds are limited to types and Wiley & Sons, Ltd, Hoboken, NJ, USA. tokens, but the values are diverse, ranging over names, places and [4] Brian Falkenhainer, Kenneth D. Forbus, and Dedre Gentner. 1989. The structure- dates. We keep our relations as attributes, but use new labels such as mapping engine: Algorithm and examples. Artificial Intelligence 41, 1 (1989), 1–63. ‘isThe’, or ‘playedBy’. The equivalence relation ≡ is string equality [5] Dedre Gentner. 1983. Structure-mapping: A theoretical framework for analogy. on kinds and values. We give some sample properties in Figures 4 Cognitive Science 7, 2 (1983), 155–170. and 5. [6] Dedre Gentner. 2002. Analogy in Scientific Discovery: The Case of Johannes Kepler. Springer US, Boston, MA, USA, 21–39. In our simple example, we can apply the rule of identity to [7] Naomi Goldblum and Shifra Glick. 2001. The Brain-Shaped Mind: What the Brain title, and also to character, because these two categories overlap. Can Tell Us About the Mind. Cambridge University Press, Cambridge, UK. [8] Tom Hope, Joel Chan, Aniket Kittur, and Dafna Shahaf. 2017. Accelerating From this we can apply the rule of value using the ‘isThe’ attribute Innovation Through Analogy Mining. In Proceedings of the 23rd ACM SIGKDD and character correspondence, thus deducing that Rick Deckard International Conference on Knowledge Discovery and Data Mining (KDD ’17). corresponds to Elijah Baley. The derivation is: ACM, New York, NY, USA, 235–243. [9] S. Kullback and R. A. Leibler. 1951. On Information and Sufficiency. The Annals of Mathematical Statistics 22, 1 (1951), 79–86. Let p1 = (token, Rick Deckard, {isThe = character, [10] Gerhard Minnameier. 2010. The logicality of abduction, deduction, and induction. In Ideas in action: Proceedings of the applying Peirce conference. Nordic Pragmatism playedBy = Harrison Ford}) Network Helsinki, 239–251. [11] Daniel Moody. 2009. The “Physics” of Notations: Toward a Scientific Basis for and p2 = (token, Elijah Baley, {isThe = character}) in Constructing Visual Notations in Software Engineering. IEEE Transactions on Software Engineering 35, 6 (2009), 756–779. ⟨ (type, character, œ), (type, character, œ), 1.0 ⟩ [12] Johanna D Moore. 1993. What Makes Human Explanations Effective?. In Pro- ceedings of the 15th Annual Conference of the Cognitive Science Society. Lawrence attr(p1, isThe, character) attr(p2, isThe, character) Elbaum Associates, Hillsdale, NJ, USA, 131–136. [val] [13] Daniel Raggi, Aaron Stockdill, Mateja Jamnik, Grecia Garcia Garcia, Holly E. A. ⟨ p1, p2, 1.0 ⟩ Sutherland, and Peter C.-H. Cheng. 2019. Inspection and Selection of Representa- tions. In Intelligent Computer Mathematics, Cezary Kaliszyk, Edwin Brady, Andrea Kohlhase, and Claudio Sacerdoti Coen (Eds.). Springer International Publishing, Over such constrained property sets this is trivial, but exemplifies Cham, 227–242. the application of our framework to a new domain. [14] JA Robinson. 1971. Computational logic: the unification algorithm. Machine Intelligence 6 (1971), 63–72. [15] Iris Vessey. 1991. Cognitive Fit: A Theory-Based Analysis of the Graphs Versus Tables Literature. Decision Sciences 22, 2 (1991), 219–240. 6 SUMMARY AND CONCLUSIONS [16] Rina Zazkis and Peter Liljedahl. 2004. Understanding Primes: The Role of Repre- sentation. Journal for Research in Mathematics Education 35, 3 (2004), 164–186. We introduced and motivated a theory of correspondences: relation- ships between property formulae which ‘fill the same role’ across representations. Our theory of correspondences is grounded in A DISCOVERY RULES propositional logic and probability, with rules for extending the In this paper we introduced five rules to discover new correspon- set of correspondences in an interactive loop with human analysts. dences. Here we build on that brief overview. Our analysis demonstrates the cognitive basis of these discovery rules, and the generality of correspondences. A.1 Attribute and value We described correspondences’ use in building analogies for Attributes are used to suggest new correspondences based on ex- problem solving through a simple counting problem, and sketched isting correspondences. Looking again at our example, we might how the same frameworks could be adapted to a domain such as state that a 2 from our algebraic representation is like a ◦◦ from product recommendation. Further applications include automated our dots representation. That is, we have the correspondence and interactive theorem proving, business visualisation tools, di- agnostic and reporting systems, and many other domains where ⟨ (token, 2, {hasType = number}), clear communication with justification is paramount. (token, ◦◦, {hasType = dot-arrangement}), 1.0 ⟩ In future work we will explore generalisations of rules to work The two corresponding properties both have entries for the ‘hasType’ over formulae, automatic description generation from correspon- attribute; perhaps those entries themselves correspond? This is the dences, and exploring the philosophical position of correspondences first rule for correspondence discovery: in knowledge representation. ⟨ p1, p2, s ⟩ attr(p1, l, e 1 ) attr(p2, l, e 2 ) [atr] (15) ⟨ e 1, e 2, s ⟩ ACKNOWLEDGMENTS Aaron Stockdill is supported by the Hamilton Cambridge Interna- where attr(p, l, e) is a predicate asserting that property p has an tional PhD Scholarship. This work was supported by the EPSRC attribute with label l and entry e. Similarly we can define the dis- Grants EP/R030650/1 and EP/R030642/1. We wish to thank the four covery rule over matching attributes: anonymous reviewers, whose comments helped to improve the ⟨ e 1, e 2, s ⟩ attr(p1, l, e 1 ) attr(p2, l, e 2 ) [val] (16) presentation of this paper. ⟨ p1, p2, s ⟩ Cross-domain Correspondences for Explainable Recommendations ExSS-ATEC’20, March 2020, Cagliari, Italy which is identical, but with the property (p) and entry (e) corre- Importantly, s does not necessarily equal s ′ : in one direction, the spondences swapped. correspondence might be strong, while the reversed direction might By way of example, using our 2/◦◦ correspondence and the [atr] be weaker. This is analogous to Bayes’ Theorem: if it is raining, I rule, we can derive a correspondence between number and dot can be very confident the grass is wet; if the grass is wet, I might arrangements. The original properties are in correspondence, they suspect it is raining but cannot be sure. both have an attribute with a common label, and so we conclude To determine the reversed strength s ′ , we return to the definition that the values of those labels also correspond. of correspondence strength, Equation (6). We can show that These rules, and in particular the [val] rule, expose the limi- Pr(p1 ) 1 − Pr(p2 ) tations of this system:5 such a rule will generate many nonsense s′ = s · · . (18) 1 − Pr(p1 ) Pr(p2 ) results. All numbers have the type ‘number’, and all dot arrange- ments have the type ‘dot-arrangement’, but we do not want the That is, the reversed correspondence strength s ′ is the original Cartesian product of numbers and dot arrangements as correspon- correspondence strength s modulated by the ratio between the dences! While unfortunate, this rule redeems itself when consider- probability of each property formula being satisfied or not. Our ing more complex types. Consider the binary operator +, a token assumption that Pr(p2 | p1 ) ≥ Pr(p2 ) guarantees s ′ will be between in the symbolic algebra representation; it has type 0 and 1. Similarly to reversing correspondences, we can compose cor- number × number → number. respondences. If we have that p1 corresponds to p2 , and p2 corre- Consider also stacking dots, a pattern in the dots representation; it sponds to p3 , then we can conclude that p1 in some way corresponds has type to p3 . ⟨ p1, p2, s ⟩ ⟨ p2, p3, s ′ ⟩ dot-arrangement × dot-arrangement → dot-arrangement. [cmp] (19) ⟨ p1, p3, s · s ′ ⟩ Given the correspondence between numbers and dot arrangements, Consequently the strength of composed correspondences is automatically associating these is a valuable contribution to the monotonically decreasing; longer chains result in weaker corre- correspondence set.6 spondences. This is a conservative estimate which results from Extending these rules to work on correspondences which contain the independence assumption: p1 and p3 might be more strongly property formulae is conceptually subtle, but in practice simple. corresponding, but this is expert knowledge which the analyst must Rather than requiring the antecedent correspondence to be ex- provide. actly between the properties or attributes under consideration, it While the reversal rule generalises to formulae trivially, the is sufficient that there be an implication relationship between the composition rule is more subtle. Specifically, we can move from re- property formulae in the correspondence, and the properties or quiring exact equality on property p2 to requiring only implication. attributes in the representation: from property to correspondence That is, we can rewrite the composition rule as on the left, and from correspondence to property on the right. For ⟨ p1, p2, s ⟩ p2 → p2′ ⟨ p2′ , p3, s ′ ⟩ example, we might use correspondences that contain the or and . ⟨ p1, p3, s · s ′ ⟩ and connectives in the [atr] rule: Knowing more about the properties and the representations these ⟨ p1 or pX , p2 and pY , s ⟩ attr(p1, l, e 1 ) attr(p2, l, e 2 ) formulae are acting over would allow even greater control: we ⟨ e 1, e 2, s ⟩ could move any unsatisfied properties from p2′ to p1 , for example. More generally, if we have a property pa , and a correspondence But without careful representation checks this would build repre- containing the left property formula pa′ , we can still use the [atr] sentation formulae which are heterogeneous—they could only be and [val] rules if and only if pa → pa′ . Conversely, if we have a satisfied by a blended representation. This is beyond the scope of property pb , and a correspondence containing the right property the project. formula pb′ , we need pb′ → pb . A.3 Identity A.2 Reversal and composition The final correspondence discovery rule is identity: Representation tables are not the only source of information that p1 ≡ p2 [idy] (20) we can use to suggest correspondences. The set of correspondences ⟨ p1, p2, 1.0 ⟩ itself can be used to discover correspondences through two rules: where p ≡ p ′ is true if and only if p and p have identical kinds and reversal, and composition. values. This rule of identity links representations that overlap; we A reversed correspondence is exactly as expected: if we know know that this is a correspondence between the two representations. p1 corresponds to p2 , then we can be infer that p2 corresponds to This can help bootstrap the other discovery rules: if properties have p1 as well. the same kind and value, but their attributes are different, then we ⟨ p1, p2, s ⟩ can apply the attribute rule, [atr], for example. [rev] (17) ⟨ p2, p1, s ′ ⟩ 5 We will explore this further in Appendix B. A.4 Order and strength 6 We have extended the rules in (15) and (16) to use unification [14] (treating all types These rules are applied repeatedly until either the analyst is satisfied, as unique type variables) for the Hindley-Milner–style types. or there are no more suggestions. But the rules have no inherent ExSS-ATEC’20, March 2020, Cagliari, Italy Stockdill et al. order, nor are correspondence derivations unique; the same cor- C GENERALISED RULES respondence can be discovered multiple times, and with different In Section 5, we briefly discussed how the correspondence frame- strengths. We leave it to the analyst to decide which strength is work generalises to domains outside problem solving. appropriate, but if they do decide to update the strength we have a Thus the overall structure required for correspondences is (S, Pr(· | problem: any correspondence previously derived from the updated ·), ≡, ∼). Set S consists of triples S = (AS , R S , PrS ). Let AS be a set correspondence will in turn need its strength updated. This need of atoms, R S a set of relations on A2S , . . . , AkS , and PrS : AS → [0, 1] propagates, carrying updated strengths through the correspondence be a probability function. Further, define a function Pr(· | ·) : set. AS × AT → [0, 1], where the subscripts S and T refer to tuples To address this, we maintain a dependency graph of derived in S. Similarly we define an equivalence relation ≡ : AS × AT . correspondences to track which correspondences to update. We We also require an equivalence relation ∼: R S × RT between the also use this graph to avoid cycles—ensuring a correspondence is relations of each triple. not used to derive itself. With this generalised structure, we can update our discovery rules for correspondences.7 Reversal and transitivity remain un- changed, as they are independent of the structure. Identity is simple: B INCOMPLETENESS AND UNSOUNDNESS a ≡b [idy] (21) Viewed as an analyst-support tool, our implementation of corre- ⟨ a, b, 1.0 ⟩ spondence discovery is ‘complete’, or more accurately saturating. So long as a and b are equivalent, then we can put them in cor- If there exists a correspondence which can be discovered by these respondence automatically. The equivalence relation should be rules, then our utility will suggest it to the analyst. The tool uses simple; a typical implementation would be equality. an unpruned graph search over states of the correspondence set, The two remaining discovery rules—for attributes, and for values— with edges defined by applying the rules to the correspondence set. collapse down to a single rule. We define the rule on relations as When considering the completeness and soundness of the rules, r ∼ r ′ r (a 1, . . . , an ) r ′ (b1, . . . , bn ) we consider the broader space of valid correspondences. No condi- ⟨ a 1 , b 1 , s 1 ⟩ · · · ⟨ a k , bk , s k ⟩ tions are both necessary and sufficient for a valid correspondence— [rel] (22) such conditions would constitute general knowledge—so our rules ⟨ ak +1, bk+1, s ′ ⟩ ··· ⟨ a n , bn , s ′ ⟩ will violate one or both of completeness and soundness. Although where we compute s ′ by both are violated, we have erred towards avoiding generating a lot k 1 Õ of unsound correspondences. This trade-off avoids overwhelming s′ = si . (23) n − 1 i=1 the analyst with many meaningless correspondences, at the cost of potentially missing rare-but-insightful correspondences. This rule states that if the equivalent predicate is satisfied by some Which correspondences get discovered (completeness) depends number of corresponding atoms, then the remaining atoms may on the starting set of correspondences. Assuming an empty set also correspond. We only consider the case where k ≥ 1. with totally disjoint representations—that is, there are no prop- Note in s ′ the difference between the sum limit k and the fraction erties with identical kinds and values—the search is immediately denominator n − 1. This reflects the proportion of the predicate terminated. Even beyond this degenerate case, correspondences already satisfied: if the predicate is almost complete, the strength which are ‘isolated’ are still unreachable. should be closer to that of the antecedent correspondences; if the Reaching these unreachable correspondences in a principled predicate is mostly inferred, the strength should be appropriately way is difficult. Assuming our descriptions of representations are weaker. The normalising term n − 1 occurs because we fix k ≥ 1, complete, we can generate the countably infinite stream of corre- leaving at most n − 1 degrees of freedom. In the binary case, this is spondences; then all such isolated correspondences are reachable. a copy of the antecedent correspondence strength, s ′ = s. But the correspondences generated will be mostly meaningless, Much like in the special case of attributes and values, we do not leaving the analyst in no better position than when they started. need a literal equality between the atoms in the relation and the Thus we will not attempt completeness until stricter correspon- atoms in the correspondences. We need only that for atom ai in the dence validity conditions are discovered. relation r and property formula ai′ in the left of the correspondence Conversely, we are unable to eliminate incorrect (unsound) cor- that ai → ai′ , and for the atom bi in the relation r ′ and property respondences from our suggestions. This is primarily an issue of formula bi′ in the right of the correspondence that bi′ → bi . meaning: the tokens and types in a representation are given sym- bols that are meaningful to an analyst. When two symbols occur in the same relations, they are effectively indistinguishable from synonyms. Using types for discovery presents an obvious example: both 2 and 76 have the same type, number, so when presented with ◦◦—of type dot arrangement, known to correspond to number—do we create a correspondence with 2, 76, both, or neither? For such simple examples, domain-specific knowledge can act as a heuristic, but more generally we hit the ‘general knowledge’ problem: how 7 In the following rules, we abstract away the matching on S , T ∈ S , etc. We assume do we know, a priori, which correspondence is more meaningful? that a ∈ AS , b ∈ AT , r ∈ R S , and r ′ ∈ RT .