=Paper= {{Paper |id=Vol-2582/paper4 |storemode=property |title=Cross-domain Correspondences for Explainable Recommendations |pdfUrl=https://ceur-ws.org/Vol-2582/paper4.pdf |volume=Vol-2582 |authors=Aaron Stockdill,Daniel Raggi,Mateja Jamnik,Grecia Garcia Garcia,Holly E. A. Sutherland,Peter C.-H. Cheng,Advait Sarkar |dblpUrl=https://dblp.org/rec/conf/iui/StockdillRJGSCS20 }} ==Cross-domain Correspondences for Explainable Recommendations== https://ceur-ws.org/Vol-2582/paper4.pdf
                                         Cross-domain Correspondences
                                        for Explainable Recommendations
                 Aaron Stockdill                                     Grecia Garcia Garcia                                Advait Sarkar
                  Daniel Raggi                                       Holly E. A. Sutherland                          advait@microsoft.com
                 Mateja Jamnik                                        Peter C.-H. Cheng                               Microsoft Research
          aaron.stockdill@cl.cam.ac.uk                             g.garcia-garcia@sussex.ac.uk                         Cambridge, UK
           daniel.raggi@cl.cam.ac.uk                                h.sutherland@sussex.ac.uk
          mateja.jamnik@cl.cam.ac.uk                                 p.c.h.cheng@sussex.ac.uk
          University of Cambridge, UK                                University of Sussex, UK

ABSTRACT
Humans use analogies to link seemingly unrelated domains. A
mathematician might discover an analogy that allows them to use
mathematical tools developed in one domain to prove a theorem
in another. Someone could recommend a book to a friend, based
on understanding their hobbies, and drawing an analogy between
them. Recommender systems typically rely on learning statistical                          Figure 1: The numbers 1 through to 5 as rows of dots.
correlations to uncover these cross-domain correspondences, but it
is difficult to generate human-readable explanations for the corre-
spondences discovered. We formalise the notion of ‘correspondence’
between domains, illustrating this through the example of a simple
mathematics problem. We explain how we might discover such                                                                          n
correspondences, and how a correspondence-based recommender
system could provide more explainable recommendations.

KEYWORDS                                                                                                           n+1
representation, correspondence, analogy, concept mapping
                                                                                       Figure 2: The general relationship for n dots from the case
ACM Reference Format:                                                                  n = 5; by counting the whole grid and halving the result, we
                                                                                       have n(n + 1)/2 black-edged dots.
Aaron Stockdill, Daniel Raggi, Mateja Jamnik, Grecia Garcia Garcia, Holly E.
A. Sutherland, Peter C.-H. Cheng, and Advait Sarkar. 2020. Cross-domain Cor-
respondences for Explainable Recommendations. In Proceedings of IUI work-
shop on Explainable Smart Systems and Algorithmic Transparency in Emerging
                                                                                       them; both are advanced techniques to reach the solution n(n + 1)/2.
Technologies (ExSS-ATEC’20). Cagliari, Italy, 8 pages.
                                                                                       This algebraic representational system (hereafter a representation)
                                                                                       is formally appropriate, but may not be cognitively appropriate.
1    INTRODUCTION
                                                                                           We can re-represent Equation (1) as in Figure 1, but instantiated
When explaining a concept, people adapt their explanation for their                    to n = 5, where numbers are rows of dots. Modulo generalisation,
audience [12]. This switch can be motivated by the concept itself,                     these representations show the same key information, but their
the audience’s knowledge, or the motivation behind sharing the                         cognitive features are different: the dot diagram hints at the closed
concept [11, 15]. Each factor must be balanced by the explainer                        form solution whereas in the symbolic representation this is missing.
before choosing a representation for the explainee.                                    Now, observing that we have a triangle, applying simple counting
   Consider the problem of finding a closed form solution to the                       and reflection rules yields the solution in Figure 2.
sum of integers between 1 and n. The ‘formal’ presentation might                           Symbolic algebra and grids of dots are not equivalent representa-
be                                                                                     tions: the former can encode many concepts that the latter cannot.
                              Õn
                                  i                             (1)                    Nor are the two forms of the problem equivalent: our remark ‘mod-
                                       i=1                                             ulo generalisation’ exposes one difference. But the statements do
which is compact and unambiguous. But it presupposes an under-                         capture the same important properties [13]: integers, associativity,
standing of higher mathematical notation, while providing no clues                     commutativity, equality, etc.
to the closed form solution. One might have to perform an induc-                           A complication in preserving important properties across repre-
tion, or identify an equivalent series and work algebraically with                     sentations is that the properties take different forms. The symbol 2
                                                                                       becomes ◦◦, while + becomes stacking. Similar to analogy or struc-
ExSS-ATEC’20, March 2020, Cagliari, Italy                                              tural similarity, we define correspondences [13] as links between
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).                                     properties which ‘fill the same role’ in their respective representa-
                                                                                       tions. Correspondences are thus both a heuristic for transformation
ExSS-ATEC’20, March 2020, Cagliari, Italy                                                                                                        Stockdill et al.


between representations, and part of a justification for why a par-                   how important they are to the problem q, and how ‘good’ those
ticular representation may be suitable for the given problem.                         reasons are in general (the correspondence strength s).
   In this paper we explore precisely what correspondences are                           Previous work around structure-mapping constructs correspon-
and how we can discover them, with the aim of helping users to                        dences between two analogous concepts by matching their internal
discover and understand analogies. Our work is part of a pipeline to                  associations [4]. This requires well-defined internal structure—in
automatically recommend alternative representations to help users                     our digitally catalogued world, such a requirement is increasingly
solve mathematics problems. We envisage a complete intelligent                        easily met. But structure-mapping assumes we already know that
tutoring system capable of understanding the individual student,                      two concepts are related, we just need to work out how they are
and adapting to their needs and preferences dynamically, for each                     related. When we are searching for an analogy, we do not know this.
problem they are tasked to solve. This paper represents one step                      Thus, the purpose of our research is to discover related concepts
towards this goal.                                                                    through analogy.
                                                                                         We have an implementation of the concepts discussed in this
2     MOTIVATION                                                                      paper; the code is available at https://github.com/rep2rep/robin.
Analogical reasoning is a powerful tool [7]. Higher-order corre-
spondences describing relationships allow deeper understanding                        3     CORRESPONDENCES
of both the source and target statements by exploiting their struc-                   In this section, we formalise the idea of correspondence first intro-
ture [5]. From education to scientific discovery, insight occurs when                 duced in [13].
disparate ideas are brought together [2, 6, 8, 16].
   We previously introduced in [13] a rep2rep framework for auto-                     3.1    Transformations
matically analysing and suggesting to users a suitable alternative                    A correspondence associates an encoding of content in one repre-
representation for their current problem. The aim is for such intel-                  sentation to an encoding of the same content in another represen-
ligent systems to tailor this suggested representation to the needs                   tation, along with a measure of how well that content is preserved
of the person solving the problem. Our framework enables one to                       between them. When both source and target representations are
describe problems, representations, and the correspondences be-                       specified, the correspondences describe how they are analogous;
tween them in a way that can be input to an algorithm which then                      when only the source representation is specified, the correspon-
produces the appropriate recommendation. The framework pro-                           dences specify requirements. In the example from Section 1, dots
vides templates which analysts can fill with many kinds of values:                    fulfil the requirements: they can encode integers and summation. A
types, tokens, patterns, and other properties. Types describe the                     representation like the lambda calculus would also fulfil the require-
grammatical roles in a representation, while tokens are what fills                    ments, but a Venn diagram representation would not. If instead our
these roles. Patterns are ‘conceptual groupings’ of properties which                  problem required encoding trigonometric functions, our dot repre-
form recognisable units for expert users. Analysts can associate                      sentation and Venn diagram representations would fail to satisfy
attributes to properties: this could be types, descriptions, counts, or               the requirements, and not be recommended.
other information. Our framework is purposefully general: repre-                         Formally, we define a correspondence as a triple
sentations are extremely diverse, and we want to encode as many
as possible.                                                                                                       ⟨ p1, p2, s ⟩                             (3)
   Correspondences between representations are catalogued by                          where p1 and p2 are formulae of properties and s is a real value
analysts. This passes the burden of discovering analogies to a hu-                    between 0 and 1 [13]. The property formulae use the connectives
man ahead of time; when our algorithm suggests an appropriate                         and, or, and not with the expected interpretations. This allows
representation, it selects relevant correspondences from this set.                    for sophisticated relationships between representation properties.
This simplifies the recommendation task considerably, but it leaves                   Value s denotes the strength of the correspondence: when s = 0,
analysts with a formidable cataloguing challenge, so even partially                   there is no correspondence, the content is unrelated; when s = 1,
automating this step is a high priority. We shall explore our ap-                     there is a perfect correspondence, the content is the same.
                                                                                                                                         Í
proach to this problem in Section 4.                                                     If we wanted to encode the correspondence ‘a is like stacking
   The final recommendation from the rep2rep algorithm1 is a list                     either horizontally or vertically’, we first identify the properties
of potential representations ordered high-to-low by their formal                      involved. Properties consist of a kind, a value, and attributes. In
score. We define the formal score vqr for representation r to encode                  this case, we have a ‘token’ kind of property from an algebraic
problem q as                                                                          representation,
                                             q
                            Õ
                         vqr =          sc · ic · matchqr (c)                                                  p1 = (token, , a 1 )
                                                                                                                           Í
                                                                               (2)
                                 c ∈C                                                 (with attributes abstracted to a 1 for this example), while we have
where C is the set of correspondences, sc is the strength of corre-                   two ‘pattern’ properties from the dots representation,
                 q
spondence c, ic is the importance of c for problem q, and matchqr (c)
is an indicator function equal to 1 when c is satisfied by the proper-                                  p2 = (pattern, stack-horizontal, a 2 )
ties of q and r , and 0 otherwise. Intuitively, we are counting reasons                             and p3 = (pattern, stack-vertical, a 3 )
why representation r is good, and weighting those reasons by both                     (where the attributes are similarly abstracted). The correspondence
1 The recommendation is presently based only on the formal properties; ongoing work   would thus be
incorporates the user profile into the decision.                                                              ⟨ p1, p2 or p3, 0.9 ⟩.                   (4)
Cross-domain Correspondences for Explainable Recommendations                                                           ExSS-ATEC’20, March 2020, Cagliari, Italy


Note the key word either in our specification: these target properties                  To address these concerns, we define the strength of the corre-
are not independent, so we do not write two correspondences.                        spondence ⟨ p1, p2, s ⟩ to be
Instead, we use a disjunction connective to make the sufficiency of                                             Pr(p2 | p1 ) − Pr(p2 )
one property or the other explicit. This is a strong correspondence,                                       s=                          .                (6)
                                                                                                                    1 − Pr(p2 )
and hence has a high strength of s = 0.9.
                                                                                    When Pr(p2 | p1 ) < Pr(p2 ), we redefine the correspondence to be
   Our definition of correspondences makes them explainable. Be-
                                                                                    between p1 and not p2 :
cause we can inspect which correspondences are activated, we can
                                                                                             Pr(not p2 | p1 ) − Pr(not p2 ) Pr(p2 ) − Pr(p2 | p1 )
describe the analogy. Beyond saying two things are alike, correspon-                                                         =                     .    (7)
dences encode precisely how the two things are alike; with strength,                                1 − Pr(not p2 )                      Pr(p2 )
we can justify how much they are alike. Thus, a description of how                  Both are undefined when Pr(p1 ) or Pr(p2 ) are 0 or 1. We consider this
and why an analogy was made could be explicated to the user: to                     an acceptable compromise, as this indicates either an ‘impossible’
represent summation, consider stacking the dots horizontally or                     property, or a ‘guaranteed’ property. Neither case is useful in our
vertically.                                                                         framework.
                                                                                        Informally, Equation (6) states the increase in probability of ob-
3.2     Property probabilities                                                      serving p2 relative to the maximum potential increase of observing
If we consider words to be the atomic units of written English, we                  p2 . The numerator is the difference between the informed and un-
can compute an occurrence probability derived from its frequency                    informed probability of p2 , while the denominator is the difference
in a particular corpus. Similarly for representations, the property is              between perfect knowledge of p2 and the uninformed probability
the atomic unit; each has an occurrence probability derived from its                of p2 . This balances the need to measure the difference in proba-
frequency in a particular corpus.2 Under a Bayesian interpretation,                 bility against the confounding effect of p2 already having a high
we can assign each property a prior probability. Thus each property                 probability. Similarly, Equation (7) informally states the decrease in
p in a representation is associated with a probability Pr(p).                       probability of p2 relative to the potential decrease of p2 .
    While having a baseline probability for individual properties is                    Correspondence strength is related to two existing concepts:
useful, context gives us further clues. For two analogous problems                  mutual information [3], and Kullback-Leibler (KL) divergence [9].
in different representations, knowing which properties are present                  Mutual information is a symmetric measure of how much infor-
in one will update our knowledge of the properties present in other.                mation is shared between two distributions. This symmetry makes
Returning to our example of adding integers, observing the oper-
                                                                  Í                 it inappropriate for our use-case: correspondences are not neces-
ator in Equation (1) primes us to expect stacking—whether horizon-                  sarily symmetric, one ‘direction’ can be stronger than the other.
tal or vertical—in Figure 1. The conditional probability Pr(p2 | p1 )               KL divergence is asymmetric, but not bound to the interval [0, 1].
expresses our knowledge about p2 after observing that p1 is present.                Because properties behave as Bernoulli random variables, we can
    We define Pr(p2 | p1 ) as per the usual definition, adapted to our              normalise the KL divergence to the interval [0, 1] with information
domain and corpus. For a representation A, we can write problems                    content: KL(Pr(p1 | p2 ) ∥ Pr(p1 ))/I(p1 ). But KL divergence forms
ai . Each problem can potentially be transformed into suitable prob-                a leaky abstraction. As we will see in Section 4, strength as de-
lems TB (ai ) in representation B. Note that TB (ai ) is a set; more than           fined in Equation (6) encapsulates relationships between property
one transformation might be appropriate. Using the count operator                   formulae only in terms of prior probabilities and strength; KL di-
#, and sat(p, q) as a predicate on whether the property formula p is                vergence requires calculations on posterior probabilities that are
satisfied by the properties in question q, we have                                  not necessarily available.

                    Í      
                            b j ∈ TB (ai ) | sat(p1, ai ) ∧ sat(p2, b j )
                                                                                    4     EXPLANATIONS
                      i#
   Pr(p2 | p1 ) =                                                         .   (5)   Correspondences are central to our representation recommendation
                               i # b j ∈ TB (ai ) | sat(p 1 , ai )
                             Í 
                                                                                    process, so we need to understand how they are discovered and how
                                                                                    they can be interpreted as explanations for the recommendation.
Transformations can be imperfect; we consider the transformation
appropriate if a human expert would be able to reach the same
                                                                                    4.1    Discovering correspondences
solution as under the original problem statement.
                                                                                    Consider the correspondence between numbers and dots, specifi-
                                                                                    cally, between 2 and ◦◦: these fill the same role in their respective
3.3     Strength                                                                    representations. Hence,
Correspondence strength must reflect the analogical similarity of
                                                                                            ⟨ (token, 2, {hasType = number}),
the constituent property formulae. Naively, this is the conditional                              (token, ◦◦, {hasType = dot-arrangement}), 1.0 ⟩
probability, but conditional probabilities are not directly compara-
ble: Is there a big difference to the prior probability? How much                   Where does this come from? For a human this is ‘obvious’, but
could the probabilities be different?                                               it must be deduced from somewhere. We propose five rules to
                                                                                    automatically discover correspondences split into three groups:
                                                                                    identity, correspondence-based, and representation-based. What
2 For now we assign prior probabilities based on expert knowledge. In future work
                                                                                    follows is a high-level summary; further technical details are in
we will compute such occurrence probabilities from large corpora of problems and    Appendix A. Figure 3 shows a diagrammatic interpretation of four
representations.                                                                    of the rules.
ExSS-ATEC’20, March 2020, Cagliari, Italy                                                                                                                          Stockdill et al.


                                                                                          Table 1: Some properties of the algebraic and dot represen-
                          a1                                c1                            tations. We will use these to build correspondences.
                                             b1
                                                       c2
                         a2                 b2
                                                                                                  Repr.                               Properties
                                                                                                  Algebraic                    (type, number, œ)
Figure 3: A diagram of three representations, each with two                                                           (token, 1, {hasType = number})
properties. Some properties are related through an attribute,
                                                                                                  Dot                    (type, dot-arrangement, œ)
the dashed arrow. Correspondences are solid arrows. Some
                                                                                                                  (token, ◦, {hasType = dot-arrangement})
example discoveries are: a 1b1 and b1c 1 generate a 1c 1 with
[cmp]; a 1b1 generates a 2b2 with rule [val]; a 2b2 generates a 1b1
with rule [atr]; and a 2b2 generates b2a 2 with rule [rev].                                  Because these two representations are disjoint—no properties
                                                                                          are equivalent, sharing a kind and value—the rule of identity [idy]
   The rule of identity states that two properties with the same                          cannot be used. We must provide an initial seed correspondence,
kind and same value are perfectly corresponding:                                          something the analyst might notice quickly. For this, we use
                              p1 ≡ p2                                                              ⟨ (token, 1, {hasType = number}),
                                              [idy]               (8)                                   (token, ◦, {hasType = dot-arrangement}), 1.0 ⟩
                          ⟨ p1, p2, 1.0 ⟩
where the relation ≡ is the string match on kinds and values. This                        That is, 1 corresponds perfectly to ◦. From this seed we can apply
rule implies that correspondences are reflexive with strength 1, and                      rules to generate new correspondences. First, we apply the rule of
allows naturally overlapping representations (e.g., algebra and a                         reversal to associate ◦ with 1:
tabular representation may reuse numbers) to map cleanly into one                             ⟨ (token, 1, {hasType = number}),
another.                                                                                           (token, ◦, {hasType = dot-arrangement}), 1.0 ⟩
                                                                                                                                                    [rev]
   Correspondence-based rules build new correspondences from                                       ⟨ (token, ◦, {hasType = dot-arrangement}),
existing ones. The first is the rule of reversal,                                                       (token, 1, {hasType = number}), 0.9 ⟩
                            ⟨ p1, p2, s ⟩                                                 Notice the strength reduced slightly: this is because in the catalogue
                                             [rev]                (9)
                           ⟨ p2, p1, s ′ ⟩                                                a 1 occurs more often than a ◦. (In this case, ◦◦ is not two ◦s, they
where                                                                                     are different dot arrangements.)
                               Pr(p1 )       1 − Pr(p2 )                                     Using the correspondence between ◦ and 1, we can discover
                   s′ = s ·                ·             .       (10)
                             1 − Pr(p1 )        Pr(p2 )                                   correspondences between their attributes. Applying the rule of
This allows us to ‘walk backwards’ along a correspondence. The                            attributes, we have a correspondence between numbers and dot
second is the rule of composition,                                                        arrangements. The derivation is shown in Equation (14) on the
                      ⟨ p1, p2, s ⟩ ⟨ p2, p3, s ′ ⟩                                       following page, where we abuse the attribute notation to state the
                                                    [cmp]            (11)                 attributes share a label.
                            ⟨ p1, p3, s · s ′ ⟩
                                                                                             We can continue applying rules in this way to discover new
allowing us to chain together correspondences. Note that due to                           possible correspondences. A richer set of properties would allow
independence assumptions, the correspondence between p1 and p3                            for more rule applications and more discoveries.
might be stronger than calculated; the product s ·s ′ is a conservative
suggestion.                                                                               4.2      Correspondences as explanations
   Finally, representation-based rules exploit the internal structure
                                                                                          Understanding both the definition and source of correspondences,
of each representation to suggest new ‘parallel’ correspondences,
                                                                                          we can interpret their function in making a recommendation. Cor-
as illustrated in Figure 3 by a 1b1 and a 2b2 . The two rules are dual:
                                                                                          respondences provide two modes of explanation: descriptive expla-
the rule of attributes
                                                                                          nation, and constructive explanation.
           ⟨ p1, p2, s ⟩ attr(p1, l, e 1 ) attr(p2, l, e 2 )
                                                               [atr] (12)                    A descriptive explanation is useful when two structures are con-
                               ⟨ e 1, e 2, s ⟩                                            sidered the same; for example, our statements about summing num-
and the rule of values                                                                    bers in Equation (1) and arranging dots in Figure 1 are analogous.
           ⟨ e 1, e 2, s ⟩ attr(p1, l, e 1 ) attr(p2, l, e 2 )                            How they are analogous might be unclear, but the correspondences
                                                               [val] (13)
                               ⟨ p1, p2, s ⟩                                              that link them form an explanation. Consider our correspondence
where attr(p, l, e) is a predicate asserting that property p has an                                            ⟨ (token, 1, a 1 ), (token, ◦, a 2 ), 1.0 ⟩
attribute with label l and entry e. These rules allow us to use an
                                                                                          linking 1 and ◦. Or consider the more sophisticated correspondence
‘internal relationship’ (encoded with attributes) such as type infor-                                                    Í
                                                                                          from Equation (4) linking the operator with stacking. By inform-
mation to build inter-representational correspondences.
                                                                                          ing the user that these are the strongest correspondences which
   Let us explore these rules in action using our examples of al-
                                                                                          are satisfied by both the source and target statements, we explain
gebra and dots. Table 1 lists a subset of the properties from each
                                                                                          how they are analogous.
representation.3
3 Taking this table as the universe of properties, there are eight potential correspon-   final as an exercise for the reader; there are at least two different derivations of the
dences, four of which are meaningful. Here we provide one, derive two, and leave the      final correspondence.
Cross-domain Correspondences for Explainable Recommendations                                                       ExSS-ATEC’20, March 2020, Cagliari, Italy


                ⟨ (token, ◦, {hasType = dot-arrangement}), (token, 1, {hasType = number}), 1.0 ⟩          hasType = hasType
                                                                                                                                     [atr]               (14)
                                          ⟨ (type, number, œ), (type, dot-arrangement, œ), 1.0 ⟩


   Conversely we can consider constructive explanations. If a target
                                                                                 (type, title, œ)     (type, character, œ)        (type, actor, œ)
structure is not known, we can use correspondences to describe
what properties the analogous statement should have. Consider                              (token, Blade Runner, {isThe = title})
again Equation (1), our algebraic problem statement, but this time                        (token, Harrison Ford, {isThe = actor})
without knowing the equivalent statement in dots. Using the same                  (token, Rick Deckard, {isThe = character,
correspondences from our descriptive explanation, we can suggest                                         playedBy = Harrison Ford})
to a student that this problem might be solved by representing 1
as ◦, and to use either vertical or horizontal stacking to convey
     Í
the . Such hints can support progress through the problem, and                 Figure 4: Example properties for a Films category.
potentially reveal to the student deep insights into numbers and
summation.
   Contrast our correspondences with alternative approaches: log-               (type, title, œ)     (type, character, œ)        (type, writer, œ)
ical or statistical recommendations [1]. Logical approaches are                         (token, The Caves of Steel, {isThe = title})
restrictive, requiring strong guarantees about the domains while                          (token, Isaac Asimov, {isThe = writer})
failing to capture the inherently fuzzy nature of connections people
make between domains. In particular, analogies which are merely                         (token, Elijah Baley, {isThe = character})
‘good enough’—in the sense that they have the shape of similarity
without being rigorously correct—cannot be reasoned with formally.             Figure 5: Example properties for a Books category.
Statistical approaches solve the problems associated with logical
approaches, but obscure the underlying reasoning. Without any
internal structure, the suggestions are opaque to the user: there is     attributes4 preserved [5]. Indeed, Gentner defines an analogy to
no justification or support on why this is analogous, and thus a         be ‘a comparison in which relational predicates, but few or no
good recommendation. Correspondences aim to allow a degree of            object attributes, can be mapped from base to target’ [5]. Our rules
uncertainty through strength while retaining sufficient formality        address this with the ‘hasType’ attribute: through the types of
through properties to provide valuable explanations.                     patterns or relation-tokens, we can extract these relation mappings.
                                                                         The extended type system accommodates higher-order relations.
                                                                            Our property framework can be extended beyond typing rela-
5     DISCUSSION                                                         tions with other attributes, which act as meta-relations; these are
Correspondences are one part of our representation recommenda-           automatically used by the existing discovery rules.
tion pipeline. Previous work has shown the recommendations in
general are meaningful [13]. But it is worth analytically examining      5.2     Generality
the quality of correspondences in isolation. In this section we con-    Correspondences, and the rules to discover them, were developed
sider correspondences and the discovery rules with respect to their     within our rep2rep representation recommendation framework. But
theoretical motivation and their generality. A discussion about the     the underlying idea—that analogical similarities across domains are
theoretical limitations of correspondences is in Appendix B.            discoverable—abstracts to a more general setting. We now define
                                                                        the necessary components to apply correspondences, and gener-
                                                                        alise the discovery rules on these components (full details are in
5.1    Cognitive grounding                                              Appendix C).
                                                                           First, we can observe that the rules [rev] and [cmp] are context-
Human reasoning takes many forms, but broadly we can describe
                                                                        independent, so we can leave them alone. Second, we observe that
reasoning as deductive, inductive, or abductive [10]. The simplest
                                                                        ‘properties’ can be made opaque; instead we assume only a set of
to automate—captured by the identity, reversal, and transitivity
                                                                        atoms. The [val] and [atr] rules relied on attributes from prop-
rules—is deductive reasoning. Through operations on existing cor-
                                                                        erties, so must be replaced. Instead, we define a single rule [rel],
respondences we can deduce new correspondences. Thus, corre-
                                                                        which is the same except for replacing attr(p, l, e) with R(p, e) for
spondences form a sort of logic.
                                                                        some relation R on atoms p and e. Finally, we keep the equivalence
   While the rules of identity, reversal and transitivity are moti-
                                                                        relation ≡ from the [idy] rule; for each setting define it as a special
vated by deductive reasoning, attribute- and value-correspondence
                                                                        equivalence relation on the atoms. In this way we leave behind
rules are motivated by inductive reasoning. Specifically, we assume
                                                                        the context of representation selection, and instead have a general
a meaningful structure must be preserved, and this structure is
                                                                        correspondence framework for any domain.
exposed through relationships. By observing and following these
relationships, we can infer new correspondences.                         4 Attribute here is used in the manner of Gentner [5]; these are not what we define
   Foundational work by Gentner explored how analogical power is         as attributes. All other uses of attribute outside direct quotes are according to our
proportional to the relations preserved, compared to the superficial     definition of attribute.
ExSS-ATEC’20, March 2020, Cagliari, Italy                                                                                                              Stockdill et al.


    To demonstrate this generalisation, we can encode the problem of        REFERENCES
recommending a product to a customer, rather than a representation           [1] J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. 2013. Recommender
to a student. Our sets of atoms are no longer representations, but               systems survey. Knowledge-Based Systems 46 (2013), 109–132.
                                                                             [2] Joan Condell, John Wade, Leo Galway, Michael McBride, Padhraig Gormley,
product categories: books and films, for example. We use the rep2rep             Joseph Brennan, and Thiyagesan Somasundram. 2010. Problem solving tech-
property structure to help us manage the encoding, so our atoms                  niques in cognitive science. Artificial Intelligence Review 34, 3 (Oct 2010), 221–234.
                                                                             [3] Thomas M. Cover and Joy A. Thomas. 2005. Elements of Information Theory. John
remain (kind, value, attributes). Our kinds are limited to types and             Wiley & Sons, Ltd, Hoboken, NJ, USA.
tokens, but the values are diverse, ranging over names, places and           [4] Brian Falkenhainer, Kenneth D. Forbus, and Dedre Gentner. 1989. The structure-
dates. We keep our relations as attributes, but use new labels such as           mapping engine: Algorithm and examples. Artificial Intelligence 41, 1 (1989),
                                                                                 1–63.
‘isThe’, or ‘playedBy’. The equivalence relation ≡ is string equality        [5] Dedre Gentner. 1983. Structure-mapping: A theoretical framework for analogy.
on kinds and values. We give some sample properties in Figures 4                 Cognitive Science 7, 2 (1983), 155–170.
and 5.                                                                       [6] Dedre Gentner. 2002. Analogy in Scientific Discovery: The Case of Johannes Kepler.
                                                                                 Springer US, Boston, MA, USA, 21–39.
    In our simple example, we can apply the rule of identity to              [7] Naomi Goldblum and Shifra Glick. 2001. The Brain-Shaped Mind: What the Brain
title, and also to character, because these two categories overlap.              Can Tell Us About the Mind. Cambridge University Press, Cambridge, UK.
                                                                             [8] Tom Hope, Joel Chan, Aniket Kittur, and Dafna Shahaf. 2017. Accelerating
From this we can apply the rule of value using the ‘isThe’ attribute             Innovation Through Analogy Mining. In Proceedings of the 23rd ACM SIGKDD
and character correspondence, thus deducing that Rick Deckard                    International Conference on Knowledge Discovery and Data Mining (KDD ’17).
corresponds to Elijah Baley. The derivation is:                                  ACM, New York, NY, USA, 235–243.
                                                                             [9] S. Kullback and R. A. Leibler. 1951. On Information and Sufficiency. The Annals
                                                                                 of Mathematical Statistics 22, 1 (1951), 79–86.
    Let p1 = (token, Rick Deckard, {isThe = character,                      [10] Gerhard Minnameier. 2010. The logicality of abduction, deduction, and induction.
                                                                                 In Ideas in action: Proceedings of the applying Peirce conference. Nordic Pragmatism
                                            playedBy = Harrison Ford})           Network Helsinki, 239–251.
                                                                            [11] Daniel Moody. 2009. The “Physics” of Notations: Toward a Scientific Basis for
    and p2 = (token, Elijah Baley, {isThe = character}) in                       Constructing Visual Notations in Software Engineering. IEEE Transactions on
                                                                                 Software Engineering 35, 6 (2009), 756–779.
        ⟨ (type, character, œ), (type, character, œ), 1.0 ⟩                 [12] Johanna D Moore. 1993. What Makes Human Explanations Effective?. In Pro-
                                                                                 ceedings of the 15th Annual Conference of the Cognitive Science Society. Lawrence
     attr(p1, isThe, character)        attr(p2, isThe, character)                Elbaum Associates, Hillsdale, NJ, USA, 131–136.
                                                                    [val]   [13] Daniel Raggi, Aaron Stockdill, Mateja Jamnik, Grecia Garcia Garcia, Holly E. A.
                             ⟨ p1, p2, 1.0 ⟩                                     Sutherland, and Peter C.-H. Cheng. 2019. Inspection and Selection of Representa-
                                                                                 tions. In Intelligent Computer Mathematics, Cezary Kaliszyk, Edwin Brady, Andrea
                                                                                 Kohlhase, and Claudio Sacerdoti Coen (Eds.). Springer International Publishing,
Over such constrained property sets this is trivial, but exemplifies             Cham, 227–242.
the application of our framework to a new domain.                           [14] JA Robinson. 1971. Computational logic: the unification algorithm. Machine
                                                                                 Intelligence 6 (1971), 63–72.
                                                                            [15] Iris Vessey. 1991. Cognitive Fit: A Theory-Based Analysis of the Graphs Versus
                                                                                 Tables Literature. Decision Sciences 22, 2 (1991), 219–240.
6     SUMMARY AND CONCLUSIONS                                               [16] Rina Zazkis and Peter Liljedahl. 2004. Understanding Primes: The Role of Repre-
                                                                                 sentation. Journal for Research in Mathematics Education 35, 3 (2004), 164–186.
We introduced and motivated a theory of correspondences: relation-
ships between property formulae which ‘fill the same role’ across
representations. Our theory of correspondences is grounded in               A      DISCOVERY RULES
propositional logic and probability, with rules for extending the           In this paper we introduced five rules to discover new correspon-
set of correspondences in an interactive loop with human analysts.          dences. Here we build on that brief overview.
Our analysis demonstrates the cognitive basis of these discovery
rules, and the generality of correspondences.                               A.1       Attribute and value
   We described correspondences’ use in building analogies for
                                                                            Attributes are used to suggest new correspondences based on ex-
problem solving through a simple counting problem, and sketched
                                                                            isting correspondences. Looking again at our example, we might
how the same frameworks could be adapted to a domain such as
                                                                            state that a 2 from our algebraic representation is like a ◦◦ from
product recommendation. Further applications include automated
                                                                            our dots representation. That is, we have the correspondence
and interactive theorem proving, business visualisation tools, di-
agnostic and reporting systems, and many other domains where                         ⟨ (token, 2, {hasType = number}),
clear communication with justification is paramount.                                      (token, ◦◦, {hasType = dot-arrangement}), 1.0 ⟩
   In future work we will explore generalisations of rules to work          The two corresponding properties both have entries for the ‘hasType’
over formulae, automatic description generation from correspon-             attribute; perhaps those entries themselves correspond? This is the
dences, and exploring the philosophical position of correspondences         first rule for correspondence discovery:
in knowledge representation.
                                                                                         ⟨ p1, p2, s ⟩     attr(p1, l, e 1 )    attr(p2, l, e 2 )
                                                                                                                                                    [atr]        (15)
                                                                                                             ⟨ e 1, e 2, s ⟩
ACKNOWLEDGMENTS
Aaron Stockdill is supported by the Hamilton Cambridge Interna-             where attr(p, l, e) is a predicate asserting that property p has an
tional PhD Scholarship. This work was supported by the EPSRC                attribute with label l and entry e. Similarly we can define the dis-
Grants EP/R030650/1 and EP/R030642/1. We wish to thank the four             covery rule over matching attributes:
anonymous reviewers, whose comments helped to improve the                                ⟨ e 1, e 2, s ⟩   attr(p1, l, e 1 )    attr(p2, l, e 2 )
                                                                                                                                                    [val]        (16)
presentation of this paper.                                                                                  ⟨ p1, p2, s ⟩
Cross-domain Correspondences for Explainable Recommendations                                                                        ExSS-ATEC’20, March 2020, Cagliari, Italy


which is identical, but with the property (p) and entry (e) corre-                          Importantly, s does not necessarily equal s ′ : in one direction, the
spondences swapped.                                                                         correspondence might be strong, while the reversed direction might
   By way of example, using our 2/◦◦ correspondence and the [atr]                           be weaker. This is analogous to Bayes’ Theorem: if it is raining, I
rule, we can derive a correspondence between number and dot                                 can be very confident the grass is wet; if the grass is wet, I might
arrangements. The original properties are in correspondence, they                           suspect it is raining but cannot be sure.
both have an attribute with a common label, and so we conclude                                 To determine the reversed strength s ′ , we return to the definition
that the values of those labels also correspond.                                            of correspondence strength, Equation (6). We can show that
   These rules, and in particular the [val] rule, expose the limi-                                                          Pr(p1 )     1 − Pr(p2 )
tations of this system:5 such a rule will generate many nonsense                                               s′ = s ·               ·             .                   (18)
                                                                                                                          1 − Pr(p1 )     Pr(p2 )
results. All numbers have the type ‘number’, and all dot arrange-
ments have the type ‘dot-arrangement’, but we do not want the                               That is, the reversed correspondence strength s ′ is the original
Cartesian product of numbers and dot arrangements as correspon-                             correspondence strength s modulated by the ratio between the
dences! While unfortunate, this rule redeems itself when consider-                          probability of each property formula being satisfied or not. Our
ing more complex types. Consider the binary operator +, a token                             assumption that Pr(p2 | p1 ) ≥ Pr(p2 ) guarantees s ′ will be between
in the symbolic algebra representation; it has type                                         0 and 1.
                                                                                               Similarly to reversing correspondences, we can compose cor-
                       number × number → number.                                            respondences. If we have that p1 corresponds to p2 , and p2 corre-
Consider also stacking dots, a pattern in the dots representation; it                       sponds to p3 , then we can conclude that p1 in some way corresponds
has type                                                                                    to p3 .
                                                                                                               ⟨ p1, p2, s ⟩ ⟨ p2, p3, s ′ ⟩
     dot-arrangement × dot-arrangement → dot-arrangement.                                                                                    [cmp]            (19)
                                                                                                                     ⟨ p1, p3, s · s ′ ⟩
Given the correspondence between numbers and dot arrangements,                                 Consequently the strength of composed correspondences is
automatically associating these is a valuable contribution to the                           monotonically decreasing; longer chains result in weaker corre-
correspondence set.6                                                                        spondences. This is a conservative estimate which results from
   Extending these rules to work on correspondences which contain                           the independence assumption: p1 and p3 might be more strongly
property formulae is conceptually subtle, but in practice simple.                           corresponding, but this is expert knowledge which the analyst must
Rather than requiring the antecedent correspondence to be ex-                               provide.
actly between the properties or attributes under consideration, it                             While the reversal rule generalises to formulae trivially, the
is sufficient that there be an implication relationship between the                         composition rule is more subtle. Specifically, we can move from re-
property formulae in the correspondence, and the properties or                              quiring exact equality on property p2 to requiring only implication.
attributes in the representation: from property to correspondence                           That is, we can rewrite the composition rule as
on the left, and from correspondence to property on the right. For                                          ⟨ p1, p2, s ⟩ p2 → p2′         ⟨ p2′ , p3, s ′ ⟩
example, we might use correspondences that contain the or and                                                                                                  .
                                                                                                                          ⟨ p1, p3, s · s ′ ⟩
and connectives in the [atr] rule:
                                                                                            Knowing more about the properties and the representations these
       ⟨ p1 or pX , p2 and pY , s ⟩ attr(p1, l, e 1 )            attr(p2, l, e 2 )          formulae are acting over would allow even greater control: we
                               ⟨ e 1, e 2, s ⟩                                              could move any unsatisfied properties from p2′ to p1 , for example.
More generally, if we have a property pa , and a correspondence                             But without careful representation checks this would build repre-
containing the left property formula pa′ , we can still use the [atr]                       sentation formulae which are heterogeneous—they could only be
and [val] rules if and only if pa → pa′ . Conversely, if we have a                          satisfied by a blended representation. This is beyond the scope of
property pb , and a correspondence containing the right property                            the project.
formula pb′ , we need pb′ → pb .
                                                                                            A.3     Identity
A.2       Reversal and composition                                                          The final correspondence discovery rule is identity:
Representation tables are not the only source of information that                                                     p1 ≡ p2
                                                                                                                                   [idy]                                (20)
we can use to suggest correspondences. The set of correspondences                                                  ⟨ p1, p2, 1.0 ⟩
itself can be used to discover correspondences through two rules:
                                                                                            where p ≡ p ′ is true if and only if p and p have identical kinds and
reversal, and composition.
                                                                                            values. This rule of identity links representations that overlap; we
   A reversed correspondence is exactly as expected: if we know
                                                                                            know that this is a correspondence between the two representations.
p1 corresponds to p2 , then we can be infer that p2 corresponds to
                                                                                            This can help bootstrap the other discovery rules: if properties have
p1 as well.
                                                                                            the same kind and value, but their attributes are different, then we
                          ⟨ p1, p2, s ⟩                                                     can apply the attribute rule, [atr], for example.
                                          [rev]               (17)
                          ⟨ p2, p1, s ′ ⟩

5 We will explore this further in Appendix B.
                                                                                            A.4     Order and strength
6 We have extended the rules in (15) and (16) to use unification [14] (treating all types   These rules are applied repeatedly until either the analyst is satisfied,
as unique type variables) for the Hindley-Milner–style types.                               or there are no more suggestions. But the rules have no inherent
ExSS-ATEC’20, March 2020, Cagliari, Italy                                                                                                          Stockdill et al.


order, nor are correspondence derivations unique; the same cor-         C     GENERALISED RULES
respondence can be discovered multiple times, and with different        In Section 5, we briefly discussed how the correspondence frame-
strengths. We leave it to the analyst to decide which strength is       work generalises to domains outside problem solving.
appropriate, but if they do decide to update the strength we have a         Thus the overall structure required for correspondences is (S, Pr(· |
problem: any correspondence previously derived from the updated         ·), ≡, ∼). Set S consists of triples S = (AS , R S , PrS ). Let AS be a set
correspondence will in turn need its strength updated. This need        of atoms, R S a set of relations on A2S , . . . , AkS , and PrS : AS → [0, 1]
propagates, carrying updated strengths through the correspondence       be a probability function. Further, define a function Pr(· | ·) :
set.                                                                    AS × AT → [0, 1], where the subscripts S and T refer to tuples
   To address this, we maintain a dependency graph of derived           in S. Similarly we define an equivalence relation ≡ : AS × AT .
correspondences to track which correspondences to update. We            We also require an equivalence relation ∼: R S × RT between the
also use this graph to avoid cycles—ensuring a correspondence is        relations of each triple.
not used to derive itself.                                                  With this generalised structure, we can update our discovery
                                                                        rules for correspondences.7 Reversal and transitivity remain un-
                                                                        changed, as they are independent of the structure. Identity is simple:
B     INCOMPLETENESS AND UNSOUNDNESS                                                                   a ≡b
                                                                                                                    [idy]                       (21)
Viewed as an analyst-support tool, our implementation of corre-                                     ⟨ a, b, 1.0 ⟩
spondence discovery is ‘complete’, or more accurately saturating.       So long as a and b are equivalent, then we can put them in cor-
If there exists a correspondence which can be discovered by these       respondence automatically. The equivalence relation should be
rules, then our utility will suggest it to the analyst. The tool uses   simple; a typical implementation would be equality.
an unpruned graph search over states of the correspondence set,             The two remaining discovery rules—for attributes, and for values—
with edges defined by applying the rules to the correspondence set.     collapse down to a single rule. We define the rule on relations as
    When considering the completeness and soundness of the rules,                      r ∼ r ′ r (a 1, . . . , an ) r ′ (b1, . . . , bn )
we consider the broader space of valid correspondences. No condi-                        ⟨ a 1 , b 1 , s 1 ⟩ · · · ⟨ a k , bk , s k ⟩
tions are both necessary and sufficient for a valid correspondence—                                                                            [rel]         (22)
such conditions would constitute general knowledge—so our rules                   ⟨ ak +1, bk+1, s ′ ⟩           ···      ⟨ a n , bn , s ′ ⟩
will violate one or both of completeness and soundness. Although        where we compute s ′ by
both are violated, we have erred towards avoiding generating a lot                                                    k
                                                                                                                 1 Õ
of unsound correspondences. This trade-off avoids overwhelming                                          s′ =             si .                                (23)
                                                                                                               n − 1 i=1
the analyst with many meaningless correspondences, at the cost of
potentially missing rare-but-insightful correspondences.                This rule states that if the equivalent predicate is satisfied by some
    Which correspondences get discovered (completeness) depends         number of corresponding atoms, then the remaining atoms may
on the starting set of correspondences. Assuming an empty set           also correspond. We only consider the case where k ≥ 1.
with totally disjoint representations—that is, there are no prop-          Note in s ′ the difference between the sum limit k and the fraction
erties with identical kinds and values—the search is immediately        denominator n − 1. This reflects the proportion of the predicate
terminated. Even beyond this degenerate case, correspondences           already satisfied: if the predicate is almost complete, the strength
which are ‘isolated’ are still unreachable.                             should be closer to that of the antecedent correspondences; if the
    Reaching these unreachable correspondences in a principled          predicate is mostly inferred, the strength should be appropriately
way is difficult. Assuming our descriptions of representations are      weaker. The normalising term n − 1 occurs because we fix k ≥ 1,
complete, we can generate the countably infinite stream of corre-       leaving at most n − 1 degrees of freedom. In the binary case, this is
spondences; then all such isolated correspondences are reachable.       a copy of the antecedent correspondence strength, s ′ = s.
But the correspondences generated will be mostly meaningless,              Much like in the special case of attributes and values, we do not
leaving the analyst in no better position than when they started.       need a literal equality between the atoms in the relation and the
Thus we will not attempt completeness until stricter correspon-         atoms in the correspondences. We need only that for atom ai in the
dence validity conditions are discovered.                               relation r and property formula ai′ in the left of the correspondence
    Conversely, we are unable to eliminate incorrect (unsound) cor-     that ai → ai′ , and for the atom bi in the relation r ′ and property
respondences from our suggestions. This is primarily an issue of        formula bi′ in the right of the correspondence that bi′ → bi .
meaning: the tokens and types in a representation are given sym-
bols that are meaningful to an analyst. When two symbols occur
in the same relations, they are effectively indistinguishable from
synonyms. Using types for discovery presents an obvious example:
both 2 and 76 have the same type, number, so when presented with
◦◦—of type dot arrangement, known to correspond to number—do
we create a correspondence with 2, 76, both, or neither? For such
simple examples, domain-specific knowledge can act as a heuristic,
but more generally we hit the ‘general knowledge’ problem: how          7 In the following rules, we abstract away the matching on S , T ∈ S , etc. We assume
do we know, a priori, which correspondence is more meaningful?          that a ∈ AS , b ∈ AT , r ∈ R S , and r ′ ∈ RT .