=Paper= {{Paper |id=None |storemode=property |title=Compact Crossbar Variable Binding for Neuro-Symbolic Computation |pdfUrl=https://ceur-ws.org/Vol-764/paper04.pdf |volume=Vol-764 |dblpUrl=https://dblp.org/rec/conf/nesy/PinkasLC11 }} ==Compact Crossbar Variable Binding for Neuro-Symbolic Computation== https://ceur-ws.org/Vol-764/paper04.pdf
Compact Crossbars of Multi-Purpose Binders for Neuro-Symbolic Computation

                                         Work in Progress Report


              Gadi Pinkas                        Priscila M. V. Lima                      Shimon Cohen
          Center for Academic                   Universidade Federal                    Center for Academic
            Studies, Israel                    Rural do Rio de Janeiro                    Studies, Israel
                                                 Seropédica, Brazil
            gadip@mla.ac.il                                                           shamon51@gmail.com
                                                 priscilamvl@ufrrj.br
                     Abstract
We present a compact – yet expressive – Multi-                 1   Introduction
purpose, distributed binding mechanism, which is
useful for encoding complex symbolic knowledge                 1.1 The Binding Problem
and computation, using Artificial Neural Networks              Human cognition is capable of producing combinatorial
(ANNs) or using Satisfiability (SAT) solvers. The              structures. The general binding problem concerns how items
technique is demonstrated by encoding unrestricted             that are encoded in distinct circuits of a massively parallel
First Order Logic (FOL) unification problems as                computing device (such as the brain or ANN) can be com-
Weighted Max SAT problems and then translating                 bined in complex ways for perception, reasoning or for ac-
the later into ANNs (or learning them). It is capa-            tion [Feldman 2010]. Consider for example, a planning
ble of capturing the full expressive power of FOL,             problem, where the task is to pick up an object and move it
and of economically encoding a large Knowledge                 from its current position to another place. In order to meet a
Base either as long term synapses or as clamped                goal, a “brain”-like device, must be able to represent the
units in Working Memory. Given a goal, the mech-               object, its properties, its position and the ways to manipulate
anism is capable of retrieving from the synaptic               it, in such a way that the goal is achieved. The object and its
knowledge just what is needed, while creating nov-             properties must be bound together, and this rather complex
el, compound structures in the Working Memory.                 structure should also be used in conjunction with other enti-
Two levels of size reduction are shown. First, we              ties and rules, such as the action consequences (e.g., moving
build a Working Memory, using a pool of multi-                 X from Y to Z clears position Y while occupying position
purpose binders, based on the assumption that the              Z). In another example, consider the sentence: “Sally ate”:
number of bindings that are actually needed is far             In language processing, the verb “EAT” is a predicate with
less than the number of all theoretically possible             at least two roles - EAT(“Sally”,X). The noun “Sally”
bindings. The second level of compactness is due               should be bound to the first role, while an existentially
to the fact that, in many symbolic representations,            quantified variable (representing “something”) should be
when two objects are bound, there is a many-to-one             bound to the second role. Once we get the information that
relationship between them. This happens because,
frequently, either only one value is pointed by vari-
                                                               “Sally ate salad”, and knowing the rule: EAT(Y,X) DI-   ⇒
                                                               GESTED(X) we should reason that “the salad is digested”.
able or only one variable point to a value. A cross-           In order to do that, we must bind the variable X to the noun
bar binding network of n × k units with such re-               “salad”, while X must be bounded to both EAT(,X) and
striction, can be transformed into an equivalent               DIGESTED(X).
neural structure of size O(n log(k)). We show that,
for performing unrestricted FOL unifications, the
Working Memory created is only log dependent on                1.2 Connectionism and Variable Binding
the KB size; i.e., O(n log(k)). The variable binding
                                                               During the years, connectionist systems have been criticized
technique described is inherently fault tolerant as            for “Propositional Fixation” [McCarthy 1988]. In [Fodor,
there are no fatal failures, when some random neu-
                                                               Phylyshyn 1988] connectionism was criticized for lacking
rons become faulty and the ability to cope with
                                                               abilities to construct combinatorial representations and for
complex structures decays gracefully. Processing               performing processes that are sensitive to complex structure.
is distributed and there is no need for a central con-
                                                               Exactly how compositionality can occur is a fundamental
trol even to allocate binders. The mechanism is
                                                               question in cognitive science and the binding aspect of it has
general, and can further be used for other applica-            been identified as a key to any neural theory of language
tions, such as language processing, FOL inference              [Jackendoff 2002]. Several attempts have been made to ap-
and planning.
                                                               proach the variable binding problem in a connectionist




                                                          14
framework [Shastri, Ajjanagadde 1993], [Browne, Sun                       work; yet, for ANNs with symmetric weights (e.g. Hopfield,
2000], [Zimmer et al. 2006], [Van der Velde, Kamps,                       Boltzmann Machines, MFT) a simple conversion has been
Kamps 2006], [Barret et al. 2008], [Velik 2010]; yet, virtu-              shown for translating any Weighted MAX SAT problem
ally all these suggestions, have limitations, related to either           into symmetric ANN and vice-versa [Pinkas, 1991]. Any
limited expressiveness, size and memory requirements, cen-                such SAT problem could be compiled into an ANN, which
tral control demands, lossy information, etc.                             performs stochastic gradient descent on an energy function
                                                                          that basically counts the number of unsatisfied logical con-
For example, compositionality can be provided using Hol-                  straints. The size of the generated network is linear in the
lographic Reduced Represenations [Plate 1995]; however,                   size of the original formula, though additional hidden units
the convolution operation used, is lossy and errors are intro-            may be required. In addition to compilation, the logical con-
duced as structures become more complex or as more opera-                 straints of a network could be PAC learnt using Hebbian-
tions are done. The BlackBoard Architecture [Van der                      like rule [Pinkas 1995], thus, for small-size constraints, a
Velde, Kamps, Kamps 2006] can form complex structures                     network can efficiently learn its weights and structure from
but does not manipulate those structures to perform cogni-                a training set that is composed of the satisfying models. The
tion. Shastri’s temporal binding has only limited FOL ex-                 performance efficiency of this neural mechanism can be
pressiveness and no mechanism for allocating temporal                     attributed to the similarities of symmetric ANNs to stochas-
binders. Finally, all the above systems need neurons in                   tic local search algorithms, such as WALKSAT [Kautz et al
numbers that is at best linear in the KB; while some use                  2004]. Due to the tight relationship between ANNs and
much more neurons than that.1 For FOL compositionality in                 Weighted Max SAT, our methodology is to specify an ANN
ANNs see [Ballard 1986], [Pinkas 1992], [Shastri 1999],                   designed for certain symbolic computation (e.g. unification),
[Lima 2000], [Garcez, Lamb 2006]. For partial-FOL encod-                  using a set of Boolean variables (the visible units) and a set
ings in Satisfiability, see [Domingos 2008], [Clark et al.                of constraints; i.e., Boolean formulae designed for restrict-
2001].                                                                    ing the values of the visible units. The constraints specified
                                                                          are used to force the visible units to converge to a valid so-
The ability to represent combinatorial structures and reason-             lution that satisfies as many (weighted) formulae as possi-
ing with them, still presents challenges to theories of neuro-            ble. We have written a compiler that translates such specifi-
cognition [Marcus 2001], while the variable binding prob-                 cations into either weighted CNF (for Weighted Max SAT
lem is fundamental to such ability [Feldman 2010].                        Solvers) or for ANN with symmetric weights.

                                                                            We believe that our fault tolerant mechanism and meth-
1.3 Unification                                                           ods for dynamically forming recursive structures will scale
In conventional computing, unification is a key operation                 and be useful for both the engineering of massively parallel
for realizing inference, reasoning, planning and language                 devices, and for modeling of high-level cognitive processes.
processing. It is the main vehicle for conventional symbolic
systems to match rules with facts, or rules with other rules.             2   Improving CrossBar Binding
In unification, two or more distinct hierarchical entities
                                                                          The simplest, most naïve binding techniques is CrossBar
(terms) are merged, to produce a single, unified, tree-like               binding. The term was mentioned in [Barrett et al. 2008],
structure. This unified structure adheres to the constraints of
                                                                          yet it was intuitively used by many connectionist systems in
both the original entities. Formally, unification is an opera-
                                                                          the past [Ballard 1986], [Anandan 1989] and in many SAT
tion which produces from two or more logic terms, a set of                reductions; e.g., [Kautz, at el 2006]. Formally, we define
substitutions, which either identifies the terms or makes the
                                                                          crossbar binding as a Boolean matrix representation of a
terms equal modulo some equational theory. For connec-
                                                                          relation between 2 sets of items, using Characteristic Matrix
tionist approaches to unification see [Hölldobler 1990],                  of the relation; i.e., if A contains m objects and B contains n
[Weber 1992], [Komendantskaya 2010].                                      objects, then the characteristic matrix R has m lines and n
   For easiness of reading, we have chosen to demonstrate
                                                                          columns, containing m × n Boolean variables (neurons). We
our compact variable binding mechanism on the more fun-                   say that item i is bound to item j iff R(i,j)=1. In this naïve,
damental unification function, rather than on full FOL infer-
                                                                          binding mechanism, a neuron should be allocated for each
ence.
                                                                          possible binding, and all theoretic combinations of two
1.4 Artificial Neural Networks and SAT                                    items must be pre-enumerated as rows and columns of the
                                                                          matrix. A crossbar matrix, that needs to represent a complex
ANNs may be seen as constraint satisfaction networks,                     tree or a graph, must bind together not just simple constitu-
where neuron-units stand for Boolean variables, and where                 ents, but all the compounded entities representing partial
the synapse weights represent constraints imposed on the                  trees (or sub-graphs). It is possible to represent a FOL KB
variables. Any ANN may be seen as such a constraint net-                  this way at the cost of using an enormous number of neu-
                                                                          rons, and with an extremely localist approach. Even more
    1
      The BlackBoard architecture uses billions of neurons to rep-        frustrating is the fact that this technique will not be suitable
resent thousands of atomic concepts; HRR Production systems               for dynamically creating novel, nested structures upon de-
[Stewart, Elliasmith 2008] needs about one million neurons.               mand. The number of theoretic bindings, for all possible tree




                                                                     15
structures, grows exponentially with the number of constitu-               representing only those graphs that are actually needed for
ent items and must be computed in advance.                                 computing the goal. It turns out that this approach is con-
We improve this simplistic binding mechanism in several                    sistent with cognitive theories, where a large KB is stored in
steps:                                                                     synapses (long term memory); and a smaller size working
                                                                           memory is used for retrieving only few KB items at a time.
2.1 Using Binders as “pointers” to form Graphs                             Only those items that are necessary to the process4 get to be
   First, we introduce2 a special kind of entities called Gen-             retrieved from the synaptic KB. For example, if our purpose
eral Purpose Binders (GPBs). GPBs are similar to pointers                  is to find a plan for a goal, expressed in FOL, we need to
except for the fact that a single GPB can point to several                 design the WM with enough binders to represent a valid
objects, as the crossbar paradigm permits implementation of                plan. We retrieve the facts and rules of the world from that
arbitrary relations between binders and objects. In the spe-               KB only if they are required by the plan we desire to make.
cial case where binders point to binders, arbitrary directed                  To implement a pool of binders for FOL unification, the
graphs can be built. In this scenario, we can interpret each               WM should contain three crossbar matrices: One for label-
GPB as a node in the graph, and the crossbar, as specifying                ing nodes by symbols (predicates, functions, constants). The
the arcs of the graph (adjacency matrix). In such graph in-                second is for nesting of the nodes in Graphs and labeling the
terpretation, each node may be labeled using a labeling                    arcs according to slots of the predicates and functions. The
crossbar, that ties together binders, with symbols such as,                third crossbar is for retrieving items from the long term
predicates, functions or constants in FOL. Arcs can also be                memory where the KB is stored (e.g., terms, literals or
labeled, as the binder-to-binder crossbar, may have a third                clauses). This third matrix ties a binder to a KB item and
dimension which relates one or more labels to each arc. This               triggers the constraints of that item to be activated so that
enables the formation of arbitrary complex graph structures,               the binder node is forced to assume the structure of the KB
that can be used to represent language constituents and in                 item retrieved. The mechanism starts working as goal acti-
particular, FOL terms, predicates, literals and clauses. Un-               vated constraints cause some binders to be tied to KB items
like in the naïve crossbar approach, unrestricted graphs can               and activate some KB constraints. Those constraints, in
be built directly out of simple constituents, with GPB as the              turn, activate other constraints, till the WM converges to a
mechanism for gluing them together.                                        valid solution. When we implement unification problems,
   Because the binders are general-purpose entities, we can                the size of the WM is O(n ×k) where n is the maximal num-
construct a working memory out of a pool of such binders.                  ber of nodes in a solution; k is the size of the KB and n<