=Paper=
{{Paper
|id=Vol-1419/paper0097
|storemode=property
|title=Connectionist Modeling of Part--Whole Analogy Learning
|pdfUrl=https://ceur-ws.org/Vol-1419/paper0097.pdf
|volume=Vol-1419
|dblpUrl=https://dblp.org/rec/conf/eapcogsci/GergelF15
}}
==Connectionist Modeling of Part--Whole Analogy Learning==
Connectionist modeling of part–whole analogy learning
Peter Gergeľ (peter.gergel@gmail.com)
Faculty of Mathematics, Physics and Informatics, Comenius University
Mlynská dolina, 84248 Bratislava, Slovak Republic
Igor Farkaš (farkas@fmph.uniba.sk)
Faculty of Mathematics, Physics and Informatics, Comenius University
Mlynská dolina, 84248 Bratislava, Slovak Republic
Abstract to ancient Greece. For example, “white is to black as cold is
to hot”, or “tree is to forest as department is to faculty”. The
Analogical reasoning, along with inductive, deductive or abduc-
tive reasoning, belongs to the fundamental human mechanisms next type we could consider is a metaphor which is a figure of
for the environment exploration, learning, or problem solving. speech that describes one object in terms of a second object:
Modeling this ability using computer simulations is important, e.g. “He has a heart of stone”.
as it might offer mechanistic explanation of these phenomena.
In this work, we focus on the part–whole analogies in a separa- In our paper, we focus on part–whole proportional analogies
tion task where the analogical objects between two scenes show where analogical objects (parts) between the scenes (wholes)
a mutual resemblance. In simulations, using a simple recurrent are the ones that share a common attribute; this could be the
network, we deal with the problem of geometrical analogies,
inspired by the Analogator model (Blank, 1997). The original same role, position, the same or very similar structure, or simi-
model had learning limitations hindering its full potential, com- lar sensory properties. One type of analogies in this category
bined with increased time and memory complexity. We propose are geometrical analogies in which two scenes comprising ge-
model modifications for removing these limitations which leads
to superior learning performance both in terms of speed and ometrical objects are presented and one scene contains the
accuracy. selected object. The idea is to select an object in the second
Keywords: geometrical analogies; part–whole relationship; scene which is analogical to the selected object in the first
simple recurrent network; learning
scene in some way. The concept of learning and analogy mak-
ing in people is still poorly understood, despite plethora of
Introduction existing computational models (Gentner and Forbus, 2011).
Analogical reasoning is considered to be the fundamental cog- Therefore, cognitive modelling is important as it may provide
nitive ability, which distinguishes humans from other animals.1 new hypotheses and explanations.
This mechanism enables to explain new concepts in terms of This type of geometrical analogy was dealt with by Evans
old ones, to emphasize some aspects of situations, to gener- (1968) using symbolic approaches and more recently by Lovett
alize, to characterize situations, to explain or describe some et al. (2009) who combined the sketch understanding program
new phenomena, to provide ideas on how to behave in new CogSketch (Forbus et al., 2008) and the symbolic model SME
circumstances, to understand new forms of humour, etc. (Falkenhainer et al., 1986). Connectionist solution to these
To create an analogy means to see an object or situation part–whole analogies was proposed by Blank (1997) whose
within one context as the same as different object or situation doctoral thesis served as the main inspiration for our work.
within another context. Hall (1989) defines this process as a More recent connectionist models have not dealt with this type
mapping between source domain which represents old, known of analogy. Blank designed Analogator model as a simple
situation onto target domain which represents new, unknown recurrent network that learns to reason analogically by see-
situation. Objects between domains that possess the same ing lots of analogical pairs. This represents quite a different
properties are mapped onto each other. Whole process consists approach, as many other models were explicitly designed for
of four steps: (1) source identification, (2) evaluation of degree analogical reasoning and they did not have to learn. However,
of similarity, (3) information transfer from source to target, (4) Analogator had its limits which prevented it from using its
consolidation. That is, when one starts making an analogy, whole potential.
the first step is to recall a similar situation from the past that
resembles the current situation (source identification). Then Geometrical analogies
one evaluates whether the resemblance is adequate (degree of Consider two scenes consisting of 3 objects with different
similarity) and if so, one tries to use the knowledge about the shapes and colors. In the first scene (source) exactly one object
past situation in the new situation (information transfer). The is selected (figure) which differs in some way from the remain-
outcome of this process is then remembered (consolidation). ing two (ground).2 The idea is to identify this unique property
There are many types of analogical reasoning with differ- and to choose an object in the second scene (target) that differs
ent histories. A classical type of analogy is considered the
proportional analogy “A is to B as C is to D”, that dates back 2 We use the terms “figure” and “ground”, even though this task
does not evoke the experience of perceiving the figure as closer to the
1 However, the evidence suggests that chimpanzees are also capable observer than the ground. Blank (1997) also used these terms in his
of analogical thinking to some degree (Gillan et al., 1981). model that we were inspired by.
587
from the other objects in the same way as the selected object Categorical analogies In this case the analogical object dif-
in the source. fers from the ground by color or by shape. In the source scene
the selected object differs by one property from the ground,
either in color or in shape while in the target scene it differs in
both properties. The reason for that is that we want the network
to notice this unique property in the source scene and then find
it in the target scene. The network should learn to identify
the required property and to ignore the rest. Example of this
Figure 1: Example of a geometrical analogy. categorical analogy in which analogical objects differ in color
is in Figure 3. Example of a categorical analogy where objects
Example of this problem is given in Figure 1 where on differ by shape is shown in Figure 1.
the left (source), one object is beforehand selected (see the
pointing arrow). On the right (target), the correct answer is
also shown (blue square). We have constrained the problem
to three different colors (red, blue, green) and three different
shapes (circle, square, triangle). Hence, every scene consists
of exactly three objects, which are placed in three possible
positions (out of four corners). In Figure 1, red triangle is Figure 3: Example of a categorical analogy where analogical
analogical to blue square because both differ in shape from objects differ in color.
the ground. Another possible example could be the pair of
pictures where analogical objects differ in color, or where the
Scene representation
target is the rotation of the source.
We have chosen two types of analogies to investigate (fol- In order to approach the task, we must also choose the data
lowing Blank): rotational (target is the rotated version of the representation for inputs and targets. Basically, there are two
source) and categorical (analogical objects differ in color or possible ways how to represent data: implicitly or explicitly.
shape from the ground). Both are explained in the following sections.
Rotational analogies In this type of analogy the target scene Implicit representation When using implicit representa-
is made by rotation of source scene by 90◦ , 180◦ , 270◦ or 0◦ (no tion, the structure of objects and their relations to others are
rotation applied) and changing the shape and color of objects. not explicitly defined. Instead, this representation only defines
In Figure 2, blue triangle bottom left is analogical to red square attributes of objects; it does not provide information about re-
top right, assuming the rotation 180◦ . Color and shapes are in lationships between attributes, neither which attributes belong
this type of analogy not considered because they could create to which objects. For example, when a geometrical object is
ambiguity. not defined by the set of points, but instead by the set of colors
in the bitmap. Motivation behind using implicit representation
is that in real world there are probably no explicit representa-
tions, since every percept is obtained implicitly. In order to
create an analogy, one has to extract required attributes and to
map them in a way that analogy is created. If the input was
explicit, extraction of attributes would no longer be required
and the problem of analogy creation would be much simpli-
Figure 2: Example of a rotational analogy.
fied. Mitchell and Hofstadter (1995) state that perception of
a situation is an important component in the process of creat-
Let us now evaluate the overall number of unique rotational ing analogies and it should be included in the computational
analogies. Because all objects on the scene have the same color models for analogy making. The fact that perception is a part
and shape, there are exactly 9 possible options for choosing of analogy making was one of the most important contribu-
one object (3 colors × 3 shapes). Then there are 4 possible tions of cognitive science in understanding analogy making in
positions on the setting of objects in the scene3 and 3 possible humans.
selections for the figure object. Therefore there are 9×4×3 = Blank was inspired by the work of Halford et al. (1994) who
108 unique source scenes and for every scene there are 9×4 = used tensor representation to implicitly represent predicates
36 target scenes. To sum up, we have 108×36 = 3888 unique and arguments.5 Blank represented the scene with geometrical
pairs of two scenes between which it is possible to find a rota-
tional analogy.4 constrain his experiments to using only one color and one shape in a
scene. We did so because we would like to use both rotational and
3 The number of all subsets of size 3 (3 objects) over the set of size categorical analogies as an input for the same model, and we needed
to avoid ambiguity in analogy making.
4 (4 options for putting objects in a scene) sums up to 43 = 4.
5 It is questionable whether this type of implicit representation is
4 Blank actually generated 46656 unique pairs because he did not cognitively plausible, though.
588
objects as follows: For every object he created a 2×2 matrix X
which defined the position of an object in the scene and a vector
y of length 6 to define object attributes. By using the outer
product (X ⊗ y) he obtained a tensor of order 3 (6 matrices). Figure 6: Explicit representation of green square.
Example of representing red square in top right corner can be
seen in Figure 4.
Figure 7: Explicit representation of the whole scene.
Figure 4: Tensor representation for red square top right.
ated for each object and all vectors are simply concatenated to
form one vector of length 6×4 = 24. Example of this repre-
sentation is shown in Figure 7. Position of a selected object is
defined the same as in implicit representation (Figure 4).
Analogator model
Connectionist model Analogator (Blank, 1997) is a domain-
independent model that learns part–whole analogies based on
examples. The basic difference between Analogator and other
models is that Analogator was not explicitly designed to make
analogies. It is based on a simple recurrent network (Elman,
Figure 5: Tensor representation of the whole scene. 1990). The input layer is divided into two groups of neurons:
the first group contains representation of the whole scene, while
To get the tensor representation for the whole scene, Blank the second group (context neurons) stores the selected object,
generated tensor representations for each object and then or a copy of the hidden layer activation. The output layer also
summed up the corresponding matrices. Detailed example comprises two groups of units where the first group contains
is shown in Figure 5. Furthermore, it is also necessary to repre- the position of the selected object and the second group outputs
sent the selected object. This can be simply done by providing the remaining two objects at correct positions (see Figure 9).
another 2×2 matrix which defines the position of this object. Computation related to processing a source–target pair of
To sum up, the whole input consists of tensor representation of inputs is executed in two steps (see Figure 8): In step 1, the
the source scene, of the position of a selected object and also network learns to properly separate in the source scene the
of the target scene. figure (selected object) from the ground (the remaining two
Explicit representation Symbolic models of analogy mak- objects). The source scene with a selected object is presented
ing typically use this approach which led to their criticism to the network and the output layer activations are calculated
(Chalmers et al., 1992). We already mentioned that using ex- (step 1a). Afterwards, the correct targets (figure–ground pair)
plicit representation in analogy making simplifies this process are placed at the output and the weights are trained (step 1b).
to a large degree. Another problem with explicit representation In step 2, the model learns to apply the previous transformation
is that there are many unique and equivalent means of repre- (from step 1) to the target scene. Here the target scene is placed
senting the scenes, but not all of them would allow analogy at the input, the hidden layer activations from previous step
to be correctly formed. The reason why we decided to also are copied into context neurons and the outputs are calculated
use explicit representation is to compare them with implicit (step 2a). The network weight are then updated to produce
representation. analogical figure–ground pair at the output layers (step 2b).
Our exlicit representation abstracts from the problem of The limitation of this architecture is that the number of neu-
feature binding, which implies that all features are already rons in the hidden layer and the context part must be identical.
linked to corresponding objects. It uses boolean vector of Because the context part also stores the position of a selected
length 6 for description of one object. Every element of this object (dual purpose layer), four neurons (there are 4 places
vector defines which attribute is present. The first three bits the selected object) may be too restrictive for the hidden layer.
define the color and the last three bits define the shape. Figure On the other hand, increasing the number of neurons would
6 provides an example. introduce redundant neurons in representing the position of a
To represent the whole scene, vector representation is cre- selected object.
589
Figure 8: Analogy making between the source scene and the target scene (Blank, 1997). For explanation, see the text.
Modifications of the basic model Training the models All models were trained on both types
of analogy (rotational and categorical, having two subtypes).
We have proposed and tested several modifications of the basic All weights were initialized to random values with uniform
model which eliminated its restrictions. First, we split the distribution (−0.05, 0.05). In each step, the standard error
input layer into three groups of neurons instead of two. The backpropagation (BP) algorithm was applied. Each epoch con-
first group remained unchanged (the whole scene), while the sisted of several thousands of input–target patterns (i.e. pairs of
second group was split to separately represent the position a source and a target scene). For interpretation of the network
of the selected object (in step 1) and the context. In step 1 outcome, the values were rounded to 1/0 (with the threshold
the context neurons are set to zeros (they are not required at 0.5). These values were afterwards compared with desired
that time) and in step 2 the hidden layer is copied into context outputs to calculate the error. Learning was stopped after the
neurons while the neurons storing the position of a selected fixed number of epochs.
object (the second group) are set to zero.
Second, we applied linear transformation to hidden layer Table 1: Overviews of the tested models. Each model could be
activations via Principal Component Analysis to project the combined with explicit or implicit representations (B = basic,
patterns into a lower dimension and stored its result into neu- S = split, P = PCA, 2 = extra hidden layer).
rons representing the position for a selected object. In this
model, PCA layer served as the layer of context neurons. To Model PCA split inp. hid2
implement online PCA learning, we used the Generalized Heb- AB1
bian learning algorithm (GHA, Sanger 1989). Model with AP1 X
this modification is trained simultaneously with BP and GHA AS1 X
algorithms. AB2 X
Third, we introduced the second hidden layer to the model AP2 X X
assuming it would increase its ability to learn analogies, speed AS2 X X
up convergence and improve generalization. Combination of
these three modifications yielded a variety of models as shown
in Table 1.6 Architectures of these novel models can also be Experiments
retrieved from Figure 9. Because training the model on one type of analogy (either
rotational or categorical) is an easy task (it was demonstrated
6 Combination of PCA and the split input layer would not make by Blank) we decided to simultaneously train our models on
much sense, since both attempt to solve the same problem. both types of analogy. In order to analyse the impact of the
590
sisted of 49 + 1 = 50 neurons and the output layer contained 49
+ 49 = 98 neurons. The AB1-I model did not manage to learn
both types of analogy even when trained for a large number
of epochs (∼1000). Average testing error was 12.6% which
roughly corresponds to 310 incorrectly made analogies. There-
fore we considered the explicit representation (AB1-E model),
in order to test its possible benefit in analogy learning. We
used 24 input neurons representing the scene and 49 neurons
representing the position of the selected object in the source
scene, amounting to 24 + 49 + 1 = 74 input neurons (hence, a
much smaller input size). The hidden and output layers were
the same as in the previous experiment, i.e. 50 hidden and
98 output neurons. AB1-E model revealed a significant im-
provement but it still did not manage to simultaneously learn
both types of analogy well enough. Testing error remained
at about 2.2%. This model also required a smaller number of
Figure 9: Proposed model modifications. The original model epochs in order to converge (∼200) and it also had smaller
(Analogator) consists of solid boxes, novel elements are illus- time complexity (thanks to smaller input).
trated by dashed boxes. Inclusion of the PCA module (AP1 model) led to a signifi-
cantly improved performance, in case of implicit representa-
tion, both in terms of accuracy (0.6%) and convergence time
modifications we also trained the basic models with both types (75 epochs). Explicit representation did not help in this model,
of representation to be used as a reference. We trained each unlike the basic model.
model five times to get an estimate of average training and test- In case of AS1 model, we split the input, so the hidden layer
ing errors. Every simulation used different patterns for training was no longer tied to the second group of inputs. Hence we
and testing sets but lasted the same number of epochs. The set increased the size of the hidden layer to 100 neurons. Simula-
of patterns was generated as follows: First, we generated all tions have shown that this model was able to successfully learn
categorical analogies from which we randomly selected a sub- both types of analogy with even smaller number of epochs.
set of 20000 patterns. Then we extended this set by all (3888) After 75 epochs the model converged to testing error 0.4%.
rotational analogies which yields 23888 training patterns in Since explicit representation had a positive impact on accu-
total. We used 90:10 ratio for training and testing sets (five racy in the basic model, we tested this effect also at this point,
different splits). Parameters for BP were taken from Blank using the AS1-E model. It had identical parameters to the
(1997), assuming they were appropriately selected: learning previous model, except for input and output sizes. The model
rate 0.1 and momentum 0.7. The overview of results is shown showed slightly worse results and after 75 epochs, testing er-
in Table 2. In case of PCA, the learning speed for GHA had to ror was 1.0% (vs. 0.4% in case of AS1-I). This experiment
be set to a much lower value 0.0001 (higher values hindered showed that in AS1 model does not benefit from either type of
the performance and the value around 0.1 prevented the model representation.
from learning completely).
The last three models were equipped with two hidden layers,
both with 100 neurons. In AB2 model we used identical param-
Table 2: Results of of five models combined with implicit and eter setting as in AB1. Interestingly, both implicit and explicit
explicit representations. versions of this model worked very well (accuracies 0.4% and
0.1%), whereas the latter had a little bit faster convergence (50
implicit explicit vs 35 epochs).
Model error #ep error #ep As a best model, the combination of two hidden layers with
AB1 12.60% 400 2.2% 200 PCA (AP2 models) led to perfect generalization (0%) in case
AP1 0.60% 75 2.0% 100 of explicit representation, and quite a small error (0.14%) with
AS1 0.40% 75 1.0% 75 implicit representations. The error-free AP2-E model took
AB2 0.40% 50 0.1% 35 somewhat longer to converge (100 epochs), though.
AP2 0.14% 35 0.0% 100
Finally, the combination of input splitting with two hidden
AS2 0.19% 45 0.04% 45
layers (AS2 models) also led to converging models with both
types of representations. Not the best ones (errors 0.19% and
We started with AB1-I model, that had at the input 49×6 = 0.04%) but very fast ones (45 epochs).
294 neurons representing the scene, 49 neurons representing As a next step, we wanted to get insight into a well trained
the position of the selected object of the source scene and one model. We recorded hidden-layer activations after training,
bias input, together 344 input neurons. The hidden layer con- taken from step 1 (which serves as context information for step
591
0
Acknowledgments
This work was supported by projects no. 1/0898/14 (VEGA)
5 and 076UK-4/2013 (KEGA).
10
References
Blank, D. (1997). Learning to See Analogies: A Connectionist
15 Exploration. PhD thesis, Indiana University, Bloomington.
Chalmers, D. J., French, R. M., and Hofstadter, D. R. (1992).
20
High-level perception, representation, and analogy: A cri-
tique of artificial intelligence methodology. Journal of Ex-
25
0 5 10 15 20 25 30 perimental and Theoretical Artif. Intelligence, 4:185–211.
Elman, J. L. (1990). Finding structure in time. Cognitive
Figure 10: The SOM winners for hidden-layer activation vec- Science, 14(2):179–211.
tors of the trained AS1-I model, corresponding to two types
Evans, T. G. (1968). A program for the solution of a class of
of categorical analogy. Symbol ’x’ denotes type 1 (differs in
geometric-analogy intelligence-test questions. In Minsky,
color), and symbol ’o’ denotes type 2 (differs in shape).
M. L., editor, Semantic Information Processing, pages 271–
353. MIT Press, Cambridge, Massachusetts.
2). We anticipated that the hidden layer had learned to orga- Falkenhainer, B., Forbus, K., and Gentner, D. (1986). The
nize its space by splitting the categories. The best separability structure-mapping engine. In Proc. of the 5th National
could be seen in a AS1 model with 50 hidden neurons, trained Conf. on Artif. Intelligence, pages 272–277, Philadelphia,
on categorical analogies only (differs in color or shape), so PA.
we present it here (in order models, the hidden organization Forbus, K., Usher, J., Lovett, A., Lockwood, K., and Wetzel, J.
was not so well discernible). These patterns (with recorded (2008). Cogsketch: Open-domain sketch understanding for
category labels) were then submitted during training to the self- cognitive science research and for education. In Eurograph-
organizing map (SOM; Kohonen 1982 with 25×29 neurons. ics Workshop on Sketch-Based Interfaces and Modeling. The
For each 50-dim. input pattern, the position of the best match- Eurographics Association.
ing unit was recorded after training. Figure10 displays the Gentner, D. and Forbus, K. D. (2011). Computational models
superimposed both classes in the same map. It is evident that of analogy. Wiley Interdisciplinary Reviews: Cognitive
the classes are well separated suggesting that the two subsets Science, 2(3):266–276.
of hidden vector activations are well separable in the hidden Gillan, D., Premack, D., and Woodruff, G. (1981). Reasoning
space, hence enabling error-free generalization. in the chimpanzee: I. analogical reasoning. Journal of
Exper. Psychology: Animal Behavior Processes, 7(1):1–17.
Conclusion Halford, S. G., Wilson, W., Guo, J., Gaylor, R. W., Wiles, J.,
In this article we focused on the problem of geometrical analo- and Stewart, J. E. M. (1994). Connectionist implications for
gies that are a subset of part–whole analogy, and we attempted processing capacity limitations in analogies. In Advances in
to solve this problem using a connecionist system. The original Connectionist and Neural Computation Theory, Volume 2:
Analogator model (Blank, 1997) was not able to learn both Analogical Connections, pages 363–445. Ablex Publishing,
types of analogy (rotational and categorical) so we designed Norwood, New Jersey.
improvements of this model. We also tried using explicit repre- Hall, R. P. (1989). Computational approaches to analogical rea-
sentations, and showed that they had a positive impact in some soning: A comparative analysis. Artif. Intelligence, 39:39–
of the models. 120.
In total, we have evaluated behaviour of the original model Kohonen, T. (1982). Self-organized formation of topologically
and five modifications. The best model combines PCA and two correct feature maps. Biol. Cybernetics, 43:59–69.
hidden layers, using explicit representation. This AP2-E model Lovett, A., Tomai, E., Forbus, K. D., and Usher, J. M. (2009).
learned the task with testing error 0.0% in every simulation. Its Solving geometric analogy problems through two-stage ana-
minor drawback is that the higher number of training epochs (in logical mapping. Cognitive Science, 33:1192–1231.
comparison to other models) and also a higher time complexity
Mitchell, M. and Hofstadter, D. (1995). Perspectives on copy-
for each iteration. On the other hand, it only needed 8500 pairs
cat: Comparisons with recent work. In Fluid Concepts and
for training (around 35%), as opposed to twice as many pairs
Creative Analogies, pages 275–299. Basic Books, Inc., New
in case of other models.
York, NY.
We did not thoroughly search for optimal parameters, and
Sanger, T. D. (1989). Optimal unsupervised learning in a single-
used the ones in Blank (1997). The following research in this
layer linear feedforward neural network. Neural Networks,
topic could also investigate our models on different part–whole
2:459–473.
analogies in order to test their potential.
592