=Paper=
{{Paper
|id=Vol-1144/paper6
|storemode=property
|title=Creating Visual Reactive Robot Behaviors Using Growing Neural Gas
|pdfUrl=https://ceur-ws.org/Vol-1144/paper6.pdf
|volume=Vol-1144
|dblpUrl=https://dblp.org/rec/conf/maics/Ferrer14
}}
==Creating Visual Reactive Robot Behaviors Using Growing Neural Gas==
Creating Visual Reactive Robot Behaviors Using Growing Neural Gas
Gabriel J. Ferrer
Department of Mathematics and Computer Science
Hendrix College
1600 Washington Ave.
Conway, Arkansas 72032
Abstract
Creating reactive robot behaviors that rely solely on vi-
sual input is tricky due to the well-known problems in-
volved with computer vision. This paper presents a po-
tential solution to this problem. The robot builds repre-
sentations of its target environment using Growing Neu-
ral Gas. The robot programmer then specifies its behav-
ior for each learned node. This approach is shown to
have the potential to be effective for simplifying robot
behavior programming in an indoor environment with
low-cost hardware.
Programming reactive behaviors (Brooks 1986) on a robot
equipped with basic sensors such as touch sensors and
sonars is reasonably straightforward and well-understood.
Despite considerable progress in computer vision tech-
niques, use of camera images to direct reactive behaviors
remains tricky. Considerable effort is often invested in com-
puter vision algorithms that are specialized for a particular
environment. (Horswill 1994) Figure 1: Portrait of the Robot
Unsupervised learning algorithms have proven to be pop-
ular tools for computer vision, especially neural network
representative example of the input space. Each represen-
models such as the Self-Organizing Map (Kohonen 2001)
tative example is arithmetically derived (in an algorithm-
and Growing Neural Gas (Fritzke 1995). These algorithms
dependent manner) from the training inputs.
derive abstractions of subsets of image sets that can be used
for later classification of previously unseen images. Both k-means and self-organizing maps require a fixed
number of clusters to be specified in advance. Growing neu-
This paper describes an application of unsupervised learn-
ral gas adaptively determines the number of clusters based
ing, specifically Growing Neural Gas (GNG) to the problem
on its inputs. In the interest of allowing the inputs to deter-
of programming reactive behaviors when using computer vi-
mine the number of clusters, we selected GNG for this task.
sion as the principal sensor for a mobile robot. Behavior pro-
gramming is a two-stage process. First, the robot builds a
Growing Neural Gas
GNG network as it explores its environment. Subsequently,
the programmer specifies the desired action for each learned Growing Neural Gas is an artificial neural network that is
GNG cluster. Once this specification is complete, the robot trained using an unsupervised learning algorithm. The net-
selects its actions based on the GNG cluster that is the clos- work is an undirected weighted graph. In this implementa-
est match to the current input image. tion, both the network inputs and the network nodes are 2D
grayscale images. When an input is presented to the net-
Learning the Environment work, the Euclidean distance between the input and each
node is calculated. The node with the shortest distance rela-
Unsupervised Learning tive to the input becomes the active node.
Unsupervised learning algorithms, such as k-means (Forgey Each edge connects two nodes that are active in response
1965) (MacQueen 1967), the self-organizing map (SOM) to similar inputs. The edge weight reflects the frequency
(Kohonen 2001), and Growing Neural Gas (GNG) (Fritzke with which the pair has been responsive in tandem; it is re-
1995), operate by partitioning their training inputs into clus- set to zero whenever this occurs. Large weights denote weak
ters based on a distance metric. Each cluster is defined as a connections that are eventually purged. Nodes who lose all
their edges are purged as well. • Find its neighbor n with the largest error value among all
of m’s neighbors.
Training In each iteration of training, a single input is pre-
sented to the network. The network is then adjusted as fol- • Create an image by averaging the corresponding pixel val-
lows: ues of m and n.
• Identify the nodes that are the closest and second-closest • Create a new node p using:
matches to the input. – The averaged image
• Adjust edge weights. – The mean of the errors of m and n
• Update error and utility values. – The maximum of the utility values of m and n
• Create a new node. • Break the edge between m and n.
• Purge edges and nodes. • Add an edge of weight zero between m and p, and another
between n and p.
In the training process, two nodes are identified: the node
with the shortest distance to the input, and the node with • Divide the error and utility values for each of m and n by
the second-shortest distance. If an edge is present between two.
them, its weight is reset to zero; otherwise, a new edge is Purging Edges and Nodes On each iteration, every edge
created between them with a weight of zero. All other edges whose weight exceeds a specified limit is purged. If any node
adjoining the winning node are increased by one. has all its edges purged, that node is purged as well.
The values of each image pixel pxy for the winning node In addition, for every node the utility ratio is calculated
and each of its neighbors are updated as follows relative to (Fritzke 1997). The largest error of any node, emax , is de-
each input pixel qxy : termined. The utility ratio for each node n with utility un
pxy = pxy + α(qxy − pxy ) is emax
un . The node with the single largest utility ratio is the
most useless node. If its ratio exceeds a parameter k, it is
The learning rate parameter α is significantly larger for purged.
the winning node in comparison to the lower value used for
the neighboring nodes. Training and Programming
Error and Utility Each node maintains error and utility Our robot is equipped with a controller that allows the pro-
values. The purpose of the error value is to identify nodes grammer to drive it around its environment. As the robot is
that, while they are frequently the best matching node for driven around, the first two images it acquires become the
a variety of inputs, are still not very close matches. These first two nodes of Growing Neural Gas. Each subsequent
nodes, then, represent parts of the input space that are not ad- image acquired is used for one iteration of training the GNG
equately covered by the current set of nodes. The purpose of network. Training ends upon a signal from the programmer.
the utility value is to identify nodes that are distinctive rela- The GNG network is then saved for later use.
tive to their neighbors. Nodes with high utility are frequently We slightly modified how the GNG learning algorithm de-
much closer matches to many inputs than their neighbors. cides to create new nodes. When the programmer changes
On each training iteration, these values are updated as fol- the robot’s command, it is typically in response to a stim-
lows: ulus the programmer perceives. Consequently, whatever the
• For the winning node: robot is sensing at that time has the potential to be very im-
portant. Following this intuition, a new GNG node is created
– Increase the error for the winning node by the Eu- and λ is reset to zero whenever the programmer changes the
clidean distance to the input. robot’s movement.
– Subtract the distance to the winning node from the dis- When the GNG training process is complete, the pro-
tance to the second-best node. Add this difference to grammer runs an application that shows all of the GNG
the utility. nodes. Each node is annotated with a GUI component that
• For each node n: allows the programmer to select an action to correspond to
that node. A screenshot of the action selection application
– Reduce the error and utility values using a decay con- is given in Figure 2. When the programmer has selected an
stant β (0 < β < 1) as follows: action for every node, a controller is generated. When the
∗ errorn = errorn − β × errorn controller executes, as each image is acquired it is presented
∗ utilityn = utilityn − β × utilityn to the GNG network. The action specified for the winning
Creating New Nodes An integer parameter λ controls the node is immediately executed.
creation of new nodes. The frequency of node creation is
inversely proportional to lambda; low values imply frequent Experiments
introduction of new nodes. Configuration and Training
Every λ iterations, a new node is created as follows: The experimental goal was to train the robot to be familiar
• Find the node m with the largest error value in the net- with a particular room, such that it could wander the room
work. without hitting anything. The robot is a Lego Mindstorms
Action Selection
Actions were selected by visually examining the refer-
ence images for each GNG node and determining an ap-
propriate action for the match. To keep things simple,
only three actions were used: FORWARD, BACK_LEFT, and
BACK_RIGHT.
For some reference images, the choice of action was clear.
In Figure 3, we see an example where going forward is the
obvious choice, as there is plenty of clear floor space. Figure
4 is an example of a clear choice of a turn, as the wall is very
close.
Figure 2: Selecting Actions for GNG Nodes
NXT. To enable image processing, a Dell Inspiron Mini Net-
book equipped with a webcam was placed atop the robot to
control it via a USB cable. The configured robot is shown in
Figure 1.
The webcam images were acquired at a size of 640x480
pixels. Each image was averaged to produce a grayscale im- Figure 3: Forward
age upon acquisition, with each pixel value ranging from 0
to 255. Each grayscale image was scaled down to 160x120
prior to being applied as an input to the GNG network. The
first two GNG nodes are the first two images acquired. This
configuration enabled images to be acquired and processed
and learning to occur at about 15-16 frames per second.
Given a robot velocity of 17 cm/s, it processes about one
frame per centimeter of travel.
The GNG algorithm was parameterized as follows:
• Maximum edge age: 100
• Learning rate (winning node): 0.05
• Learning rate (neighboring nodes): 0.0006
• Maximum utility ratio (k): 4
• Iterations per node creation (λ): 200
• Error decay (β) per iteration: 0.0005
Most of the parameters were taken directly from the ex-
periments described by (Holmstrom 2002). As noted above,
every change of command results in the creation of a new
node. The λ value of 200, then, was selected to ensure the
creation of a new node after, at most, two meters of travel. Figure 4: Turn
The GNG network was trained by driving the robot
around the room for several minutes. The trained network In other cases, the choice was less clear, and it had to be
had 26 nodes (depicted in Figure 2) by the completion of the altered after experimentation. The action for Figure 5 was
run. originally FORWARD, but it was changed to a turn after it was
observed that moving forward in that situation led to a col- final version.) The first version produced a robot behavior
lision. The action for Figure 6 was originally BACK_LEFT, that countered virtually every forward motion with a back-
but it was changed to FORWARD when it was observed that ward motion. The result was a robot that rarely hit anything,
this action caused the robot to turn in the middle of the room. but also made very little progress. Gradual refinement of the
Subsequent reflection led to the conclusion that the reference action selection produced a robot (Version 4) that avoided
image displays considerably more floor space than was orig- obstacles successfully while maintaining consistent forward
inally thought when the first action was selected. Overall, motion. The GNG-based controller runs at about 10 frames
four out of the 26 actions were changed during the course of per second, somewhat slower than the training phrase.
experimentation. Figure 7 illustrates the effect of refined action selection
on forward motion. The vertical axis denotes the percentage
of the time that the robot spent moving forward rather than
turning. For the first three versions, the data is the average of
two runs. (After two runs, the observed behavior was found
sufficiently frustrating that actions were altered immediately
in light of the observed behavior.) For the considerably more
satisfying fourth version, the average of seven runs is given.
The ”Human” value denotes the percentage of forward mo-
tion for a single run of a human piloting the robot, maximiz-
ing forward motion while avoiding obstacles.
Figure 5: Forward, later revised
Figure 7: Forward Motion
Figure 8 shows the number of seconds the robot was able
to run before striking an obstacle. All runs were for the
fourth and final version of action selection depicted in Fig-
ure 2. For each run, the robot started moving from the same
position in our lab. While it generally did well in avoiding
obstacles, there were a couple of locations that consistently
caused problems. The long duration for the second run re-
flects a situation in which the robot was fortunate to avoid
the problematic areas for a considerable period of time.
A representative example of a problematic node is node
Figure 6: Turn, later revised 24, depicted in Figure 9. The lower part depicts open
floorspace, while the upper part shows artifacts of several
obstacles. Nodes 3, 10, and 16 exhibit similar ambiguities
Observed Robot Behavior and caused similar problems. Node 18 (in Figure 3) was nor-
For the GNG network shown in Figure 2, three revisions mally a strong contributor to forward motion; however, it ap-
were made to the original action selection, resulting in four pears that the checkerboard pattern on the floor may have led
total versions. (The actions shown in Figure 2 represent the even this node into ambiguous situations. Node 17 exhibited
it avoided obstacles perfectly while maintaining significant
forward motion. Devising and improving the action configu-
ration proved to be straightforward; simple tweaks produced
significant incremental performance improvements. We are
considering the following avenues to improve performance.
Merging GNG Networks
Relying upon a single training run to produce a usable GNG
interferes with what should naturally be an iterative process
of carefully debugging the robot’s behavior. By using the ini-
tial GNG network as a starting point, additional training runs
could be conducted in the physical vicinity of problematic
locations in order to help introduce new nodes to overcome
ambiguities.
A related approach would be to incorporate the ability to
merge GNG networks derived from separate training runs.
Figure 8: Time until collision This would be an alternative means to enable the program-
mer to “patch” the robot’s knowledge of troublesome areas
while still retaining the positive aspects of the initially cre-
the same issue. The most ironic example is Node 2 (in Figure ated network.
6), which had originally been designated as a BACK_LEFT
node but which was changed to FORWARD due to unwanted Implicit Action Selection
turns. Node 2 is a combination of floor and wall with a very An alternative to manual action selection would be a two-
ambiguous boundary; not all of the turns it triggered turned phase piloting approach. In the first piloted run, the GNG
out to be ”unwanted”. network would be built. During the second run, popularity
of actions selected by the pilot would be tracked for each
winning node. In this way, action selection would be im-
plicit rather than explicit. This would be especially useful
in allowing this technique to scale to GNG networks with
significantly larger numbers of nodes.
Furthermore, this could enable the early identification
of ambiguous nodes. Nodes that map onto locations with
clear actions would exhibit uniformity in the actions selected
on their behalf. Nodes with conflicting action designations
could serve as an early signal for trouble, and a focus for
efforts to improve the underlying GNG network.
Related Work
Growing Neural Gas has been applied to robotic systems
for numerous purposes beyond that proposed in this paper,
including localization (Baldassarri et al. 2003) (Yan, We-
ber, and Wermter 2012), modeling the physical structure of
the environment (Kim, Cho, and Kim 2003), (Shamwell et
al. 2012), and control via gesture recognition (Yanik et al.
2012).
Figure 9: Node 24 A representative example of a system that uses artificial
neural networks to simplify the programming of reactive be-
What all of these nodes have in common is that they were haviors is (Cox and Best 2004). In their system, the pro-
otherwise used heavily to trigger productive forward motion. grammer specifies motor settings for various combinations
Unfortunately, several locations in our environment proved of sensor values that a simulated robot encounters. A multi-
consistently problematic due to these ambiguities. Giving all layer perceptron is employed to generalize the motor set-
of these nodes turn-action designations would have resulted tings to sensor combinations that had not been previously
in a significant loss of forward motion. We hypothesize that encountered. The simulated sensors are three infrared sen-
the underlying problem is that the GNG network needs more sors and two bump sensors. It is not at all clear how this ap-
nodes to overcome these ambiguities. proach would scale to the much greater complexity of visual
input. Our work, by exploiting the inherent intelligibility of
Analysis the reference images of the GNG clusters, provides the pro-
The results of this preliminary study are very promising. As grammer with meaningful abstractions of the visual input
long as the robot was not exposed to problematic locations, for which actions can then be coherently specified.
(Touzet 2006) employs self-organizing maps (Kohonen A.; Hasler, M.; and Nicoud, J. D., eds., Proceedings of the
2001) to build models of a physical robot’s environment and Seventh International Conference on Artificial Neural Net-
the effect of performing actions in certain states. The de- works: ICANN-97, volume 1327 of Lecture Notes in Com-
sired behavior is then specified in terms of targeted sensor puter Science, 613–618. Berlin; New York: Springer-Verlag.
values. For the range sensors they employed, specific win- Gunderson, L. F., and Gunderson, J. P. 2008. Robots, Rea-
dows of target values are specified in order to induce the soning, and Reification. Springer.
target behavior. When using computer vision as a sensor,
Holmstrom, J. 2002. Growing neural gas: Experiments with
creating such windows of target values is impractical. The
gng, gng with utility and supervised gng. Master’s thesis,
GNG cluster reference images provide a usable alternative.
Uppsala University, Department of Information Technology
In (Gunderson and Gunderson 2008), an agent architec-
Computer Systems Box 337, SE-751 05 Uppsala, Sweden.
ture is presented that is centered around the issue of reifi-
cation. Each object to be perceived is represented by a Horswill, I. 1994. Visual collision avoidance by segmen-
PerCept that binds a sensor-derived signature to a sym- tation. In Proceedings of the IEEE/RSJ International Con-
bolic component that can be manipulated by the symbolic ference on Intelligent Robots and Systems, 902–909. IEEE
reasoning engine. The authors describe how a chair, for ex- Press.
ample, can be described based on a pattern of readings from Kim, M. Y.; Cho, H.; and Kim, J. 2003. Obstacle modeling
multiple sonars. When using camera input, the scheme de- for environment recognition of mobile robots using grow-
scribed in this paper could provide a useful means of defin- ing neural gas network. International Journal of Control,
ing sensor signatures for objects that are to be visually iden- Automation, and Systems 1(1).
tified, by assigning identification tags to cluster reference Kohonen, T. 2001. Self-Organizing Maps. Springer, 3rd
images rather than actions. edition.
MacQueen, J. 1967. Some methods for classification and
Conclusion analysis of multivariate observations. In Proc. of the Fifth
Growing Neural Gas is an effective technique for creating a Berkeley Symposium on Math. Stat. and Prob., 281–296.
model of a robot’s environment that simplifies the program- Shamwell, J.; Oates, T.; Bhargava, P.; Cox, M. T.; Oh, U.;
ming of reactive behaviors relying on visual input. It runs Paisner, M.; and Perlis, D. 2012. The robot baby and mas-
sufficiently fast for real-time processing, both when learn- sive metacognition: Early steps via growing neural gas. In
ing and when selecting actions. ICDL-EPIROB, 1–2. IEEE.
Significant work remains to be done to improve upon the
Touzet, C. 2006. Modeling and simulation of elementary
results of this preliminary study. A particular focus will be
robot behaviors using associative memories. International
made on investigating techniques for iterative refinement of
Journal of Advanced Robotic Systems 3(2):165–170.
the GNG network, as well as the incorporation of informa-
tion from a piloted phase targeted at determining appropriate Yan, W.; Weber, C.; and Wermter, S. 2012. A neural ap-
action selection. proach for robot navigation based on cognitive map learn-
ing. In IJCNN, 1–8. IEEE.
References Yanik, P.; Manganelli, J.; Merino, J.; Threatt, A.; Brooks,
Baldassarri, P.; Puliti, P.; Montesanto, A.; and Tascini, G. J. O.; Green, K. E.; and Walker, I. D. 2012. Use of kinect
2003. Self-organizing maps versus growing neural gas in depth data and growing neural gas for gesture based robot
a robotic application. In Mira, J., and lvarez, J. R., eds., control. In PervasiveHealth, 283–290. IEEE.
IWANN (2), volume 2687 of Lecture Notes in Computer Sci-
ence, 201–208. Springer.
Brooks, R. A. 1986. A robust layered control system for
a mobile robot. IEEE Journal of Robotics and Automation
2(10).
Cox, P. T., and Best, S. M. 2004. Programming an au-
tonomous robot controller by demonstration using artificial
neural networks. In Proceedings of the IEEE Symposium
on Visual Languages and Human-Centric Computing, 157–
159.
Forgey, E. 1965. Cluster analysis of multivariate data:
Efficiency vs. interpretability of classification. Biometrics
21(768).
Fritzke, B. 1995. A growing neural gas network learns
topologies. In Advances in Neural Information Processing
Systems 7, 625–632. MIT Press.
Fritzke, B. 1997. A self-organizing network that can fol-
low non-stationary distributions. In Gerstner, W.; Germond,