=Paper= {{Paper |id=Vol-1144/paper6 |storemode=property |title=Creating Visual Reactive Robot Behaviors Using Growing Neural Gas |pdfUrl=https://ceur-ws.org/Vol-1144/paper6.pdf |volume=Vol-1144 |dblpUrl=https://dblp.org/rec/conf/maics/Ferrer14 }} ==Creating Visual Reactive Robot Behaviors Using Growing Neural Gas== https://ceur-ws.org/Vol-1144/paper6.pdf
          Creating Visual Reactive Robot Behaviors Using Growing Neural Gas

                                                           Gabriel J. Ferrer
                                          Department of Mathematics and Computer Science
                                                         Hendrix College
                                                       1600 Washington Ave.
                                                     Conway, Arkansas 72032



                            Abstract
     Creating reactive robot behaviors that rely solely on vi-
     sual input is tricky due to the well-known problems in-
     volved with computer vision. This paper presents a po-
     tential solution to this problem. The robot builds repre-
     sentations of its target environment using Growing Neu-
     ral Gas. The robot programmer then specifies its behav-
     ior for each learned node. This approach is shown to
     have the potential to be effective for simplifying robot
     behavior programming in an indoor environment with
     low-cost hardware.

Programming reactive behaviors (Brooks 1986) on a robot
equipped with basic sensors such as touch sensors and
sonars is reasonably straightforward and well-understood.
Despite considerable progress in computer vision tech-
niques, use of camera images to direct reactive behaviors
remains tricky. Considerable effort is often invested in com-
puter vision algorithms that are specialized for a particular
environment. (Horswill 1994)                                                         Figure 1: Portrait of the Robot
   Unsupervised learning algorithms have proven to be pop-
ular tools for computer vision, especially neural network
                                                                      representative example of the input space. Each represen-
models such as the Self-Organizing Map (Kohonen 2001)
                                                                      tative example is arithmetically derived (in an algorithm-
and Growing Neural Gas (Fritzke 1995). These algorithms
                                                                      dependent manner) from the training inputs.
derive abstractions of subsets of image sets that can be used
for later classification of previously unseen images.                    Both k-means and self-organizing maps require a fixed
                                                                      number of clusters to be specified in advance. Growing neu-
   This paper describes an application of unsupervised learn-
                                                                      ral gas adaptively determines the number of clusters based
ing, specifically Growing Neural Gas (GNG) to the problem
                                                                      on its inputs. In the interest of allowing the inputs to deter-
of programming reactive behaviors when using computer vi-
                                                                      mine the number of clusters, we selected GNG for this task.
sion as the principal sensor for a mobile robot. Behavior pro-
gramming is a two-stage process. First, the robot builds a
                                                                      Growing Neural Gas
GNG network as it explores its environment. Subsequently,
the programmer specifies the desired action for each learned          Growing Neural Gas is an artificial neural network that is
GNG cluster. Once this specification is complete, the robot           trained using an unsupervised learning algorithm. The net-
selects its actions based on the GNG cluster that is the clos-        work is an undirected weighted graph. In this implementa-
est match to the current input image.                                 tion, both the network inputs and the network nodes are 2D
                                                                      grayscale images. When an input is presented to the net-
             Learning the Environment                                 work, the Euclidean distance between the input and each
                                                                      node is calculated. The node with the shortest distance rela-
Unsupervised Learning                                                 tive to the input becomes the active node.
Unsupervised learning algorithms, such as k-means (Forgey                Each edge connects two nodes that are active in response
1965) (MacQueen 1967), the self-organizing map (SOM)                  to similar inputs. The edge weight reflects the frequency
(Kohonen 2001), and Growing Neural Gas (GNG) (Fritzke                 with which the pair has been responsive in tandem; it is re-
1995), operate by partitioning their training inputs into clus-       set to zero whenever this occurs. Large weights denote weak
ters based on a distance metric. Each cluster is defined as a         connections that are eventually purged. Nodes who lose all
their edges are purged as well.                                     • Find its neighbor n with the largest error value among all
                                                                      of m’s neighbors.
Training In each iteration of training, a single input is pre-
sented to the network. The network is then adjusted as fol-         • Create an image by averaging the corresponding pixel val-
lows:                                                                 ues of m and n.
• Identify the nodes that are the closest and second-closest        • Create a new node p using:
   matches to the input.                                              – The averaged image
• Adjust edge weights.                                                – The mean of the errors of m and n
• Update error and utility values.                                    – The maximum of the utility values of m and n
• Create a new node.                                                • Break the edge between m and n.
• Purge edges and nodes.                                            • Add an edge of weight zero between m and p, and another
                                                                      between n and p.
   In the training process, two nodes are identified: the node
with the shortest distance to the input, and the node with          • Divide the error and utility values for each of m and n by
the second-shortest distance. If an edge is present between           two.
them, its weight is reset to zero; otherwise, a new edge is         Purging Edges and Nodes On each iteration, every edge
created between them with a weight of zero. All other edges         whose weight exceeds a specified limit is purged. If any node
adjoining the winning node are increased by one.                    has all its edges purged, that node is purged as well.
   The values of each image pixel pxy for the winning node             In addition, for every node the utility ratio is calculated
and each of its neighbors are updated as follows relative to        (Fritzke 1997). The largest error of any node, emax , is de-
each input pixel qxy :                                              termined. The utility ratio for each node n with utility un
                 pxy = pxy + α(qxy − pxy )                          is emax
                                                                        un . The node with the single largest utility ratio is the
                                                                    most useless node. If its ratio exceeds a parameter k, it is
   The learning rate parameter α is significantly larger for        purged.
the winning node in comparison to the lower value used for
the neighboring nodes.                                                          Training and Programming
Error and Utility Each node maintains error and utility             Our robot is equipped with a controller that allows the pro-
values. The purpose of the error value is to identify nodes         grammer to drive it around its environment. As the robot is
that, while they are frequently the best matching node for          driven around, the first two images it acquires become the
a variety of inputs, are still not very close matches. These        first two nodes of Growing Neural Gas. Each subsequent
nodes, then, represent parts of the input space that are not ad-    image acquired is used for one iteration of training the GNG
equately covered by the current set of nodes. The purpose of        network. Training ends upon a signal from the programmer.
the utility value is to identify nodes that are distinctive rela-   The GNG network is then saved for later use.
tive to their neighbors. Nodes with high utility are frequently        We slightly modified how the GNG learning algorithm de-
much closer matches to many inputs than their neighbors.            cides to create new nodes. When the programmer changes
   On each training iteration, these values are updated as fol-     the robot’s command, it is typically in response to a stim-
lows:                                                               ulus the programmer perceives. Consequently, whatever the
• For the winning node:                                             robot is sensing at that time has the potential to be very im-
                                                                    portant. Following this intuition, a new GNG node is created
   – Increase the error for the winning node by the Eu-             and λ is reset to zero whenever the programmer changes the
      clidean distance to the input.                                robot’s movement.
   – Subtract the distance to the winning node from the dis-           When the GNG training process is complete, the pro-
      tance to the second-best node. Add this difference to         grammer runs an application that shows all of the GNG
      the utility.                                                  nodes. Each node is annotated with a GUI component that
• For each node n:                                                  allows the programmer to select an action to correspond to
                                                                    that node. A screenshot of the action selection application
   – Reduce the error and utility values using a decay con-         is given in Figure 2. When the programmer has selected an
      stant β (0 < β < 1) as follows:                               action for every node, a controller is generated. When the
    ∗ errorn = errorn − β × errorn                                  controller executes, as each image is acquired it is presented
    ∗ utilityn = utilityn − β × utilityn                            to the GNG network. The action specified for the winning
Creating New Nodes An integer parameter λ controls the              node is immediately executed.
creation of new nodes. The frequency of node creation is
inversely proportional to lambda; low values imply frequent                               Experiments
introduction of new nodes.                                          Configuration and Training
   Every λ iterations, a new node is created as follows:            The experimental goal was to train the robot to be familiar
• Find the node m with the largest error value in the net-          with a particular room, such that it could wander the room
   work.                                                            without hitting anything. The robot is a Lego Mindstorms
                                                               Action Selection
                                                               Actions were selected by visually examining the refer-
                                                               ence images for each GNG node and determining an ap-
                                                               propriate action for the match. To keep things simple,
                                                               only three actions were used: FORWARD, BACK_LEFT, and
                                                               BACK_RIGHT.
                                                                  For some reference images, the choice of action was clear.
                                                               In Figure 3, we see an example where going forward is the
                                                               obvious choice, as there is plenty of clear floor space. Figure
                                                               4 is an example of a clear choice of a turn, as the wall is very
                                                               close.




       Figure 2: Selecting Actions for GNG Nodes


NXT. To enable image processing, a Dell Inspiron Mini Net-
book equipped with a webcam was placed atop the robot to
control it via a USB cable. The configured robot is shown in
Figure 1.
   The webcam images were acquired at a size of 640x480
pixels. Each image was averaged to produce a grayscale im-                           Figure 3: Forward
age upon acquisition, with each pixel value ranging from 0
to 255. Each grayscale image was scaled down to 160x120
prior to being applied as an input to the GNG network. The
first two GNG nodes are the first two images acquired. This
configuration enabled images to be acquired and processed
and learning to occur at about 15-16 frames per second.
Given a robot velocity of 17 cm/s, it processes about one
frame per centimeter of travel.
   The GNG algorithm was parameterized as follows:
• Maximum edge age: 100
• Learning rate (winning node): 0.05
• Learning rate (neighboring nodes): 0.0006
• Maximum utility ratio (k): 4
• Iterations per node creation (λ): 200
• Error decay (β) per iteration: 0.0005
   Most of the parameters were taken directly from the ex-
periments described by (Holmstrom 2002). As noted above,
every change of command results in the creation of a new
node. The λ value of 200, then, was selected to ensure the
creation of a new node after, at most, two meters of travel.                           Figure 4: Turn
   The GNG network was trained by driving the robot
around the room for several minutes. The trained network          In other cases, the choice was less clear, and it had to be
had 26 nodes (depicted in Figure 2) by the completion of the   altered after experimentation. The action for Figure 5 was
run.                                                           originally FORWARD, but it was changed to a turn after it was
observed that moving forward in that situation led to a col-      final version.) The first version produced a robot behavior
lision. The action for Figure 6 was originally BACK_LEFT,         that countered virtually every forward motion with a back-
but it was changed to FORWARD when it was observed that           ward motion. The result was a robot that rarely hit anything,
this action caused the robot to turn in the middle of the room.   but also made very little progress. Gradual refinement of the
Subsequent reflection led to the conclusion that the reference    action selection produced a robot (Version 4) that avoided
image displays considerably more floor space than was orig-       obstacles successfully while maintaining consistent forward
inally thought when the first action was selected. Overall,       motion. The GNG-based controller runs at about 10 frames
four out of the 26 actions were changed during the course of      per second, somewhat slower than the training phrase.
experimentation.                                                     Figure 7 illustrates the effect of refined action selection
                                                                  on forward motion. The vertical axis denotes the percentage
                                                                  of the time that the robot spent moving forward rather than
                                                                  turning. For the first three versions, the data is the average of
                                                                  two runs. (After two runs, the observed behavior was found
                                                                  sufficiently frustrating that actions were altered immediately
                                                                  in light of the observed behavior.) For the considerably more
                                                                  satisfying fourth version, the average of seven runs is given.
                                                                  The ”Human” value denotes the percentage of forward mo-
                                                                  tion for a single run of a human piloting the robot, maximiz-
                                                                  ing forward motion while avoiding obstacles.




              Figure 5: Forward, later revised




                                                                                    Figure 7: Forward Motion

                                                                     Figure 8 shows the number of seconds the robot was able
                                                                  to run before striking an obstacle. All runs were for the
                                                                  fourth and final version of action selection depicted in Fig-
                                                                  ure 2. For each run, the robot started moving from the same
                                                                  position in our lab. While it generally did well in avoiding
                                                                  obstacles, there were a couple of locations that consistently
                                                                  caused problems. The long duration for the second run re-
                                                                  flects a situation in which the robot was fortunate to avoid
                                                                  the problematic areas for a considerable period of time.
                                                                     A representative example of a problematic node is node
                Figure 6: Turn, later revised                     24, depicted in Figure 9. The lower part depicts open
                                                                  floorspace, while the upper part shows artifacts of several
                                                                  obstacles. Nodes 3, 10, and 16 exhibit similar ambiguities
Observed Robot Behavior                                           and caused similar problems. Node 18 (in Figure 3) was nor-
For the GNG network shown in Figure 2, three revisions            mally a strong contributor to forward motion; however, it ap-
were made to the original action selection, resulting in four     pears that the checkerboard pattern on the floor may have led
total versions. (The actions shown in Figure 2 represent the      even this node into ambiguous situations. Node 17 exhibited
                                                                it avoided obstacles perfectly while maintaining significant
                                                                forward motion. Devising and improving the action configu-
                                                                ration proved to be straightforward; simple tweaks produced
                                                                significant incremental performance improvements. We are
                                                                considering the following avenues to improve performance.

                                                                Merging GNG Networks
                                                                Relying upon a single training run to produce a usable GNG
                                                                interferes with what should naturally be an iterative process
                                                                of carefully debugging the robot’s behavior. By using the ini-
                                                                tial GNG network as a starting point, additional training runs
                                                                could be conducted in the physical vicinity of problematic
                                                                locations in order to help introduce new nodes to overcome
                                                                ambiguities.
                                                                   A related approach would be to incorporate the ability to
                                                                merge GNG networks derived from separate training runs.
               Figure 8: Time until collision                   This would be an alternative means to enable the program-
                                                                mer to “patch” the robot’s knowledge of troublesome areas
                                                                while still retaining the positive aspects of the initially cre-
the same issue. The most ironic example is Node 2 (in Figure    ated network.
6), which had originally been designated as a BACK_LEFT
node but which was changed to FORWARD due to unwanted           Implicit Action Selection
turns. Node 2 is a combination of floor and wall with a very    An alternative to manual action selection would be a two-
ambiguous boundary; not all of the turns it triggered turned    phase piloting approach. In the first piloted run, the GNG
out to be ”unwanted”.                                           network would be built. During the second run, popularity
                                                                of actions selected by the pilot would be tracked for each
                                                                winning node. In this way, action selection would be im-
                                                                plicit rather than explicit. This would be especially useful
                                                                in allowing this technique to scale to GNG networks with
                                                                significantly larger numbers of nodes.
                                                                   Furthermore, this could enable the early identification
                                                                of ambiguous nodes. Nodes that map onto locations with
                                                                clear actions would exhibit uniformity in the actions selected
                                                                on their behalf. Nodes with conflicting action designations
                                                                could serve as an early signal for trouble, and a focus for
                                                                efforts to improve the underlying GNG network.

                                                                                      Related Work
                                                                Growing Neural Gas has been applied to robotic systems
                                                                for numerous purposes beyond that proposed in this paper,
                                                                including localization (Baldassarri et al. 2003) (Yan, We-
                                                                ber, and Wermter 2012), modeling the physical structure of
                                                                the environment (Kim, Cho, and Kim 2003), (Shamwell et
                                                                al. 2012), and control via gesture recognition (Yanik et al.
                                                                2012).
                     Figure 9: Node 24                             A representative example of a system that uses artificial
                                                                neural networks to simplify the programming of reactive be-
   What all of these nodes have in common is that they were     haviors is (Cox and Best 2004). In their system, the pro-
otherwise used heavily to trigger productive forward motion.    grammer specifies motor settings for various combinations
Unfortunately, several locations in our environment proved      of sensor values that a simulated robot encounters. A multi-
consistently problematic due to these ambiguities. Giving all   layer perceptron is employed to generalize the motor set-
of these nodes turn-action designations would have resulted     tings to sensor combinations that had not been previously
in a significant loss of forward motion. We hypothesize that    encountered. The simulated sensors are three infrared sen-
the underlying problem is that the GNG network needs more       sors and two bump sensors. It is not at all clear how this ap-
nodes to overcome these ambiguities.                            proach would scale to the much greater complexity of visual
                                                                input. Our work, by exploiting the inherent intelligibility of
                        Analysis                                the reference images of the GNG clusters, provides the pro-
The results of this preliminary study are very promising. As    grammer with meaningful abstractions of the visual input
long as the robot was not exposed to problematic locations,     for which actions can then be coherently specified.
    (Touzet 2006) employs self-organizing maps (Kohonen           A.; Hasler, M.; and Nicoud, J. D., eds., Proceedings of the
2001) to build models of a physical robot’s environment and       Seventh International Conference on Artificial Neural Net-
the effect of performing actions in certain states. The de-       works: ICANN-97, volume 1327 of Lecture Notes in Com-
sired behavior is then specified in terms of targeted sensor      puter Science, 613–618. Berlin; New York: Springer-Verlag.
values. For the range sensors they employed, specific win-        Gunderson, L. F., and Gunderson, J. P. 2008. Robots, Rea-
dows of target values are specified in order to induce the        soning, and Reification. Springer.
target behavior. When using computer vision as a sensor,
                                                                  Holmstrom, J. 2002. Growing neural gas: Experiments with
creating such windows of target values is impractical. The
                                                                  gng, gng with utility and supervised gng. Master’s thesis,
GNG cluster reference images provide a usable alternative.
                                                                  Uppsala University, Department of Information Technology
    In (Gunderson and Gunderson 2008), an agent architec-
                                                                  Computer Systems Box 337, SE-751 05 Uppsala, Sweden.
ture is presented that is centered around the issue of reifi-
cation. Each object to be perceived is represented by a           Horswill, I. 1994. Visual collision avoidance by segmen-
PerCept that binds a sensor-derived signature to a sym-           tation. In Proceedings of the IEEE/RSJ International Con-
bolic component that can be manipulated by the symbolic           ference on Intelligent Robots and Systems, 902–909. IEEE
reasoning engine. The authors describe how a chair, for ex-       Press.
ample, can be described based on a pattern of readings from       Kim, M. Y.; Cho, H.; and Kim, J. 2003. Obstacle modeling
multiple sonars. When using camera input, the scheme de-          for environment recognition of mobile robots using grow-
scribed in this paper could provide a useful means of defin-      ing neural gas network. International Journal of Control,
ing sensor signatures for objects that are to be visually iden-   Automation, and Systems 1(1).
tified, by assigning identification tags to cluster reference     Kohonen, T. 2001. Self-Organizing Maps. Springer, 3rd
images rather than actions.                                       edition.
                                                                  MacQueen, J. 1967. Some methods for classification and
                       Conclusion                                 analysis of multivariate observations. In Proc. of the Fifth
Growing Neural Gas is an effective technique for creating a       Berkeley Symposium on Math. Stat. and Prob., 281–296.
model of a robot’s environment that simplifies the program-       Shamwell, J.; Oates, T.; Bhargava, P.; Cox, M. T.; Oh, U.;
ming of reactive behaviors relying on visual input. It runs       Paisner, M.; and Perlis, D. 2012. The robot baby and mas-
sufficiently fast for real-time processing, both when learn-      sive metacognition: Early steps via growing neural gas. In
ing and when selecting actions.                                   ICDL-EPIROB, 1–2. IEEE.
   Significant work remains to be done to improve upon the
                                                                  Touzet, C. 2006. Modeling and simulation of elementary
results of this preliminary study. A particular focus will be
                                                                  robot behaviors using associative memories. International
made on investigating techniques for iterative refinement of
                                                                  Journal of Advanced Robotic Systems 3(2):165–170.
the GNG network, as well as the incorporation of informa-
tion from a piloted phase targeted at determining appropriate     Yan, W.; Weber, C.; and Wermter, S. 2012. A neural ap-
action selection.                                                 proach for robot navigation based on cognitive map learn-
                                                                  ing. In IJCNN, 1–8. IEEE.
                        References                                Yanik, P.; Manganelli, J.; Merino, J.; Threatt, A.; Brooks,
Baldassarri, P.; Puliti, P.; Montesanto, A.; and Tascini, G.      J. O.; Green, K. E.; and Walker, I. D. 2012. Use of kinect
2003. Self-organizing maps versus growing neural gas in           depth data and growing neural gas for gesture based robot
a robotic application. In Mira, J., and lvarez, J. R., eds.,      control. In PervasiveHealth, 283–290. IEEE.
IWANN (2), volume 2687 of Lecture Notes in Computer Sci-
ence, 201–208. Springer.
Brooks, R. A. 1986. A robust layered control system for
a mobile robot. IEEE Journal of Robotics and Automation
2(10).
Cox, P. T., and Best, S. M. 2004. Programming an au-
tonomous robot controller by demonstration using artificial
neural networks. In Proceedings of the IEEE Symposium
on Visual Languages and Human-Centric Computing, 157–
159.
Forgey, E. 1965. Cluster analysis of multivariate data:
Efficiency vs. interpretability of classification. Biometrics
21(768).
Fritzke, B. 1995. A growing neural gas network learns
topologies. In Advances in Neural Information Processing
Systems 7, 625–632. MIT Press.
Fritzke, B. 1997. A self-organizing network that can fol-
low non-stationary distributions. In Gerstner, W.; Germond,