=Paper= {{Paper |id=Vol-2203/116 |storemode=property |title=Learning Central Pattern Generator Network with Back-Propagation Algorithm |pdfUrl=https://ceur-ws.org/Vol-2203/116.pdf |volume=Vol-2203 |authors=Rudolf J. Szadkowski,Petr Cizek,Jan Faigl |dblpUrl=https://dblp.org/rec/conf/itat/SzadkowskiCF18 }} ==Learning Central Pattern Generator Network with Back-Propagation Algorithm== https://ceur-ws.org/Vol-2203/116.pdf
S. Krajči (ed.): ITAT 2018 Proceedings, pp. 116–123
CEUR Workshop Proceedings Vol. 2203, ISSN 1613-0073, c 2018 Rudolf J. Szadkowski, Petr Čížek, and Jan Faigl



              Learning Central Pattern Generator Network with Back-Propagation
                                         Algorithm

                                           Rudolf J. Szadkowski1 , Petr Čížek1 , and Jan Faigl1

                              Czech Technical University in Prague, Technicka 2, 16627 Prague, Czech Republic,
                                              szadkrud,petr.cizek,faiglj@fel.cvut.cz

         An adaptable central pattern generator (CPG) that di-             Parameters of the CPG networks can be found ex-
      rectly controls the rhythmic motion of multilegged robot          perimentally (i.e., tuned manually or automatically by
      must combine plasticity and sustainable periodicity. This         evolutionary algorithms [4]) or they can be heuristically
      combination requires an algorithm that searches the para-         designed. Such design-dependent methods make CPG
      metric space of the CPG and yields a non-stationary and           networks difficult to scale on other robotic bodies or
      non-divergent solution. We model the CPG with the pi-             adapt to the locomotion control in different environments.
      oneering Matsuoka’s neural oscillator which is (mostly)           The scaling problem can be partially bypassed by pre-
      non-divergent and provides constraints ensuring non-              computing a trajectory for each foot tip and employing in-
      stationarity. We embed these constraints into the CPG             verse kinematics to determine the control signals for the
      formulation which we further implemented as a layer of            particular leg’s joints [5, 6]. However, the inverse kine-
      an artificial neural network. This enables the CPG to be          matic depends on the robot’s body, and identification of
      learnable by back-propagation algorithm while sustaining          the parameters that have to be manually fine-tuned to en-
      the desirable properties. Moreover, the proposed CPG can          sure a proper behavior.
      be integrated into more complex networks and trained un-             The motivation for the presented approach is to develop
      der different optimization objectives. In addition to the         a fully automatic CPG learning and this paper explores the
      theoretical properties of the developed system, its flexibil-     possibility of learning a CPG network modeled by Mat-
      ity is demonstrated in successful learning of the tripod mo-      suoka’s neural oscillators [7] with back-propagation al-
      tion gait with its practical deployment on the real hexapod       gorithm (BP). To boost the BP algorithm that learns the
      walking robot.                                                    desired locomotion control for our multi-legged walking
                                                                        robot, we propose two methods pruning the parameter
                                                                        space of the CPG network.
      1    Introduction                                                    The particular contributions presented in the paper are
                                                                        considered as follows.
      The movement of legged robots relies on synchronized
                                                                           • A normalization layer that prunes the parameter
      control of each its joint. Since these joints are part of
                                                                             space from parametrization with stable stationary so-
      the same body, the velocity of each joint is dependent on
                                                                             lutions.
      the position of all robot’s joints. The problem of generat-
      ing such synchronized control signals gets harder with in-           • An inductive learning method that exploits the struc-
      creasing number of legs (or the number of joints per leg).             ture of robot’s body and further reduces the searched
      A widely used generator of such signals is a system of                 parametric space.
      interconnected Central Pattern Generators (CPGs). The
      system based on CPGs can be described as two or more                 • Experimental evaluation of the proposed learning us-
      coupled oscillators. CPGs appear in many vertebrates and               ing real hexapod walking robot for which the pro-
      insects where they are responsible for controlling rhythmic            posed CPG network learned by the designed algo-
      motions, such as swimming, walking or respiration [1, 2].              rithm exhibits successful locomotion control follow-
      It also appears in biologically inspired robotics, where               ing tripod gait, where the developed CPG network
      CPGs are used for locomotion control of legged robots [3].             directly produces the control signal for each of 18 ac-
         A CPG network can be modeled as a non-linear dy-                    tuators of the robot.
      namic system with coupled variables. Such a non-linear
      dynamic system is parameterized in the way that it con-
      tains a stable limit cycle, but finding such a parametriza-       2 Related Work
      tion is difficult because an analytical description of the
      high-dimensional non-linear dynamic system is hard or             Different biomimetic approaches including CPGs [1], Re-
      impossible. Moreover, even a small change in the param-           current Neural Networks [8] or Self-Adjusting Ring Mod-
      eters can result in a sudden change of the system’s quali-        ules [9] to produce rhythmic patterns have been studied
      tative properties that can range from chaotic to stationary       and deployed for locomotion control of robots [3] in recent
      and somewhere between is the desired periodic behavior.           years. These approaches differ mainly in the complexity of
Learning Central Pattern Generator Network with Back-Propagation Algorithm                                                                                 117

      the underlying model and have different levels of abstrac-                          Extensor neuron                  Flexor neuron
      tion ranging from biomechanical models [10] simulating
                                                                                                           to other CPGs
      membrane potentials and ion flows inside neurons, down                         d
                                                                                              vie
                                                                                                                                                d
                                                                               Ta                        wij          wij         vif      Ta
      to a model of two coupled neurons in a mutual inhibi-                          dt                                                         dt
      tion [11]. Amongst them, the CPGs based on Matsuoka’s                                     β                                   β
      neural oscillator [7] are being used as the prevalent model.
      Further details on the Matsuoka’s model are in Section 3                       d
                                                                                              uei           wf e   wf e                         d
                                                                                Tr                                                ufi      Tr
      as we built on its properties [7, 12, 13] in our work.                         dt                                                         dt
                                                                                       cei                                              cfi
         Deployment of the CPG oscillators on legged robots is
                                                                                                          from other CPGs
      also particularly difficult because of different kinematics
      and dynamics of each robot. A different amount of post-
      processing is used to translate the CPG outputs to joint                Figure 1: CPG unit connected to the CPG network.
      coordinates. Namely, approaches using inverse kinemat-
      ics [5, 6] suffer from necessary hand fine-tuning of both
      the parameters of CPG as-well-as kinematics. Besides, ex-         3.1 CPG Model
      isting approaches are using the separate neural network as        The dynamics of the CPG network with N units can be
      motor control unit [11] or use CPG outputs directly as joint      described by a set of equations
      angles [14]. Furthermore, CPGs can seamlessly switch be-
                                                                                                                           N
      tween different output patterns, thus different gaits [15]
                                                                             Tr u̇ei = −uei − w f e g(uif ) − β vei − ∑ wi j g(uej ) + cei ,         (1)
      which further supports the direct joint control. In our                                                             j=1
      work, we use a dedicated output layer to shape the out-
                                                                             Ta v̇ei = g(uei ) − vei ,                                               (2)
      puts of CPGs as we assume simple transformations of the
                                                                                                                           N
      output signal are easier to learn by changing parameters of
                                                                             Tr u̇if = −uif − w f e g(uei ) − β vif − ∑ wi j g(u fj ) + cif , (3)
      the output layer while the gait change is in charge of the
                                                                                                                           j=1
      CPG.
         Parametrization of the oscillator can be found experi-              Ta v̇if = g(uif ) − vif ,                                               (4)
      mentally, e.g., using evolutionary algorithms with fitness
                                                                        where the subscript i ∈ N denotes the particular CPG and
      function minimizing energy consumption [11], maximiz-
                                                                        the superscript µ ∈ {e, f } distinguishes the extensor and
      ing the velocity [4], or using parameter optimization [16].
                                                                        flexor neurons, respectively. Each tuple of the variables
      Besides, a modified back-propagation algorithm has been
                                                                        uei , vei describes the dynamics of the extensor neuron. The
      used on an adaptive neural oscillator in [17] to imitate an
                                                                        variable uei represents activation of the neuron and vei rep-
      external periodic signal by its output signal, but it fails
                                                                        resents its self-inhibitory input, which makes this neuron
      to sustain oscillations for complex waveforms. Further
                                                                        adaptive. Similarly uif , vif describe the dynamics of the
      works on the parameter constraining of CPGs to maintain
                                                                        flexor neuron. The function g is a rectifier
      stable oscillations have been published [7,12,13,16]; how-
      ever, to the best of our knowledge we are the first to teach                                       g(x) = max(0, x)                            (5)
      a network of CPGs to perform a locomotion gait of a hexa-
      pod walking robot using back-propagation. Furthermore,            that is an activation function that adds non-linearity to the
      we propose two methods to prune the space of possible             system. Each neuron (i, µ) inhibits itself through the vari-
                                                                               µ
      CPG parameters.                                                   able vi scaled by the parameter β > 0. The extensor-flexor
                                                                        pair (i.e., the CPG unit) mutually inhibits itself through
                                                                        the symmetric connection with the weight w f e > 0. Fi-
      3   Central Pattern Generator Network                             nally, the CPG units are inter-connected with the symmet-
                                                                        ric inhibiting connections wi j ∈ W for wi j ≥ 0 and wii = 0,
      The CPG network used in this paper is based on the Mat-           where W is a symmetric matrix. The only source of exci-
      suoka’s neural oscillator [7]. Matsuoka’s neural oscillator       tation for this CPG network is the tonic input cei , cif (≥ 0)
      is a pair of symmetrically connected adaptive neurons, ex-        which is given externally. In general, the tonic input may
      tensor, and flexor, that imitate the behavior of biological       be time-dependent and can be used to regulate the output
      neurons where after peaking, the neuron starts to repolar-        of the CPG network [12]. Tr and Ta (both > 0) are reac-
      ize until its activation drops to resting potential. Features     tion times for their respective variables. The structure of
      of Matsuoka’s neurons were extensively studied; hence,            the CPG unit is visualized in Fig. 1.
      necessary conditions under which the neural network en-              All the equations (1), (2), (3), and (4) are differentiable
                                                                                                 µ
      ters the stable stationary state [7], effects of time-variant     except the cases when ui = 0, since the rectifier is used as
      tonic input [12], and approximation of oscillator’s funda-        the activation function. However, we assume this will not
      mental frequency and amplitude [13] are well documented           cause any problems because the rectifier is used inside the
      in the literature. The description of the particular CPG          Rectified Linear Units (ReLU), which are widely used in
      model used in this work is as follows.                            deep neural networks.
118                                                                                               Rudolf J. Szadkowski, Petr Čížek, and Jan Faigl


                                                                                                                           θC                       θT




                                                                                                                                      ur
                                                                                                                                     m
                                                                                                                                  Fe
                                                                                                                           Coxa




                                                                                                                                               ia
                                                                                                                                  θF




                                                                                                                                           Tib
                                                                                            (a)                                    (b)
      Figure 2: The CPG network connected to the output layer
      Out. Notice the output y is not fed back to the network.           Figure 3: (a) Hexapod robot with the numbered legs. (b)
      Also notice that the self-inhibitory input u is not connected      Schema of the leg. Each leg consists of three parts – Coxa,
      to Out.                                                            Femur, and Tibia.


         Note that except tonic inputs cei , cif , there are used only   order in which the swing and support phases alternate for
      inhibiting connections, because such a system is less prone        individual legs; hence, all the legs must work in coordina-
      to become chaotic or divergent [13].                               tion to simultaneously achieve the desired behavior. The
                                                                         hexapod walking robot is thus used for benchmarking the
                                                                         proposed learning method, where the CPG network has to
      3.2   Output layer                                                 learn to generate control signals that realize the locomo-
                                                                         tion control of the robot with the tripod motion gait.
      In this work, we consider the self-inhibitory inputs ve , v f
      as hidden variables, we do not work with them outside of
      the CPG network. The output layer combines the activa-             4.1 Normalization layer
      tion variables ue , u f with the affine transformation
                                                                         The proposed normalization layer is based on early exper-
                            y = Wout u + bout ,                   (6)    iments with randomly parametrized CPG networks which
                                                                         in most cases ends up oscillating or converges to a static
      where u = (ue , u f ) and Wout ∈ RN×2N , bout ∈ RN×1 are the       behavior. The static behavior is caused by the stable fixed
      learnable parameters. The connection of the CPG network            points that may appear in the corresponding dynamic sys-
      and the output layer is illustrated in Fig. 2.                     tem. Therefore, we propose to employ a sufficient condi-
         The main advantage of having Wout and bout as learnable         tion for the CPG network to be free of stable fixed points.
      parameters are that the BP algorithm can scale and trans-          Condition. For a CPG network of N units, if all the values
      late the limit cycle formed by the CPG network. Here, we                                µ
                                                                         of the tonic input ci , where i ∈ N and µ ∈ {e, f }, are from
      assume that these transformations are easier to learn by           the range [cmin , cmax ] and
      changing the parameters of the output layer than by chang-                                                       !
      ing parameters of the CPG network. It is because a change                              cmin                 N
      of any parameter of the CPG network can generally cause                       wfe <         (1 + β ) − max ∑ wi j ,           (7)
                                                                                            cmax             i∈N   j
      a non-linear change in the amplitude, frequency, and shift
      of the generated signals [6]. Another advantage of the pro-                   w f e > 1 + Tr /Ta                                        (8)
      posed output layer is that it can develop complex signals
                                                                         then the CPG network has no stable fixed point.
      as it can combine outputs from different CPGs.
                                                                         Proof. First, we state adapted theorem from [7].
      4     Proposed Locomotion Control Learning                         Theorem. Assume that for some i and k (i 6= k)
                                                                                                          2N
      In this section, we propose the normalization layer and in-                          ci (1 + β ) − ∑ ai j c j > 0,                      (9)
      ductive learning method adapted to learning a CPG net-                                               j
      work for a hexapod walking robot, see Fig. 3a. Each leg                                             2N
      of the robot has three joints called coxa, femur, and tibia                         ck (1 + β ) − ∑ ak j c j > 0,                     (10)
      (see Fig. 3b) for which an appropriate control signal has                                            j
      to be generated to control the locomotion of the robot. In                                      aik > 1 + Tr /Ta ,                    (11)
      the total, the robot has 18 controllable joints and depend-
      ing on the control signals; the robot can move with various        then the CPG network has no stable fixed point. The term
      motion gaits [18], e.g., tripod, quadruped, wave, and pen-         {ai j } = A(2N,2N) is a matrix of the form
      tapod. During the locomotion, each leg is either in a swing                                                  
                                                                                                      W      w f eI
      phase to reach a new foothold or in the stance phase in                                 A=                              (12)
      which it supports the body. The motion gait prescribes the                                    w f eI W
Learning Central Pattern Generator Network with Back-Propagation Algorithm                                                                                119

      and c = (ce , c f ), where I is the identity matrix of the same               The condition (13) implies F > 0, because w f e must be
      dimensions as W .                                                             greater than zero and the following condition must hold
                                                                                    too
         Since the CPGs should act as independent units, it is
      intuitive that each extensor-flexor neuron pair (a CPG) is                                                   cmax N
                                                                                                                   cmin ∑
                                                                                                         1+β >            wi j .                  (21)
      able to oscillate on its own. Thus, a weaker form of the                                                          j
      theorem is used, where the following conditions must hold
      for each i-th CPG:                                                            Now, we define variable ε > 0 that
                      cei                  1 N                                                                   cmax N
                     cif
                         (1 + β ) −
                                           cf
                                              ∑ wi j cej > w f e             (13)                      1+β =
                                                                                                                 cmin ∑
                                                                                                                        wi j + ε                  (22)
                                            i      j                                                                  j

                     cif            1 N                                             and substitute the right side of (22) into F(cmin ) and
                                    cei ∑
                         (1 + β ) −       wi j c fj > w f e                  (14)
                     cei                j                                           F(cmax )
                                            w f e > 1 + Tr /Ta .             (15)                                    cmin
                                                                                                          F(cmax ) =      ε,                      (23)
                                                                                                                     cmax
         Now, we can focus on the effect of the tonic input c. For                                        F(cmin ) = ε.                           (24)
      any parametrization W, β , Tr , Ta , w f e we can find a vector
      c that would break these conditions. Let’s relax the prob-                    Since ccmax
                                                                                            min
                                                                                                ∈ (0, 1] and ε > 0, the expression F(cmax ) al-
      lem by clipping the values of c into the range [cmin , cmax ]                 ways minimizes (20). Therefore
      where cmin > 0. Then, it must become independent on the
      mutable c vector to simplify the system of conditions. This                                              c0i = cmax .                       (25)
      can be done by substituting c with such c−    i that minimizes                  After substituting c0i into (17) and then c−
                                                                                                                                 i into (13) we
      the left side expression of (13) or (14) for the i-th CPG.                    get
      W.l.o.g. we consider finding c− i just for (13) as
                                                                                                                    N
                                                                                                    cmin
                                     cei                1 N                                              (1 + β ) − ∑ wi j > w f e .              (26)
           c−
            i =      argmin           f
                                           (1 + β ) −
                                                        cif
                                                              ∑ wi j cej .   (16)                   cmax            j
                  c∈[cmin ,cmax ]2N ci                         j
                                                                                    Finally, to make this condition independent on the i-th
      Since all the parameters are positive and wii = 0, the min                    CPG, we can choose such an inequality (26) that has the
      argument in (16) decreases monotonically with decreasing                                            N
                                                                                    largest value of the ∑ wi j expression
      cei and increasing c fj values. Thus, we can substitute these                                       j
      variables with their respective extremes                                                                                         !
                                                                                                                                   N
                                                                                                     cmin
                                 e
                                c j ∈ R , j 6= i
                                        +                                                     wfe <
                                                                                                     cmax
                                                                                                          (1 + β ) − max
                                                                                                                     i∈N
                                                                                                                                ∑ wi j .          (27)
                                                                                                                                  j
                                ce = c
                                  i     min
                            c−
                             i = f                                          (17)   Combining (15) and (27) we get the desired (8) and (7).
                                c j = cmax , j 6= i
                                
                                 f                                                                                                      
                                 ci = c0i
                                                                                       We integrate the conditions (7) and (8) into the BP
      that leaves just c0i as the variable to minimize                              framework by redefining the variables w f e and β as func-
                                                                                    tions
                                cmin            cmax N
                                                 c ∑
                   F(c) =            (1 + β ) −        wi j ,                (18)     w f e (ŵ f e , Tr , Ta ) = 1 + Tr /Ta + exp(ŵ f e ),       (28)
                                 c                   j                                                                        cmax
                                                                                        β (β̂ , w f e , w∗ ) = (w f e + w∗ )       + exp(β̂ ) − 1, (29)
                       c0i =      argmin F(cif ).                            (19)                                             cmin
                                 f
                               ci ∈[cmin ,cmax ]
                                                                                    where ŵ f e , β̂ ∈ R are new independent parameters and w∗
      Notice that now, we are searching a scalar value c0i that                     is defined as
                                                                                                                           !
      minimizes the given expression.                                                                                  N
         The equation dF(c)
                        dc = 0 has a solution only if F has such
                                                                                                        w∗ = max
                                                                                                                i∈N
                                                                                                                       ∑ wi j .                   (30)
                                                                                                                        j
      parameters β ,W, cmin , and cmax that make the function F
      constant. Since it is unlikely that such a parametrization                    Then, the max operator is approximated by the differen-
      will emerge during the learning, we consider F does not                       tiable smoothmax defined as
      have any local extremes in the range [cmin , cmax ]. There-
                                                                                                                   exp(x)
      fore, the minimization (19) can be simplified to                                                 softmax(x) =        ,                      (31)
                                                                                                                  ∑ exp(x)
                     c0i = argmin{F(cmin ), F(cmax )}.                       (20)                  smoothmax(x) = softmax(x)x.                    (32)
120                                                                                                                   Rudolf J. Szadkowski, Petr Čížek, and Jan Faigl

      Since all the parameters must be positive, other parameters                         4.3 Objective Function
      are defined as exponent of the underlying parameter as
                                                                                          The utilized loss function of the CPG network is defined
                                Ta = exp(T̂a ),                                           as a positive distance of the output vector from the desired
                                                                                          one
                                 Tr = exp(T̂r ),                                 (33)
                               wi j = exp(ŵi j ), i 6= j,                                                  L (y(t), d(t)) = ky(t) − d(t)k ,                      (34)

      where T̂a , T̂r , ŵi j ∈ R. The weights wi j , i 6= j cannot reach                 where d(t) ∈ [0, 1]18 is the target signal for each of 18
      zero during learning, but they can approach it.                                     robot’s actuators at the time t.
                                                                                             During early evaluation of the proposed learning, we
         The BP algorithm learns the proposed new param-
                                                                                          observed that in many cases, the output signal has unde-
      eters T̂a , T̂r , ŵi j , ŵ f e , and β̂ that are later normalized
                                                                                          sired lower frequency harmonics. This caused the output
      by (28), (29), and (33).
                                                                                          signal to fit the target signal only for a couple of the first
                                                                                          periods. We propose to address this issue by an additional
      4.2    Proposed Architecture and Inductive Learning                                 term to the objective function (34)

      We propose to divide the CPG network into smaller sub-                                                             + kr − ωk ,                              (35)
      networks to reduce the search parameter space. These
      sub-networks are independently learned and then merged                              where r ∈ R+ is a new hyperparameter and ω is an ap-
      into larger sub-networks until a single final network re-                           proximation of the fundamental frequency of the CPG os-
      mains. The proposed learning of the CPG network is                                  cillations that can be expressed as [13]
      performed in three phases. First, we learn a single CPG                                                     s
                                                                                                               1 (Tr + Ta )β − Tr w f e
      to generate a signal for one joint which gives us the                                              ω=                             .    (36)
      shared parameters (w f e , Ta , Tr , β ). Then, six triplets of                                          Ta         Tr w f e
      CPGs are learned to generate a control signal for the par-
                                                                                          The hyperparameter r should be equal to the fundamen-
      ticular leg. Therefore, for each leg k ∈ [1, . . . , 6], we
                                                                                          tal frequency of the desired signal. However, since (36)
      get parameters W k and Wout                 k , bk . In the final phase,
                                                           out                            is just an approximation; it might lead to undesired local
      we connect all six CPG sub-networks into one. We
                                                                                          minima. Therefore, we propose to switch off the regu-
      choose to connect CPG sub-networks only by coxa-CPGs
                                                                                          larization once the term (35) is lesser than a predefined
      as it is assumed this is enough for each CPG sub-
                                                                                          threshold.
      network to synchronize. Therefore, for the subspace ue =
      (uecoxa,1 , . . . , uecoxa,6 , uef emur,1 , . . . , utibia,1
                                                           e       ) (and similarly for
        f
      u ), W ∈ R        18×18    is organized as follows                                  5 Experimental evaluation
                                                                                  
                           Wcoxa,coxa Wcoxa, f emur Wcoxa,tibia                           The proposed learning method has been experimentally
                     
           W =  W f emur,coxa                         0
                                                                                   
                                                                    W f emur,tibia  ,    verified using rmsprop [19] algorithm, which is com-
                                                                                          monly used to learn recurrent neural networks. Since the
                           Wtibia,coxa Wtibia, f emur                     0
                                                                                          following experiments are meant to benchmark and map
                                                                                          problems of the CPG network learning, we use a constant
      where Wi j , i 6= j is the matrix of the connections between
                                                                                          tonic input c = 1. Therefore, cmin = cmax = 1. The initial
      the i-th and j-th joints that can be expressed as                                                              f       f                               f
                                                                                          state (ueinit , veinit , uinit , vinit ) is set to ueinit = 0.1, uinit = −0.1,
                                                                                               f          f
                                    w1i j        0      0                                 and vinit = vinit = 0. The target signal is formed of eighteen
                           Wi j =   0          ···     0 ,                              sequences of joint angles that were recorded for a course
                                     0           0     w6i j                              of five tripod gait cycles. The hexapod robot was driven by
                                                                                          a default regular gait based on [20], which is suitable for
      where the weights {wkij } = W k are taken from the matrices                         traversing flat terrains, and it uses the inverse kinematics
      parametrizing the previously learned CPG sub-networks.                              for following the prescribed triangular leg foot-tip trajec-
         For the rearranged vector u = (ue1 , u1f , . . . , ue6 , u6f ), the              tory. This 4.7 seconds long record of all joint signals is
      term Wout ∈ R18×36 is composed of the matrices Wout              k of               sampled to 2350 equidistant data points, and each signal
      the previously learned CPG network that controls the k-th                           is further normalized in the range [0, 1], smoothed using
      leg                                                                                 Gaussian convolution to filter out signal peaks, and finally
                            1                                                           downsampled by the factor of 3.
                               Wout 0        0
                                                                                             Preliminary experiments have shown that the process of
                   Wout =  0         ···    0 .
                                              6                                           learning profoundly depends on initial parameters and in
                                0      0 Wout
                                                                                          some runs, the BP algorithm seems to stuck in local min-
      All the zeroes in the W and Wout matrices are unlearnable                           ima from which the learning becomes very slow. This ob-
      constants imposing a structure onto the CPG network.                                servation is consistent with [17]. The performance of the
Learning Central Pattern Generator Network with Back-Propagation Algorithm                                                                                                                    121

           0.5              Perturbation
                                  +0.3
           0.4                    +0.5
           0.3                    +0.7
                                  +0.9
           0.2




                                                                                    coxa[rad]
                                                                                                 0.50                                     0.50

           0.1                                                                                   0.25                                     0.25

           0.0                                                                                                 44            45     46             −1.2      −1.0      −0.8     −0.6
                 0    20     40        60 500    520   540   560
                                    iterations




                                                                             femur[rad]
                                                                                          −0.75                                          −0.75
                                                                                          −1.00                                          −1.00
                                       (a)                                                −1.25                                          −1.25

                                                                                                               44            45     46                 1.2      1.4           1.6

                            Perturbation




                                                                                    tibia[rad]
                                                                                                 1.50                                     1.50
            4                     +0.3
                                  +0.5                                                           1.25                                     1.25
                                  +0.7
            2                     +0.9                                                                         44            45     46             0.2           0.4           0.6
                                                                                                                    t[sec]                                    [rad]

            0
                 0    20     40        60 500    520   540   560
                                    iterations                          Figure 5: Signals controlling the joints of the first leg.
                                       (b)                              Green ones are desired, and blue ones are generated by
                                                                        the CPG network. The time evolution is on the left, while
      Figure 4: Squared signal errors of the first leg (a) and body     projected phase-space trajectories are on the right. Each
      (b) caused by perturbations. The perturbations have been          row corresponds to one joint, i.e., coxa, femur, and tibia.
      introduced only at the start by adding a constant value to        In the phase-space column, the variable pairs from the top
      all the trajectory components. For the leg CPG network            are coxa-femur, femur-tibia, and tibia-coxa.
      (a) after 500 iterations, the errors vanished except for +0.9
      perturbation. The body CPG network (b) does not recover
      even after 500 iterations except for +0.3 perturbation.
                                                                                                          learned trajectory                                    1.6
                                                                                                          original trajectory                          1.6              1.4
                                                                                                                                                                                1.2
      BP algorithm has been improved by adding the regulariza-
      tion term (35). After that, the learning is performed in the                                                                0.2             1.4                                  1.6
      three following consecutive steps.                                                                                                                                               1.4
                                                                                                                                                 1.2                                    1.2
         First, each single CPG unit is learned to generate the                                                                   0.4
      sinusoid sin(t/2) that has the same frequency as the fun-
      damental frequency of the desired control signal, which                                                                     0.6
      is deterministically set to 3 Hz. The CPG is learned in
      2000 epochs, each back-propagating a batch of size 50
      data-points. Note that the number of the needed epochs                                                                  0.2
      depends on the initial random parametrization.                                                                       0.4
                                                                                                 0.2
         Next, the parameters of the sinusoid generator is re-                                          0.4             0.6
      trained to generate the desired joint control signals. The                                              0.6
      generator of each joint control is learned with 2000
      epochs. We experimented with the stability of the learned
      limit cycle of the first leg by perturbing it, see Fig. 4a. Fi-   Figure 6: Synchronization of multiple legs. On the left,
      nally, the joints CPGs are connected as described in Sec. 4       the coxa-control trajectory for legs 1, 2, and 3 depicted in
      with non-diagonal values of Wcoxa,coxa initialized to 0.5,        Fig. 3a. The original trajectory moves almost diagonally
      and learned with 4000 epochs. We experimented with the            in the pictured cube and is “wrapped” by the learned tra-
      stability of this final CPG network and results are depicted      jectory. On the right, the tibia-control trajectory for the
      in Fig. 4b.                                                       legs 1, 2, and 3.
         A comparison of the desired control signal of the first
      leg and the learned signal is depicted in Fig. 5. The learned
      signal has a similar shape and the same frequency as the             We deployed the resultant CPG locomotion controller
      original signal. Binding between different triplets of the        on the real hexapod (see Fig. 3a) and compared with the
      legs, the most difficult part is shown in Fig. 6. We can          original controller [20] in 10 trials. The robot was re-
      see that the learned trajectory has a similar structure to the    quested to crawl on flat surface for 10 s and then stop.
      desired limit cycles. The trajectory also stays within its        The velocity of the robot was estimated using an exter-
      limit cycle; the trajectory was generated by six gait-cycles,     nal visual localization system based on tracking of visual
      therefore, traveled the limit cycle multiple times.               marker [21] running with 25 Hz. Moreover, the robot’s
122                                                                                                                      Rudolf J. Szadkowski, Petr Čížek, and Jan Faigl

                                                                                                     Combination of sub-networks into one network has two
                                                                                                  difficulties. The parameters (w f e , Ta , Tr , β ) must be the
                                          0.5


                                          0.0
                                                                                                  same for the whole CPG network, but the sub-networks




                                 y [m]
                                                                                                  are trained independently; so, they can end up with dif-
                                         −0.5
                                                                      Original trajectory
                                                                      Learned trajectory
                                                                                                  ferent parameters. In our case, the parameters are similar
                                                   −2.0   −1.5       −1.0     −0.5          0.0
                                                                                                  because all the CPG sub-networks are based on one CPG
                                                                 x [m]                            sub-network. Thus, the BP algorithm is able to adjust them
                 (a)                                      (b)                                     during the learning of the complete network. Another dif-
                                                                                                  ficulty is the choice of the initial Wcoxa,coxa weights. The
      Figure 7: (a) Used experimental setup of the robot with                                     higher the weights are, the stronger is the coupling be-
      the tracking marker. (b) Visualization of the performed                                     tween the legs. However, if the weight values are too high,
      trajectories using the proposed CPG locomotion control                                      the constraint (7) would be violated. Therefore, we used
      and the reference controller.                                                               (7) to choose the initial Wcoxa,coxa weights.
                                                                                                     Even though that the robustness is not the objective
      stability was measured as smoothness of the locomotion                                      of the learning algorithm, it is a property of single Mat-
      using an XSens MTi-30 inertial measurement unit (IMU)                                       suoka’s oscillator [22]. This property translated well into
      attached to the robot trunk. The variances in vertical ac-                                  our 3-unit CPG network (see Fig. 4a) where the network
      celeration (Accz ) and the orientation (pitch and roll angles)                              can recover from perturbations. In the real world, robust-
      of the robot’s body are the selected indicators of the loco-                                ness helps quickly react to simple temporal events, e.g.,
      motion stability.                                                                           servo errors, or feedback from the environment.
         The recorded robot trajectories visualized in Fig. 7 show                                   In this work, we chose a simple model with cmin =
      that there is a transition effect for our CPG locomotion                                    cmax = 1, i.e., we have a constant tonic input. The time-
      controller at the beginning of the trajectory where the                                     variant tonic input; however, introduces dynamic changes
      CPG network starts to oscillate which makes the robot ini-                                  as we can see in Fig. 8. In the future work, we would
      tial acceleration lower; however, the overall locomotion is                                 like to use the tonic input to control the output of the CPG
      smoother, as the velocity deviation is smaller.                                             network dynamically.
         The quantitative results are listed in Table 1 as aver-
      age values of the indicators. The results indicate that the                                 6    Conclusion
      performance of the CPG locomotion controller is similar
      to the implementation [20] based on inverse kinematics                                      In this paper, we propose a new methodology for learn-
      (IKT).                                                                                      ing a CPG network modeled by symmetrically connected
                                                                                                  neural oscillators. The method is based on a combination
                       Table 1: Experimental results                                              of the back-propagation learning algorithm, normalization
                                                                                                  layer, and regularization term, where the normalization
                          Unit                  IKT [20]           CPG (Ours)                     layer prunes the parameters spaces of the CPG network
                                                                                                  from the undesired non-periodic results, and thus help to
        Velocity          [m·s−1 ]              0.18±0.03            0.15±0.01
                                                                                                  speed up the learning process. The advantage of the pro-
        Accz var.         [m·s−2 ]                  18.31                23.03                    posed solution over the previous work on the CPG-based
        Pitch var.     ×10−3 [rad]                   0.14                 0.23                    locomotion control is in the scalability of the method that
        Roll var.      ×10−3 [rad]                   0.25                 0.28                    enables to create such a CPG network that can directly
                                                                                                  control each actuator without the need to employ the in-
                                                                                                  verse kinematics. The proposed method has been success-
                                                                                                  fully deployed in the locomotion control of the real hexa-
      5.1   Lessons Learned and Discussion
                                                                                                  pod walking robot.
      During the experimental evaluation of the proposed learn-                                      The main properties of the proposed methodology arise
      ing of the CPG network, a couple of good practices how                                      from the idea that the proposed CPG network for the hexa-
      to learn the sinusoid generator came up as follows.                                         pod locomotion control is based on the architecture of the
                                                                                                  CPG connections that imitates the structure of the robot.
        1. It is better to learn the network in batches containing
                                                                                                  The CPG is inductively learned by learning its parts and
           at most two periods.
                                                                                                  merging them. Therefore, the proposed method is promis-
        2. If the CPG network is restarted to the initial state, it                               ing to be easily extendable to other multi-legged robot
           is good to ignore the transient states.                                                bodies. Furthermore, since the proposed CPG network
                                                                                                  is learnable by the back-propagation algorithm, it can be
        3. Since it is not important at which place the system                                    integrated into more complex neural networks supporting
           enters the limit cycle, it is suitable to phase-shift the                              back-propagation, which is a subject of our future work.
           target signal; so, to minimize the distance from the
           output signal.                                                                             Acknowledgments – This work was supported by the
Learning Central Pattern Generator Network with Back-Propagation Algorithm                                                                    123

                                                                           [6] G. Zhong, L. Chen, Z. Jiao, J. Li, and H. Deng, “Loco-
                                                                               motion control and gait planning of a novel hexapod robot
                                                                               using biomimetic neurons,” IEEE Transactions on Control
                                                                               Systems Technology, vol. 26, no. 2, pp. 624–636, 2018.
              4
                                                                           [7] K. Matsuoka, “Sustained oscillations generated by mutu-
         ue − uf



              2
                                                                               ally inhibiting neurons with adaptation,” Biological Cyber-
              0                                                                netics, vol. 52, no. 6, pp. 367–376, 1985.
                   0   20   40   60        80      100   120   140         [8] B. A. Pearlmutter, “Gradient calculations for dynamic re-
                                      iterations
                                                                               current neural networks: A survey,” IEEE Transactions on
                                       (a)                                     Neural networks, vol. 6, no. 5, pp. 1212–1228, 1995.
                                                                           [9] M. Hild and F. Pasemann, “Self-Adjusting Ring Modules
                                                                               (SARMs) for Flexible Gait Pattern Generation.” in IEEE
              4
                                                                               International Joint Conference on Artificial Intelligence
         ue − uf




              2                                                                (IJCAI), 2007, pp. 848–852.
              0                                                           [10] J. Hellgren, S. Grillner, and A. Lansner, “Computer simula-
                                                                               tion of the segmental neural network generating locomotion
                   0   20   40   60        80      100   120   140
                                      iterations                               in lamprey by using populations of network interneurons,”
                                                                               Biological Cybernetics, vol. 68, no. 1, pp. 1–13, Nov 1992.
                                       (b)
                                                                          [11] S. Steingrube, M. Timme, F. Wörgötter, and P. Manoon-
                                                                               pong, “Self-organized adaptation of a simple neural circuit
              2
                                                                               enables complex robot behaviour,” Nature physics, vol. 6,
         ue − uf




              1                                                                no. 3, pp. 224–230, 2010.
              0                                                           [12] K. Matsuoka, “Mechanisms of frequency and pattern con-
                   0   20   40   60        80      100   120   140
                                                                               trol in the neural rhythm generators,” Biological Cybernet-
                                      iterations                               ics, vol. 56, no. 5, pp. 345–353, 1987.
                                       (c)                                [13] ——, “Analysis of a neural oscillator,” Biological Cyber-
                                                                               netics, vol. 104, no. 4, pp. 297–304, 2011.
      Figure 8: Output of three randomly generated CPG units              [14] A. J. Ijspeert, A. Crespi, and J.-M. Cabelguen, “Simulation
      with cmin = 2; cmax = 4. After each 50 iterations of total               and robotics studies of salamander locomotion,” Neuroin-
      150 iterations, the tonic input is set to ce = c f = 2; ce =             formatics, vol. 3, no. 3, pp. 171–195, 2005.
      2, c f = 4; ce = 2, c f = 8, respectively. Note that the last       [15] W. Chen, G. Ren, J. Zhang, and J. Wang, “Smooth transi-
      setup violates the cmax constraint.                                      tion between different gaits of a hexapod robot via a cen-
                                                                               tral pattern generators algorithm,” Journal of Intelligent &
                                                                               Robotic Systems, vol. 67, no. 3, pp. 255–270, Sep 2012.
      Czech Science Foundation (GAČR) under research project             [16] L. Righetti and A. J. Ijspeert, “Design methodologies for
      No. 18-18858S. The support of the Grant Agency of the                    central pattern generators: an application to crawling hu-
      CTU in Prague under grant No. SGS16/235/OHK3/3T/13                       manoids,” in Robotics: Science and Systems, 2006, pp.
      to Rudolf Szadkowski is also gratefully acknowledged.                    191–198.
                                                                          [17] K. Doya and S. Yoshizawa, “Adaptive neural oscillator
                                                                               using continuous-time back-propagation learning,” Neural
      References                                                               Networks, vol. 2, pp. 375–385, 12 1989.
       [1] E. Marder and D. Bucher, “Central pattern generators           [18] N. Porcino, “Hexapod gait control by a neural network,” in
           and the control of rhythmic movements,” Current Biology,            IEEE International Joint Conference on Neural Networks
           vol. 11, no. 23, pp. R986–R996, 2001.                               (IJCNN), 1990, pp. 189–194.
       [2] E. Marder, D. Bucher, D. J. Schulz, and A. L. Taylor, “In-     [19] T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide
           vertebrate central pattern generation moves along,” Current         the gradient by a running average of its recent magnitude,”
           Biology, vol. 15, no. 17, pp. 685–699, 2005.                        COURSERA: Neural networks for machine learning, vol. 4,
                                                                               no. 2, pp. 26–31, 2012.
       [3] A. J. Ijspeert, “Central pattern generators for locomotion
           control in animals and robots: A review,” Neural Networks,     [20] J. Mrva and J. Faigl, “Tactile sensing with servo drives
           vol. 21, no. 4, pp. 642–653, 2008.                                  feedback only for blind hexapod walking robot,” in 10th
                                                                               International Workshop on Robot Motion and Control (Ro-
       [4] R. D. Beer, H. J. Chiel, and J. C. Gallagher, “Evolution and
                                                                               MoCo), 2015, pp. 240–245.
           analysis of model CPGs for walking: II. General principles
           and individual variability,” Journal of Computational Neu-     [21] E. Olson, “AprilTag: A Robust and Flexible Visual Fiducial
           roscience, vol. 7, no. 2, pp. 119–147, 1999.                        System,” in IEEE International Conference on Robotics
                                                                               and Automation (ICRA), 2011, pp. 3400–3407.
       [5] H. Yu, H. Gao, L. Ding, M. Li, Z. Deng, and G. Liu,
           “Gait Generation With Smooth Transition Using CPG-             [22] M. M. Williamson, “Robot arm control exploiting natural
           Based Locomotion Control for Hexapod Walking Robot,”                dynamics,” Ph.D. dissertation, Massachusetts Institute of
           IEEE Transactions on Industrial Electronics, vol. 63, no. 9,        Technology, 1999.
           pp. 5488–5500, 2016.