1 Introduction

1613-0073

Learning Central Pattern Generator Network with Back-Propagation Algorithm

Rudolf J. Szadkowski

Petr Cˇ ížek

Jan Faigl

faiglj@fel.cvut.cz 0 0 Czech Technical University in Prague , Technicka 2, 16627 Prague , Czech Republic

2018

2203 116 123

An adaptable central pattern generator (CPG) that directly controls the rhythmic motion of multilegged robot must combine plasticity and sustainable periodicity. This combination requires an algorithm that searches the parametric space of the CPG and yields a non-stationary and non-divergent solution. We model the CPG with the pioneering Matsuoka's neural oscillator which is (mostly) non-divergent and provides constraints ensuring nonstationarity. We embed these constraints into the CPG formulation which we further implemented as a layer of an artificial neural network. This enables the CPG to be learnable by back-propagation algorithm while sustaining the desirable properties. Moreover, the proposed CPG can be integrated into more complex networks and trained under different optimization objectives. In addition to the theoretical properties of the developed system, its flexibility is demonstrated in successful learning of the tripod motion gait with its practical deployment on the real hexapod walking robot.

1 Introduction

The movement of legged robots relies on synchronized control of each its joint. Since these joints are part of the same body, the velocity of each joint is dependent on the position of all robot’s joints. The problem of generating such synchronized control signals gets harder with increasing number of legs (or the number of joints per leg). A widely used generator of such signals is a system of interconnected Central Pattern Generators (CPGs). The system based on CPGs can be described as two or more coupled oscillators. CPGs appear in many vertebrates and insects where they are responsible for controlling rhythmic motions, such as swimming, walking or respiration [ 1, 2 ]. It also appears in biologically inspired robotics, where CPGs are used for locomotion control of legged robots [ 3 ].

A CPG network can be modeled as a non-linear dynamic system with coupled variables. Such a non-linear dynamic system is parameterized in the way that it contains a stable limit cycle, but finding such a parametrization is difficult because an analytical description of the high-dimensional non-linear dynamic system is hard or impossible. Moreover, even a small change in the parameters can result in a sudden change of the system’s qualitative properties that can range from chaotic to stationary and somewhere between is the desired periodic behavior.

Parameters of the CPG networks can be found experimentally (i.e., tuned manually or automatically by evolutionary algorithms [ 4 ]) or they can be heuristically designed. Such design-dependent methods make CPG networks difficult to scale on other robotic bodies or adapt to the locomotion control in different environments. The scaling problem can be partially bypassed by precomputing a trajectory for each foot tip and employing inverse kinematics to determine the control signals for the particular leg’s joints [ 5, 6 ]. However, the inverse kinematic depends on the robot’s body, and identification of the parameters that have to be manually fine-tuned to ensure a proper behavior.

The motivation for the presented approach is to develop a fully automatic CPG learning and this paper explores the possibility of learning a CPG network modeled by Matsuoka’s neural oscillators [7] with back-propagation algorithm (BP). To boost the BP algorithm that learns the desired locomotion control for our multi-legged walking robot, we propose two methods pruning the parameter space of the CPG network.

The particular contributions presented in the paper are considered as follows.

• A normalization layer that prunes the parameter space from parametrization with stable stationary solutions. • An inductive learning method that exploits the structure of robot’s body and further reduces the searched parametric space. • Experimental evaluation of the proposed learning using real hexapod walking robot for which the proposed CPG network learned by the designed algorithm exhibits successful locomotion control following tripod gait, where the developed CPG network directly produces the control signal for each of 18 actuators of the robot. 2

Related Work

Different biomimetic approaches including CPGs [ 1 ], Recurrent Neural Networks [8] or Self-Adjusting Ring Modules [9] to produce rhythmic patterns have been studied and deployed for locomotion control of robots [ 3 ] in recent years. These approaches differ mainly in the complexity of the underlying model and have different levels of abstraction ranging from biomechanical models [10] simulating membrane potentials and ion flows inside neurons, down to a model of two coupled neurons in a mutual inhibition [11]. Amongst them, the CPGs based on Matsuoka’s neural oscillator [7] are being used as the prevalent model. Further details on the Matsuoka’s model are in Section 3 as we built on its properties [7, 12, 13] in our work.

Deployment of the CPG oscillators on legged robots is also particularly difficult because of different kinematics and dynamics of each robot. A different amount of postprocessing is used to translate the CPG outputs to joint coordinates. Namely, approaches using inverse kinematics [ 5, 6 ] suffer from necessary hand fine-tuning of both the parameters of CPG as-well-as kinematics. Besides, existing approaches are using the separate neural network as motor control unit [11] or use CPG outputs directly as joint angles [14]. Furthermore, CPGs can seamlessly switch between different output patterns, thus different gaits [15] which further supports the direct joint control. In our work, we use a dedicated output layer to shape the outputs of CPGs as we assume simple transformations of the output signal are easier to learn by changing parameters of the output layer while the gait change is in charge of the CPG.

Parametrization of the oscillator can be found experimentally, e.g., using evolutionary algorithms with fitness function minimizing energy consumption [11], maximizing the velocity [ 4 ], or using parameter optimization [16]. Besides, a modified back-propagation algorithm has been used on an adaptive neural oscillator in [17] to imitate an external periodic signal by its output signal, but it fails to sustain oscillations for complex waveforms. Further works on the parameter constraining of CPGs to maintain stable oscillations have been published [7,12,13,16]; however, to the best of our knowledge we are the first to teach a network of CPGs to perform a locomotion gait of a hexapod walking robot using back-propagation. Furthermore, we propose two methods to prune the space of possible CPG parameters. 3

Central Pattern Generator Network

The CPG network used in this paper is based on the Matsuoka’s neural oscillator [7]. Matsuoka’s neural oscillator is a pair of symmetrically connected adaptive neurons, extensor, and flexor, that imitate the behavior of biological neurons where after peaking, the neuron starts to repolarize until its activation drops to resting potential. Features of Matsuoka’s neurons were extensively studied; hence, necessary conditions under which the neural network enters the stable stationary state [7], effects of time-variant tonic input [12], and approximation of oscillator’s fundamental frequency and amplitude [13] are well documented in the literature. The description of the particular CPG model used in this work is as follows.

Extensor neuron

Flexor neuron d Ta dt

d Tr dt cie vie uie β

to other CPGs wij

wij wfe

wfe from other CPGs vif uif β

d Ta dt

Tr dt cif where the subscript i ∈ N denotes the particular CPG and the superscript μ ∈ {e, f } distinguishes the extensor and flexor neurons, respectively. Each tuple of the variables uie, vie describes the dynamics of the extensor neuron. The variable uie represents activation of the neuron and vie represents its self-inhibitory input, which makes this neuron adaptive. Similarly uif , vif describe the dynamics of the flexor neuron. The function g is a rectifier

g(x) = max(0, x) that is an activation function that adds non-linearity to the system. Each neuron (i, μ) inhibits itself through the variable viμ scaled by the parameter β > 0. The extensor-flexor pair (i.e., the CPG unit) mutually inhibits itself through the symmetric connection with the weight w f e > 0. Finally, the CPG units are inter-connected with the symmetric inhibiting connections wi j ∈ W for wi j ≥ 0 and wii = 0, where W is a symmetric matrix. The only source of excitation for this CPG network is the tonic input cie, cif (≥ 0) which is given externally. In general, the tonic input may be time-dependent and can be used to regulate the output of the CPG network [12]. Tr and Ta (both > 0) are reaction times for their respective variables. The structure of the CPG unit is visualized in Fig. 1.

All the equations (1), (2), (3), and (4) are differentiable except the cases when uiμ = 0, since the rectifier is used as the activation function. However, we assume this will not cause any problems because the rectifier is used inside the Rectified Linear Units (ReLU), which are widely used in deep neural networks. (2) (4) (5)

a Tibi θT

Note that except tonic inputs cie, cif , there are used only inhibiting connections, because such a system is less prone to become chaotic or divergent [13]. In this work, we consider the self-inhibitory inputs ve, v f as hidden variables, we do not work with them outside of the CPG network. The output layer combines the activation variables ue, u f with the affine transformation y = Wout u + bout , (6) where u = (ue, u f ) and Wout ∈ RN×2N , bout ∈ RN×1 are the learnable parameters. The connection of the CPG network and the output layer is illustrated in Fig. 2.

The main advantage of having Wout and bout as learnable parameters are that the BP algorithm can scale and translate the limit cycle formed by the CPG network. Here, we assume that these transformations are easier to learn by changing the parameters of the output layer than by changing parameters of the CPG network. It is because a change of any parameter of the CPG network can generally cause a non-linear change in the amplitude, frequency, and shift of the generated signals [6]. Another advantage of the proposed output layer is that it can develop complex signals as it can combine outputs from different CPGs. 4

Proposed Locomotion Control Learning

In this section, we propose the normalization layer and inductive learning method adapted to learning a CPG network for a hexapod walking robot, see Fig. 3a. Each leg of the robot has three joints called coxa, femur, and tibia (see Fig. 3b) for which an appropriate control signal has to be generated to control the locomotion of the robot. In the total, the robot has 18 controllable joints and depending on the control signals; the robot can move with various motion gaits [18], e.g., tripod, quadruped, wave, and pentapod. During the locomotion, each leg is either in a swing phase to reach a new foothold or in the stance phase in which it supports the body. The motion gait prescribes the order in which the swing and support phases alternate for individual legs; hence, all the legs must work in coordination to simultaneously achieve the desired behavior. The hexapod walking robot is thus used for benchmarking the proposed learning method, where the CPG network has to learn to generate control signals that realize the locomotion control of the robot with the tripod motion gait. The proposed normalization layer is based on early experiments with randomly parametrized CPG networks which in most cases ends up oscillating or converges to a static behavior. The static behavior is caused by the stable fixed points that may appear in the corresponding dynamic system. Therefore, we propose to employ a sufficient condition for the CPG network to be free of stable fixed points.

Condition. For a CPG network of N units, if all the values

of the tonic input ciμ , where i ∈ N and μ ∈ {e, f }, are from the range [cmin, cmax] and w f e < cmin (1 + β ) − mi∈aNx

cmax w f e > 1 + Tr/Ta

N ! ∑ wi j , j then the CPG network has no stable fixed point.

Proof. First, we state adapted theorem from [7]. Theorem. Assume that for some i and k (i 6= k) 2N ci(1 + β ) − ∑ ai jc j > 0,

j 2N ck(1 + β ) − ∑ ak jc j > 0,

j aik > 1 + Tr/Ta, A =

W w f eI w f eI W then the CPG network has no stable fixed point. The term {ai j} = A(2N,2N) is a matrix of the form (7) (8) (9) (10) (11) (12) and c = (ce, c f ), where I is the identity matrix of the same dimensions as W .

Since the CPGs should act as independent units, it is intuitive that each extensor-flexor neuron pair (a CPG) is able to oscillate on its own. Thus, a weaker form of the theorem is used, where the following conditions must hold for each i-th CPG: ce 1 N if (1 + β ) − f ∑ wi jcej > w f e ci ci j c f 1 N ie (1 + β ) − ce ∑ wi jc jf > w f e ci i j

w f e > 1 + Tr/Ta.

Now, we can focus on the effect of the tonic input c. For any parametrization W, β , Tr, Ta, w f e we can find a vector c that would break these conditions. Let’s relax the problem by clipping the values of c into the range [cmin, cmax] where cmin > 0. Then, it must become independent on the mutable c vector to simplify the system of conditions. This can be done by substituting c with such ci− that minimizes the left side expression of (13) or (14) for the i-th CPG. W.l.o.g. we consider findingci− just for (13) as ci− =

argmin cief (1 + β ) − 1f ∑N wi jcej.

c∈[cmin,cmax]2N ci ci j Since all the parameters are positive and wii = 0, the min argument in (16) decreases monotonically with decreasing cie and increasing c jf values. Thus, we can substitute these variables with their respective extremes

Since ccmmainx ∈ (0, 1] and ε > 0, the expression F(cmax) always minimizes (20). Therefore get

After substituting c0i into (17) and then ci− into (13) we

N cmin (1 + β ) − ∑ wi j > w f e.

cmax j Finally, to make this condition independent on the i-th CPG, we can choose such an inequality (26) that has the

N largest value of the ∑ wi j expression

j w f e < cmin (1 + β ) − mi∈aNx cmax

N ! ∑ wi j . j Combining (15) and (27) we get the desired (8) and (7).

We integrate the conditions (7) and (8) into the BP framework by redefining the variablesw f e and β as functions w f e(wˆf e, Tr, Ta) = 1 + Tr/Ta + exp(wˆf e), β (βˆ, w f e, w∗) = (w f e + w∗) cmax + exp(βˆ) − 1, (29) cmin where wˆf e, βˆ∈ R are new independent parameters and w∗ is defined as w∗ = max i∈N

N ! ∑ wi j . j Then, the max operator is approximated by the differentiable smoothmax defined as softmax(x) = exp(x) ∑ exp(x) (21) (22) (23) (24) (25) (26) (27) (28) (30) (31) that leaves just c0i as the variable to minimize

N F(c) = cmin (1 + β ) − cmax ∑ wi j,

c c j c0i =

argmin cif ∈[cmin,cmax]

F(cif ).

Notice that now, we are searching a scalar value c0i that minimizes the given expression.

The equation dF(c) = 0 has a solution only if F has such dc parameters β ,W, cmin, and cmax that make the function F constant. Since it is unlikely that such a parametrization will emerge during the learning, we consider F does not have any local extremes in the range [cmin, cmax]. Therefore, the minimization (19) can be simplified to c0i = argmin{F(cmin), F(cmax)}. (20)

The condition (13) implies F > 0, because w f e must be greater than zero and the following condition must hold too

Now, we define variableε > 0 that

and substitute the right side of (22) into F(cmin) and

F(cmax)

N 1 + β > cmax ∑ wi j.

cmin j

N 1 + β = cmax ∑ wi j + ε

cmin j F(cmax) = cmin ε,

cmax F(cmin) = ε.

c0i = cmax. Since all the parameters must be positive, other parameters are defined as exponent of the underlying parameter as Ta = exp(Tˆa), Tr = exp(Tˆr), wi j = exp(wˆi j), i 6= j, (33) where Tˆa, Tˆr, wˆi j ∈ R. The weights wi j, i 6= j cannot reach zero during learning, but they can approach it.

The BP algorithm learns the proposed new parameters Tˆa, Tˆr, wˆi j, wˆf e, and βˆ that are later normalized by (28), (29), and (33). 4.2

Proposed Architecture and Inductive Learning We propose to divide the CPG network into smaller subnetworks to reduce the search parameter space. These sub-networks are independently learned and then merged into larger sub-networks until a single final network remains. The proposed learning of the CPG network is performed in three phases. First, we learn a single CPG to generate a signal for one joint which gives us the shared parameters (w f e, Ta, Tr, β ). Then, six triplets of CPGs are learned to generate a control signal for the particular leg. Therefore, for each leg k ∈ [1, . . . , 6], we get parameters W k and Wokut , bkout . In the final phase, we connect all six CPG sub-networks into one. We choose to connect CPG sub-networks only by coxa-CPGs as it is assumed this is enough for each CPG subnetwork to synchronize. Therefore, for the subspace ue = (uecoxa,1, . . . , uecoxa,6, uef emur,1, . . . , uteibia,1) (and similarly for u f ), W ∈ R18×18 is organized as follows 

Wcoxa,coxa

W =  Wf emur,coxa 

Wtibia,coxa Wcoxa, f emur

Wtibia, f emur Wcoxa,tibia

Wf emur,tibia  , 0  where Wi j, i 6= j is the matrix of the connections between the i-th and j-th joints that can be expressed as

Wi j =   wi1j 0 0 0 · · · 0 0  0  , wi6j where the weights {wi j} = W k are taken from the matrices k parametrizing the previously learned CPG sub-networks.

For the rearranged vector u = (ue1, u1f , . . . , ue6, u6f ), the term Wout ∈ R18×36 is composed of the matrices Wokut of the previously learned CPG network that controls the k-th leg

Wout =   Wo1ut 0 0 0 · · · 0

All the zeroes in the W and Wout matrices are unlearnable constants imposing a structure onto the CPG network. where d(t) ∈ [ 0, 1 ]18 is the target signal for each of 18 robot’s actuators at the time t.

During early evaluation of the proposed learning, we observed that in many cases, the output signal has undesired lower frequency harmonics. This caused the output signal to fit the target signal only for a couple of the first periods. We propose to address this issue by an additional term to the objective function (34)

+ kr − ωk , where r ∈ R+ is a new hyperparameter and ω is an approximation of the fundamental frequency of the CPG oscillations that can be expressed as [13] ω =

Ta 1 s (Tr + Ta)β − Trw f e

Trw f e (35) (36) The hyperparameter r should be equal to the fundamental frequency of the desired signal. However, since (36) is just an approximation; it might lead to undesired local minima. Therefore, we propose to switch off the regularization once the term (35) is lesser than a predefined threshold. 5

Experimental evaluation

The proposed learning method has been experimentally verified using rmsprop [19] algorithm, which is commonly used to learn recurrent neural networks. Since the following experiments are meant to benchmark and map problems of the CPG network learning, we use a constant tonic input c = 1. Therefore, cmin = cmax = 1. The initial e f f e f state (uienit , vinit , uinit , vinit ) is set to uinit = 0.1, uinit = −0.1, f f and vinit = vinit = 0. The target signal is formed of eighteen sequences of joint angles that were recorded for a course of five tripod gait cycles. The hexapod robot was driven by a default regular gait based on [20], which is suitable for traversing flat terrains, and it uses the inverse kinematics for following the prescribed triangular leg foot-tip trajectory. This 4.7 seconds long record of all joint signals is sampled to 2350 equidistant data points, and each signal is further normalized in the range [ 0, 1 ], smoothed using Gaussian convolution to filter out signal peaks, and finally downsampled by the factor of 3.

Preliminary experiments have shown that the process of learning profoundly depends on initial parameters and in some runs, the BP algorithm seems to stuck in local minima from which the learning becomes very slow. This observation is consistent with [17]. The performance of the 0.5 0.4 BP algorithm has been improved by adding the regularization term (35). After that, the learning is performed in the three following consecutive steps.

First, each single CPG unit is learned to generate the sinusoid sin(t/2) that has the same frequency as the fundamental frequency of the desired control signal, which is deterministically set to 3 Hz. The CPG is learned in 2000 epochs, each back-propagating a batch of size 50 data-points. Note that the number of the needed epochs depends on the initial random parametrization.

Next, the parameters of the sinusoid generator is retrained to generate the desired joint control signals. The generator of each joint control is learned with 2000 epochs. We experimented with the stability of the learned limit cycle of the first leg by perturbing it, see Fig. 4a. Finally, the joints CPGs are connected as described in Sec. 4 with non-diagonal values of Wcoxa,coxa initialized to 0.5, and learned with 4000 epochs. We experimented with the stability of this final CPG network and results are depicted in Fig. 4b.

A comparison of the desired control signal of the first leg and the learned signal is depicted in Fig. 5. The learned signal has a similar shape and the same frequency as the original signal. Binding between different triplets of the legs, the most difficult part is shown in Fig. 6. We can see that the learned trajectory has a similar structure to the desired limit cycles. The trajectory also stays within its limit cycle; the trajectory was generated by six gait-cycles, therefore, traveled the limit cycle multiple times.

We deployed the resultant CPG locomotion controller on the real hexapod (see Fig. 3a) and compared with the original controller [20] in 10 trials. The robot was requested to crawl on flat surface for 10 s and then stop. The velocity of the robot was estimated using an external visual localization system based on tracking of visual marker [21] running with 25 Hz. Moreover, the robot’s 0.5 stability was measured as smoothness of the locomotion using an XSens MTi-30 inertial measurement unit (IMU) attached to the robot trunk. The variances in vertical acceleration (Accz) and the orientation (pitch and roll angles) of the robot’s body are the selected indicators of the locomotion stability.

The recorded robot trajectories visualized in Fig. 7 show that there is a transition effect for our CPG locomotion controller at the beginning of the trajectory where the CPG network starts to oscillate which makes the robot initial acceleration lower; however, the overall locomotion is smoother, as the velocity deviation is smaller.

The quantitative results are listed in Table 1 as average values of the indicators. The results indicate that the performance of the CPG locomotion controller is similar to the implementation [20] based on inverse kinematics (IKT).

Velocity Accz var. Pitch var. Roll var. Table 1: Experimental results

During the experimental evaluation of the proposed learning of the CPG network, a couple of good practices how to learn the sinusoid generator came up as follows. 1. It is better to learn the network in batches containing at most two periods. 2. If the CPG network is restarted to the initial state, it is good to ignore the transient states. 3. Since it is not important at which place the system enters the limit cycle, it is suitable to phase-shift the target signal; so, to minimize the distance from the output signal.

Combination of sub-networks into one network has two difficulties. The parameters (w f e, Ta, Tr, β ) must be the same for the whole CPG network, but the sub-networks are trained independently; so, they can end up with different parameters. In our case, the parameters are similar because all the CPG sub-networks are based on one CPG sub-network. Thus, the BP algorithm is able to adjust them during the learning of the complete network. Another difficulty is the choice of the initialWcoxa,coxa weights. The higher the weights are, the stronger is the coupling between the legs. However, if the weight values are too high, the constraint (7) would be violated. Therefore, we used (7) to choose the initial Wcoxa,coxa weights.

Even though that the robustness is not the objective of the learning algorithm, it is a property of single Matsuoka’s oscillator [22]. This property translated well into our 3-unit CPG network (see Fig. 4a) where the network can recover from perturbations. In the real world, robustness helps quickly react to simple temporal events, e.g., servo errors, or feedback from the environment.

In this work, we chose a simple model with cmin = cmax = 1, i.e., we have a constant tonic input. The timevariant tonic input; however, introduces dynamic changes as we can see in Fig. 8. In the future work, we would like to use the tonic input to control the output of the CPG network dynamically. 6

Conclusion

In this paper, we propose a new methodology for learning a CPG network modeled by symmetrically connected neural oscillators. The method is based on a combination of the back-propagation learning algorithm, normalization layer, and regularization term, where the normalization layer prunes the parameters spaces of the CPG network from the undesired non-periodic results, and thus help to speed up the learning process. The advantage of the proposed solution over the previous work on the CPG-based locomotion control is in the scalability of the method that enables to create such a CPG network that can directly control each actuator without the need to employ the inverse kinematics. The proposed method has been successfully deployed in the locomotion control of the real hexapod walking robot.

The main properties of the proposed methodology arise from the idea that the proposed CPG network for the hexapod locomotion control is based on the architecture of the CPG connections that imitates the structure of the robot. The CPG is inductively learned by learning its parts and merging them. Therefore, the proposed method is promising to be easily extendable to other multi-legged robot bodies. Furthermore, since the proposed CPG network is learnable by the back-propagation algorithm, it can be integrated into more complex neural networks supporting back-propagation, which is a subject of our future work.

Acknowledgments – This work was supported by the 4 2 0 0 0

80 iterations

Czech Science Foundation (GACˇ R) under research project No. 18-18858S. The support of the Grant Agency of the CTU in Prague under grant No. SGS16/235/OHK3/3T/13 to Rudolf Szadkowski is also gratefully acknowledged.

[1]

Marder and

Bucher , “ Central pattern generators and the control of rhythmic movements , ” Current Biology , vol. 11 , no. 23 , pp. R986 - R996 , 2001 .

[2]

Marder ,

Bucher ,

D. J.

Schulz , and

A. L.

Taylor , “ Invertebrate central pattern generation moves along , ” Current Biology , vol. 15 , no. 17 , pp. 685 - 699 , 2005 .

[3]

A. J.

Ijspeert , “ Central pattern generators for locomotion control in animals and robots: A review,” Neural Networks , vol. 21 , no. 4 , pp. 642 - 653 , 2008 .

[4]

R. D.

Beer ,

H. J.

Chiel , and

J. C.

Gallagher , “ Evolution and analysis of model CPGs for walking: II. General principles and individual variability , ” Journal of Computational Neuroscience , vol. 7 , no. 2 , pp. 119 - 147 , 1999 .

[5]

Yu ,

Gao ,

Ding ,

Li ,

Deng , and G. Liu, “ Gait Generation With Smooth Transition Using CPGBased Locomotion Control for Hexapod Walking Robot,” IEEE Transactions on Industrial Electronics , vol. 63 , no. 9 , pp. 5488 - 5500 , 2016 .