S. Krajči (ed.): ITAT 2018 Proceedings, pp. 116–123 CEUR Workshop Proceedings Vol. 2203, ISSN 1613-0073, c 2018 Rudolf J. Szadkowski, Petr Čížek, and Jan Faigl Learning Central Pattern Generator Network with Back-Propagation Algorithm Rudolf J. Szadkowski1 , Petr Čížek1 , and Jan Faigl1 Czech Technical University in Prague, Technicka 2, 16627 Prague, Czech Republic, szadkrud,petr.cizek,faiglj@fel.cvut.cz An adaptable central pattern generator (CPG) that di- Parameters of the CPG networks can be found ex- rectly controls the rhythmic motion of multilegged robot perimentally (i.e., tuned manually or automatically by must combine plasticity and sustainable periodicity. This evolutionary algorithms [4]) or they can be heuristically combination requires an algorithm that searches the para- designed. Such design-dependent methods make CPG metric space of the CPG and yields a non-stationary and networks difficult to scale on other robotic bodies or non-divergent solution. We model the CPG with the pi- adapt to the locomotion control in different environments. oneering Matsuoka’s neural oscillator which is (mostly) The scaling problem can be partially bypassed by pre- non-divergent and provides constraints ensuring non- computing a trajectory for each foot tip and employing in- stationarity. We embed these constraints into the CPG verse kinematics to determine the control signals for the formulation which we further implemented as a layer of particular leg’s joints [5, 6]. However, the inverse kine- an artificial neural network. This enables the CPG to be matic depends on the robot’s body, and identification of learnable by back-propagation algorithm while sustaining the parameters that have to be manually fine-tuned to en- the desirable properties. Moreover, the proposed CPG can sure a proper behavior. be integrated into more complex networks and trained un- The motivation for the presented approach is to develop der different optimization objectives. In addition to the a fully automatic CPG learning and this paper explores the theoretical properties of the developed system, its flexibil- possibility of learning a CPG network modeled by Mat- ity is demonstrated in successful learning of the tripod mo- suoka’s neural oscillators [7] with back-propagation al- tion gait with its practical deployment on the real hexapod gorithm (BP). To boost the BP algorithm that learns the walking robot. desired locomotion control for our multi-legged walking robot, we propose two methods pruning the parameter space of the CPG network. 1 Introduction The particular contributions presented in the paper are considered as follows. The movement of legged robots relies on synchronized • A normalization layer that prunes the parameter control of each its joint. Since these joints are part of space from parametrization with stable stationary so- the same body, the velocity of each joint is dependent on lutions. the position of all robot’s joints. The problem of generat- ing such synchronized control signals gets harder with in- • An inductive learning method that exploits the struc- creasing number of legs (or the number of joints per leg). ture of robot’s body and further reduces the searched A widely used generator of such signals is a system of parametric space. interconnected Central Pattern Generators (CPGs). The system based on CPGs can be described as two or more • Experimental evaluation of the proposed learning us- coupled oscillators. CPGs appear in many vertebrates and ing real hexapod walking robot for which the pro- insects where they are responsible for controlling rhythmic posed CPG network learned by the designed algo- motions, such as swimming, walking or respiration [1, 2]. rithm exhibits successful locomotion control follow- It also appears in biologically inspired robotics, where ing tripod gait, where the developed CPG network CPGs are used for locomotion control of legged robots [3]. directly produces the control signal for each of 18 ac- A CPG network can be modeled as a non-linear dy- tuators of the robot. namic system with coupled variables. Such a non-linear dynamic system is parameterized in the way that it con- tains a stable limit cycle, but finding such a parametriza- 2 Related Work tion is difficult because an analytical description of the high-dimensional non-linear dynamic system is hard or Different biomimetic approaches including CPGs [1], Re- impossible. Moreover, even a small change in the param- current Neural Networks [8] or Self-Adjusting Ring Mod- eters can result in a sudden change of the system’s quali- ules [9] to produce rhythmic patterns have been studied tative properties that can range from chaotic to stationary and deployed for locomotion control of robots [3] in recent and somewhere between is the desired periodic behavior. years. These approaches differ mainly in the complexity of Learning Central Pattern Generator Network with Back-Propagation Algorithm 117 the underlying model and have different levels of abstrac- Extensor neuron Flexor neuron tion ranging from biomechanical models [10] simulating to other CPGs membrane potentials and ion flows inside neurons, down d vie d Ta wij wij vif Ta to a model of two coupled neurons in a mutual inhibi- dt dt tion [11]. Amongst them, the CPGs based on Matsuoka’s β β neural oscillator [7] are being used as the prevalent model. Further details on the Matsuoka’s model are in Section 3 d uei wf e wf e d Tr ufi Tr as we built on its properties [7, 12, 13] in our work. dt dt cei cfi Deployment of the CPG oscillators on legged robots is from other CPGs also particularly difficult because of different kinematics and dynamics of each robot. A different amount of post- processing is used to translate the CPG outputs to joint Figure 1: CPG unit connected to the CPG network. coordinates. Namely, approaches using inverse kinemat- ics [5, 6] suffer from necessary hand fine-tuning of both the parameters of CPG as-well-as kinematics. Besides, ex- 3.1 CPG Model isting approaches are using the separate neural network as The dynamics of the CPG network with N units can be motor control unit [11] or use CPG outputs directly as joint described by a set of equations angles [14]. Furthermore, CPGs can seamlessly switch be- N tween different output patterns, thus different gaits [15] Tr u̇ei = −uei − w f e g(uif ) − β vei − ∑ wi j g(uej ) + cei , (1) which further supports the direct joint control. In our j=1 work, we use a dedicated output layer to shape the out- Ta v̇ei = g(uei ) − vei , (2) puts of CPGs as we assume simple transformations of the N output signal are easier to learn by changing parameters of Tr u̇if = −uif − w f e g(uei ) − β vif − ∑ wi j g(u fj ) + cif , (3) the output layer while the gait change is in charge of the j=1 CPG. Parametrization of the oscillator can be found experi- Ta v̇if = g(uif ) − vif , (4) mentally, e.g., using evolutionary algorithms with fitness where the subscript i ∈ N denotes the particular CPG and function minimizing energy consumption [11], maximiz- the superscript µ ∈ {e, f } distinguishes the extensor and ing the velocity [4], or using parameter optimization [16]. flexor neurons, respectively. Each tuple of the variables Besides, a modified back-propagation algorithm has been uei , vei describes the dynamics of the extensor neuron. The used on an adaptive neural oscillator in [17] to imitate an variable uei represents activation of the neuron and vei rep- external periodic signal by its output signal, but it fails resents its self-inhibitory input, which makes this neuron to sustain oscillations for complex waveforms. Further adaptive. Similarly uif , vif describe the dynamics of the works on the parameter constraining of CPGs to maintain flexor neuron. The function g is a rectifier stable oscillations have been published [7,12,13,16]; how- ever, to the best of our knowledge we are the first to teach g(x) = max(0, x) (5) a network of CPGs to perform a locomotion gait of a hexa- pod walking robot using back-propagation. Furthermore, that is an activation function that adds non-linearity to the we propose two methods to prune the space of possible system. Each neuron (i, µ) inhibits itself through the vari- µ CPG parameters. able vi scaled by the parameter β > 0. The extensor-flexor pair (i.e., the CPG unit) mutually inhibits itself through the symmetric connection with the weight w f e > 0. Fi- 3 Central Pattern Generator Network nally, the CPG units are inter-connected with the symmet- ric inhibiting connections wi j ∈ W for wi j ≥ 0 and wii = 0, The CPG network used in this paper is based on the Mat- where W is a symmetric matrix. The only source of exci- suoka’s neural oscillator [7]. Matsuoka’s neural oscillator tation for this CPG network is the tonic input cei , cif (≥ 0) is a pair of symmetrically connected adaptive neurons, ex- which is given externally. In general, the tonic input may tensor, and flexor, that imitate the behavior of biological be time-dependent and can be used to regulate the output neurons where after peaking, the neuron starts to repolar- of the CPG network [12]. Tr and Ta (both > 0) are reac- ize until its activation drops to resting potential. Features tion times for their respective variables. The structure of of Matsuoka’s neurons were extensively studied; hence, the CPG unit is visualized in Fig. 1. necessary conditions under which the neural network en- All the equations (1), (2), (3), and (4) are differentiable µ ters the stable stationary state [7], effects of time-variant except the cases when ui = 0, since the rectifier is used as tonic input [12], and approximation of oscillator’s funda- the activation function. However, we assume this will not mental frequency and amplitude [13] are well documented cause any problems because the rectifier is used inside the in the literature. The description of the particular CPG Rectified Linear Units (ReLU), which are widely used in model used in this work is as follows. deep neural networks. 118 Rudolf J. Szadkowski, Petr Čížek, and Jan Faigl θC θT ur m Fe Coxa ia θF Tib (a) (b) Figure 2: The CPG network connected to the output layer Out. Notice the output y is not fed back to the network. Figure 3: (a) Hexapod robot with the numbered legs. (b) Also notice that the self-inhibitory input u is not connected Schema of the leg. Each leg consists of three parts – Coxa, to Out. Femur, and Tibia. Note that except tonic inputs cei , cif , there are used only order in which the swing and support phases alternate for inhibiting connections, because such a system is less prone individual legs; hence, all the legs must work in coordina- to become chaotic or divergent [13]. tion to simultaneously achieve the desired behavior. The hexapod walking robot is thus used for benchmarking the proposed learning method, where the CPG network has to 3.2 Output layer learn to generate control signals that realize the locomo- tion control of the robot with the tripod motion gait. In this work, we consider the self-inhibitory inputs ve , v f as hidden variables, we do not work with them outside of the CPG network. The output layer combines the activa- 4.1 Normalization layer tion variables ue , u f with the affine transformation The proposed normalization layer is based on early exper- y = Wout u + bout , (6) iments with randomly parametrized CPG networks which in most cases ends up oscillating or converges to a static where u = (ue , u f ) and Wout ∈ RN×2N , bout ∈ RN×1 are the behavior. The static behavior is caused by the stable fixed learnable parameters. The connection of the CPG network points that may appear in the corresponding dynamic sys- and the output layer is illustrated in Fig. 2. tem. Therefore, we propose to employ a sufficient condi- The main advantage of having Wout and bout as learnable tion for the CPG network to be free of stable fixed points. parameters are that the BP algorithm can scale and trans- Condition. For a CPG network of N units, if all the values late the limit cycle formed by the CPG network. Here, we µ of the tonic input ci , where i ∈ N and µ ∈ {e, f }, are from assume that these transformations are easier to learn by the range [cmin , cmax ] and changing the parameters of the output layer than by chang- ! ing parameters of the CPG network. It is because a change cmin N of any parameter of the CPG network can generally cause wfe < (1 + β ) − max ∑ wi j , (7) cmax i∈N j a non-linear change in the amplitude, frequency, and shift of the generated signals [6]. Another advantage of the pro- w f e > 1 + Tr /Ta (8) posed output layer is that it can develop complex signals then the CPG network has no stable fixed point. as it can combine outputs from different CPGs. Proof. First, we state adapted theorem from [7]. 4 Proposed Locomotion Control Learning Theorem. Assume that for some i and k (i 6= k) 2N In this section, we propose the normalization layer and in- ci (1 + β ) − ∑ ai j c j > 0, (9) ductive learning method adapted to learning a CPG net- j work for a hexapod walking robot, see Fig. 3a. Each leg 2N of the robot has three joints called coxa, femur, and tibia ck (1 + β ) − ∑ ak j c j > 0, (10) (see Fig. 3b) for which an appropriate control signal has j to be generated to control the locomotion of the robot. In aik > 1 + Tr /Ta , (11) the total, the robot has 18 controllable joints and depend- ing on the control signals; the robot can move with various then the CPG network has no stable fixed point. The term motion gaits [18], e.g., tripod, quadruped, wave, and pen- {ai j } = A(2N,2N) is a matrix of the form tapod. During the locomotion, each leg is either in a swing   W w f eI phase to reach a new foothold or in the stance phase in A= (12) which it supports the body. The motion gait prescribes the w f eI W Learning Central Pattern Generator Network with Back-Propagation Algorithm 119 and c = (ce , c f ), where I is the identity matrix of the same The condition (13) implies F > 0, because w f e must be dimensions as W . greater than zero and the following condition must hold too Since the CPGs should act as independent units, it is intuitive that each extensor-flexor neuron pair (a CPG) is cmax N cmin ∑ 1+β > wi j . (21) able to oscillate on its own. Thus, a weaker form of the j theorem is used, where the following conditions must hold for each i-th CPG: Now, we define variable ε > 0 that cei 1 N cmax N cif (1 + β ) − cf ∑ wi j cej > w f e (13) 1+β = cmin ∑ wi j + ε (22) i j j cif 1 N and substitute the right side of (22) into F(cmin ) and cei ∑ (1 + β ) − wi j c fj > w f e (14) cei j F(cmax ) w f e > 1 + Tr /Ta . (15) cmin F(cmax ) = ε, (23) cmax Now, we can focus on the effect of the tonic input c. For F(cmin ) = ε. (24) any parametrization W, β , Tr , Ta , w f e we can find a vector c that would break these conditions. Let’s relax the prob- Since ccmax min ∈ (0, 1] and ε > 0, the expression F(cmax ) al- lem by clipping the values of c into the range [cmin , cmax ] ways minimizes (20). Therefore where cmin > 0. Then, it must become independent on the mutable c vector to simplify the system of conditions. This c0i = cmax . (25) can be done by substituting c with such c− i that minimizes After substituting c0i into (17) and then c− i into (13) we the left side expression of (13) or (14) for the i-th CPG. get W.l.o.g. we consider finding c− i just for (13) as N cmin cei 1 N (1 + β ) − ∑ wi j > w f e . (26) c− i = argmin f (1 + β ) − cif ∑ wi j cej . (16) cmax j c∈[cmin ,cmax ]2N ci j Finally, to make this condition independent on the i-th Since all the parameters are positive and wii = 0, the min CPG, we can choose such an inequality (26) that has the argument in (16) decreases monotonically with decreasing N largest value of the ∑ wi j expression cei and increasing c fj values. Thus, we can substitute these j variables with their respective extremes ! N cmin  e c j ∈ R , j 6= i  + wfe < cmax (1 + β ) − max i∈N ∑ wi j . (27)  j ce = c i min c− i = f (17) Combining (15) and (27) we get the desired (8) and (7). c j = cmax , j 6= i   f  ci = c0i We integrate the conditions (7) and (8) into the BP that leaves just c0i as the variable to minimize framework by redefining the variables w f e and β as func- tions cmin cmax N c ∑ F(c) = (1 + β ) − wi j , (18) w f e (ŵ f e , Tr , Ta ) = 1 + Tr /Ta + exp(ŵ f e ), (28) c j cmax β (β̂ , w f e , w∗ ) = (w f e + w∗ ) + exp(β̂ ) − 1, (29) c0i = argmin F(cif ). (19) cmin f ci ∈[cmin ,cmax ] where ŵ f e , β̂ ∈ R are new independent parameters and w∗ Notice that now, we are searching a scalar value c0i that is defined as ! minimizes the given expression. N The equation dF(c) dc = 0 has a solution only if F has such w∗ = max i∈N ∑ wi j . (30) j parameters β ,W, cmin , and cmax that make the function F constant. Since it is unlikely that such a parametrization Then, the max operator is approximated by the differen- will emerge during the learning, we consider F does not tiable smoothmax defined as have any local extremes in the range [cmin , cmax ]. There- exp(x) fore, the minimization (19) can be simplified to softmax(x) = , (31) ∑ exp(x) c0i = argmin{F(cmin ), F(cmax )}. (20) smoothmax(x) = softmax(x)x. (32) 120 Rudolf J. Szadkowski, Petr Čížek, and Jan Faigl Since all the parameters must be positive, other parameters 4.3 Objective Function are defined as exponent of the underlying parameter as The utilized loss function of the CPG network is defined Ta = exp(T̂a ), as a positive distance of the output vector from the desired one Tr = exp(T̂r ), (33) wi j = exp(ŵi j ), i 6= j, L (y(t), d(t)) = ky(t) − d(t)k , (34) where T̂a , T̂r , ŵi j ∈ R. The weights wi j , i 6= j cannot reach where d(t) ∈ [0, 1]18 is the target signal for each of 18 zero during learning, but they can approach it. robot’s actuators at the time t. During early evaluation of the proposed learning, we The BP algorithm learns the proposed new param- observed that in many cases, the output signal has unde- eters T̂a , T̂r , ŵi j , ŵ f e , and β̂ that are later normalized sired lower frequency harmonics. This caused the output by (28), (29), and (33). signal to fit the target signal only for a couple of the first periods. We propose to address this issue by an additional 4.2 Proposed Architecture and Inductive Learning term to the objective function (34) We propose to divide the CPG network into smaller sub- + kr − ωk , (35) networks to reduce the search parameter space. These sub-networks are independently learned and then merged where r ∈ R+ is a new hyperparameter and ω is an ap- into larger sub-networks until a single final network re- proximation of the fundamental frequency of the CPG os- mains. The proposed learning of the CPG network is cillations that can be expressed as [13] performed in three phases. First, we learn a single CPG s 1 (Tr + Ta )β − Tr w f e to generate a signal for one joint which gives us the ω= . (36) shared parameters (w f e , Ta , Tr , β ). Then, six triplets of Ta Tr w f e CPGs are learned to generate a control signal for the par- The hyperparameter r should be equal to the fundamen- ticular leg. Therefore, for each leg k ∈ [1, . . . , 6], we tal frequency of the desired signal. However, since (36) get parameters W k and Wout k , bk . In the final phase, out is just an approximation; it might lead to undesired local we connect all six CPG sub-networks into one. We minima. Therefore, we propose to switch off the regu- choose to connect CPG sub-networks only by coxa-CPGs larization once the term (35) is lesser than a predefined as it is assumed this is enough for each CPG sub- threshold. network to synchronize. Therefore, for the subspace ue = (uecoxa,1 , . . . , uecoxa,6 , uef emur,1 , . . . , utibia,1 e ) (and similarly for f u ), W ∈ R 18×18 is organized as follows 5 Experimental evaluation   Wcoxa,coxa Wcoxa, f emur Wcoxa,tibia The proposed learning method has been experimentally  W =  W f emur,coxa 0  W f emur,tibia  , verified using rmsprop [19] algorithm, which is com- monly used to learn recurrent neural networks. Since the Wtibia,coxa Wtibia, f emur 0 following experiments are meant to benchmark and map problems of the CPG network learning, we use a constant where Wi j , i 6= j is the matrix of the connections between tonic input c = 1. Therefore, cmin = cmax = 1. The initial the i-th and j-th joints that can be expressed as f f f state (ueinit , veinit , uinit , vinit ) is set to ueinit = 0.1, uinit = −0.1,   f f w1i j 0 0 and vinit = vinit = 0. The target signal is formed of eighteen Wi j =  0 ··· 0 , sequences of joint angles that were recorded for a course 0 0 w6i j of five tripod gait cycles. The hexapod robot was driven by a default regular gait based on [20], which is suitable for where the weights {wkij } = W k are taken from the matrices traversing flat terrains, and it uses the inverse kinematics parametrizing the previously learned CPG sub-networks. for following the prescribed triangular leg foot-tip trajec- For the rearranged vector u = (ue1 , u1f , . . . , ue6 , u6f ), the tory. This 4.7 seconds long record of all joint signals is term Wout ∈ R18×36 is composed of the matrices Wout k of sampled to 2350 equidistant data points, and each signal the previously learned CPG network that controls the k-th is further normalized in the range [0, 1], smoothed using leg Gaussian convolution to filter out signal peaks, and finally  1  downsampled by the factor of 3. Wout 0 0 Preliminary experiments have shown that the process of Wout =  0 ··· 0 . 6 learning profoundly depends on initial parameters and in 0 0 Wout some runs, the BP algorithm seems to stuck in local min- All the zeroes in the W and Wout matrices are unlearnable ima from which the learning becomes very slow. This ob- constants imposing a structure onto the CPG network. servation is consistent with [17]. The performance of the Learning Central Pattern Generator Network with Back-Propagation Algorithm 121 0.5 Perturbation +0.3 0.4 +0.5 0.3 +0.7 +0.9 0.2 coxa[rad] 0.50 0.50 0.1 0.25 0.25 0.0 44 45 46 −1.2 −1.0 −0.8 −0.6 0 20 40 60 500 520 540 560 iterations femur[rad] −0.75 −0.75 −1.00 −1.00 (a) −1.25 −1.25 44 45 46 1.2 1.4 1.6 Perturbation tibia[rad] 1.50 1.50 4 +0.3 +0.5 1.25 1.25 +0.7 2 +0.9 44 45 46 0.2 0.4 0.6 t[sec] [rad] 0 0 20 40 60 500 520 540 560 iterations Figure 5: Signals controlling the joints of the first leg. (b) Green ones are desired, and blue ones are generated by the CPG network. The time evolution is on the left, while Figure 4: Squared signal errors of the first leg (a) and body projected phase-space trajectories are on the right. Each (b) caused by perturbations. The perturbations have been row corresponds to one joint, i.e., coxa, femur, and tibia. introduced only at the start by adding a constant value to In the phase-space column, the variable pairs from the top all the trajectory components. For the leg CPG network are coxa-femur, femur-tibia, and tibia-coxa. (a) after 500 iterations, the errors vanished except for +0.9 perturbation. The body CPG network (b) does not recover even after 500 iterations except for +0.3 perturbation. learned trajectory 1.6 original trajectory 1.6 1.4 1.2 BP algorithm has been improved by adding the regulariza- tion term (35). After that, the learning is performed in the 0.2 1.4 1.6 three following consecutive steps. 1.4 1.2 1.2 First, each single CPG unit is learned to generate the 0.4 sinusoid sin(t/2) that has the same frequency as the fun- damental frequency of the desired control signal, which 0.6 is deterministically set to 3 Hz. The CPG is learned in 2000 epochs, each back-propagating a batch of size 50 data-points. Note that the number of the needed epochs 0.2 depends on the initial random parametrization. 0.4 0.2 Next, the parameters of the sinusoid generator is re- 0.4 0.6 trained to generate the desired joint control signals. The 0.6 generator of each joint control is learned with 2000 epochs. We experimented with the stability of the learned limit cycle of the first leg by perturbing it, see Fig. 4a. Fi- Figure 6: Synchronization of multiple legs. On the left, nally, the joints CPGs are connected as described in Sec. 4 the coxa-control trajectory for legs 1, 2, and 3 depicted in with non-diagonal values of Wcoxa,coxa initialized to 0.5, Fig. 3a. The original trajectory moves almost diagonally and learned with 4000 epochs. We experimented with the in the pictured cube and is “wrapped” by the learned tra- stability of this final CPG network and results are depicted jectory. On the right, the tibia-control trajectory for the in Fig. 4b. legs 1, 2, and 3. A comparison of the desired control signal of the first leg and the learned signal is depicted in Fig. 5. The learned signal has a similar shape and the same frequency as the We deployed the resultant CPG locomotion controller original signal. Binding between different triplets of the on the real hexapod (see Fig. 3a) and compared with the legs, the most difficult part is shown in Fig. 6. We can original controller [20] in 10 trials. The robot was re- see that the learned trajectory has a similar structure to the quested to crawl on flat surface for 10 s and then stop. desired limit cycles. The trajectory also stays within its The velocity of the robot was estimated using an exter- limit cycle; the trajectory was generated by six gait-cycles, nal visual localization system based on tracking of visual therefore, traveled the limit cycle multiple times. marker [21] running with 25 Hz. Moreover, the robot’s 122 Rudolf J. Szadkowski, Petr Čížek, and Jan Faigl Combination of sub-networks into one network has two difficulties. The parameters (w f e , Ta , Tr , β ) must be the 0.5 0.0 same for the whole CPG network, but the sub-networks y [m] are trained independently; so, they can end up with dif- −0.5 Original trajectory Learned trajectory ferent parameters. In our case, the parameters are similar −2.0 −1.5 −1.0 −0.5 0.0 because all the CPG sub-networks are based on one CPG x [m] sub-network. Thus, the BP algorithm is able to adjust them (a) (b) during the learning of the complete network. Another dif- ficulty is the choice of the initial Wcoxa,coxa weights. The Figure 7: (a) Used experimental setup of the robot with higher the weights are, the stronger is the coupling be- the tracking marker. (b) Visualization of the performed tween the legs. However, if the weight values are too high, trajectories using the proposed CPG locomotion control the constraint (7) would be violated. Therefore, we used and the reference controller. (7) to choose the initial Wcoxa,coxa weights. Even though that the robustness is not the objective stability was measured as smoothness of the locomotion of the learning algorithm, it is a property of single Mat- using an XSens MTi-30 inertial measurement unit (IMU) suoka’s oscillator [22]. This property translated well into attached to the robot trunk. The variances in vertical ac- our 3-unit CPG network (see Fig. 4a) where the network celeration (Accz ) and the orientation (pitch and roll angles) can recover from perturbations. In the real world, robust- of the robot’s body are the selected indicators of the loco- ness helps quickly react to simple temporal events, e.g., motion stability. servo errors, or feedback from the environment. The recorded robot trajectories visualized in Fig. 7 show In this work, we chose a simple model with cmin = that there is a transition effect for our CPG locomotion cmax = 1, i.e., we have a constant tonic input. The time- controller at the beginning of the trajectory where the variant tonic input; however, introduces dynamic changes CPG network starts to oscillate which makes the robot ini- as we can see in Fig. 8. In the future work, we would tial acceleration lower; however, the overall locomotion is like to use the tonic input to control the output of the CPG smoother, as the velocity deviation is smaller. network dynamically. The quantitative results are listed in Table 1 as aver- age values of the indicators. The results indicate that the 6 Conclusion performance of the CPG locomotion controller is similar to the implementation [20] based on inverse kinematics In this paper, we propose a new methodology for learn- (IKT). ing a CPG network modeled by symmetrically connected neural oscillators. The method is based on a combination Table 1: Experimental results of the back-propagation learning algorithm, normalization layer, and regularization term, where the normalization Unit IKT [20] CPG (Ours) layer prunes the parameters spaces of the CPG network from the undesired non-periodic results, and thus help to Velocity [m·s−1 ] 0.18±0.03 0.15±0.01 speed up the learning process. The advantage of the pro- Accz var. [m·s−2 ] 18.31 23.03 posed solution over the previous work on the CPG-based Pitch var. ×10−3 [rad] 0.14 0.23 locomotion control is in the scalability of the method that Roll var. ×10−3 [rad] 0.25 0.28 enables to create such a CPG network that can directly control each actuator without the need to employ the in- verse kinematics. The proposed method has been success- fully deployed in the locomotion control of the real hexa- 5.1 Lessons Learned and Discussion pod walking robot. During the experimental evaluation of the proposed learn- The main properties of the proposed methodology arise ing of the CPG network, a couple of good practices how from the idea that the proposed CPG network for the hexa- to learn the sinusoid generator came up as follows. pod locomotion control is based on the architecture of the CPG connections that imitates the structure of the robot. 1. It is better to learn the network in batches containing The CPG is inductively learned by learning its parts and at most two periods. merging them. Therefore, the proposed method is promis- 2. If the CPG network is restarted to the initial state, it ing to be easily extendable to other multi-legged robot is good to ignore the transient states. bodies. Furthermore, since the proposed CPG network is learnable by the back-propagation algorithm, it can be 3. Since it is not important at which place the system integrated into more complex neural networks supporting enters the limit cycle, it is suitable to phase-shift the back-propagation, which is a subject of our future work. target signal; so, to minimize the distance from the output signal. Acknowledgments – This work was supported by the Learning Central Pattern Generator Network with Back-Propagation Algorithm 123 [6] G. Zhong, L. Chen, Z. Jiao, J. Li, and H. Deng, “Loco- motion control and gait planning of a novel hexapod robot using biomimetic neurons,” IEEE Transactions on Control Systems Technology, vol. 26, no. 2, pp. 624–636, 2018. 4 [7] K. Matsuoka, “Sustained oscillations generated by mutu- ue − uf 2 ally inhibiting neurons with adaptation,” Biological Cyber- 0 netics, vol. 52, no. 6, pp. 367–376, 1985. 0 20 40 60 80 100 120 140 [8] B. A. Pearlmutter, “Gradient calculations for dynamic re- iterations current neural networks: A survey,” IEEE Transactions on (a) Neural networks, vol. 6, no. 5, pp. 1212–1228, 1995. [9] M. Hild and F. Pasemann, “Self-Adjusting Ring Modules (SARMs) for Flexible Gait Pattern Generation.” in IEEE 4 International Joint Conference on Artificial Intelligence ue − uf 2 (IJCAI), 2007, pp. 848–852. 0 [10] J. Hellgren, S. Grillner, and A. Lansner, “Computer simula- tion of the segmental neural network generating locomotion 0 20 40 60 80 100 120 140 iterations in lamprey by using populations of network interneurons,” Biological Cybernetics, vol. 68, no. 1, pp. 1–13, Nov 1992. (b) [11] S. Steingrube, M. Timme, F. Wörgötter, and P. Manoon- pong, “Self-organized adaptation of a simple neural circuit 2 enables complex robot behaviour,” Nature physics, vol. 6, ue − uf 1 no. 3, pp. 224–230, 2010. 0 [12] K. Matsuoka, “Mechanisms of frequency and pattern con- 0 20 40 60 80 100 120 140 trol in the neural rhythm generators,” Biological Cybernet- iterations ics, vol. 56, no. 5, pp. 345–353, 1987. (c) [13] ——, “Analysis of a neural oscillator,” Biological Cyber- netics, vol. 104, no. 4, pp. 297–304, 2011. Figure 8: Output of three randomly generated CPG units [14] A. J. Ijspeert, A. Crespi, and J.-M. Cabelguen, “Simulation with cmin = 2; cmax = 4. After each 50 iterations of total and robotics studies of salamander locomotion,” Neuroin- 150 iterations, the tonic input is set to ce = c f = 2; ce = formatics, vol. 3, no. 3, pp. 171–195, 2005. 2, c f = 4; ce = 2, c f = 8, respectively. Note that the last [15] W. Chen, G. Ren, J. Zhang, and J. Wang, “Smooth transi- setup violates the cmax constraint. tion between different gaits of a hexapod robot via a cen- tral pattern generators algorithm,” Journal of Intelligent & Robotic Systems, vol. 67, no. 3, pp. 255–270, Sep 2012. Czech Science Foundation (GAČR) under research project [16] L. Righetti and A. J. Ijspeert, “Design methodologies for No. 18-18858S. The support of the Grant Agency of the central pattern generators: an application to crawling hu- CTU in Prague under grant No. SGS16/235/OHK3/3T/13 manoids,” in Robotics: Science and Systems, 2006, pp. to Rudolf Szadkowski is also gratefully acknowledged. 191–198. [17] K. Doya and S. Yoshizawa, “Adaptive neural oscillator using continuous-time back-propagation learning,” Neural References Networks, vol. 2, pp. 375–385, 12 1989. [1] E. Marder and D. Bucher, “Central pattern generators [18] N. Porcino, “Hexapod gait control by a neural network,” in and the control of rhythmic movements,” Current Biology, IEEE International Joint Conference on Neural Networks vol. 11, no. 23, pp. R986–R996, 2001. (IJCNN), 1990, pp. 189–194. [2] E. Marder, D. Bucher, D. J. Schulz, and A. L. Taylor, “In- [19] T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide vertebrate central pattern generation moves along,” Current the gradient by a running average of its recent magnitude,” Biology, vol. 15, no. 17, pp. 685–699, 2005. COURSERA: Neural networks for machine learning, vol. 4, no. 2, pp. 26–31, 2012. [3] A. J. Ijspeert, “Central pattern generators for locomotion control in animals and robots: A review,” Neural Networks, [20] J. Mrva and J. Faigl, “Tactile sensing with servo drives vol. 21, no. 4, pp. 642–653, 2008. feedback only for blind hexapod walking robot,” in 10th International Workshop on Robot Motion and Control (Ro- [4] R. D. Beer, H. J. Chiel, and J. C. Gallagher, “Evolution and MoCo), 2015, pp. 240–245. analysis of model CPGs for walking: II. General principles and individual variability,” Journal of Computational Neu- [21] E. Olson, “AprilTag: A Robust and Flexible Visual Fiducial roscience, vol. 7, no. 2, pp. 119–147, 1999. System,” in IEEE International Conference on Robotics and Automation (ICRA), 2011, pp. 3400–3407. [5] H. Yu, H. Gao, L. Ding, M. Li, Z. Deng, and G. Liu, “Gait Generation With Smooth Transition Using CPG- [22] M. M. Williamson, “Robot arm control exploiting natural Based Locomotion Control for Hexapod Walking Robot,” dynamics,” Ph.D. dissertation, Massachusetts Institute of IEEE Transactions on Industrial Electronics, vol. 63, no. 9, Technology, 1999. pp. 5488–5500, 2016.