=Paper= {{Paper |id=Vol-2203/116 |storemode=property |title=Learning Central Pattern Generator Network with Back-Propagation Algorithm |pdfUrl=https://ceur-ws.org/Vol-2203/116.pdf |volume=Vol-2203 |authors=Rudolf J. Szadkowski,Petr Cizek,Jan Faigl |dblpUrl=https://dblp.org/rec/conf/itat/SzadkowskiCF18 }} ==Learning Central Pattern Generator Network with Back-Propagation Algorithm== https://ceur-ws.org/Vol-2203/116.pdf

S. Krajči (ed.): ITAT 2018 Proceedings, pp. 116–123
CEUR Workshop Proceedings Vol. 2203, ISSN 1613-0073, c 2018 Rudolf J. Szadkowski, Petr Čížek, and Jan Faigl

Learning Central Pattern Generator Network with Back-Propagation
Algorithm

Rudolf J. Szadkowski1 , Petr Čížek1 , and Jan Faigl1

Czech Technical University in Prague, Technicka 2, 16627 Prague, Czech Republic,
szadkrud,petr.cizek,faiglj@fel.cvut.cz

An adaptable central pattern generator (CPG) that di- Parameters of the CPG networks can be found ex-
rectly controls the rhythmic motion of multilegged robot perimentally (i.e., tuned manually or automatically by
must combine plasticity and sustainable periodicity. This evolutionary algorithms [4]) or they can be heuristically
combination requires an algorithm that searches the para- designed. Such design-dependent methods make CPG
metric space of the CPG and yields a non-stationary and networks difficult to scale on other robotic bodies or
non-divergent solution. We model the CPG with the pi- adapt to the locomotion control in different environments.
oneering Matsuoka’s neural oscillator which is (mostly) The scaling problem can be partially bypassed by pre-
non-divergent and provides constraints ensuring non- computing a trajectory for each foot tip and employing in-
stationarity. We embed these constraints into the CPG verse kinematics to determine the control signals for the
formulation which we further implemented as a layer of particular leg’s joints [5, 6]. However, the inverse kine-
an artificial neural network. This enables the CPG to be matic depends on the robot’s body, and identification of
learnable by back-propagation algorithm while sustaining the parameters that have to be manually fine-tuned to en-
the desirable properties. Moreover, the proposed CPG can sure a proper behavior.
be integrated into more complex networks and trained un- The motivation for the presented approach is to develop
der different optimization objectives. In addition to the a fully automatic CPG learning and this paper explores the
theoretical properties of the developed system, its flexibil- possibility of learning a CPG network modeled by Mat-
ity is demonstrated in successful learning of the tripod mo- suoka’s neural oscillators [7] with back-propagation al-
tion gait with its practical deployment on the real hexapod gorithm (BP). To boost the BP algorithm that learns the
walking robot. desired locomotion control for our multi-legged walking
robot, we propose two methods pruning the parameter
space of the CPG network.
1 Introduction The particular contributions presented in the paper are
considered as follows.
The movement of legged robots relies on synchronized
• A normalization layer that prunes the parameter
control of each its joint. Since these joints are part of
space from parametrization with stable stationary so-
the same body, the velocity of each joint is dependent on
lutions.
the position of all robot’s joints. The problem of generat-
ing such synchronized control signals gets harder with in- • An inductive learning method that exploits the struc-
creasing number of legs (or the number of joints per leg). ture of robot’s body and further reduces the searched
A widely used generator of such signals is a system of parametric space.
interconnected Central Pattern Generators (CPGs). The
system based on CPGs can be described as two or more • Experimental evaluation of the proposed learning us-
coupled oscillators. CPGs appear in many vertebrates and ing real hexapod walking robot for which the pro-
insects where they are responsible for controlling rhythmic posed CPG network learned by the designed algo-
motions, such as swimming, walking or respiration [1, 2]. rithm exhibits successful locomotion control follow-
It also appears in biologically inspired robotics, where ing tripod gait, where the developed CPG network
CPGs are used for locomotion control of legged robots [3]. directly produces the control signal for each of 18 ac-
A CPG network can be modeled as a non-linear dy- tuators of the robot.
namic system with coupled variables. Such a non-linear
dynamic system is parameterized in the way that it con-
tains a stable limit cycle, but finding such a parametriza- 2 Related Work
tion is difficult because an analytical description of the
high-dimensional non-linear dynamic system is hard or Different biomimetic approaches including CPGs [1], Re-
impossible. Moreover, even a small change in the param- current Neural Networks [8] or Self-Adjusting Ring Mod-
eters can result in a sudden change of the system’s quali- ules [9] to produce rhythmic patterns have been studied
tative properties that can range from chaotic to stationary and deployed for locomotion control of robots [3] in recent
and somewhere between is the desired periodic behavior. years. These approaches differ mainly in the complexity of
Learning Central Pattern Generator Network with Back-Propagation Algorithm 117

the underlying model and have different levels of abstrac- Extensor neuron Flexor neuron
tion ranging from biomechanical models [10] simulating
to other CPGs
membrane potentials and ion flows inside neurons, down d
vie
d
Ta wij wij vif Ta
to a model of two coupled neurons in a mutual inhibi- dt dt
tion [11]. Amongst them, the CPGs based on Matsuoka’s β β
neural oscillator [7] are being used as the prevalent model.
Further details on the Matsuoka’s model are in Section 3 d
uei wf e wf e d
Tr ufi Tr
as we built on its properties [7, 12, 13] in our work. dt dt
cei cfi
Deployment of the CPG oscillators on legged robots is
from other CPGs
also particularly difficult because of different kinematics
and dynamics of each robot. A different amount of post-
processing is used to translate the CPG outputs to joint Figure 1: CPG unit connected to the CPG network.
coordinates. Namely, approaches using inverse kinemat-
ics [5, 6] suffer from necessary hand fine-tuning of both
the parameters of CPG as-well-as kinematics. Besides, ex- 3.1 CPG Model
isting approaches are using the separate neural network as The dynamics of the CPG network with N units can be
motor control unit [11] or use CPG outputs directly as joint described by a set of equations
angles [14]. Furthermore, CPGs can seamlessly switch be-
N
tween different output patterns, thus different gaits [15]
Tr u̇ei = −uei − w f e g(uif ) − β vei − ∑ wi j g(uej ) + cei , (1)
which further supports the direct joint control. In our j=1
work, we use a dedicated output layer to shape the out-
Ta v̇ei = g(uei ) − vei , (2)
puts of CPGs as we assume simple transformations of the
N
output signal are easier to learn by changing parameters of
Tr u̇if = −uif − w f e g(uei ) − β vif − ∑ wi j g(u fj ) + cif , (3)
the output layer while the gait change is in charge of the
j=1
CPG.
Parametrization of the oscillator can be found experi- Ta v̇if = g(uif ) − vif , (4)
mentally, e.g., using evolutionary algorithms with fitness
where the subscript i ∈ N denotes the particular CPG and
function minimizing energy consumption [11], maximiz-
the superscript µ ∈ {e, f } distinguishes the extensor and
ing the velocity [4], or using parameter optimization [16].
flexor neurons, respectively. Each tuple of the variables
Besides, a modified back-propagation algorithm has been
uei , vei describes the dynamics of the extensor neuron. The
used on an adaptive neural oscillator in [17] to imitate an
variable uei represents activation of the neuron and vei rep-
external periodic signal by its output signal, but it fails
resents its self-inhibitory input, which makes this neuron
to sustain oscillations for complex waveforms. Further
adaptive. Similarly uif , vif describe the dynamics of the
works on the parameter constraining of CPGs to maintain
flexor neuron. The function g is a rectifier
stable oscillations have been published [7,12,13,16]; how-
ever, to the best of our knowledge we are the first to teach g(x) = max(0, x) (5)
a network of CPGs to perform a locomotion gait of a hexa-
pod walking robot using back-propagation. Furthermore, that is an activation function that adds non-linearity to the
we propose two methods to prune the space of possible system. Each neuron (i, µ) inhibits itself through the vari-
µ
CPG parameters. able vi scaled by the parameter β > 0. The extensor-flexor
pair (i.e., the CPG unit) mutually inhibits itself through
the symmetric connection with the weight w f e > 0. Fi-
3 Central Pattern Generator Network nally, the CPG units are inter-connected with the symmet-
ric inhibiting connections wi j ∈ W for wi j ≥ 0 and wii = 0,
The CPG network used in this paper is based on the Mat- where W is a symmetric matrix. The only source of exci-
suoka’s neural oscillator [7]. Matsuoka’s neural oscillator tation for this CPG network is the tonic input cei , cif (≥ 0)
is a pair of symmetrically connected adaptive neurons, ex- which is given externally. In general, the tonic input may
tensor, and flexor, that imitate the behavior of biological be time-dependent and can be used to regulate the output
neurons where after peaking, the neuron starts to repolar- of the CPG network [12]. Tr and Ta (both > 0) are reac-
ize until its activation drops to resting potential. Features tion times for their respective variables. The structure of
of Matsuoka’s neurons were extensively studied; hence, the CPG unit is visualized in Fig. 1.
necessary conditions under which the neural network en- All the equations (1), (2), (3), and (4) are differentiable
µ
ters the stable stationary state [7], effects of time-variant except the cases when ui = 0, since the rectifier is used as
tonic input [12], and approximation of oscillator’s funda- the activation function. However, we assume this will not
mental frequency and amplitude [13] are well documented cause any problems because the rectifier is used inside the
in the literature. The description of the particular CPG Rectified Linear Units (ReLU), which are widely used in
model used in this work is as follows. deep neural networks.
118 Rudolf J. Szadkowski, Petr Čížek, and Jan Faigl

θC θT

ur
m
Fe
Coxa

ia
θF

Tib
(a) (b)
Figure 2: The CPG network connected to the output layer
Out. Notice the output y is not fed back to the network. Figure 3: (a) Hexapod robot with the numbered legs. (b)
Also notice that the self-inhibitory input u is not connected Schema of the leg. Each leg consists of three parts – Coxa,
to Out. Femur, and Tibia.

Note that except tonic inputs cei , cif , there are used only order in which the swing and support phases alternate for
inhibiting connections, because such a system is less prone individual legs; hence, all the legs must work in coordina-
to become chaotic or divergent [13]. tion to simultaneously achieve the desired behavior. The
hexapod walking robot is thus used for benchmarking the
proposed learning method, where the CPG network has to
3.2 Output layer learn to generate control signals that realize the locomo-
tion control of the robot with the tripod motion gait.
In this work, we consider the self-inhibitory inputs ve , v f
as hidden variables, we do not work with them outside of
the CPG network. The output layer combines the activa- 4.1 Normalization layer
tion variables ue , u f with the affine transformation
The proposed normalization layer is based on early exper-
y = Wout u + bout , (6) iments with randomly parametrized CPG networks which
in most cases ends up oscillating or converges to a static
where u = (ue , u f ) and Wout ∈ RN×2N , bout ∈ RN×1 are the behavior. The static behavior is caused by the stable fixed
learnable parameters. The connection of the CPG network points that may appear in the corresponding dynamic sys-
and the output layer is illustrated in Fig. 2. tem. Therefore, we propose to employ a sufficient condi-
The main advantage of having Wout and bout as learnable tion for the CPG network to be free of stable fixed points.
parameters are that the BP algorithm can scale and trans- Condition. For a CPG network of N units, if all the values
late the limit cycle formed by the CPG network. Here, we µ
of the tonic input ci , where i ∈ N and µ ∈ {e, f }, are from
assume that these transformations are easier to learn by the range [cmin , cmax ] and
changing the parameters of the output layer than by chang- !
ing parameters of the CPG network. It is because a change cmin N
of any parameter of the CPG network can generally cause wfe < (1 + β ) − max ∑ wi j , (7)
cmax i∈N j
a non-linear change in the amplitude, frequency, and shift
of the generated signals [6]. Another advantage of the pro- w f e > 1 + Tr /Ta (8)
posed output layer is that it can develop complex signals
then the CPG network has no stable fixed point.
as it can combine outputs from different CPGs.
Proof. First, we state adapted theorem from [7].
4 Proposed Locomotion Control Learning Theorem. Assume that for some i and k (i 6= k)
2N
In this section, we propose the normalization layer and in- ci (1 + β ) − ∑ ai j c j > 0, (9)
ductive learning method adapted to learning a CPG net- j
work for a hexapod walking robot, see Fig. 3a. Each leg 2N
of the robot has three joints called coxa, femur, and tibia ck (1 + β ) − ∑ ak j c j > 0, (10)
(see Fig. 3b) for which an appropriate control signal has j
to be generated to control the locomotion of the robot. In aik > 1 + Tr /Ta , (11)
the total, the robot has 18 controllable joints and depend-
ing on the control signals; the robot can move with various then the CPG network has no stable fixed point. The term
motion gaits [18], e.g., tripod, quadruped, wave, and pen- {ai j } = A(2N,2N) is a matrix of the form
tapod. During the locomotion, each leg is either in a swing
W w f eI
phase to reach a new foothold or in the stance phase in A= (12)
which it supports the body. The motion gait prescribes the w f eI W
Learning Central Pattern Generator Network with Back-Propagation Algorithm 119

and c = (ce , c f ), where I is the identity matrix of the same The condition (13) implies F > 0, because w f e must be
dimensions as W . greater than zero and the following condition must hold
too
Since the CPGs should act as independent units, it is
intuitive that each extensor-flexor neuron pair (a CPG) is cmax N
cmin ∑
1+β > wi j . (21)
able to oscillate on its own. Thus, a weaker form of the j
theorem is used, where the following conditions must hold
for each i-th CPG: Now, we define variable ε > 0 that
cei 1 N cmax N
cif
(1 + β ) −
cf
∑ wi j cej > w f e (13) 1+β =
cmin ∑
wi j + ε (22)
i j j

cif 1 N and substitute the right side of (22) into F(cmin ) and
cei ∑
(1 + β ) − wi j c fj > w f e (14)
cei j F(cmax )
w f e > 1 + Tr /Ta . (15) cmin
F(cmax ) = ε, (23)
cmax
Now, we can focus on the effect of the tonic input c. For F(cmin ) = ε. (24)
any parametrization W, β , Tr , Ta , w f e we can find a vector
c that would break these conditions. Let’s relax the prob- Since ccmax
min
∈ (0, 1] and ε > 0, the expression F(cmax ) al-
lem by clipping the values of c into the range [cmin , cmax ] ways minimizes (20). Therefore
where cmin > 0. Then, it must become independent on the
mutable c vector to simplify the system of conditions. This c0i = cmax . (25)
can be done by substituting c with such c− i that minimizes After substituting c0i into (17) and then c−
i into (13) we
the left side expression of (13) or (14) for the i-th CPG. get
W.l.o.g. we consider finding c− i just for (13) as
N
cmin
cei 1 N (1 + β ) − ∑ wi j > w f e . (26)
c−
i = argmin f
(1 + β ) −
cif
∑ wi j cej . (16) cmax j
c∈[cmin ,cmax ]2N ci j
Finally, to make this condition independent on the i-th
Since all the parameters are positive and wii = 0, the min CPG, we can choose such an inequality (26) that has the
argument in (16) decreases monotonically with decreasing N
largest value of the ∑ wi j expression
cei and increasing c fj values. Thus, we can substitute these j
variables with their respective extremes !
N
cmin
 e
c j ∈ R , j 6= i
 + wfe <
cmax
(1 + β ) − max
i∈N
∑ wi j . (27)
 j
ce = c
i min
c−
i = f (17) Combining (15) and (27) we get the desired (8) and (7).
c j = cmax , j 6= i

 f
ci = c0i
We integrate the conditions (7) and (8) into the BP
that leaves just c0i as the variable to minimize framework by redefining the variables w f e and β as func-
tions
cmin cmax N
c ∑
F(c) = (1 + β ) − wi j , (18) w f e (ŵ f e , Tr , Ta ) = 1 + Tr /Ta + exp(ŵ f e ), (28)
c j cmax
β (β̂ , w f e , w∗ ) = (w f e + w∗ ) + exp(β̂ ) − 1, (29)
c0i = argmin F(cif ). (19) cmin
f
ci ∈[cmin ,cmax ]
where ŵ f e , β̂ ∈ R are new independent parameters and w∗
Notice that now, we are searching a scalar value c0i that is defined as
!
minimizes the given expression. N
The equation dF(c)
dc = 0 has a solution only if F has such
w∗ = max
i∈N
∑ wi j . (30)
j
parameters β ,W, cmin , and cmax that make the function F
constant. Since it is unlikely that such a parametrization Then, the max operator is approximated by the differen-
will emerge during the learning, we consider F does not tiable smoothmax defined as
have any local extremes in the range [cmin , cmax ]. There-
exp(x)
fore, the minimization (19) can be simplified to softmax(x) = , (31)
∑ exp(x)
c0i = argmin{F(cmin ), F(cmax )}. (20) smoothmax(x) = softmax(x)x. (32)
120 Rudolf J. Szadkowski, Petr Čížek, and Jan Faigl

Since all the parameters must be positive, other parameters 4.3 Objective Function
are defined as exponent of the underlying parameter as
The utilized loss function of the CPG network is defined
Ta = exp(T̂a ), as a positive distance of the output vector from the desired
one
Tr = exp(T̂r ), (33)
wi j = exp(ŵi j ), i 6= j, L (y(t), d(t)) = ky(t) − d(t)k , (34)

where T̂a , T̂r , ŵi j ∈ R. The weights wi j , i 6= j cannot reach where d(t) ∈ [0, 1]18 is the target signal for each of 18
zero during learning, but they can approach it. robot’s actuators at the time t.
During early evaluation of the proposed learning, we
The BP algorithm learns the proposed new param-
observed that in many cases, the output signal has unde-
eters T̂a , T̂r , ŵi j , ŵ f e , and β̂ that are later normalized
sired lower frequency harmonics. This caused the output
by (28), (29), and (33).
signal to fit the target signal only for a couple of the first
periods. We propose to address this issue by an additional
4.2 Proposed Architecture and Inductive Learning term to the objective function (34)

We propose to divide the CPG network into smaller sub- + kr − ωk , (35)
networks to reduce the search parameter space. These
sub-networks are independently learned and then merged where r ∈ R+ is a new hyperparameter and ω is an ap-
into larger sub-networks until a single final network re- proximation of the fundamental frequency of the CPG os-
mains. The proposed learning of the CPG network is cillations that can be expressed as [13]
performed in three phases. First, we learn a single CPG s
1 (Tr + Ta )β − Tr w f e
to generate a signal for one joint which gives us the ω= . (36)
shared parameters (w f e , Ta , Tr , β ). Then, six triplets of Ta Tr w f e
CPGs are learned to generate a control signal for the par-
The hyperparameter r should be equal to the fundamen-
ticular leg. Therefore, for each leg k ∈ [1, . . . , 6], we
tal frequency of the desired signal. However, since (36)
get parameters W k and Wout k , bk . In the final phase,
out is just an approximation; it might lead to undesired local
we connect all six CPG sub-networks into one. We
minima. Therefore, we propose to switch off the regu-
choose to connect CPG sub-networks only by coxa-CPGs
larization once the term (35) is lesser than a predefined
as it is assumed this is enough for each CPG sub-
threshold.
network to synchronize. Therefore, for the subspace ue =
(uecoxa,1 , . . . , uecoxa,6 , uef emur,1 , . . . , utibia,1
e ) (and similarly for
f
u ), W ∈ R 18×18 is organized as follows 5 Experimental evaluation
 
Wcoxa,coxa Wcoxa, f emur Wcoxa,tibia The proposed learning method has been experimentally

W =  W f emur,coxa 0

W f emur,tibia  , verified using rmsprop [19] algorithm, which is com-
monly used to learn recurrent neural networks. Since the
Wtibia,coxa Wtibia, f emur 0
following experiments are meant to benchmark and map
problems of the CPG network learning, we use a constant
where Wi j , i 6= j is the matrix of the connections between
tonic input c = 1. Therefore, cmin = cmax = 1. The initial
the i-th and j-th joints that can be expressed as f f f
state (ueinit , veinit , uinit , vinit ) is set to ueinit = 0.1, uinit = −0.1,
  f f
w1i j 0 0 and vinit = vinit = 0. The target signal is formed of eighteen
Wi j =  0 ··· 0 , sequences of joint angles that were recorded for a course
0 0 w6i j of five tripod gait cycles. The hexapod robot was driven by
a default regular gait based on [20], which is suitable for
where the weights {wkij } = W k are taken from the matrices traversing flat terrains, and it uses the inverse kinematics
parametrizing the previously learned CPG sub-networks. for following the prescribed triangular leg foot-tip trajec-
For the rearranged vector u = (ue1 , u1f , . . . , ue6 , u6f ), the tory. This 4.7 seconds long record of all joint signals is
term Wout ∈ R18×36 is composed of the matrices Wout k of sampled to 2350 equidistant data points, and each signal
the previously learned CPG network that controls the k-th is further normalized in the range [0, 1], smoothed using
leg Gaussian convolution to filter out signal peaks, and finally
 1  downsampled by the factor of 3.
Wout 0 0
Preliminary experiments have shown that the process of
Wout =  0 ··· 0 .
6 learning profoundly depends on initial parameters and in
0 0 Wout
some runs, the BP algorithm seems to stuck in local min-
All the zeroes in the W and Wout matrices are unlearnable ima from which the learning becomes very slow. This ob-
constants imposing a structure onto the CPG network. servation is consistent with [17]. The performance of the
Learning Central Pattern Generator Network with Back-Propagation Algorithm 121

0.5 Perturbation
+0.3
0.4 +0.5
0.3 +0.7
+0.9
0.2

coxa[rad]
0.50 0.50

0.1 0.25 0.25

0.0 44 45 46 −1.2 −1.0 −0.8 −0.6
0 20 40 60 500 520 540 560
iterations

femur[rad]
−0.75 −0.75
−1.00 −1.00
(a) −1.25 −1.25

44 45 46 1.2 1.4 1.6

Perturbation

tibia[rad]
1.50 1.50
4 +0.3
+0.5 1.25 1.25
+0.7
2 +0.9 44 45 46 0.2 0.4 0.6
t[sec] [rad]

0
0 20 40 60 500 520 540 560
iterations Figure 5: Signals controlling the joints of the first leg.
(b) Green ones are desired, and blue ones are generated by
the CPG network. The time evolution is on the left, while
Figure 4: Squared signal errors of the first leg (a) and body projected phase-space trajectories are on the right. Each
(b) caused by perturbations. The perturbations have been row corresponds to one joint, i.e., coxa, femur, and tibia.
introduced only at the start by adding a constant value to In the phase-space column, the variable pairs from the top
all the trajectory components. For the leg CPG network are coxa-femur, femur-tibia, and tibia-coxa.
(a) after 500 iterations, the errors vanished except for +0.9
perturbation. The body CPG network (b) does not recover
even after 500 iterations except for +0.3 perturbation.
learned trajectory 1.6
original trajectory 1.6 1.4
1.2
BP algorithm has been improved by adding the regulariza-
tion term (35). After that, the learning is performed in the 0.2 1.4 1.6
three following consecutive steps. 1.4
1.2 1.2
First, each single CPG unit is learned to generate the 0.4
sinusoid sin(t/2) that has the same frequency as the fun-
damental frequency of the desired control signal, which 0.6
is deterministically set to 3 Hz. The CPG is learned in
2000 epochs, each back-propagating a batch of size 50
data-points. Note that the number of the needed epochs 0.2
depends on the initial random parametrization. 0.4
0.2
Next, the parameters of the sinusoid generator is re- 0.4 0.6
trained to generate the desired joint control signals. The 0.6
generator of each joint control is learned with 2000
epochs. We experimented with the stability of the learned
limit cycle of the first leg by perturbing it, see Fig. 4a. Fi- Figure 6: Synchronization of multiple legs. On the left,
nally, the joints CPGs are connected as described in Sec. 4 the coxa-control trajectory for legs 1, 2, and 3 depicted in
with non-diagonal values of Wcoxa,coxa initialized to 0.5, Fig. 3a. The original trajectory moves almost diagonally
and learned with 4000 epochs. We experimented with the in the pictured cube and is “wrapped” by the learned tra-
stability of this final CPG network and results are depicted jectory. On the right, the tibia-control trajectory for the
in Fig. 4b. legs 1, 2, and 3.
A comparison of the desired control signal of the first
leg and the learned signal is depicted in Fig. 5. The learned
signal has a similar shape and the same frequency as the We deployed the resultant CPG locomotion controller
original signal. Binding between different triplets of the on the real hexapod (see Fig. 3a) and compared with the
legs, the most difficult part is shown in Fig. 6. We can original controller [20] in 10 trials. The robot was re-
see that the learned trajectory has a similar structure to the quested to crawl on flat surface for 10 s and then stop.
desired limit cycles. The trajectory also stays within its The velocity of the robot was estimated using an exter-
limit cycle; the trajectory was generated by six gait-cycles, nal visual localization system based on tracking of visual
therefore, traveled the limit cycle multiple times. marker [21] running with 25 Hz. Moreover, the robot’s
122 Rudolf J. Szadkowski, Petr Čížek, and Jan Faigl

Combination of sub-networks into one network has two
difficulties. The parameters (w f e , Ta , Tr , β ) must be the
0.5

0.0
same for the whole CPG network, but the sub-networks

y [m]
are trained independently; so, they can end up with dif-
−0.5
Original trajectory
Learned trajectory
ferent parameters. In our case, the parameters are similar
−2.0 −1.5 −1.0 −0.5 0.0
because all the CPG sub-networks are based on one CPG
x [m] sub-network. Thus, the BP algorithm is able to adjust them
(a) (b) during the learning of the complete network. Another dif-
ficulty is the choice of the initial Wcoxa,coxa weights. The
Figure 7: (a) Used experimental setup of the robot with higher the weights are, the stronger is the coupling be-
the tracking marker. (b) Visualization of the performed tween the legs. However, if the weight values are too high,
trajectories using the proposed CPG locomotion control the constraint (7) would be violated. Therefore, we used
and the reference controller. (7) to choose the initial Wcoxa,coxa weights.
Even though that the robustness is not the objective
stability was measured as smoothness of the locomotion of the learning algorithm, it is a property of single Mat-
using an XSens MTi-30 inertial measurement unit (IMU) suoka’s oscillator [22]. This property translated well into
attached to the robot trunk. The variances in vertical ac- our 3-unit CPG network (see Fig. 4a) where the network
celeration (Accz ) and the orientation (pitch and roll angles) can recover from perturbations. In the real world, robust-
of the robot’s body are the selected indicators of the loco- ness helps quickly react to simple temporal events, e.g.,
motion stability. servo errors, or feedback from the environment.
The recorded robot trajectories visualized in Fig. 7 show In this work, we chose a simple model with cmin =
that there is a transition effect for our CPG locomotion cmax = 1, i.e., we have a constant tonic input. The time-
controller at the beginning of the trajectory where the variant tonic input; however, introduces dynamic changes
CPG network starts to oscillate which makes the robot ini- as we can see in Fig. 8. In the future work, we would
tial acceleration lower; however, the overall locomotion is like to use the tonic input to control the output of the CPG
smoother, as the velocity deviation is smaller. network dynamically.
The quantitative results are listed in Table 1 as aver-
age values of the indicators. The results indicate that the 6 Conclusion
performance of the CPG locomotion controller is similar
to the implementation [20] based on inverse kinematics In this paper, we propose a new methodology for learn-
(IKT). ing a CPG network modeled by symmetrically connected
neural oscillators. The method is based on a combination
Table 1: Experimental results of the back-propagation learning algorithm, normalization
layer, and regularization term, where the normalization
Unit IKT [20] CPG (Ours) layer prunes the parameters spaces of the CPG network
from the undesired non-periodic results, and thus help to
Velocity [m·s−1 ] 0.18±0.03 0.15±0.01
speed up the learning process. The advantage of the pro-
Accz var. [m·s−2 ] 18.31 23.03 posed solution over the previous work on the CPG-based
Pitch var. ×10−3 [rad] 0.14 0.23 locomotion control is in the scalability of the method that
Roll var. ×10−3 [rad] 0.25 0.28 enables to create such a CPG network that can directly
control each actuator without the need to employ the in-
verse kinematics. The proposed method has been success-
fully deployed in the locomotion control of the real hexa-
5.1 Lessons Learned and Discussion
pod walking robot.
During the experimental evaluation of the proposed learn- The main properties of the proposed methodology arise
ing of the CPG network, a couple of good practices how from the idea that the proposed CPG network for the hexa-
to learn the sinusoid generator came up as follows. pod locomotion control is based on the architecture of the
CPG connections that imitates the structure of the robot.
1. It is better to learn the network in batches containing
The CPG is inductively learned by learning its parts and
at most two periods.
merging them. Therefore, the proposed method is promis-
2. If the CPG network is restarted to the initial state, it ing to be easily extendable to other multi-legged robot
is good to ignore the transient states. bodies. Furthermore, since the proposed CPG network
is learnable by the back-propagation algorithm, it can be
3. Since it is not important at which place the system integrated into more complex neural networks supporting
enters the limit cycle, it is suitable to phase-shift the back-propagation, which is a subject of our future work.
target signal; so, to minimize the distance from the
output signal. Acknowledgments – This work was supported by the
Learning Central Pattern Generator Network with Back-Propagation Algorithm 123

[6] G. Zhong, L. Chen, Z. Jiao, J. Li, and H. Deng, “Loco-
motion control and gait planning of a novel hexapod robot
using biomimetic neurons,” IEEE Transactions on Control
Systems Technology, vol. 26, no. 2, pp. 624–636, 2018.
4
[7] K. Matsuoka, “Sustained oscillations generated by mutu-
ue − uf

2
ally inhibiting neurons with adaptation,” Biological Cyber-
0 netics, vol. 52, no. 6, pp. 367–376, 1985.
0 20 40 60 80 100 120 140 [8] B. A. Pearlmutter, “Gradient calculations for dynamic re-
iterations
current neural networks: A survey,” IEEE Transactions on
(a) Neural networks, vol. 6, no. 5, pp. 1212–1228, 1995.
[9] M. Hild and F. Pasemann, “Self-Adjusting Ring Modules
(SARMs) for Flexible Gait Pattern Generation.” in IEEE
4
International Joint Conference on Artificial Intelligence
ue − uf

2 (IJCAI), 2007, pp. 848–852.
0 [10] J. Hellgren, S. Grillner, and A. Lansner, “Computer simula-
tion of the segmental neural network generating locomotion
0 20 40 60 80 100 120 140
iterations in lamprey by using populations of network interneurons,”
Biological Cybernetics, vol. 68, no. 1, pp. 1–13, Nov 1992.
(b)
[11] S. Steingrube, M. Timme, F. Wörgötter, and P. Manoon-
pong, “Self-organized adaptation of a simple neural circuit
2
enables complex robot behaviour,” Nature physics, vol. 6,
ue − uf

1 no. 3, pp. 224–230, 2010.
0 [12] K. Matsuoka, “Mechanisms of frequency and pattern con-
0 20 40 60 80 100 120 140
trol in the neural rhythm generators,” Biological Cybernet-
iterations ics, vol. 56, no. 5, pp. 345–353, 1987.
(c) [13] ——, “Analysis of a neural oscillator,” Biological Cyber-
netics, vol. 104, no. 4, pp. 297–304, 2011.
Figure 8: Output of three randomly generated CPG units [14] A. J. Ijspeert, A. Crespi, and J.-M. Cabelguen, “Simulation
with cmin = 2; cmax = 4. After each 50 iterations of total and robotics studies of salamander locomotion,” Neuroin-
150 iterations, the tonic input is set to ce = c f = 2; ce = formatics, vol. 3, no. 3, pp. 171–195, 2005.
2, c f = 4; ce = 2, c f = 8, respectively. Note that the last [15] W. Chen, G. Ren, J. Zhang, and J. Wang, “Smooth transi-
setup violates the cmax constraint. tion between different gaits of a hexapod robot via a cen-
tral pattern generators algorithm,” Journal of Intelligent &
Robotic Systems, vol. 67, no. 3, pp. 255–270, Sep 2012.
Czech Science Foundation (GAČR) under research project [16] L. Righetti and A. J. Ijspeert, “Design methodologies for
No. 18-18858S. The support of the Grant Agency of the central pattern generators: an application to crawling hu-
CTU in Prague under grant No. SGS16/235/OHK3/3T/13 manoids,” in Robotics: Science and Systems, 2006, pp.
to Rudolf Szadkowski is also gratefully acknowledged. 191–198.
[17] K. Doya and S. Yoshizawa, “Adaptive neural oscillator
using continuous-time back-propagation learning,” Neural
References Networks, vol. 2, pp. 375–385, 12 1989.
[1] E. Marder and D. Bucher, “Central pattern generators [18] N. Porcino, “Hexapod gait control by a neural network,” in
and the control of rhythmic movements,” Current Biology, IEEE International Joint Conference on Neural Networks
vol. 11, no. 23, pp. R986–R996, 2001. (IJCNN), 1990, pp. 189–194.
[2] E. Marder, D. Bucher, D. J. Schulz, and A. L. Taylor, “In- [19] T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide
vertebrate central pattern generation moves along,” Current the gradient by a running average of its recent magnitude,”
Biology, vol. 15, no. 17, pp. 685–699, 2005. COURSERA: Neural networks for machine learning, vol. 4,
no. 2, pp. 26–31, 2012.
[3] A. J. Ijspeert, “Central pattern generators for locomotion
control in animals and robots: A review,” Neural Networks, [20] J. Mrva and J. Faigl, “Tactile sensing with servo drives
vol. 21, no. 4, pp. 642–653, 2008. feedback only for blind hexapod walking robot,” in 10th
International Workshop on Robot Motion and Control (Ro-
[4] R. D. Beer, H. J. Chiel, and J. C. Gallagher, “Evolution and
MoCo), 2015, pp. 240–245.
analysis of model CPGs for walking: II. General principles
and individual variability,” Journal of Computational Neu- [21] E. Olson, “AprilTag: A Robust and Flexible Visual Fiducial
roscience, vol. 7, no. 2, pp. 119–147, 1999. System,” in IEEE International Conference on Robotics
and Automation (ICRA), 2011, pp. 3400–3407.
[5] H. Yu, H. Gao, L. Ding, M. Li, Z. Deng, and G. Liu,
“Gait Generation With Smooth Transition Using CPG- [22] M. M. Williamson, “Robot arm control exploiting natural
Based Locomotion Control for Hexapod Walking Robot,” dynamics,” Ph.D. dissertation, Massachusetts Institute of
IEEE Transactions on Industrial Electronics, vol. 63, no. 9, Technology, 1999.
pp. 5488–5500, 2016.