Adaptive neural network based control of balancing robot in real time mode A I Glushchenko1, V A Petrov1 and K A Lastochkin1 1 Automated and information control systems department, Stary Oskol technological institute n.a. A.A Ugarov (branch) NUST “MISIS”, Stary Oskol, Russia Abstract. The scope of this research is to control a balancing robot in real time mode. This problem is solved with the help of a neural network, which is trained online and used as a controller. A method to develop such controller is proposed. Particularly, the neural network structure is chosen, restrictions is developed to determine situations when to train the network online, an algorithm is proposed to define the sign of change in the network weights. The obtained controller is compared to a linearly quadratic regulator (LQR) using a real balancing robot on LEGO EV3 platform. Experiments are conducted in two modes. The first of them is to keep the robot at an unstable equilibrium mode. The second one is both to make the robot follow the user’s setpoint for the state coordinates and stabilize the plant. The obtained results show that the control quality is improved comparing to the LQR controller, since the system with the neural network is able to adapt to real conditions of the experiments. As far as the first type of experiments is concerned, the robot controlled by the network has covered the distance, which is by 1630 radians shorter comparing to the LQR controller. 1. Introduction In this research, the problem of a balancing two-wheeled robot control is solved. This robot is a physical implementation of an inverted pendulum on a trolley. Various methods to control such plant are developed [1-3]. However, this task has not been fully solved by now. At the same time, the efficiency improvement of the balancing robots control would allow to use them to create vehicles for people with disabilities, robotic loaders for storage facilities and planetary rovers for the space industry [4]. In general, the existing control algorithms of the considered plant can be divided into two groups. Classic control methods include PID [5], LQR controllers [6] and H∞ theory [7]. The main problem of their implementation for the real balancing robots is that the parameters of the above mentioned controllers are calculated using linearized plant models and not adjusted during operation. This may result in the control system instability due to the existence of the plant nonlinearities and non-stationarity, and changes in the environment conditions. To some extent, the above mentioned problem could be overcome with the help of the intelligent control methods. Among such methods, neural networks and fuzzy logic are commonly applied for the balancing robot control. They have the ability for online training and / or allow to take knowledge about the plant into consideration. They also have adaptive properties. Neural networks in a control loop are used both as regulators and, in order to adjust the existing controllers, neural network tuners. For example, a neural network adjuster of the PID controller for the balancing robot is developed in [8]. In [9, 10] a method of neural network controller synthesis is proposed. The disadvantages of these techniques are as follows. As for the systems using off-line training, it is the complexity of obtaining 168 of a training set (either an accurate model of the plant is needed, or the samples are formed with the help of an existing controller, and this will not allow to create a more accurate regulator). As for the systems using online training, it is the absence of restrictions on both the learning rate for the neural network and time moments when to train it. Such systems do not also take into account the features of the control objects in question (a priori knowledge about them). For these reasons, the considered control systems can be used for the models of the inverted pendulums, but their application to the real control object faces certain difficulties. Another promising intelligent method is a fuzzy logic, which is often implemented as fuzzy controllers [11-13]. The disadvantage of the fuzzy logic is that the values of the normalization coefficients of the input and output values of the fuzzy controller are to be experimentally found. It is also possible that these coefficients will have to be adjusted during plant functioning. Thus, only a small part of the above mentioned algorithms, both classical and intelligent, can be implemented for a real control object – the balancing robot. But, in general, the neural networks application seems to be the most promising to solve the considered problem because of their ability to approximate dependencies and be trained online. In this research, it is proposed to combine a neural network, functioning as a controller, with a base of conditions and restrictions on the online training of such network. This combination of intelligent approaches will provide the controller with the adaptive properties, and it will not have the above mentioned disadvantages. 2. Control object description A real balancing robot based on the LEGO EV3 platform has been selected as a control object. Its overall view is shown in Figure 1, its kinematic scheme (a side view) – in Figure 2, its top view – in Figure 3. Figure 1. Robot overall view. Figure 2. Robot kinematic scheme (side view). Figure 3. Robot top view. Considered control object is described by seven coordinates in the state space. θ (theta) is the mean angle of wheels turning (an average value between the angular positions θl and θr of the left and right wheels) θ = 0.5 · (θl + θr), θ’ (theta_dot) is the speed of the wheels rotation, θint (theta_int) is the integral of θ, ψ (psi) is the body pitch angle, ψ’ (psi_dot) is the speed of ψ value change, φ (phi) is the robot yaw angle, φ’(phi_dot) is the speed of φ value change. The task is to control all seven state 169 coordinates at the same time. In this case, the control action for the robot, which has only two actuators, is the vector of voltage for the left and right motors (u = (ul, ur)). 3. Neural network controller synthesis The first step of the neural network controller synthesis is the choice of the network structure, as well as the selection of input and output values. As it is mentioned above, the balancing robot is a multi-loop control object. So it requires simultaneous control of all seven state coordinates for normal operation. As far as the classic theory of automatic control is concerned, the controller for such objects is developed in the form of a certain matrix of coefficients for all components of the control system negative feedback. Such matrix is called a linearly quadratic regulator (LQR) [14]. This controller obtains the vector consisting of the control errors of all state coordinates E = [eθint eθ eψ eθ’ eψ’ eφ eφ’]T. Its output is a control action vector u = (ul, ur). The parameters matrix of the LQR controller, which components remain constants during the whole period of the plant functioning, is calculated on the basis of the control object state space model [15] by minimization of the squared optimality criterion (1).  J  0.5   ( E T QE  u T Ru)dt  min , 0 (1) where Q(7,7) and R(2,2) are positively defined identity matrixes. The structure of the neural network for the proposed controller is chosen in such a way that the mathematical operations in its output layer repeat such operations of the LQR controller. The control error vector E containing seven elements is sent to the input of the neural network, because the neural controller is to have no less information than the considered optimal controller has. This optimal controller is used as a basis to develop nonlinear adaptive neural network controller. So the number of input neurons is chosen to be seven. The hidden layer also requires seven neurons to transmit the vector E to the output layer. The number of output neurons corresponds to the number of the control actions used in the control system under consideration. It equals to two. The hidden and output layers have linear activation functions to repeat the mathematical operations of the LQR controller. The next step after choosing the structure of the neural network is to train it. There are different techniques to implement that [16]. And, first of all, they can be divided into methods of offline and online training. The aim of the first of them is to ensure the correct functioning of the network at the moment of its integration into the balancing robot control system. The best course of action in this case is to use the parameters values found for the classical controller – LQR. The network weights are artificially set in such a way that the elements of the weight matrix of the output layer LW(7,2) coincide with the corresponding elements of the parameters matrix of the LQR controller. The weight matrix of the hidden layer IW(7,7) is an identity matrix. Considering that the linear activation functions are used in the hidden and output layers, the biases are equaled to zero. Thus, the calculated matrix of weights of the output layer LW is shown as (2). 0.644 1.242 59.38 1.391 7.1 0.677 0.179  LW   - 0.179  . 0.644 1.242 59.38 1.391 7.1 - 0.677 (2) The neural network of the selected structure is depicted in Figure 4. Taking the non-stationarity of the control object into consideration, the next step of the controller synthesis is to arrange the online training of the neural network. 4. Online neural network training Online training of the chosen neural network is performed according to the backpropagation algorithm as one of the most widely used methods [16]. At the same time, the training error (according to which the weights are adjusted) is calculated as follows. The error is ½ of the sum of the squared distances between the current control object state coordinates value and the required ones – equation (3). 170 Figure 4. Neural network controller structure. It is proposed to calculate it using the coordinates θ’, ψ’, φ’, since their values are directly obtained from the sensors of the real robot (a gyroscope and two encoders). ET (t )  0.5  (e2 (t )  e2  (t )  e2 (t )) (3) Considering the real control object, the error value (3) is not possible to be reduced to zero due to the measurement error and physical features of the robot. So the permissible level Ne of the training error (3) should be selected. It has been equaled to eight for the experiments in this research due to the amplitude of the sensors noise. Thus, the basic rule used to execute the online training is formulated as follows: the neural network controller is trained if the total error ET(t) is greater than Ne units. Learning rates for the output and hidden layers are experimentally equaled to ηLW = 10-7; ηIW = 10-7 respectively. These values are not going to be corrected during experiments. The process of functioning of the balancing robot can be divided broadly into two main modes: 1) stabilization mode, when the control system is required to ensure the stability of the control object under the condition that the setpoint values for all state coordinates are equaled to nil; 2) the mode when the user defines the setpoint values of the coordinates θ’ and/or φ’. Each mode needs its own “best” set of the neural network weights (controller parameters) values. The control error of some coordinates can influence the control quality in different ways for different modes. So the algorithm to select the sign of change in neural network output layer weights is developed and shown in Figure 5. Here ET(t) is the training error (3), Ref θ’ is the setpoint value of θ’ state coordinate, ∆ωθint is the change in output layer weight, which is responsible for θint coordinate, ∆ωθ is the change in value of the output layer weight, which is responsible for θ coordinate, ∆ωθ’ is the change in value of the output layer weight, which is responsible for θ’ coordinate. This algorithm allows to adjust the current weight value so as to reduce the error (3) and provide the required quality of control, taking into account the current functioning mode of the control object. 171 Figure 5. Algorithm of sign selection of change in value of output layer weights. The algorithm shown in Figure 5 is based on data of LQR controller functioning obtained experimentally and a priori known data of the control object. A posteriori, it has been found that if the current robot functioning mode is to follow the user's setpoint, and the output layer weight for the control error of the coordinate θint is going to zero, then the control quality will be improved. But if the stabilization mode is current, then the same actions will have vice versa effect on the control quality. It is also a priori known that the signs of changes in output layer weights, which are responsible for θ, θint and θ’ coordinates, must be of the same sign. 5. Experiments with real balancing robot The robot control systems have been developed in the MatLab Simulink software. The connection of the LEGO EV3 microprocessor to the MatLab Simulink to conduct experiments with the real control object was established using an Ethernet cable. Discretization interval (Ts) of EV3 was 0.004 s. The same time interval was set for calculations during the simulations in Matlab. 5.1. Stabilization mode experiments The neural network controller has been required to provide better control quality in comparison with the LQR controller. The setpoint values for all the state coordinates were equaled to nil. The duration of the experiment was 195 seconds. Obtained results for six of seven state coordinates for LQR is shown in Figure 6, a neural network controller – in Figure 7. Transients curves for the coordinate φ’ for both controllers can be described as constant oscillations of one radian/s amplitude with mean value of zero radian/s and are not shown in Figure 6 and Figure 7. Evaluation of the proposed neural network controller effectiveness was made by comparison of the final values of the coordinates θint for both considered controllers. This coordinate showed the total distance covered by the robot during the experiment. As this is the stabilization mode, then the shorter the distance, the better. 172 Figure 6. Transients of state coordinates obtained with LQR controller (stabilization mode). Figure 7. Transients of state coordinates obtained with neural network controller (stabilization mode). 173 Having analyzed θint curves in Figure 6 and Figure 7, the conclusion was made that the robot controlled by the neural network controller covered a shorter distance (20 radians in 195 seconds) than the one with the LQR controller (1650 radians in 195 seconds). There is also a noticeable decrease in the absolute value of the final value of ψ coordinate. This increases the stability of the control object. Visual observations during the experiments with both controllers confirmed the data obtained from the sensors. Figure 8 shows a graph of the training error change throughout the experiment, and Figure 9 shows the change in the weight coefficients for neurons, which are responsible for the coordinates θ, θint, θ’. Figure 8. ET(t) curve for experiment with neural network controller. Figure 9. Change in output layer weights, which are responsible for θ, θint, θ’ coordinates (stabilization mode). 174 The graph in Figure 8 shows a decrease of ET(t) value after 35 seconds of the experiment. It is not depicted in Figure 8, but ET(t) curve for the LQR controller did not change its amplitude and kept the same form during the whole experiment as in Figure 8 before the 35 th second. The graph in Figure 9 illustrates the results of the online training according to the developed algorithm (Figure 5). Thus, experimental results confirm the effectiveness of the proposed neural network controller and the algorithm to choose the sign of the changes in weights for the stabilization mode. 5.2. Experiment combining stabilization and setpoint following modes – mixed mode This mode of the balancing robot functioning combines the two main modes described earlier. This means that at some moments the robot operates in the stabilization mode, while at other moments – in the mode of the user’s setpoint following. It is very common for real two-wheeled vehicles. The aim of this experiment was to test the ability of the neural network controller to reconfigure the weights according to the developed algorithm in Figure 5. In this case, the comparison of transient processes quality obtained with the help of the neural network and LQR controllers was not performed due to the complexity of accurate reproduction of the experiment. Figure 10 shows the transients for the control system with the neural network controller. The graph with θ (theta) curve is divided into sections, each of which corresponds to a certain mode – stabilization one (number “0”) and setpoint following one (number “1”). During the experiment, the θ’ setpoint value was changed from zero (stabilization mode) to one (setpoint following mode) and back. Figure 10. Neural network controller results obtained in mixed mode. Changes in the weights of the neurons, which were responsible for the coordinates θ, θint, θ’, in mixed mode with a division into corresponding sections for each particular mode (number “0” is the stabilization mode; number “1” – the setpoint following mode) are shown in Figure 11 and Figure 12. It can be concluded from the curves in these figures that: 1) the weights decreased in the stabilization mode, 2) the weights increased in the setpoint following mode. That corresponds to the logic of the developed algorithm depicted in Figure 3. Figure 13 shows the training error ET(t) curve with the division into the above described sections. For each mode section the final value of ET(t) became close to the required value of 8 units as a result of neural network training. The smallest value of error was achieved at time moments of 420-470 seconds. It can be concluded from Figure 10 that the best quality of control of the balancing robot is 175 achieved at these moments of time. This resulted in a minimum value of amplitude oscillations of θ’ and ψ’ coordinates. Figure 11. Change in weigh, which is responsible for θint, in mixed mode. Figure 12. Change in weighs, which are responsible for θ and θ’, in mixed mode. 176 Figure 13. Training error ET(t) in mixed mode. Thus, the results of the experiment in the mixed mode demonstrate the change in the output layer weights in accordance with the proposed algorithm and the reduction of the training error ET(t) making it close to the required value. These mean that the proposed neural network controller is effective to control the balancing robot functioning in different modes. 6. Conclusion In this research, the neural network controller synthesis method to control the balancing robot in real time was developed. The set of restrictions on the online training of such regulator was proposed, the algorithm to choose the sign of change in output layer weights was devised. The obtained method allowed to improve the quality of control of the robot both for the stabilization and setpoint following robot functioning modes. As for the stabilization mode, the experimental results showed the effectiveness of the developed controller as the control system with the neural network controller covered shorter distance (20 radian in 195 seconds) comparing to the LQR controller (1650 radian in 195 seconds). The scope of further research is to expand the base of restrictions on online network training through: 1) the development of stability assessment criterion for the system with the neural network controller of the balancing robot on the basis of the second Lyapunov’s method, 2) modification of the algorithm to choose the sign of change in the output layer weights. 7. References [1] Semenov M E, Solovyov A M and Meleshenko P A 2015 Elastic inverted pendulum with backlash in suspension: stabilization problem Nonlinear Dynamics 82 pp 677–688. [2] White W and Fales R 1999 Control of double inverted pendulum with hydraulic actuation: a case study Proc. American Control Conference (San Diego) (IEEE) pp 495–499. [3] Spong M W 1995 The swing up control problem for the acrobat IEEE Control Systems Magazine 15 pp 72–79. [4] Chan R P M., Stol K A and Halkyard C R 2013 Review of modelling and control of two- wheeled robots Annual Reviews in Control 37 1 pp 89-103 [5] Sung H C 2015 Balancing Robot Control and Implementation. Master's thesis (Texas: A & M University). [6] Sun L and Gan J 2010 Researching of two-wheeled self-balancing robot base on LQR combined with PID. Proc. 2nd International Workshop on Intelligent Systems and Applications (Wuhan) (IEEE) pp 1-5. 177 [7] Ruan X and Chen J 2010 H∞ robust control of self-balancing two-wheeled robot Proc. 8th World Congress on Intelligent Control and Automation (Jinan) (IEEE) pp 6524-6527. [8] Ren T J, Chen T C and Chen C J 2008 Motion control for a two-wheeled vehicle using a self- tuning PID controller Control Engineering Practice 16 3 pp 365-375. [9] Noh J S, Lee G H and Jung S 2010 Position control of a mobile inverted pendulum system using radial basis function network International Journal of Control, Automation and Systems 8 1 pp 157-162. [10] Jung S and Kim S S 2008 Control experiment of a wheel-driven mobile inverted pendulum using neural network IEEE Transactions on Control Systems Technology 16 2 pp 297-303. [11] Nasir A N K et al. 2011 Performance comparison between fuzzy logic controller (FLC) and PID controller for a highly nonlinear two-wheels balancing robot Proc. First International Conference on Informatics and Computational Intelligence (Bandung) (IEEE) pp 176-181. [12] Wu J, Zhang W 2011 Design of fuzzy logic controller for two-wheeled self-balancing robot 6th International Forum on Strategic Technology (Harbin) vol 2 (IEEE) pp 1266-1270. [13] Azizan H et al. 2010 Fuzzy control based on LMI approach and fuzzy interpretation of the rider input for two wheeled balancing human transporter 2010 Proc. 8th IEEE International Conference on Control and Automation (Xiamen) (IEEE) pp 192-197. [14] Zhou K, Doyle J C and Glover K 1996 Robust and optimal control (New Jersey: Prentice hall). [15] Yamamoto Y 2008 NXTway-GS Model-Based Design-Control of self-balancing two-wheeled robot built with LEGO Mindstorms NXT (Cybernet Systems Co., Ltd). [16] Reed R and Marks II R J 1999 Neural smithing: supervised learning in feedforward artificial neural networks (MIT Press). Acknowledgments This work was supported by the Russian Foundation for Basic Research. Grant No 18-47-310003. 178