The simulator and neuro-controller for small satellite at- titude development Nataliya Shakhovska[0000-0002-6875-8534], Dmytro Kozii, Pavlo Mukalov Lviv Polytechnic National University, Lviv 79013, Ukraine nataliya.b.shakhovska@lpnu.ua, dmytruto@gmail.com, pmykalov@gmail.com Abstract. The paper describes the realization of simulator and neuro-controller for small satellite attitude. The main types of neuro-controllers are analyzed. A problem of proper neuroemulator choosing for neurocontroller training is ana- lyzed. A new criterion on the basis of local control gradients analysis for input neuroemulator's neurons is proposed. Results of numerical simulations of neu- rocontroller training by a gradient descent method is given. Keywords: satellite, neuro-controller, learning rate, attitude 1 Introduction Neural control is a kind of adaptive control when artificial neural networks (NN) are used as building blocks of control systems. Neural networks have a number of unique properties that make them a powerful tool for building control systems: the ability to learn from examples and to summarize data, the ability to adapt to changing proper- ties of the object of control and the environment, the suitability for the synthesis of nonlinear regulators. Over the past 20 years, a large number of neurological methods have been developed, the most popular among them are Model Reference Adaptive Neurocontrol and Adaptive Critics [2]. The method of neural control with a reference model, also known as a "circuit with neurotransmitter and neuro-controller" or "reciprocal distribution in time," was pro- posed in the early 1990s [1], [3 – 5]. This method does not require knowledge of the mathematical model of the control object. Instead, a separate neural network, a neu- romuscular, studies the direct dynamics of the control object and then it is used to calculate derivatives when training a neuro-controller. At the same time, the trained neuro-emulators with the lowest mean square error of the simulation of the control object usually chooses from the set of trained neuro-emulators. However, is this crite- rion best if the neural network is used for further training another neural network, connected sequentially to the first, and not actually for modeling the control object? The paper presents neurocontroller development for satellite rotation control. 2 State of arts NN was proposed in 1943. McCullock and Pitts as the result of studying the structure and activity of biological neurons. A typical structure of the automatic control system with the PID-regulator and the NN as an automatic adjustment unit is considered in the work [6]. NN acts as a func- tional transformation, which for each set of input signals the coefficients for the PID regulator are produced. The most complicated part of the design of an NN-based regulator is the training procedure, which reduces to the identification of unknown NN parameters, such as weighting factors and displacement of neurons. For NN train- ing, the gradient search method uses a minimum criterion function, which depends on the parameters of the neurons. The search process is inertial, at each iteration, the search for all coefficients of the network occurs: first for the output layer, then for the previous and so on to the first. The length of the learning process is a key issue when using NN methods for PID regulators [7]. In addition, when applying NN, there are difficulties due to the impos- sibility of predicting regulation errors for incoming actions that were not included in the set of training sequences by determining the structure of neurons in the network, the duration of training, the range and the number of training actions. The main purpose of NM training is to choose the weighting factors of such a net- work to ensure consistency between input and output values. The neuron with the input p = {p1, p2, ..., pr} is shown in Fig. 1. The initial value is equal to the scalar product of the vector W on the input vector p, the bias value b is added to the weighted sum of inputs [8]. Output signal is: n  w11  p1  w12  p 2  ...  w1R  p R  b (1) р1 w11 р2 w12 … а n Inputs p2 w1R  f b Fig. 1. Neuron structure The choice of NN architecture is to determine the number of layers, the number of neurons in each of the layers, the form of the activation function of each layer, and information about the topological links of the neurons. Single-layer NNs are not suit- able for solving complex problems [9], but combining several neurons into one or more layers has great potential. The two-layer NN, which in the first layer contains a sigmoidal activation function, and in the second one linear, can be trained to ap- proximate any function with finite number of breakpoints with arbitrary accuracy [9]. The purpose of identification is to determine the operator of the model, which con- verts the input action of the controlled object to the output value. Different identifica- tion methods are possible depending on the various forms of representation of mathematical models in the form of ordinary differential equations, difference equa- tions, convolution equations [10], and others. However, none of the proposed methods is universal. The paper [11] considers the use of NN as an alternative tool for the identification of dynamic objects. The use of NN is based on the fact that in practice modern elec- tric drives are multi-mass systems with nonlinear links. Relevant linearized models built based on transfer functions, cannot always adequately reflect the state of the electric drive in all modes of its operation. The equivalence of a nonlinear system and its linear approximation will be equal in a limited time interval, and when transition- ing the output system from one mode to another, it is expedient to use the lineariza- tion method and obtain a new linear system. The paper [12] proposed the use of recurrent multi-layer N with external inputs NARX. u(t) z-1 y (t) z-q z-1 z-q Fig. 2. The recurrent multilayer neural network NARX The training model is given as y ( n  1)  f ( y ( n ),..., y ( n  q  1), u ( n )),..., u ( n  q  1) , (2) where у(п) is output vector, u(n) is input vector, п is the discrete time moment, q is power of the system. Such a NN, which has feedback with single delays, allows constructing on its basis a model of a dynamic object of arbitrary complexity. Using this method requires veri- fication of trained NF for adequacy with the use of new data not included in the train- ing sample. Such NN is associated with the possibility of re-training NN [12]. The Matlab [13] Neural Network Toolbox application suite contains the most popular neurocontrollers (NPCs) with  Neural Predictive Control (NPC),  Nonlinear Auto Regressive Moving Average (NARMA-L2) model,  Model Reference Controller (MRC). In [14], a mathematical description of predictive neurorization using MATLAB sys- tem tools is presented. In [15], the NARMA-L2 controller is used for automatic con- trol of the vessel on a variable course. When solving the problem of guidance and stabilization of the armament of a light armored machine, the NARMA-L2 neuro regulator is used in the contour of speed. As the authors note, NARMA-L2 acts as a relay regulator, whose output is switched to opposite limits, resulting in significant fluctuations in speed (up to 40% of the maximum). However, these neuro-regulators are not connected with physical model of object. The purpose of this work is to build model and neuro-controller to control small satellite with default amount of reaction wheels. 3 Materials and methods The main tasks of the paper are to create: 1. Simple simulator of satellite rotations, controlled by 3 or 4 reaction wheels, placed in different configurations. The simulation model will be configurable and easy to read. 2. An Artificial Intelligence (AI) learning module which will trigger the simulator and learn autonomously from the behavior of the simulated satellite, how to control its rotations. 3. The AI module, after trained for different configurations of wheels, will get com- mands with desired 3D rotation speeds and control the wheels to achieve the de- sired rotation. 3.1 Satellite simulator design Simulator is developed using C++ programming language. The satellite simulator is created to solve the next tasks:  To provide physical model of physical object;  To provide physical model of satellite with reaction wheels for rotation control;  To provide possibility to control satellite using reaction wheels during simulation. Simulator is divided to the such layers of logical implementations:  Core of simulation,  Satellite simulation. Core is a general simulation that grants us encapsulated logic for creating and moving of material object. It also allows us to configure simulation and to log information about all objects in simulation. Satellite simulation extends material object logic with reaction wheels and physical facts (friction, gravity, gyroscope effect etc.). The class diagram is given in Fig. 3. Fig. 3. Satellite simulator class diagram Design layers:  Contracts shows main entities of simulator and grants low coupling between their implementations. Contracts consists of abstractions;  Core implements contracts. It contains primary physical model and Simulation entity.  Satellite simulation extends Core with a dynamic of reaction wheel and satellite. Entities:  Point - provides an abstract point for further implementations;  MassPoint - point which has mass and movement vector;  Object - provides enumeration of points which interact with each other;  ReactionWheel - inherited from MassPoint instances, is used for changing rotation speed of satellite by changing its angular momentum;  Satellite - inherited from Object, provides simulated Satellite of arbitrary form, which moves and rotates using thrusters(ForcePoint) and reaction wheels;  Simulator - provides enumeration of Object instances and configuration of scenario of their behavior. The sphere in the Fig. 4 is a space, which limits the set of material points of the ob- ject. The center of mass is note center of the sphere, because its coordinates depends of coordinates and masses off other points. Fig. 4. The center of mass explanation 3.2 Neuro-controller design The controller is created to solve the next tasks:  Generate samples of satellite rotations.  Train the Neural Network model, which can predict expected Energy on reaction wheel.  Set the energy on reaction wheel based on training result. During of training, the neural network must monitor and remember the dependence of the control signal u(k-1) on the next value of the reaction of the control object that was before in the state X(k-1). The values of the control signals and responses of the object are recorded and, on this basis, a training sample is formed. U   Pi , Ti i 1 : Pi   y (i) X (i  1)  , Ti  u (i ) M T (3) We used and desired reaction. In the training mode neural network must find and remember the dependence of control signal u ( k  1) , in state before S ( k  1) . When the object is controlled, the inverse neuro-emulator is connected as a controller and it is receiving the rr ( k ) value from input r ( k  1) : rr ( k )   r ( k  1) X ( k )  . T (4) The class diagram is given in Fig. 5. Fig. 5. Class controller diagram Inputs in the control network is the satellite state (speed for each axes). The output is the control signal (torque) u(t). This is energy level for each rotation wheel. We used mini-batch gradient descent algorithm for neural network training. The structure of neural network The neural network structure for this task looks like this:  Input layer – 3 neurons (for speed by x,y,z),  Hidden layer – 15 full-connected neurons with sigmoid activation function,  Output layer – n neuron with predicted energy level, where n is equal amount of rotation wheels,  The bias is used too. The architecture of neuro-controller is chosen experimentally and given in Fig. 6. Fig. 6. Neural network architecture We used mini-batch gradient descent in NN. The goal of the algorithm is to find model parameters (e.g. coefficients or weights) that minimize the error of the model on the training dataset. It does this by making changes to the model that move it along a gradient or slope of errors down toward a minimum error value. This gives the algorithm its name of “gradient descent.” Mini-batch gradient descent is a trade-off between stochastic gradient descent and batch gradient descent. In mini-batch gradient descent, the cost function (and there- fore gradient) is averaged over a small number of samples, from around 10-500. This is opposed to the SGD batch size of 1 sample, and the BGD size of all the training samples. Mini-batch gradient descent finally takes the best of both worlds and performs an update for every mini-batch of n training examples: θ=θ−η⋅∇θJ(θ;x(i:i+n);y(i:i+n)). (5) This allows us ─ to reduces the variance of the parameter updates, which can lead to more stable convergence; ─ can make use of highly optimized matrix optimizations common to state-of-the-art deep learning libraries that make computing the gradient w.r.t. a mini-batch very efficient. Common mini-batch sizes range between 50 and 256, but can vary for different applications. 4 Results 4.1 Stack of technologies For neuro-controller realization we used 1. Eigen to provide vectors, matrixes, quaternions of different dimensions and work- ing with them (it was mostly used in simulator) [16]. 2. MiniDnn to provide neural network for creating controller of a satellite. Parameters of NN is saved in NeuralConfig.h. These neural network parameters were chosen experimentally. We provided more than 500 training experiments with differ- ent neural network configuration. In the best attempts mean loss was be equal 0.013 and parameters there was: NUMBEROFSAMPLES 1000 NUMBEROFHIDDENLAYERS 1 HIDDENLAYERSLENGTH 15 LEARNINGRATE 0.0007 BATCHSIZE 200 EPOCH 40000 Table 1. Training results # BatchSize Epoch LearningRate Loss function 1 20 40000 0.0002 0.11 2 200 4000 0.0007 0.013 3 200 40000 0.002 0.008 4 200 40000 0.005 0.05 4 200 40000 0.001 0.07 Training time is appr 7 hours and 20 min. The computer configuration is given be- low: Intel Core i3 (3,4 Ghz), 2 cores, NVidia GeForce, GT630, 2Gb 5 Conclusions To sum up this article described how we could use neural networks for controlling satellites. Neural controllers is a very powerful method that allows us automate differ- ent processes and improve accuracy of its results. An experimental study of the proposed criterion of 500 neuro-controllers was con- ducted, which showed its effectiveness compared to the traditional method (Loss function value is less than 0.05) of selecting neurotransmitters based on the least square root error method on the test data voter. In the framework of further research, it is planned to test this criterion, along with other methods of neuro-control, which include the stage of preliminary neuro- identification of the control object: predictive model neuro management and hybrid neuro-PID control as well as using the Kalman cube filter. 6 References 1. Narendra, K. S., & Parthasarathy, K.: Identification and control of dynamical systems us- ing neural networks. IEEE Transactions on neural networks, 1(1), 4-27 (1990) 2. Prokhorov, D. V., & Wunsch, D. C.: Adaptive critic designs. IEEE transactions on Neural Networks, 8(5), 997-1007 (1997) 3. Feldkamp, L. A., & Puskorius, G. V..:Training controllers for robustness: multi-stream DEKF. In Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94) 4, 2377-2382 (1994) 4. Prokhorov, D. V.: Toyota Prius HEV neurocontrol and diagnostics. Neural Networks, 21(2-3), 458-465 (2008) 5. Haykin, S. S., Haykin, S. S., Haykin, S. S., Elektroingenieur, K., & Haykin, S. S.: Neural networks and learning machines (Vol. 3). Upper Saddle River: Pearson. (2009) 6. Kawafuku, R., Sasaki, M., & Kato, S.: Self-tuning PID control of a flexible micro-actuator using neural networks. In SMC'98 Conference Proceedings. 1998 IEEE International Con- ference on Systems, Man, and Cybernetics (Cat. No. 98CH36218) Vol. 3, 3067-3072 (1998). 7. Burakov, M. V., & Kurbanov, V. G.: Neuro-PID control for nonlinear plants with variable parameters. ARPN Journal of Engineering and Applied Sciences, 12(4), 1226-1229 (2017). 8. Anderson, D. F., Ermentrout, B., & Thomas, P. J.: Stochastic representations of ion chan- nel kinetics and exact stochastic simulation of neuronal dynamics. Journal of computa- tional neuroscience, 38(1), 67-82 (2015) 9. Zhernova, P. Y., Deineko, A. O., Bodyanskiy, Y. V., & Riepin, V. O.: Adaptive Kernel Data Streams Clustering Based on Neural Networks Ensembles in Conditions of Uncer- tainty About Amount and Shapes of Clusters. In 2018 IEEE Second International Confer- ence on Data Stream Mining & Processing (DSMP) 7-12 (2018) 10. Bodyanskiy, Y., Boiko, O., Zaychenko, Y., & Hamidov, G.: Evolving GMDH-neuro-fuzzy system with small number of tuning parameters. In 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) 1321-1326 (2017) 11. Ramachandran, R., Madasamy, B., Veerasamy, V., & Saravanan, L.: Load frequency con- trol of a dynamic interconnected power system using generalised Hopfield neural network based self-adaptive PID controller. IET Generation, Transmission & Distribution, 12(21), 5713-5722 (2018) 12. Lee, C. C., Sheridan, S. C., Barnes, B. B., Hu, C., Pirhalla, D. E., Ransibrahmanakul, V., & Shein, K.:. The development of a non-linear autoregressive model with exogenous input (NARX) to model climate-water clarity relationships: reconstructing a historical water clarity index for the coastal waters of the southeastern USA. Theoretical and Applied Cli- matology, 130(1-2), 557-569 (2017) 13. Medvedev V. S., Potjomkin V. G.: Nejronnye seti. MATLAB 6. Moscow, Dialog-MIFI, 496 p (2002) 14. Hwang, C. L., & Jan, C.: Recurrent-neural-network-based multivariable adaptive control for a class of nonlinear dynamic systems with time-varying delay. IEEE transactions on neural networks and learning systems, 27(2), 388-401 (2016) 15. Yang, Y., Xiang, C., Gao, S., & Lee, T. H.: Data‐driven identification and control of nonlinear systems using multiple NARMA‐L2 models. International Journal of Robust and Nonlinear Control, 28(12), 3806-3833 (2018) 16. Pukach, P., Il'kiv, V., Nytrebych, Z., Vovk, M., Shakhovska, N., & Pukach, P.: Galerkin Method and Qualitative Approach for the Investigation and Numerical Analysis of Some Dissipative Nonlinear Physical Systems. In 2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT) Vol. 1,. 143-146 (2018)