<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Intelligent Control of Morphing Aircraft Based on Soft Actor- Critic Algorithm 1</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shaojie Ma</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xuan Zhang</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuhang Wang</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Junpeng Hui</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhu Han</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Beijing Institute of Space Long</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Key Laboratory of Digital Earth Science Aerospace Information Research Institute Chinese Academy of Sciences Beijing</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Research and Development Center, China Academy of Launch Vehicle Technology</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <fpage>148</fpage>
      <lpage>154</lpage>
      <abstract>
        <p>Morphing aircraft can optimize its flight performance by changing aerodynamic shape. However, the deformation comes up with a great challenge to the control system. The most outstanding characteristics are strongly nonlinear and large uncertainty. Therefore, an intelligent control method is proposed based on Soft Actor-Critic algorithm. Firstly, the state space, action space and reward function required by the algorithm are designed. Then the training efficiency of the algorithm is improved through the way of network pre-training. The mathematical simulation proves that the control strategy can keep the altitude and velocity stable during deformation, and also has strong robustness to the uncertainty caused by deformation and complex external interference.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Intelligent control</kwd>
        <kwd>Morphing aircraft</kwd>
        <kwd>Flight control</kwd>
        <kwd>Soft Actor-Critic</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Morphing aircraft is a type of aircraft that can change its aerodynamic shape independently
according to different flight states and task requirements. The aerodynamic parameters and structural
parameters of morphing aircraft change nonlinearly during deformation, which makes the aircraft
dynamics model have strong nonlinear. The movement between wings and body, as well as the air
would produce additional disturbance, which makes the model have great uncertainty.</p>
      <p>
        For the flight control problem of morphing aircraft, the commonly used method is switching linear
variable parameter robust control based on linear model [
        <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
        ]. However, the linearization would lose
the nonlinear characteristics of morphing aircraft model partly. Therefore, some methods based on
nonlinear control have become the mainly researched method [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Reference [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] realized adaptive
control of morphing aircraft based on dynamic inverse control, but such methods also have the problem
of high dependence on model accuracy. Therefore, Reference [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] designed a controller based on the
active disturbance rejection control theory, which has strong robustness to disturbances during
deformation. But the parameters of active disturbance rejection control are too many, which increases
the complexity of the controller design.
      </p>
      <p>
        With the development of intelligent control theory, deep reinforcement learning is increasingly
applied to complex control tasks, and shows good performance [
        <xref ref-type="bibr" rid="ref6 ref7">6-7</xref>
        ]. Reference [8] applied Soft
ActorCritic algorithm to fault-tolerant control of aircraft. Reference [9] designed a composite controller
based on deep deterministic policy gradient algorithm and traditional controller. Reference [10]
proposed a fixed-time disturbance rejection controller which set parameters assisted by twin delayed
deep deterministic policy gradient algorithm.
      </p>
      <p>Based on this, this paper proposes a controller for morphing aircraft based on Soft Actor-Critic
algorithm. Taking a variable sweep UAV as the object, Firstly, a longitudinal mathematical model is
established considering its multi-rigid body structure. Then the state space, action space, reward
function and network structure required by the algorithm are designed under the framework of Markov
V = (− X + P cos − mg sin + Fsx ) / m

 = (Y + P sin − mg cos − Fsy ) / (mV )

h = V sin

 = z

z = ( M z + M sz − Sx g cos − Izz ) / Iz
 X = Cx0 ( ) + Cx ( ) + Cx 2 ( ) 2  QS


Y = Cy0 ( ) + Cy ( )  QS

M z = Cm0 ( ) + Cm ( ) + Cm ( ) z + Cm ( )z  QSL
decision process. Finally, the control accuracy and strong robustness of the proposed control strategy
are verified by mathematical simulations.</p>
    </sec>
    <sec id="sec-2">
      <title>2. MATHEMATICAL MODEL OF MORPHING AIRCRAFT</title>
      <p>In this paper, a class of variable sweep aircraft is considered. The model is similar with the research
in reference [11-12]. The longitudinal motion model of the morphing aircraft can be described by
()
(2)
(3)</p>
      <p>Where V , , , h denote the velocity, flight path angle, angle of attack and altitude, respectively.
 , z denote the angle and angular rate of pitch, respectively. m , Iz represent the mass and moment
of inertia of the aircraft, respectively. g is the gravitational acceleration. P is the thrust of the engine.
X , Y , M z denote the lift force, drag force and pitch moment, respectively, which are given as</p>
      <p>Where Q = 0.5V 2 is dynamic pressure, S is the wing surface, L is the mean aerodynamic chord,
Cx0 ( ) , Cx ( ) , Cx 2 ( ) , Cy0 ( ) , Cy ( ) , Cm0 ( ) , Cm ( ) , Cm ( ) , Cm ( ) are the aerodynamic
derivatives which can be formulated as polynomial functions of the sweep angle  0, 45 .</p>
      <p>Fsx , Fsy , Msz are the inertial forces and moment caused by deformation, and Sx is the static
moment distribution in the body frame varies with the sweep angle  , which are given as
Fsx = (z sin +z2 cos ) Sx + 2Sxz sin − Sx cos
Fsy = (z cos −z2 sin ) Sx + 2Sxz cos + Sx sin

M sz = (V sin +V cos −Vz cos ) Sx

Sx = 2m1r1 + m3r3
Where m , m3 represent the mass of the wings and body of the aircraft, respectively. r1 , r3 denote
1
the position of related components in the body frame.</p>
    </sec>
    <sec id="sec-3">
      <title>3. DESIGN OF CONTROLLER BASED ON SOFT ACTOR-CRITIC ALGORITHM 3.1.</title>
    </sec>
    <sec id="sec-4">
      <title>Principle of Soft Actor-Critic Algorithm</title>
      <p>Soft Actor-Critic (SAC) algorithm is a deep reinforcement learning algorithm based on Actor-Critic
(AC) framework, and use deep neural network to represent policy  and action-state value
function Q ( s, a) . SAC uses stochastic network as policy network  ( s | ) , which outputs the average
and variance of the action and obtains the action instruction through sampling, so as to improve the
exploration of the algorithm. SAC uses two critic networks Qi ( s, a |Qi ) to reduce the estimation error
of Q-function, which is inherited from Double Q-Learning. Moreover, as an off-policy algorithm, SAC
sets replay buffer and two target critic networks Qi ( s, a |Qi ) . In addition, SAC encourages more
exploration by maximizing the cumulative reward of entropy regularization rather than just the
cumulative reward. Which makes it become a kind of widely used continuous control reinforcement
learning algorithm.</p>
      <p>SAC selects the minimum value of the two target critic networks when updating the Bellman
equation, so the loss function of critic network can be given as</p>
      <p>Where N denotes batch size,  Q is the learning rate of critic network, at+1 represents the next action
corresponding to the next state,  represents the temperature parameter, similarly the loss function of
policy network can be given as

L ( Qi ) =


 Qi,t+1 =  Qi,t + QQi L ( Qi )
1 N</p>
      <p> ( y j − Qi ( s j , a j |Qi ))
N j=1</p>
      <p>2
 yt = rt+1 +  min (Qi ( st+1, at+1 | Qi )) − log ( st | )
J ( ) =   log ( st | ) − mini=1,2 (Qi ( s j , a j ))

 1 N
 J ( ) =    ( s j | ) a J ( )
 N j=1

 ,t+1 =  ,t +  J ( )
 J ( ) =   − log ( st | ) − 

 t+1 =  t +  J ( )</p>
      <p>Where  is the learning rate of policy network. Besides, SAC also provides a method to adjust the
temperature parameter automatically, and the loss function can be given as</p>
      <p>Where  is target entropy, which can dynamically find the lowest temperature that still ensures a
certain minimum entropy,  is the learning rate of  . SAC updates the target network by exponential
smoothing rather than direct replacement like DDPG, which can make the target network update more
slowly and stably, and improve the stability of algorithm.
3.2.</p>
    </sec>
    <sec id="sec-5">
      <title>Design of Controller</title>
      <p>The longitudinal plane of the aircraft is controlled by altitude and velocity. Due to the complex
continuous action space of the aircraft, it is difficult for the randomly initialized policy network to
ensure flight stability, and the quality of the samples collected at the initial training episode is poor
which would result that the training efficiency is extremely low. Therefore, the network pre-training is
considered in this paper. Firstly, the traditional controller is fitted through deep learning, and the deep
neural network learned is used as the initial policy network of SAC. The training structure is shown in
Figure 1. The policy network serves as the aircraft controller directly, and the action from network is
the command of the aircraft control actuator.</p>
      <p>The control model is transformed into Markov decision process, then the state space, action space,
reward function and deep neural network structure are designed under this framework.</p>
    </sec>
    <sec id="sec-6">
      <title>3.2.1. State space and action space</title>
      <p>Drawing on the traditional controller design ideas, and considering the influence of deformation on
the model, the seven-dimensional state vector is designed as</p>
      <p>The actuator of altitude channel is mainly elevator, and the velocity channel is mainly adjustable
thrust engine, so the action vector is designed as
s = h, h, V , V , ,z , </p>
      <p>a =  ,T </p>
    </sec>
    <sec id="sec-7">
      <title>3.2.2. Reward function</title>
      <p>In order to ensure that the aircraft can accurately track altitude and velocity commands, and reduce
the control energy demand, the reward function is designed as
r = h h + V V +   + T  T + 1 + 2 + d
(7)
(8)
(9)</p>
      <p>Where first four are the penalty related to tracking error, deflection angle of elevator, and the thrust.
When the height tracking error, velocity tracking error, deflection angle of elevator and thrust increase,
the penalty value increases,  i (i = h,V , ,T ) is the weight value, respectively. The last two are sparse
reward for tracking accuracy. When the tracking error is less than the threshold, the reward is applied.
 d is the penalty of states divergence. In this paper, when the height tracking error is greater than
500m, it would be judged that the states divergence, and the episode would be ended.</p>
    </sec>
    <sec id="sec-8">
      <title>3.2.3. Deep neural networks</title>
      <p>All the networks used in this paper are back propagation neural network. The input layer of the
policy network has 7 neurons corresponding to the 7-dimensional state vector, the hidden layer has 2
fully connected layers, both composed of 256 neurons, and the activation function are ReLU. The
output layer is composed of the mean and the variance of the action. The two critic networks have the
same structure. The input layer has 9 neurons corresponding to the 7-dimensional state vector and the
2-dimensional action vector; the hidden layer has 3 fully connected layers, all composed of 64 neurons;
and the activation function are ReLU, too. The output layer has 1 neuron corresponding to the
actionstate value function.</p>
    </sec>
    <sec id="sec-9">
      <title>4. NUMERICAL SIMULATION</title>
      <p>The initial simulation states are h0 = 1000m  5m , V0 = 30m / s  5m / s , 0 = 0.995 1 ,
0 = 0.995 1 , and the initial altitude and velocity command are 1000m, 30m/s, respectively. And the
change of sweeping angle is shown in Figure 2.</p>
      <p>The control step is 10ms, the network updating step is 100ms, and the simulation time of each
episode is 100s. The algorithm training parameters and the weights of the reward function are shown in
TABLE 1.</p>
    </sec>
    <sec id="sec-10">
      <title>Control Performance Analyses of the Controller</title>
      <p>In order to verify the effectiveness of the control policy obtained by training, the simulation
verification under nominal state and deviation state is carried out based on the longitudinal plane
motion model of the morphing aircraft.</p>
    </sec>
    <sec id="sec-11">
      <title>4.2.1. Simulation under nominal states</title>
      <p>Figure 4 shows the altitude and velocity tracking results, the deflection angle of elevator and the
thrust. The altitude command changes from 1000m to 1050m, and the velocity is always 30m/s. The
blue line is for SAC optimized controller, green dotted line is for pre-training controller.</p>
      <p>The integral absolute error of altitude and velocity before optimization are 167.1219m, 114.5735m/s,
respectively. After optimization they are reduced to 7.4009m, 7.0559m/s, respectively. The control
accuracy has been greatly improved. Besides, the impact of deformation is greatly reduced.</p>
    </sec>
    <sec id="sec-12">
      <title>4.2.2. Simulation under deviation states</title>
      <p>In order to verify the robustness to complex external disturbances of the control policy proposed in
this paper. 20% aerodynamic deviation, 15% structural deviation, and 10% density disturbances are
considered. Figure 5 shows the tracking results of altitude and velocity.</p>
      <p>The integral absolute error of altitude and velocity under deviation states are 10.2310m, 7.9277m/s,
respectively. From this show that the control policy trained in this paper can achieve stable control
under the deviation states and ensure the altitude and velocity accuracy in the deformation transition
process, which proves its robustness to external disturbances.</p>
    </sec>
    <sec id="sec-13">
      <title>5. Conclusions</title>
      <p>Aiming at the problems of strong nonlinear dynamics model of deformed aircraft and complex
internal and external interference factors in the deformation process, taking a class of variable swept
aircraft as an example, the height and velocity controller design of deformed aircraft was carried out
based on SAC deep reinforcement learning algorithm. Network pre-training was adopted to ensure
stable control at the initial stage of algorithm training and improve sample quality. The simulation
results show that the proposed control policy can improve the accuracy of altitude and velocity control
greatly, and has strong robustness to both internal and external uncertainties of the model during
deformation. However, this paper only carried out experimental verification based on mathematical
simulation, and it needs to continue to carry out practical application verification.</p>
    </sec>
    <sec id="sec-14">
      <title>6.REFERENCES</title>
      <p>[8] K Dally, E V Kampen, “Soft actor-critic deep reinforcement learning for fault tolerant flight
control,” AIAA Scitech 2022 Forum, San Diego, CA&amp;Virtual, January , 2022.
[9] X Huang, J Liu, Ch Jia, et al, “Deep deterministic policy gradient algorithm for UAV control,”</p>
      <p>Acta Aeronautica et Astronautica Sinica, vol 42, pp. 524688, 2021.
[10] Y Liu, H Wang, T Wu, et al, “Attitude control for hypersonic reentry vehicles: An efficient deep
reinforcement learning method,” Applied Soft Computing, vol 123, pp. 108865, 2022.
[11] Z Wu, J Lu, Q Zhou, et al. “Modified adaptive neural dynamic surface control for morphing
aircraft with input and output constraints,” Nonlinear Dyn, vol 87, pp. 2367–83, 2017.
[12] L Gong, Q Wang, Ch Hu, et al, “Switching control of morphing aircraft based on Q-learning,”
Chinese Journal of Aeronautics,vol 33, pp. 672–687, 2020.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K</given-names>
            <surname>Boothe</surname>
          </string-name>
          ,
          <string-name>
            <surname>K Fitzpatrick</surname>
          </string-name>
          , R Lind, “
          <article-title>Controllers for disturbance rejection for a linear input-varying class of morphing aircraft,”</article-title>
          46th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics &amp; Materials Conference, Austin, Texas, April,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W</given-names>
            <surname>Jiang</surname>
          </string-name>
          , Ch Dong,
          <string-name>
            <given-names>T</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q Wang</surname>
          </string-name>
          , “
          <article-title>Smooth switching LPV robust control for morphing aircraft,” Control and Decision</article-title>
          , vol
          <volume>31</volume>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>72</lpage>
          , January,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M</given-names>
            <surname>Ran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ch</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H</given-names>
            <surname>Liu</surname>
          </string-name>
          , et al, “
          <article-title>Research status and future development of morphing aircraft control technology</article-title>
          ,
          <source>” Acta Aeronautica et Astronautica Sinica</source>
          , vol
          <volume>43</volume>
          , pp.
          <fpage>527449</fpage>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T</given-names>
            <surname>Lombaerts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            <surname>Kaneshige</surname>
          </string-name>
          , S Schuet, “
          <article-title>Dynamic inversion based full envelope flight control for an VTOL vehicle using a unified framework,” AIAA Scitech 2020 Forum, Orlando</article-title>
          , FL, January,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H</given-names>
            <surname>Song</surname>
          </string-name>
          , L Jin, “
          <article-title>Dynamic modeling and stability control of folding wing aircraft</article-title>
          ,”
          <source>Chinese Journal of Theoretical and Applied Mechanics</source>
          , vol
          <volume>52</volume>
          , pp.
          <fpage>1548</fpage>
          -
          <lpage>1559</lpage>
          , November,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <surname>R Mancuso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R</given-names>
            <surname>West</surname>
          </string-name>
          , et al, “
          <article-title>Reinforcement learning for UAV attitude control</article-title>
          ,
          <source>” ACM Transactions on Cyber-Physical Systems</source>
          , vol
          <volume>3</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H</given-names>
            <surname>He</surname>
          </string-name>
          , et al, “
          <article-title>Deterministic policy gradient with integral compensator for robust quadrotor control</article-title>
          ,
          <source>” IEEE Transactions on Systems, Man, and Cybernetics: Systems</source>
          , vol,
          <volume>50</volume>
          , pp.
          <fpage>3713</fpage>
          -
          <lpage>3725</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>