<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Adaptive Federated Learning for Electric Power Inspection with UAV System 1</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yu Liang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruifan Huang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xun Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Junjie Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xinkai Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xuehe Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Artificial Intelligence, Sun Yat-sen University</institution>
          ,
          <addr-line>Zhuhai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <fpage>234</fpage>
      <lpage>241</lpage>
      <abstract>
        <p>With the rapid development of the national power grid, the demand for efficient and reliable power supply is increasing. As the labor cost and the size of the power grid are increasing, the Unmanned Aerial Vehicle (UAV) power inspection is a new and efficient way of detecting power grid abrasion. By analyzing the data collected by the UAVs, a smart detection and maintenance service can be provided. To improve the model robustness and accuracy, data collected from different companies and regions are required, which may violate the data privacy policy. As a distributed machine learning technique, Federated Learning (FL) can collaboratively train global models without sharing private data. In this article, in order to protect the data privacy between different systems, optimize models' accuracy and convergence performance for non-Independently-and-Identically-Distributed (non-IID) data, we propose an adaptive method that jointly adjusts the learning rate and gradient based on the idea of FL. By recording global gradient information and using the momentum to accelerate the training process, our method adaptively controls the local gradient and learning rate in training of local models, and be more robust to local minima. Finally, we verify the superiority of our model compared with the generic FL model for nonIDD data through experiments.</p>
      </abstract>
      <kwd-group>
        <kwd>Adaptive Federated Learning</kwd>
        <kwd>Non-IDD Data</kwd>
        <kwd>UAV</kwd>
        <kwd>Smart Grid</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>model parameters and intermediate results without exposing the local source data to each other. FL
approaches a unique way to balance the data sharing and data privacy protection, making data
"available but not visible".</p>
      <p>In previous distributed models, it is often assumed that the participants had
Independently-andIdentically-Distributed (IDD) dataset. However, such an ideal scenario is often not available in most
realistic scenarios. Participants are usually quite different from each other, as whose data are often
non-IDD in practical questions. For example, in the power grid distributed across provinces, the
climate in different regions causes different losses to cables. It also indicates that the server and
participants need to communicate and exchange more times to achieve the required model accuracy.
Unsatisfactory convergence performance, high communication cost and privacy guarantee also pose
challenges to the optimization of FL’s non-IDD training model.</p>
    </sec>
    <sec id="sec-2">
      <title>1.2. Related work and Our idea</title>
      <p>
        In the current research of FL, many training methods for models based on non-IDD data have been
proposed. FedAVG [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is the most classical and widely used federated optimization method, which
can effectively reduce the communication cost compared with the traditional stochastic gradient
decent (SGD) model. However, as it uses relatively static parameters, its convergence effect fluctuates
greatly in the face of different optimization problems, thus it does not always have good enough
convergence performance in the face of data with a higher heterogeneity [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        In response to this problem, some adaptive federated optimization methods have attracted
extensive attention. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] proposed dynamic learning rate (DLR), which improved FedAVG algorithm
by optimizing local learning rate to adapt the fading channel and realize efficient aggregation of
wireless data. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed an adaptive data enhancement framework for imbalanced distributed
training data to reduce communication traffic and accelerate convergence. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] synthesized the current
general ideas of adaptive optimization for FL and summarized the adaptive methods such as
FedADAM, FedADAGRAD, FedYOGI by adaptively adjusting the learning rate.
      </p>
      <p>All related works list above extend our research ideas, nevertheless, few articles discuss the
direction of integrated adaptive optimization of learning rate and gradient. In the training of deep
learning network, the gradient and learning rate are both factors of great significance. The
convergence performance of the model will be greatly improved theoretically by adapting them in
local training and global aggregation. More specific method will be presented in next section.
2.</p>
    </sec>
    <sec id="sec-3">
      <title>Adaptive Optimization</title>
      <p>
        Federated Learning (FL) was approached by B. McMahan et al. in 2016 as a decentralized
machine learning mechanism [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which is trained jointly by a central server coordinating a set of
distributed participating devices (which we refer to as clients). It avoids the direct aggregation of source data
and protects the privacy of user data.
      </p>
    </sec>
    <sec id="sec-4">
      <title>2.1. Generic Algorithm</title>
      <p>
        FedAVG algorithm is a classical federated learning algorithm [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This algorithm proposes the
basic idea of federated learning based on stochastic gradient decent algorithm (SGD).
      </p>
      <p>In each communication round, the client uses its source data locally to perform one-step gradient
descent on the current model to obtain the model parameters of the client:
where
is the gradient at iteration</p>
      <p>of client , is the learning rate of the model.
After receiving local parameters, servers perform model average operation to update global parameters:</p>
      <p>The updated global parameters are synchronized to all clients for the next round of local training,
and the process is repeated (Show in Figure 1).</p>
      <p>
        Compared with the generic FedSGD model, the Fe-dAVG algorithm is much more accurate and
robust to non-IID data to a certain extent. However, as it adopts static learning rate, gradient and other
parameters in local training, its convergence speed will still be slow in the case of more imbalanced
non-IDD data in practical problems. There is still room for its improvement in this case [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Recent times, the idea of adaptively updating static parameters has spawned many optimization
methods of FL, which can achieve faster convergence on non-IDD datasets while ensuring robustness.
FedADAM [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is one of the most advanced algorithms. Based on SGD model, it adaptively adjusts
the learning rate according to the momentum information of the gradient by tracking and calculating
the first moment and second moment of gradient parameter of the model, and use Adam’s update
method for iteration of its global parameter [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Compared with standard SGD, these features enable
FedADAM to converge faster and be more robust to local minima [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ][
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>2.2. Our Adaptive Optimized Method</title>
      <p>Communication cost plays a dominant role in the optimization of federated learning. We consider
reducing the number of communication rounds required for training the model by using additional
computation, so as to achieve faster convergence. Therefore, we refer to FedADAM algorithm and
propose a new adaptive method according to the idea of dynamic adjustment of static parameters such
as learning rate and gradient.</p>
      <p>The pseudocode of the Algorithm is presented in Algorithm 1 and abstractly show in Figure 2:</p>
      <p>Algorithm 1 shows our adaptive approach, is the fraction of clients participating in each round,
is the set of all clients, are hyperparameters of the model which could be customize
before training, denotes the learning rate. Therefore, clients are sampled at the beginning
of each global iteration. Assume the local parameter that the -th involved client received as , in
each iteration the global parameter is transferred thus is assigned as:
, where
, where</p>
      <p>Since the historical gradient information hasn’t been recorded in the first global iteration, the
gradient of the loss function would be applied as the replacement of estimated local gradient
function in the first epoch</p>
      <p>:
As the gradient of the loss function
of participated clients differ from each other,
which leads to a situation that FedAVG may have more divergent results and poor convergence when
it performs multiple local updates for non-IDD dataset. To mitigate the unstable convergence
performance of the model for non-IDD data in practical problems, the model could be adaptively adjusted
by introducing the estimated gradient function, which enables the model to achieve better results for
non-IDD data after multiple local updates in the correct update direction. After the first local update,
the central server calculates estimated global gradient function with the global parameters of
current and the previous global iterations as:</p>
      <p>where , which will be delivered to the participated clients in the next global
iteration for the update of the estimated gradient function of client at the remaining global iterations:</p>
      <p>As shown in Algorithm 1, the core idea of our adaptive method is to perform the update of
in each epoch
of client</p>
      <p>as follows:
where
is the gradient of the loss function with the local dataset
at each epoch, is
a hyper-parameter that we preset for adjusting
. Then the update of local parameters at each
epoch can be formulated as:</p>
      <p>
        In addition, in order to accelerate the convergence speed and improve the performance of the
model for non-IDD data, -th epoch value of local parameters would be used to calculate
according to the idea of Adam [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]:
ter local updates from the
average of the local
      </p>
      <p>During the aggregation phase, the central server receives the parameters
and
afinvolved clients, and aggregates their values to calculate the weighted
Figure 3 Testing accuracy of FedAVG, FedADAM, Our Adaptive Method over MNIST dataset, with
non-IDD,IDD data
Figure 4 Testing accuracy of FedAVG, FedADAM, Our Adaptive Method over CIFAR-10 dataset, with
non-IDD data
parameters of the involved clients, which can be formulated as:</p>
      <p>their values to calculate the weighted average of the local parameters of the involved clients, which
can be formulated as:</p>
      <p>At the end of each global iteration, the first moment estimation and second moment estimation
are calculated with Adam’s method, and finally the global parameters is updated by:
Such a design has the following advantages:
a) Source Data in most of the practical problems (e.g. Electric power inspection with UAV System
in our background) is often distributed inconsistently, which makes the model trained at the case of
non-IDD data. By utilizing the global gradient information to adaptively update local parameter, faster
and more robust convergence can be achieved.</p>
      <p>b) According to the idea of FedADAM, we calculated the moment estimation using gradient
information recorded in the history to adaptively regulate learning rate of training. It should receive
much better effect in accelerating network training and inhibiting data oscillations.</p>
      <p>c) The adaptive adjustment of parameters such as gradient and learning rate does not involve direct
transmission of source data, which ensures data privacy of participants in federated learning.</p>
      <p>d) Our adaptive adjustment focuses on the optimization in case of non-IDD data, in which
participated clients can obtain more stable parameters through more local training. Through this way,
the whole model is more inclined to train on the client side, which will greatly reduce the
communication rounds and lower the cost of communication between the server and client.
3.</p>
    </sec>
    <sec id="sec-6">
      <title>Experiment and Comparison</title>
      <p>In the following, we will compare Our Adaptive Method, FedAVG and FedADAM to show the
superiority of Our Method under local multi-epoch training.</p>
      <p>
        There is no publicly available data about the electric power inspection with UAV on the Internet at
present. Without loss of generality, we selected MNIST [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] datasets and CIFAR-10 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] datasets
which are widely used in machine learning area for simulation respectively. We divide the dataset into
IDD data and non-IDD data, construct the CNN model, and select different combinations of local
epochs and communication round for training.
      </p>
      <p>In the experiment, we set the learning rate , and set for local update of
Method can achieve the expected result.</p>
      <p>By analyzing the results of the algorithm, the following conclusions can be obtained:
a) For IDD data and implementing only one local training, the performance of Our Adaptive
Method is not necessarily better than generic FedAVG and FedADAM, or even more mediocre. The
reason is the main purpose of our adaptive improvement is to optimize the convergence speed and
robustness of the model under the condition of non-IDD data. To save communication cost, more
local training epochs is required to run on client. Under this circumstance, this method can truly
attribute much faster and robust convergence with non-IDD data and significantly reduce the
communication cost between server and client at the meantime, which is more meaningful in solving
practical problems.</p>
      <p>b) As the local epochs gradually increase, it can be observed that both FedAVG and FedADAM
slow down the convergence speed to different degrees, while Our Adaptive Method keeps stable
convergence performance. This is because there are often large differences between local data among
participants for non-IID dataset. If the participated clients have gone through local updates for many
times, the differences between participating clients will become larger and larger, thus the
convergence efficiency will slow down a lot during server aggregation. However, Our Adaptive
Method ensures higher accuracy and faster convergence speed through adaptive adjustment of
parameters in this case.</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>In this article, we propose an adaptive FL method by using the momentum and adaptive gradient to
optimize the convergence performance of the model. To achieve fast convergence, we introduce the
new local gradient by considering the difference between the local gradient and historical global
gradient. Furthermore, by tracking the first and second moment estimation of the gradients for model
parameter, our algorithm adjusts the learning rate adaptively. At last, we perform simulation
experiments using MNIST dataset and CIFAR-10 dataset to verify that our model can achieve faster
convergence speed and higher accuracy than the widely used FL algorithm for non-IDD data. It is of
great significance to accelerate the convergence of the model and reduce the communication cost in
the practical problems of imbalanced data distribution.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledge</title>
      <p>This paper is funded by the Innovation and Entrepreneurship Training Program for College
Students of Sun Yat-sen University. (Project number: 202211500)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Sun</surname>
          </string-name>
          and
          <string-name>
            <given-names>L. Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al,
          <article-title>" Application of intelligent identification technology in UAV power inspection,"</article-title>
          <source>The Journal of New Industrialization</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>McMahan</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Moore</surname>
          </string-name>
          , et al.,
          <article-title>"Communication-efficient learning of deep networks from decentralized data,"</article-title>
          <source>in Artificial intelligence and statistics</source>
          , pp.
          <fpage>1273</fpage>
          -
          <lpage>1282</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Huang</surname>
          </string-name>
          , et al.,
          <article-title>" On the Convergence of FedAvg on Non-IID Data,"</article-title>
          <source>in International Conference on Learning Representations</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          , et al.,
          <article-title>"Learning rate optimization for federated learning exploiting over-the-air computation,"</article-title>
          <source>IEEE Journal on Selected Areas in Communications</source>
          , vol.
          <volume>39</volume>
          , no.
          <issue>12</issue>
          , pp.
          <fpage>3742</fpage>
          -
          <lpage>3756</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Duan</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Liu</surname>
          </string-name>
          , et al.,
          <article-title>"Self-balancing federated learning with global imbalanced data in mobile systems,"</article-title>
          <source>IEEE Transactions on Parallel and Distributed Systems</source>
          , vol.
          <volume>32</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>71</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Reddi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Charles</surname>
          </string-name>
          , et al.,
          <article-title>"Adaptive federated optimization,"</article-title>
          <source>in International Conference on Learning Representations</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D. P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ba</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>"Adam: A method for stochastic optimization,"</article-title>
          <source>in International Conference on Learning Representations</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8] Hsu and
          <string-name>
            <surname>T. M. H.</surname>
          </string-name>
          , et al.
          <article-title>"Measuring the effects of non-identical data distribution for federated visual classification." arXiv preprint arXiv:</article-title>
          <year>1909</year>
          .06335,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Mills</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          and
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.
          <article-title>"User-oriented multi-task federated deep learning for mobile edge computing." arXiv preprint arXiv:</article-title>
          <year>2007</year>
          .09236,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun and C. Cortes</surname>
          </string-name>
          , “
          <article-title>MNIST handwritten digit database</article-title>
          ,”
          <year>2010</year>
          . [Online]. Available:
          <article-title>MNIST handwritten digit database, Yann LeCun, Corinna Cortes</article-title>
          and Chris Burges
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Nair</surname>
          </string-name>
          , and G. Hinton, “Cifar-
          <volume>10</volume>
          (
          <article-title>Canadian institute for advanced research</article-title>
          ),”
          <year>2010</year>
          . [Online].
          <source>Available: CIFAR-10 and CIFAR-100</source>
          datasets (toronto.edu)
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>