<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.3390/s21248282</article-id>
      <title-group>
        <article-title>Boosting Methods for Federated Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roberto Esposito</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mirko Polato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Aldinucci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Informatica, Universita` di Torino</institution>
          ,
          <addr-line>Corso Svizzera 185, 10145 Torino</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>51</volume>
      <fpage>02</fpage>
      <lpage>05</lpage>
      <abstract>
        <p>Federated Learning (FL) has been proposed to develop better AI systems without compromising the privacy of nal users and the legitimate interests of private companies. Initially deployed by Google to predict text input on mobile devices, FL has been deployed in many other industries. Since its introduction, Federated Learning mainly exploited the inner working of neural networks and other gradient descent-based algorithms by either exchanging the weights of the model or the gradients computed during learning. While this approach has been very successful, it rules out applying FL in contexts where other models are preferred, e.g., easier to interpret or known to work better. This paper proposes to leverage distributed versions of the AdaBoost algorithm to acquire strong federated models. In contrast with previous approaches, our proposal does not put any constraint on the client-side learning models and does not rely on inner workings of the learning algorithms used in the clients. We perform a large set of experiments on ten UCI datasets, comparing the algorithms in six non-iidness settings. Results show that the approach is eective, in the case of an IID setting, results are oen near to the theoretical optimum (i.e., the performances of AdaBoost on the complete dataset). In case of non-IID settings, results very much depend on the severity of the non-IIDness.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;federated learning</kwd>
        <kwd>cross-silo</kwd>
        <kwd>boosting</kwd>
        <kwd>adaboost</kwd>
        <kwd>ensemble learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Recent years have been characterized by crucial advances in articial intelligence and machine
learning systems, by the widespread availability of massive computational resources, and by the
availability of huge datasets. The consequent deployment of AI and ML methods throughout
many industries has been a welcome innovation that generated, nonetheless, newfound concerns
about the fairness of the results and the privacy of the involved data. As a result, it is oen
the case that data is dispersed into many isolated islands, and ML practitioners are forbidden
by laws and by the legitimate owners from collecting, fusing, and ultimately using the data to
improve their systems. While protecting the privacy of users and the competing advantages
of companies is arguably a fair objective, it nonetheless hampers the development of learning
models that, by leveraging all the available data, could make a dierence in the quality of life of
many people who are subjected to the decisions made using AI systems.</p>
      <p>
        Federated Learning (FL) has been proposed by McMahan et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as a way out of this
conundrum, i.e., as a way to develop better AI systems without compromising the privacy of
nal users and the legitimate interests of private companies.
      </p>
      <p>FL is a learning paradigm where multiple parties (clients) collaborate in solving a machine
learning task using their private data under the coordination of an aggregator (a.k.a. server or
coordinator). Each client’s local data is not exchanged or transferred to any participant. The
learning happens in rounds where model updates are computed by clients in insulation using
local and private data, then aggregated on the server, then broadcast to the clients for the next
round.</p>
      <p>
        There are two main federated settings: cross-device and cross-silo. In cross-device FL, the
parties can be edge devices (e.g., smart devices and laptops); they can be numerous (order of
thousands or even millions). Parties are considered not reliable and with limited computational
power. In the Cross-silo FL setting, the involved parties are instead organizations; the number
of parties is limited, usually in the range [
        <xref ref-type="bibr" rid="ref2">2, 100</xref>
        ]. Given the nature of the parties, it can also be
assumed that communication and computation are no real bottlenecks.
      </p>
      <p>
        Since its introduction [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Federated Learning mainly exploited the inner working of neural
networks and other gradient descent-based algorithms by either exchanging the weights of the
model or the gradients computed during learning. While this approach has been very successful,
it rules out applying FL in contexts where other models would be preferred, either because they
are more interpretable or known to work better. For instance in the case of medical studies, it is
oen the case that data comes in tabular form and examples are not numerous and distributed
among several medical centers that need to respect hard privacy constraints. Also, medical
doctors oen require to be able to interpret the inferred models. In these situations decision
trees or rule based system are oen justiably preferred to neural networks, but they cannot
be readily applied without collecting the data in one single place (e.g., [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), which makes the
whole process hard or impossible to implement due to the aforementioned privacy constraints.
      </p>
      <p>
        This is a position paper based on the work in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], where we proposed a series of cross-silo FL
algorithms for classication based on distributed versions of the AdaBoost algorithm [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7 ref8">4, 5, 6, 7, 8</xref>
        ]
allowing gradient-free federated learning. The algorithms pose minimal constraints on the
learning settings of the clients, thus allowing a federation of models not specically designed for
FL, such as decision trees and SVMs. While there is no technical barrier to using our approach
in cross-device federated learning settings, we have not conducted experiments to clarify the
issue. Our intuition is that the approach will best work with reliable clients that own many
examples, and when communication cost is not high. We, therefore, believe that they are best
suited for cross-silo settings and leave to future work investigating alternatives more kin to
cross-device environments.
      </p>
      <p>The main contributions of this work are:
i) we propose two new FL algorithms inspired by distributed AdaBoost literature, namely</p>
      <p>DistBoost.F and PreWeak.F;
ii) we introduce a third algorithm (AdaBoost.F) purposely developed for FL;
iii) we present a comprehensive evaluation of our solutions on ten UCI datasets and 6 data
distribution settings.</p>
      <p>For reproducibility purposes, all the code used to perform the experiments in this paper is
available at https://github.com/ml-unito/federation boosting.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        Ensemble Learning copes with the problem of strengthening the performances of a learning
algorithm by iterating it and combining the results. Ensemble Learning is oen employed by
practitioners because it requires almost no parameters and can be used along with o-the-shelf
algorithms to obtain strong models that are usually very robust to overtting [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. It is not
surprising then that, at the beginning of this century a large swat of research has been devoted
to the topic and that many avors of ensemble learning have been proposed during those years
(e.g., Bagging [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Boosting and its variants [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Stacking [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], ECOC [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], etc.). In this context,
the original boosting algorithm from Schapire [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is fundamental because by constructively
solving the weak learnability problem [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] spawned massive interest in the eld and posed the
basis for the development of AdaBoost [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], arguably the best-known algorithm in the eld. The
main idea in Schapire’s boosting algorithm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and hence in AdaBoost, under the assumption
that the base learning algorithm (the weak learner) will always strictly better than random
guess, one can leverage the distribution of the examples to force the weak learner to focus on
specic portions of the examples space. This can be then used to drive down the error of the
ensemble exponentially fast. AdaBoost appears particularly interesting as a candidate tool for
FL, as it eectively combines classiers which may be learned independently by the FL clients.
Furthermore, it could be argued that, as long as at least one of the clients can nd a model
which is slightly better than the random guess over the complete dataset, AdaBoost should be
able to drive the error of the ensemble on the training set to its theoretical minimum no matter
other factors (such as the possible non-iidness of the data distribution).
      </p>
      <p>
        Most of the FL literature focuses on gradient-based methods with very few exceptions.
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] proposes Federated Forest, a lossless federated version of the classical Random Forest
(RF) algorithm for vertically partitioned data. In this method, trees are built on node splits
selected by the aggregator that repeatedly asks clients for the impurity index and picks the
minimum. Federated Forest guarantees privacy preservation mainly using features/labels’
encoding. However, label encoding may fail in the case of binary classication tasks. A very
dierent approach to learning RFs is presented in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] where the federation is managed using
Blockchain technology that guarantees security even against adversarial participants. Vertical FL
(VFL) is the learning setting in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] that presents federated algorithm for classication/regression
trees based on Multi-Party Computation [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The authors also describe possible extensions of
the methodology to gradient-boosting trees and linear regression. In [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], the VFL setting is
considered in the context of kernel-based methods. The authors propose a privacy-preserving
protocol to build dot-product kernel matrices, showing the technique’s eectiveness on top-N
recommendation tasks. To the best of our knowledge, we are the rst to propose a federated
version(s) of AdaBoost where the (weak) classiers can be induced by any learning algorithm.
      </p>
      <p>
        As briey mentioned in the introduction, two of the algorithms presented in this paper are
based on a distributed version of AdaBoost, namely DistBoost [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and PreWeak [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] that we will
describe in Section 3. In [18], a distributed agnostic boosting algorithm is described. Dierently
from AdaBoost, the method uses a non-exponential multiplicative weight update rule that is
further adjusted using the Bregman projection. Here, we propose a federated adaptation of
AdaBoost, and we would argue that a similar methodology may also apply to the approach
in [18]. Boosting-based FL has been little studied in the literature. All published works on the
topic focus on gradient-boosting trees [19, 20] and most of them are designed for vertically
partitioned data [21, 22, 23, 24]. Homomorphic encryption and secret sharing schemes are used
to guarantee privacy, with the only exception of [21, 19] that use a dierential private approach.
The cross-silo setting is considered in both [21] and [24] (decentralized FL).
      </p>
      <p>We dierentiate from these previous works because our federated boosting algorithms can be
used with any weak learner, and our setting is horizontal FL. Even if our work focuses on a very
specic case (classication in a vertical setting) in federated learning, we believe the techiques
proposed could be extended and generalized to cover other learning tasks (e.g., regression,
clustering, . . . ) and FL settings.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Ensemble Learning based Federated Learning</title>
      <p>In this work, we set ourselves in a cross-silo FL setting, we assume that the clients are reliable
and have enough computational power as well as a stable and secure connection [25, 26]. With
these assumptions, our proposals expects a certain degree of synchronicity between the clients
and the aggregator. However, all the proposed techniques can easily handle clients’ failure, for
instance, by using a timeout on the clients that exclude their participation from that federated
round.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] Freund and Schapire formally proved that, provided that the weak learner can induce a
decision rule which is consistently better than random guessing, AdaBoost reduces the ensemble
error over the training set exponentially fast in the number  of the combined weak models. It is
worth emphasizing that this is the only constraint posed by the algorithm. As shown by Freund
and Schapire [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], this holds true even when the weak learner behaves adversarially towards the
ensemble learner. While this is not relevant in most scenarios, in the federated learning case,
the weak learners only work with a subset of the available data. In a sense, it can be thought
that malevolent learners try to make the ensemble learner fail on that part of the data (the data
they do not own). This argument shows that, as long as at least one client can produce a model
better than random guess over the entire dataset, a distributed version of AdaBoost, modied to
guarantee that no information about the local dataset is exchanged, should be able to drive the
ensemble error to its minimum exponentially fast. This is the main idea on the basis of our work.
      </p>
      <p>
        In the past, there have been several attempts to build distributed versions of AdaBoost [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
In these works, the main aim was to distribute the computation; there was no attempt to
provide privacy over the data and, indeed, all clients were supposed to hold the complete
dataset. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we have shown how to adapt two of these algorithms to work in a FL setting
and also proposed an additional original algorithm. The main contributions were to provide
mechanisms to cope with the fact that dierent clients hold dierent portions of the dataset,
which have repercussions over the way the distribution over the examples is handled (e.g., how
weights are normalized). For a detailed description of the working of the algorithms, we refer
to the original publication [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]; for details of their implementation in actual (not simulated) FL
environments, please refer to [27, 28]; here we only provide a brief summary of the ideas on
which the algorithms are based. A common trait of the algorithms is that care is put in ensuring
that the necessary statistics over the examples are computed in a privacy preserving way. To
do that, all clients maintain unnormalized statistics over the examples and communicate them
to the aggregator. The aggregator collect all statistics and uses them to compute a common
normalization factor. The normalization factor can then be used to properly compute the  
and   values that are central to the working of algorithms based on AdaBoost. The   terms
are then broadcasted to all clients so that they can update their local set of statistics and the
process repeats.
      </p>
      <p>DistBoost.F At each round, all clients build weak hypotheses over their local dataset; the
hypotheses are sent to the aggregator that forms a bagged ensemble and uses that as the
weak hypotesis for the current round. That weak hypothesis is transferred to each client
so that they can use it to measure the performances of the newly learnt hypothesis and
communicate them back to the aggregator (needed to let everyone to update the weights
distribution).</p>
      <p>PreWeak.F In an initial step all clients train an AdaBoost classier over their local datasets. In
this step a xed number  of weak hypotheses are built in each client without exchanging
any information with the aggregator. Once all clients complete, all the learnt weak
hypotheses are transmitted to the aggregator, which starts a global AdaBoost process. In
this step, only the weak hypotheses already learnt in the previous step are considered as
candidates to be added to the ensemble. At the end of each round the selected hypothesis
is communicated to the clients so to allow the computation of the statistics necessary to
maintain the global distribution of weights.</p>
      <p>AdaBoost.F At each round each client builds a weak hypothesis which is communicated to
the aggregator. The aggregator distribute these hypotheses to the clients so that they can
compute the necessary statistics over the local dataset. These statistics are then used to
pick the best weak hypothesis that is then added to the ensemble.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>
        We compare the federated algorithms introduced in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], namely DistBoost.F, PreWeak.F, and
AdaBoost.F, with the centralized algorithm SAMME [29] (multiclass AdaBoost). For all methods,
we x the number of weak learners (federated rounds)  = 300. As weak learners, we
employ Decision Trees with up to 10 leaves (as in [29]). However, it is worth mentioning that
the proposed algorithms are agnostic to the choice of the weak learner; better still, there is
nothing preventing building a system where each client adopts a dierent model. The simulated
federation contains 10 clients, which is a standard choice [26] in the cross-silo setting. We
assumed that all clients correctly participated in all rounds during the simulation.
      </p>
      <p>0</p>
      <p>letters
0</p>
      <p>
        We evaluated the methods on the following 10 datasets from the UCI [30] repository: adult,
kr-vs-kp, forestcover, splice, vehicle, segmentation, sat, pendigits, vowel, letter. The datasets have
been distributed across the clients using six dierent data distributions. Besides the iid case
(uniform data distribution), we also consider the following types of non-iidness: quantity skew,
prior shi (pathological, Dirichlet, and labels quantity), and covariate shi [26]. For more details
about the implementation of these data distribution please refer to [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It is worth noticing that
each client has at least two examples of dierent classes in every type of skewness.
      </p>
      <p>The methods have been compared using standard classication metrics like accuracy,
precision, recall, and F1. For space reasons, we only report the F1 score, which is the harmonic mean
of the precision and recall, and it considers how the data is distributed.</p>
      <p>Each experiment has been repeated ve times. The reported results are the averages (with
their standard deviation) over these runs. The python implementation of the methods and their
evaluation is available at https://github.com/ml-unito/federation boosting.
4.1. Results
We start by investigating how benecial are the federations built by the proposed algorithm. To
do that, we need to evaluate the performance of a possible competitor built only on local data.
Then, for each non-iidness type, we ran the SAMME algorithm on each client, using only the
local data for training and recorded the F1 score over a xed independent test set.</p>
      <p>Figure 1 shows, for all the clients and all the data distributions, the dierence in F1 score (ΔF1)
between the local run of SAMME (local SAMME in the following) and the best F1 score achieved
by one of the federated algorithms on the pendigits and the letters datasets. The lower (more
negative) ΔF1 is for a given point, the more benecial is the federation for the corresponding
client and data distribution setting.</p>
      <p>Dataset
adult
s
t
e
s
a
t
ad forestcover
y
r
a
n
i
b
kr-vs-kp
splice
vehicle
segmentation
s
t
e
s
a
t
a
d
ss sat
a
l
c
i
t
l
u
m
pendigits
vowel
letter</p>
      <sec id="sec-4-1">
        <title>Avg. rank</title>
      </sec>
      <sec id="sec-4-2">
        <title>Model</title>
        <p>Samme
DistBoost.F
PreWeak.F
AdaBoost.F
Samme
DistBoost.F
PreWeak.F
AdaBoost.F
Samme
DistBoost.F
PreWeak.F
AdaBoost.F
Samme
DistBoost.F
PreWeak.F
AdaBoost.F
Samme
DistBoost.F
PreWeak.F
AdaBoost.F
Samme
DistBoost.F
PreWeak.F
AdaBoost.F
Samme
DistBoost.F
PreWeak.F
AdaBoost.F
Samme
DistBoost.F
PreWeak.F
AdaBoost.F
Samme
DistBoost.F
PreWeak.F
AdaBoost.F
Samme
DistBoost.F
PreWeak.F
AdaBoost.F
Samme
PreWeak.F
DistBoost.F
AdaBoost.F</p>
        <p>Barring small dierences in the actual numbers, the two experiments narrate the same story.
The rst thing to notice is that participating in the federation is generally benecial to all clients,
especially in non-iid data distributions.</p>
        <p>An interesting observation is that, in the quantity skew scenario, clients with many examples
(the head of the power-law) can reach F1 scores that are even higher than the federation. This
is reasonable because those clients are close to having all the available data; i.e., they run in a
setting similar to running SAMME over the fused dataset, that is generally better than having
to deal with the split dataset scenario. We can also observe that the scenarios with a prior
shi (i.e., Labels Quantity, Dirichlet, and Pathological) are the most challenging ones. This is
particularly apparent for the label quantity skew and the pathological label skew where, by
design, we assign only a small subset of labels per client. We note that, contrary to what the
gure might suggest, in absolute terms the performances of local SAMME on the label quantity
skew case are worse than those in the pathological skew: the corresponding points (⊗ symbols)
appear upper (w.r.t. ) because the federation does not perform well in this particular case.
This is particularly apparent for the pendigits dataset where the label quantity skew is not as
detrimental to the performances as in the letters dataset.</p>
        <p>In the uniform data distribution case, the federation is only slightly useful (pendigits) and
slightly detrimental (letters).</p>
        <p>Table 1 provides all the average F1 scores (± standard deviation) for all methods, datasets,
and skewness. Overall, the performance of PreWeak.F and AdaBoost.F are signicantly better
than DistBoost.F. We can observe that, in general, the federation tends to achieve F1 scores
very close to the centralized SAMME on datasets with few labels (e.g., 2 and 3), even in non-iid
settings. Clearly, as the number of classes increases, the prior shi scenario becomes more and
more challenging. The Labels Quantity skew is the most demanding setting because each client
only has two labels. Thus, their weak classiers are not good enough to be boosted eectively.</p>
        <p>Overall, we believe that the evidence presented here is enough to conclude that the
approach is benecial and that DistBoost.F is not performing as well as the other two algorithms.
There is evidence, albeit not conclusive, that PreWeak.F outperforms AdaBoost.F in terms of
performances and that PreWeak.F might suer more than AdaBoost.F from overtting problems.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The possibility of applying federated learning beyond gradient-based methods may broaden
the adaptation of this methodology. In this paper, we exploit ideas from distributed boosting
literature to propose three algorithms DistBoost.F, PreWeak.F, and AdaBoost.F, which allow, for
the rst time ever, the federation of parties without putting constraints on the type of models
learned in the clients. Indeed, to the best of our knowledge, our proposal is also the rst to
allow each client to choose a dierent local model.</p>
      <p>Our experiments show that the federation works. The generalization error of the federation
is driven down by the three algorithms and, except in trivial cases, the federated model largely
outperforms the models that could have been learned locally. Experiments also show that
non-iid data distributions can harm the quality of the federated model. Specically, when an
extreme skew on the labels is present, the federation might suer, especially when the problem
is multi-class and the number of possible labels is large. We leave as future work a comparison
between our approach and traditional (gradient-based) federated algorithms. The comparison
would also allow us to assess how much the problems we observed in some non-iid settings are
specic to our methodology.</p>
      <p>This work opens the doors to many possible future directions. We aim to perform an in-depth
analysis of these algorithms’ security and privacy aspects in our future work. As mentioned,
we would like to compare their behavior against gradient-based alternatives.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>McMahan</surname>
          </string-name>
          et al.,
          <article-title>Communication-ecient learning of deep networks from decentralized data, in: Articial intelligence and statistics</article-title>
          , PMLR,
          <year>2017</year>
          , pp.
          <fpage>1273</fpage>
          -
          <lpage>1282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>F. D'Ascenzo</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>De Filippo</surname>
            , G. Gallone, G. Mittone,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Deriu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Iannaccone</surname>
            , A. ArizaSole´,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Liebetrau</surname>
          </string-name>
          , S. Manzano-Ferna´ndez, G. Quadri, et al.,
          <article-title>Machine learning-based prediction of adverse events following an acute coronary syndrome (praise): a modelling study of pooled datasets</article-title>
          ,
          <source>The Lancet</source>
          <volume>397</volume>
          (
          <year>2021</year>
          )
          <fpage>199</fpage>
          -
          <lpage>207</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Polato</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Aldinucci, Boosting the federation: Cross-silo federated learning without gradient descent</article-title>
          ,
          <source>2022 International Joint Conference on Neural Networks (IJCNN)</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Freund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Schapire</surname>
          </string-name>
          ,
          <article-title>Game theory, on-line prediction and boosting</article-title>
          ,
          <source>in: Proceedings of the ninth annual conference on Computational learning theory, 1996</source>
          , pp.
          <fpage>325</fpage>
          -
          <lpage>332</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Freund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Schapire</surname>
          </string-name>
          ,
          <article-title>A decision-theoretic generalization of on-line learning and an application to boosting</article-title>
          ,
          <source>Journal of computer and system sciences 55</source>
          (
          <year>1997</year>
          )
          <fpage>119</fpage>
          -
          <lpage>139</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Freund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schapire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Abe</surname>
          </string-name>
          ,
          <article-title>A short introduction to boosting</article-title>
          ,
          <source>Journal-Japanese Society For Articial Intelligence</source>
          <volume>14</volume>
          (
          <year>1999</year>
          )
          <fpage>1612</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lazarevic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Obradovic</surname>
          </string-name>
          ,
          <article-title>Boosting algorithms for parallel and distributed learning</article-title>
          ,
          <source>Distributed and Parallel Databases</source>
          <volume>11</volume>
          (
          <year>2002</year>
          )
          <fpage>203</fpage>
          -
          <lpage>229</lpage>
          . URL: https://doi.org/10.1023/A: 1013992203485. doi:
          <volume>10</volume>
          .1023/A:
          <fpage>1013992203485</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cooper</surname>
          </string-name>
          , L. Reyzin,
          <article-title>Improved algorithms for distributed boosting</article-title>
          ,
          <source>in: 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>806</fpage>
          -
          <lpage>813</lpage>
          . doi:
          <volume>10</volume>
          .1109/ALLERTON.
          <year>2017</year>
          .
          <volume>8262822</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          , Bagging predictors,
          <source>Machine learning 24</source>
          (
          <year>1996</year>
          )
          <fpage>123</fpage>
          -
          <lpage>140</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D. H.</given-names>
            <surname>Wolpert</surname>
          </string-name>
          , Stacked generalization,
          <source>Neural networks 5</source>
          (
          <year>1992</year>
          )
          <fpage>241</fpage>
          -
          <lpage>259</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E. B.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. G.</given-names>
            <surname>Dietterich</surname>
          </string-name>
          ,
          <article-title>Error-correcting output coding corrects bias and variance</article-title>
          ,
          <source>in: Machine learning proceedings 1995, Elsevier</source>
          ,
          <year>1995</year>
          , pp.
          <fpage>313</fpage>
          -
          <lpage>321</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>R. E. Schapire,</surname>
          </string-name>
          <article-title>The strength of weak learnability</article-title>
          ,
          <source>Machine learning 5</source>
          (
          <year>1990</year>
          )
          <fpage>197</fpage>
          -
          <lpage>227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , Federated forest (
          <year>2019</year>
          ). doi:
          <volume>10</volume>
          .1109/ TBDATA.
          <year>2020</year>
          .
          <volume>2992755</volume>
          . arXiv:arXiv:
          <year>1905</year>
          .10053.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>L. A. C. de Souza</surname>
            , G. Antonio F. Rebello,
            <given-names>G. F.</given-names>
          </string-name>
          <string-name>
            <surname>Camilo</surname>
          </string-name>
          , L. C. B.
          <article-title>Guimara˜es,</article-title>
          <string-name>
            <surname>O. C. M. B. Duarte</surname>
          </string-name>
          , Dfedforest:
          <article-title>Decentralized federated forest</article-title>
          ,
          <source>in: 2020 IEEE International Conference on Blockchain (Blockchain)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>90</fpage>
          -
          <lpage>97</lpage>
          . doi:
          <volume>10</volume>
          .1109/Blockchain50366.
          <year>2020</year>
          .
          <volume>00019</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Ooi</surname>
          </string-name>
          ,
          <article-title>Privacy preserving vertical federated learning for tree-based models</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>13</volume>
          (
          <year>2020</year>
          )
          <fpage>2090</fpage>
          -
          <lpage>2103</lpage>
          . URL: https://doi.org/10. 14778/3407790.3407811. doi:
          <volume>10</volume>
          .14778/3407790.3407811.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cramer</surname>
          </string-name>
          , I. B. Damga˚rd, J. B.
          <string-name>
            <surname>Nielsen</surname>
          </string-name>
          ,
          <source>Secure Multiparty Computation and Secret Sharing</source>
          , Cambridge University Press,
          <year>2015</year>
          . doi:
          <volume>10</volume>
          .1017/CBO9781107337756.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Polato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gallinaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Aiolli</surname>
          </string-name>
          ,
          <article-title>Privacy-preserving kernel computation for ver-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>