<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Split Ways: Privacy-Preserving Training of Encrypted Data Using Split Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tanveer Khan</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Khoa Nguyen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonis Michalas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>RISE Research Institutes of Sweden</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Tampere University</institution>
          ,
          <addr-line>Tampere</addr-line>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Split Learning (SL) is a new collaborative learning technique that allows participants, e.g. a client and a server, to train machine learning models without the client sharing raw data. In this setting, the client initially applies its part of the machine learning model on the raw data to generate activation maps and then sends them to the server to continue the training process. Previous works in the field demonstrated that reconstructing activation maps could result in privacy leakage of client data. In addition to that, existing mitigation techniques that overcome the privacy leakage of SL prove to be significantly worse in terms of accuracy. In this paper, we improve upon previous works by constructing a protocol based on U-shaped SL that can operate on homomorphically encrypted data. More precisely, in our approach, the client applies Homomorphic Encryption (HE) on the activation maps before sending them to the server, thus protecting user privacy. This is an important improvement that reduces privacy leakage in comparison to other SL-based works. Finally, our results show that, with the optimum set of parameters, training with HE data in the U-shaped SL setting only reduces accuracy by 2.65% compared to training on plaintext. In addition, raw training data privacy is preserved.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Homomorphic Encryption</kwd>
        <kwd>Privacy-preserving Machine Learning</kwd>
        <kwd>Split Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Published in the Workshop Proceedings of the EDBT/ICDT 2023 Joint
Conference (March 28-March 31, 2023), Ioannina, Greece
$ tanveer.khan@tuni.fi (T. Khan); khoa.nguyen@tuni.fi
(K. Nguyen); antonios.michalas@tuni.fi (A. Michalas)
 https://www.amichalas.com/ (A. Michalas)</p>
      <p>0000-0001-7296-2178 (T. Khan); 0000-0002-0189-3520
(A. Michalas)</p>
      <p>
        © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Vision AI systems have proven surpass people in
recCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) ognizing abnormalities such as tumours on X-rays and
ultrasound scans [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In addition to that, machines can outperforms FL in terms of accuracy [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
reliably make diagnoses equal to those of human experts. Initially, it was believed that SL is a promising
apAll the evidence indicates that we can now build systems proach in terms of client raw data protection, however,
that achieve human expert performance in analyzing SL provides data privacy on the grounds that only
intermedical data – systems allowing humans to send their mediate activation maps are shared between the parties.
medical data to a remote AI service and receive an ac- Diferent studies showed the possibility of privacy
leakcurate automated diagnosis. An intelligent and eficient age in SL. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the authors analyzed the privacy leakage
AI healthcare system of this type ofers a great poten- of SL and found a considerable leakage from the split
tial since it can improve the health of humans but also layer in the 2D CNN model. Furthermore, the authors
have an important social impact. However, these oppor- mentioned that it is possible to reduce the distance
cortunities come with certain pitfalls, mainly concerning relation between the split layer and raw data by slightly
privacy. With this in mind, we have designed a system scaling the weights of all layers before the split. This
that analyzes images in a privacy-preserving way. More type of scaling works well in models with a large number
precisely, we show how encrypted images can be ana- of hidden layers before the split.
lyzed with high accuracy without leaking information The work of Abuadbba et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is the first study
exabout their actual content. While this is still far from ploring whether SL can deal with time-series data. It is
our big dream (namely automated AI diagnosis) we still dedicated to investigating (i) whether an SL can achieve
believe it is an important step that will eventually pave the same model accuracy for a 1D CNN model compared
the way towards our timate goal. to the non-split version and (ii) whether it can be used
Contributions The main contributions are: to protect privacy in sequential data. According to the
• We designed a simplified version of the 1D CNN model results, SL can be applied to a model without the model
presented in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and we are using it to classify the ECG classification accuracy degradation. As for the second
signals [8] in both local and SL settings. More specifi- question, the authors proved it is possible to reconstruct
cally, we construct a U-shaped split 1D CNN model and the raw data (personal ECG signal) in the 1D CNN model
experiment using plaintext activation maps (PAMs) sent using SL by proposing a privacy assessment framework.
from the client to the server. Through the U-shaped They suggested three metrics: visual invertibility,
dis1D CNN model, clients do not need to share the input tance correlation, and dynamic time warping. The
retraining samples and the ground truth labels with the sults showed that when SL is directly adopted into 1D
server – this is an important improvement that reduces CNN models for time series data could result in
signifiprivacy leakage compared to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. cant privacy leakage. Two mitigation techniques were
• We constructed the HE version of the U-shaped SL. In employed to limit the potential privacy leakage in SL:
the encrypted U-shaped SL, the client encrypts the ac- (i) increasing the number of layers before the split on
tivation map using HE and sends it to the server. The the client-side and (ii) applying diferential privacy to
advantage of the HE encrypted U-shaped SL over the the split layer activation before sending the activation
plaintext U-shaped SL is that the server performs com- map to the server. However, both techniques sufer from
putation over the EAMs. a loss of model accuracy, particularly when diferential
• To assess the applicability of our framework, we per- privacy is used. The strongest diferential privacy can
formed experiments on a heartbeat datasets (MIT- increase the dissimilarity between the activation map
DB [8]). We experimented with activation maps of 256 and the corresponding raw data. However, it degrades
for both plaintext and homomorphically EAMs and we the classification accuracy significantly from 98.9% to 50%.
measured the model’s performance by considering train- In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], during the forward propagation, the client sends
ing duration, test accuracy, and communication cost. the PAMs to the server, where the server can easily
reconstruct the original raw data from the activated vector
of the split layer leading to clear privacy leakage. In our
2. Related Work work, we constructed a training protocol, where, instead
of sending PAMs, the client first conducts an encryption
using HE and then sends said maps to the server. In this
way, the server is unable to reconstruct the original raw
data, but can still perform a computation on the EAMs
and realize the training process.
      </p>
      <p>
        The SL approach proposed by Gupta and Raskar [9] ofers
a number of significant advantages over FL. Similar to
FL [10], SL does not share raw data. In addition, it has
the benefit of not disclosing the model’s architecture and
weights. For example, [9] predicted that reconstructing
raw data on the client-side , while using SL would be
dificult. In addition, the authors of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]employed the
SL model to the healthcare applications to protect the
users’ personal data. Vepakomma et al. found that SL
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. Architecture</title>
      <p>In this section,we first describe the non-split version or
local model of the 1D CNN used to classify the ECG
y
F
u
C
o
detcenn
ll
l
1
gnilooPxaM anoitluovnoCD uleRykaeL
S
x
atfmo
Client-side</p>
      <p>Server-side
signal. Then, we discuss the process of splitting this local
model into a U-shaped split model. Furthermore, we also
describe the involved parties (a client and a server) in
the training process of the split model, focusing on their
roles and the parameters assigned to them throughout
the training process.</p>
      <sec id="sec-2-1">
        <title>3.1. 1D CNN Local Model Architecture</title>
        <p>
          We first implement and successfully reproduce the local
model results [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This model contains two Conv1D
layers and two FC layers. The optimal test accuracy that
this model achieves is 98.9%. We implement a simplified
version where the model has one less FC layer compared
to the model from [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Our local model consists of all
the layer of Figure 1 without any split between the client
and the server. As can be seen in Figure 1, we limit our
model to two Conv1D layers and one linear layer as we
aim to reduce computational costs when HE is applied
on activation maps in the model’s split version. Reducing
the number of FC layers leads to a drop in the accuracy
of the model. The best test accuracy we obtained after
training our local model for 10 epochs with a batch size
of 4 is 92.84%. Although reducing the number of layers
afects the model’s accuracy, it is not within our goals to
demonstrate how successful our ML model is for this task;
instead, our focus is to construct a split model where
training and evaluation on encrypted data are comparable to
training and evaluation on plaintext data.
        </p>
        <p>In section 5, we detail the results for the non-split
version and compare them with the split version.
layer are on the client-side, while the remaining layers
are on the server-side.</p>
        <p>Actors in the Split Learning Model As mentioned
earlier, in our SL setting, we have two involved parties:
the client and the server. Each party plays a specific role
and has access to certain parameters. More specifically,
their roles and accesses are described as:
• Client: In the plaintext version, the client holds two
Conv1D layers and can access their weights and biases
in plaintext. Other layers (Max Pooling layers, Leaky
ReLU layers, Softmax layer) do not have weights and
biases. Apart from these, in the HE encrypted version,
the client is also responsible for generating the context
for HE and has access to all context parameters
(Polynomial modulus (), Coeficient modulus ( ), Scaling
factor (∆ ), Public key (pk) and Secret key (sk)). Note
that for both training on plaintext and EAMs, the raw
data examples x’s and their corresponding labels y’s
reside on the client side and are never sent to the server
during the training process.
• Server: In our model, the computation performed on the
server-side is limited to only one linear layer. Hence, the
server can exclusively access the weights and biases of
this linear layer. Regarding the HE context parameters,
the server has access to , , ∆ , and pk shared by the
client, with the exception of the sk. Not holding the sk,
the server cannot decrypt the HE EAMs sent from the
client. The hyperparameters shared between the client
and the server are the learning rate ( ), batch size (),
number of batches to be trained ( ), and number of
training epochs ().</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Split Model Training Protocols</title>
      <p>In this section, we first present the protocol for
training the U-shaped split 1D CNN on PAMs, followed by
the protocol for training the U-shaped split 1D CNN on
EAMs.</p>
      <sec id="sec-3-1">
        <title>4.1. Training U-shaped Split Learning with Plaintext Activation Maps</title>
        <p>We have used algorithm 1 and algorithm 2 to train the
U-shaped split 1D CNN reported in subsection 3.2. First,
3.2. U-shaped Split 1D CNN Model the client and server start the socket initialization process
and synchronize the hyperparameters , , ,  . They
The SL protocol consists of two parties: the client and also initialize the weights () and biases () of their
server. We split the local 1D CNN into multiple parts, layers according to Φ .
where each party trains its part(s) and communicates During the forward propagation phase, the client
with others to complete the overall training procedure. forward-propagates the input x until the ℎ layer and
More specifically, we construct the U-shaped split 1D sends the activation a() to the server. The server
continCNN in such a way that the first few as well as the last ues to forward propagate and sends the output a() to
the client. Next, the client applies the Softmax function</p>
        <sec id="sec-3-1-1">
          <title>Algorithm 1: Client Side Initialization:</title>
          <p>←
s.connect
, , ,  ←
{(), ()
z()
{
{︁ 
z()</p>
          <p>}︁
for  ∈  do</p>
          <p>∀∈{0..}
}∀∈{0..}, {a()
}∀∈{0..} ←
.ℎ()</p>
          <p>Φ
,
{︁  }︁
a()
}∀∈{0..} ← ∅</p>
          <p>
            ∀∈{0..} ← ∅
socket initialized with port and address;
for each batch (x, y) generated from  do
socket initialized with port and address;
Forward propagation :
gorithm 2. Sharif et al. [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] showed that the exchange of
PAMs between client and server using SL reveals
important information regarding the client’s raw sequential
data. Later, in subsection 5.1 we show in detail how
passing the forward activation maps from the client to the
server in the plaintext will result in information leakage.
          </p>
          <p>To mitigate this privacy leakage, we propose the
protocol, where the client encrypts the activation maps before
sending them to the server, as described in subsection 4.2.</p>
          <p>.ℎ()
  Φ
The client starts the backward propagation by calculating
on a() to get y^ and calculates the error  = ℒ(y^, y). a() and sending a() in the plaintext as can be seen in
aland sending the gradient of the error w.r.t a(), i.e.
to the server. The server continues the backward
prop</p>
          <p>agation, calculates a() and sends a() to the client.</p>
          <p>After receiving the gradients a() from the server, the
backward propagation continues to the first hidden layer
on the client-side. Note that the exchange of information
between client and server in these algorithms takes place
in plaintext. The client sends the activation maps a()
to the server in plaintext and receives the output of the
linear layer a() from the server in plaintext (see
algorithm 1). The same applies on the server side: receiving
 a() ,</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Training U-shaped Split 1D CNN with</title>
      </sec>
      <sec id="sec-3-3">
        <title>Encrypted Activation Maps</title>
        <p>The protocol for training the U-shaped 1D CNN with a
homomorphically EAP consists of four phases:
initialization, forward propagation, classification, and
backward propagation. The initialization phase only takes
place once at the beginning of the procedure, whereas the
other phases continue until the model iterates through
all epochs. Each of these phases are described in detail
in the following subsections.</p>
        <sec id="sec-3-3-1">
          <title>Algorithm 2: Server Side Initialization:</title>
          <p>←
s.connect
, , ,  ←
{(), ()
z()
︂{
{
Initialization The initialization phase consists of In algorithm 3, () can be seen as the combination of
socket initialization, context generation, and random Max Pooling and Leaky ReLU functions. The final output
weight loading. The client first establishes a socket con- activation maps of the ℎ layer from the client is a().
nection to the server and synchronizes the four hyper- The client then homomorphically encrypts a() and sends
parameters , , ,  with the server, shown in al- the EAMs a() to the server. In algorithm 4, the server
gorithm 3 and algorithm 4. These parameters must be receives a() and then performs forward propagation,
synchronized on both sides to be trained in the same way. which is a linear layer evaluated on HE encrypted data
Also, the weights on the client and server are initialized a() as
wmiotdhetlhteo saacmcuerasetetloyfacsosrersessapnodndcoinmgpwareeigthhtesiinnflutehneceloocfal a() = a()() + (). (3)
SL on performance. On both the client and the server After that, the server sends a() to the client
(algosides, () are initialized using corresponding parts of rithm 4). Upon reception, the client decrypts a() to
Φ . The activation map at layer i (a()), output tensor of get a(), performs Softmax on a() to produce the
prea Conv1D layer (z()), and the gradients are initially set dicted output y^ and calculate the loss  ( algorithm 3).
to zero. In this phase, the context generated is a specific Having finished the forward propagation we may move
object that holds encryption keys pk and sk of the HE on to the backward propagation part of the protocol.
scheme as well as additional parameters like ,  and ∆ .</p>
          <p>Further information on the HE parameters and how Backward propagation After calculating the loss  ,
to choose the best-suited parameters can be found in the client starts the backward propagation by computing
the TenSEAL’s benchmarks tutorial1. As shown in al- ^y and then  a() and () using the chain rule
(algorithm 3 and algorithm 4, the context is either public gorithm 3). Specifically, the client calculates
(ctxpub) or private (ctxpri) depending on whether it holds
the secret key sk. Both the ctxpub and ctxpri have the same   yˆ
parameters, though ctxpri holds a sk and ctxpub does not. a() = yˆ a() , and (4)
Tioznhaletyiossenhrapvrheerassdteoh,eebsocntthxopttuhhbeawvciletiehancttchaeesnssdetrsoveerthvr.eeArspfktreaorsctethehedeitncoliitetihanelt- () = a() a(()) . (5)
forward and backward propagation phases.</p>
          <p>Following, the client sends  a() and () to the</p>
          <p>server. Upon reception, the server computes  by
sim</p>
          <p>ply doing  =  a() , based on equation (3). The server
then updates the weights and biases of his linear layer
according to equation (6).</p>
          <p>()
(− 1) = () (− 1)</p>
          <p>()
(− 1) = () (− 1)</p>
          <p>Finally, after calculating the gradients () , () ,
the client updates () and () using the Adam
optimization algorithm [11].</p>
          <p>() = () −  () , () = () −  () .</p>
          <p>(6)
Next, the server calculates
  a()
a() = a() a() ,
and sends</p>
          <p>a() to the client. After receiving a() , the
client calculates the gradients of  with respect to the
weights and biases of the Conv1D layer using the
chainrule, which can generally be described as
(1)
(2)
(7)
(8)
(9)
Forward propagation The forward propagation
starts on the client side. The client first zeroes out the
gradients for the batch of data (x, y). He then begins
calculating the a() activation maps from x, as can be
seen in algorithm 3 where each  () is a Conv1D layer.</p>
          <p>The Conv1D layer can be described as following: given
a 1D input signal that contains  channels, where each
channel x() is a 1D array ( ∈ {1, . . . , }), a Conv1D
layer produces an output that contains ′ channels. The
ℎ output channel y(), where  ∈ {1, . . . , ′} is:2</p>
          <p>y() = () + ∑︁ () ⋆ x(),</p>
          <p>=1
where (),  ∈ {1, . . . , } are the weights, () are
biases of the Conv1D layer, and ⋆ is the 1D cross-correlation
operation. The ⋆ operation can be described as</p>
          <p>− 1
z() = ( ⋆ x)() = ∑︁ () · x( + ),</p>
          <p>=0
where z() denotes the ℎ element of the output vector
z, and  starts at 0 and size of 1D weighted kernel is .
1https://bit.ly/3KY8ByN
2https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>Algorithm 3: Client Side</title>
          <p>Context Initialization:
ctxpri, ← 
ctxpub, ← 
.(ctxpub)
, , ∆ , pk, sk
, , ∆ , pk
for  in  do
for each batch (x, y) generated from D do
Note that in the backward pass, by sending both  a()

and () to the server, we help the server keep his
parameters in plaintext and prevent the multiplicative
depth of the HE from growing out of bound, however,
this leads to a privacy leakage of the activation maps.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Performance Analysis</title>
      <p>
        We evaluate our method on the MIT-BIH dataset [8].
MIT-BIH We use the pre-processed dataset from [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
which is based on the MIT-BIH arrhythmia (abnormal
heart rhythm) database [8]. The processed dataset
contains 26,490 samples of heartbeat that belong to 5 diferent
      </p>
      <sec id="sec-4-1">
        <title>Algorithm 4: Server Side</title>
      </sec>
      <sec id="sec-4-2">
        <title>Context Initialization:</title>
        <p>.(ctxpub)
for e in E do</p>
        <p>The neural nets are constructed using the PyTorch
library version 1.8.1+cu102. For HE algorithms, we
employ the TenSeal library version 0.3.10. We perform our
experiments in the localhost setting. The open source
implementation of our work is publically available3.</p>
        <p>In terms of hyperparameters, we train all networks
with 10 epochs,  = 0.001 learning rate, and  = 4
training batch size. For the split neural network with HE
activation maps, we use the Adam optimizer for the client
model and mini-batch Gradient Descent for the server.</p>
        <p>We use GPU for networks trained on the plaintext. For
the U-shaped SL model on HE activation maps, we train
the client model on GPU, and the server model on CPU.</p>
        <p>
          Visual Invertibility In the SL model, the activation
maps are sent from client to server to continue the
training process. A visual representation of the activation
maps reveals a high similarity between certain activation
maps and the input data from the client, as demonstrated
in Figure 4 for the models trained on the MIT-BIH dataset.
5.1. Evaluation The figure indicates that, compared to the raw input data
from the client (the first row of Figure 4), some
activaIn this section, we report the experimental results in tion maps (as plotted in the second row of Figure 4) have
terms of accuracy, training duration and communica- exceedingly similar patterns. This phenomenon clearly
tion throughput. We measure the accuracy of the neural compromises the privacy of the client’s raw data. The
nets on the plaintext test set after the training processes authors of [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] quantify the privacy leakage by measuring
are completed. The 1D CNN models used on MIT-BIH the correlations between the activation maps and the
dataset have two Conv1D layers and one linear layer. The raw input signal by using two metrics: distance
correlaactivation maps are the output of the last Conv1D layer. tion and Dynamic Time Warping. This approach allows
        </p>
        <p>We experiment with the activation maps of them to measure whether their solutions mitigate privacy
[batch size, 256] for the MIT-BIH dataset. We denote leakage work. Since our work uses HE, said metrics are
the 1D CNN model with an activation map sized unnecessary as the activation maps are encrypted.
[batch size, 256] as 1.
43.9% longer than local training. The U-shaped split
models take longer to train due to the communication
between the client and the server. The communication cost
for one epoch of training split 1 is 33.06 Mb.</p>
        <p>Training Locally Results when training 1 locally on
the MIT-BIH plaintext dataset are shown in Figure 3. The
neural network learns quickly and is able to decrease
the loss drastically from epoch 1 to 5. From epoch 6-10,
the loss begins to plateau. After training for 10 epochs,
we test the trained neural network on the test dataset
and get 88.06% accuracy. Training the model locally on
plaintext takes 4.8sec for each epoch on average.</p>
        <p>
          U-shaped Split Learning using Plaintext Activation
Maps Our experiments, show that training the U-shaped
split model on plaintext (reported in section 3.2) produces
the same results in terms of accuracy compared to local
training for model 1. This result is similar to the
findings of [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Even though the authors of [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] only used the
vanilla version of the split model, they too found that,
compared to training locally, accuracy was not reduced.
        </p>
        <p>We will now discuss the training time and
communication overhead of the U-shaped split models and compare
them to their local versions. For the split version of 1,
each training epoch takes 8.56 seconds on average, hence
3https://github.com/khoaguin/HESplitNet</p>
        <p>U-shaped Split 1D CNN with Homomorphic
Encrypted Activation Maps We train the split neural
networks 1 on the MIT-BIH dataset using EAMs
according to subsection 4.2. To encrypt the activation maps
on client side (i.e. before sending them to the server), we
experiment with five diferent sets of HE parameters for
model 1. Additionally, we perform experiments using
diferent combinations of HE parameters. Table 1 shows
the results in terms of training time, testing accuracy, and
communication overhead for the neural networks with
diferent configurations. For the U-shaped SL version on
the plaintext, we captured all communication between
client and server. For training split models on EAPs,
we approximate the communication overhead for one
training epoch by getting the average communication of
training on the first ten batches of data, then multiply
that with the total number of training batches.</p>
        <p>For the 1 model, the best test accuracy was 85.41%,
when using the HE parameters with polynomial
modulus  = 4096, coeficient modulus  = [40, 20, 20],
scale ∆ = 2 21. The accuracy drop was 2.65%
compared to training the same network on plaintext. This
set of parameters achieves higher accuracy compared
to the bigger sets of parameters with  = 8192, while
requiring much lower training time and communication
overhead. The result when using the first set of
parameters with  = 8192 is close (85.31%), but with a much
longer training time (3.67 times longer) and
communication overhead (8.43 times higher).</p>
        <p>Uur experiments show that training on EAMs can
produce optimistic results, with accuracy dropping by 2-3%
for the best sets of HE parameters.</p>
        <p>The set of parameters with  = 8192 achieve the
second highest test accuracy, though incurring the
highest communication overhead and the longest training
time. The set of parameters with  = 4096 can ofer
a good trade-of as they can produce on-par accuracy
with  = 8192, while requiring significantly less
communication and training time. Experimental results show
that with the smallest set of HE parameters  = 2048,
 = [18, 18, 18], ∆ = 2 16, the least amount of
communication and training time is required.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bakas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Michalas</surname>
          </string-name>
          ,
          <article-title>Blind faith: Privacypreserving machine learning using function approximation</article-title>
          ,
          <source>in: 2021 IEEE Symposium on Computers and Communications (ISCC)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Vepakomma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raskar</surname>
          </string-name>
          ,
          <article-title>Reducing leakage in distributed deep learning for sensitive health data</article-title>
          , arXiv:
          <year>1812</year>
          .
          <volume>00564</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vepakomma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raskar</surname>
          </string-name>
          ,
          <article-title>Detailed comparison of communication eficiency of split learning and federated learning</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>09145</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Vepakomma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Swedish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raskar</surname>
          </string-name>
          ,
          <article-title>Split learning for health: Distributed deep learning without sharing raw patient data</article-title>
          , arXiv preprint arXiv:
          <year>1812</year>
          .
          <volume>00564</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Cheon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <article-title>Homomorphic encryption for arithmetic of approximate numbers</article-title>
          ,
          <source>in: International Conference on the Theory and Application of Cryptology and Information Security</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>409</fpage>
          -
          <lpage>437</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abuadbba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Thapa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Camtepe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nepal</surname>
          </string-name>
          ,
          <article-title>Can we use split learning on 1d cnn models for privacy preserving training?</article-title>
          ,
          <source>in: Proceedings of the 15th ACM Asia Conference on Computer and Communications Se6. Conclusion curity</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>305</fpage>
          -
          <lpage>318</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wooldridge</surname>
          </string-name>
          , The Road to Conscious Machines:
          <article-title>This paper focused on how to train ML models in a The Story of AI, Pelican Books, Penguin Books Limprivacy-preserving way using a combination of split ited, 2020. learning and homomorphic encryption</article-title>
          .
          <source>We constructed</source>
          [8]
          <string-name>
            <given-names>G. B.</given-names>
            <surname>Moody</surname>
          </string-name>
          , R. G. Mark,
          <article-title>The impact of the mit-bih protocols by which a client and a server could collabora- arrhythmia database, IEEE Engineering in Medicine tively train a model without revealing significant infor-</article-title>
          and
          <source>Biology Magazine</source>
          <volume>20</volume>
          (
          <year>2001</year>
          )
          <fpage>45</fpage>
          -
          <lpage>50</lpage>
          .
          <article-title>mation about the raw data. As far as we are aware</article-title>
          , this [9]
          <string-name>
            <given-names>O.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raskar</surname>
          </string-name>
          ,
          <article-title>Distributed learning of deep is the first time split learning is used on encrypted data. neural network over multiple agents</article-title>
          ,
          <source>Journal of Network and Computer Applications</source>
          <volume>116</volume>
          (
          <year>2018</year>
          ). Acknowledgments [10]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          , Y. Cheng, Y. Kang,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Federated learning</article-title>
          ,
          <source>Synthesis Lectures on Artificial This work was funded by the HARPOCRATES EU re- Intelligence and Machine Learning</source>
          <volume>13</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>207</lpage>
          .
          <article-title>search project (No. 101069535) and</article-title>
          the Technology In- [11]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochasnovation Institute (TII), UAE, for the project ARROW- tic optimization</article-title>
          ,
          <source>arXiv preprint arXiv:1412.6980 SMITH</source>
          . (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>