=Paper=
{{Paper
|id=Vol-3856/paper-5
|storemode=property
|title=Low-Latency Privacy-Preserving Deep Learning Design via Secure MPC
|pdfUrl=https://ceur-ws.org/Vol-3856/paper-5.pdf
|volume=Vol-3856
|authors=Ke Lin,Yasir Glani,Ping Luo
|dblpUrl=https://dblp.org/rec/conf/aisafety/0003GL24
}}
==Low-Latency Privacy-Preserving Deep Learning Design via Secure MPC==
Low-Latency Privacy-Preserving Deep Learning Design via
Secure MPC
Ke Lin1 , Yasir Glani1 and Ping Luo1,*
1
Tsinghua University, 30 Shuangqing Rd., Haidian District, Beijing, 100084, China
Abstract
Secure multi-party computation (MPC) facilitates privacy-preserving computation between multiple parties without leaking private
information. While most secure deep learning techniques utilize MPC operations to achieve feasible privacy-preserving machine
learning on downstream tasks, the overhead of the computation and communication still hampers their practical application. This work
proposes a low-latency secret-sharing-based MPC design that reduces unnecessary communication rounds during the execution of
MPC protocols. We also present a method for improving the computation of commonly used nonlinear functions in deep learning by
integrating multivariate multiplication and coalescing different packets into one to maximize network utilization. Our experimental
results indicate that our method is effective in a variety of settings, with a speedup in communication latency of 10 ∼ 20%.
Keywords
Multi-party computation, deep learning, privacy-preserving
1. Introduction • We propose a low-latency secret-sharing-based
method for computing multivariate multiplications
Secure multi-party computation (MPC) [1, 2] enables par- and univariate polynomials using network commu-
ties to compute securely over their private data without nication that is efficient and reduces unnecessary
revealing the data to each other. Secure MPC offers privacy- communication rounds on the fly.
preserving property, which makes it suitable for most
privacy-sensitive domains, such as medical research and • We improve the computation of nonlinear functions
finance. Upon the development of deep learning techniques, by integrating the proposed multivariate multiplica-
the ability to capture important information from large tion and coalescing different packets into one single
datasets of neural models raises concerns regarding the packet to maximize network utilization.
surveillance of individuals [3]. In this case, the prospects of
• We conducted experiments to evaluate the effec-
secure MPC demonstrate its application to secure machine
tiveness of our method in the context of models
learning and deep learning. While MPC-based deep learn-
with varying sizes, networks with different latency
ing frameworks have achieved significant performance in
and bandwidth, the accuracy of downstream clas-
general scenarios, most works suffer from the limitations
sification tasks, and the number of participants in-
caused by 1. network communication due to the nature of ex-
volved. The results indicate an overall improvement
changing intermediate information during MPC execution,
of 10 ∼ 20% in communication latency.
2. excessive computation introduced by complex MPC pro-
tocols. Since the computation of MPC protocols is largely
determined by their sophisticated design, optimizing the pro- 2. Background
tocol itself would seem to be difficult and infeasible. Thus,
some studies [4] are concerned with improving the com- 2.1. Arithmetic Secret Sharing Based
munication stage of MPC protocols to make them more
Scheme
practical. In this paper, we present an approach to reduce
the communication latency of the MPC protocol through Our setup is primarily focused on arithmetic operations, so
optimized multivariate multiplication. we represent all inputs and intermediate results in terms
In general, privacy-preserving deep learning frameworks of linear secret sharing between 𝑛 parties, especially in the
usually adopt secret-sharing-based techniques to avoid ex- context of additive secret-sharing schemes.
tensive computational overheads [5, 6, 7, 8]. Consequently, Apart from the general (𝑛, 𝑡)-Shamir secret sharing
secret-sharing-based methods require multiple exchanges scheme [10], which relies on the degree-𝑡 polynomials over
of intermediate results to achieve collaborative MPC opera- 𝑛 parties, we adopt the simple arithmetic secret sharing
tions. As these MPC techniques are based on linear compu- scheme based on (𝑛, 0)-Shamir secret sharing. In other
tations, such as addition and multiplication, modern deep words, we share a scalar value 𝑥 ∈ Z/𝑄Z across 𝑛 parties
learning techniques that inherently rely on linear algebra 𝒫, where Z/𝑄Z denotes a ring with 𝑄 elements, follow-
benefit significantly from them. Considering the heavy de- ing the notations of [5]. The sharing of 𝑥 is defined as
pendency on linear operations, our research aims to reduce [𝑥] = {[𝑥]𝑝 }𝑝∈𝒫 , where [𝑥]𝑝 is the party 𝑝’s share of 𝑥.
unnecessary communication rounds following [9] during The ground-truth value 𝑥 could be reconstructed
∑︀ from the
the execution of MPC protocols. sum of the shares of each party, i.e. 𝑥 = 𝑝∈𝒫 [𝑥]𝑝 .
Our main contributions are as follows: When parties wish to share a value 𝑥, they generate a
pseudorandom zero-share that sums to 0. The party that pos-
The IJCAI-2024 AISafety Workshop sesses the value adds 𝑥 to their share in secret. To represent
*
Corresponding author. floating-point numbers, we adopt a fixed-point encoding
$ leonard.keilin@gmail.com (K. Lin); yasirglani@gmail.com
to encode any floating-point number 𝑥𝐹 into a fixed-point
(Y. Glani); luop@tsinghua.edu.cn (P. Luo)
0009-0002-5376-7881 (K. Lin); 0000-0003-0060-4771 (Y. Glani); representation, 𝑥. Alternatively, we consider that each 𝑥 is
0000-0001-6171-3811 (P. Luo) the result of multiplying a floating-point number 𝑥𝐹 by a
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
scaling factor 𝐵 = 2𝐿 and rounding to the nearest integer,
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
i.e. 𝑥 = ⌊𝐵𝑥𝐹 ⌉. Here 𝐿 is the precision of the fixed-point party involved in the computation through one-round com-
encoding. To decode a ground-truth floating-point value munications. Since most linear operations are also appli-
𝑥𝐹 from 𝑥, we compute as follows: 𝑥𝐹 ≈ 𝑥/𝐵. cable to element-wise operations and matrix operations, 𝑥
can also represent a vector, matrix, or even a tensor if there
2.2. Arithmetic Secret Sharing Based MPC is no confusion and ambiguity.
It is noteworthy that arithmetic secret shares are homomor-
phic and can be used to implement secure MPC, especially 3. Related Work
in the context of linear computation in most cases.
To achieve communication-efficient MPC, various ap-
proaches have been developed to optimize the communica-
Addition. The sum of two secret shared values [𝑥] and
tion rounds and the throughput of communication. Ishai
[𝑦] could be directly computed as [𝑧] = [𝑥] + [𝑦], where
and Kushilevitz [12] proposes a new representation of poly-
each party 𝑝 ∈ 𝒫 computes [𝑧]𝑝 = [𝑥]𝑝 + [𝑦]𝑝 without
nomials for round-efficient secure computation, dividing
multi-party communications.
high-degree polynomials into multiple low-degree polyno-
mials that are easy to solve. Mohassel and Franklin [13]
Multiplication. Two secret shared values [𝑥] and [𝑦] are performs operations directly on polynomials, such as poly-
multiplied using a random Beaver triple [11] generated by nomial multiplication and division. Dachman-Soled et al.
the Trusted Third Party (TTP): a triplet ([𝑎], [𝑏], [𝑎𝑏]). It [14] improves the evaluation of multivariate polynomials
should be noted that the Beaver triple could be shared in with different variables being held as private inputs by each
advance by each party. The parties first calculate [𝜖] = party. Then, Lu et al. [4] proposes an efficient method for
[𝑥] − [𝑎] and [𝛿] = [𝑦] − [𝑏]. In this way, the [𝜖] and [𝛿] are evaluating high-degree polynomials with arbitrary num-
then revealed to all parties (denoted as Reveal(·)) without bers of variables. While the current research has focused on
compromising information since the ground-truth values improving the calculation of polynomials, our study aims
𝑎, 𝑏 remain unknown to each party except for the TTP. The to develop a communication-efficient and effective MPC
final results could be computed as [𝑥𝑦] = [𝑐] + 𝜖[𝑏] + [𝑎]𝛿 + system for use in modern deep learning frameworks by
𝜖𝛿. Algorithm 1 illustrates the multiplication using Beaver leveraging arithmetic tuples computation from Krips et al.
triples. [9]. This system is not confined to only computing polyno-
mials within finite rings, as seen in previous studies.
Linear functions. It is possible to implement functions In recent years, several deep learning frameworks that
that consist of linear operations by combining additions and preserve privacy have emerged to enable the secure in-
multiplications. Common operations in deep learning, such ference of neural network models. Wagh et al. [8] imple-
as element-wise product and convolution, are allowed in a ments a maliciously secure 3-party MPC protocol from Se-
linear paradigm. cureNN [15] and ABY3 [16]. Knott et al. [5] provides flexi-
ble machine-learning APIs with a rich set of functions for
Nonlinear functions. Due to the inherent infeasibility secure deep learning. Li et al. [7] presents a fast and perfor-
of nonlinear functions in the standard arithmetic secret- mant MPC Transformer inference framework designed to be
sharing scheme, most works use approximation methods privacy-preserving. Our low-latency linear MPC implemen-
to simulate the outcome of nonlinear functions. In particu- tation is built on top of Knott et al.’s CrypTen framework
lar, Taylor Expansion, Newton-Rhapson, and Householder and provides a significant improvement in communication
methods are commonly used to approximate nonlinear func- latency.
tions using only linear operations. All reciprocal functions,
exponential functions, loss functions, kernel functions, and
other useful functions in deep learning are calculated this 4. Methodology
way, for example.
4.1. Multivariate Multiplication
Algorithm 1 Beaver Multiplication Mul([𝑥], [𝑦]) Since Beaver triples illustrate how to multiply two variables
Input: Secret-shared inputs [𝑥], [𝑦], Beaver triple with pre-shared triplets, a classic multiplication between
([𝑎], [𝑏], [𝑎𝑏]). multiple variables, such as [𝑥𝑦𝑧], requires several rounds
Output: [𝑥𝑦]. of binary multiplication, i.e. Mul(Mul([𝑥], [𝑦]), [𝑧]). This
1: ◁ Compute masked values naive implementation, however, introduces additional com-
2: [𝜖] ← [𝑥 − 𝑎] = [𝑥] − [𝑎] munication rounds during the on-the-fly Reveal process.
3: [𝛿] ← [𝑦 − 𝑏] = [𝑦] − [𝑏] In general, a 𝑛-ary multiplication requires 𝑛 − 1 rounds of
4: ◁ Reveal 𝜖 and 𝛿 through one-round communications communication.
5: 𝜖 ← Reveal([𝜖]) To reduce the communication rounds involved in the
6: 𝛿 ← Reveal([𝛿]) multivariate multiplications, the basic binary Beaver triple
7: return 𝜖𝛿 + 𝜖[𝑏] + [𝑎]𝛿 + [𝑎𝑏] is extended into a general 𝑛-ary Beaver triple. This results
in only one round of communication required throughout
the entire process.
Assume the 𝑛 inputs could be represented as {[𝑥𝑖 ]}𝑛 𝑖=1 .
2.3. Notations The precomputation and preshared information required by
the extended protocol is {𝒜𝑖 }𝑛 𝑛
𝑖=1 . Here 𝒜1 := {[𝑎𝑗 ]}𝑗=1
This section summarizes the notations used throughout this
is defined as the set of 𝑛 auxiliary shared values used to
work. We denote [𝑥] as a secret sharing of 𝑥. Reveal([𝑥])
blind the the inputs {[𝑥𝑖 ]}𝑛𝑖=1 , which is also similar to the
means that the ground-truth value 𝑥 is revealed to every
Beaver’s idea. Then 𝒜𝑖 (𝑖 ≥ 2) is defined as the set of
shared degree-𝑖 cross-terms consisting of the variables in the auxiliary sets {𝒜𝑖 }𝑛
𝑖=1 . Therefore, in practice, there is a
𝒜1 . For example, 𝒜2 could be defined as 𝒜2 := {[𝑎𝑖 𝑎𝑗 ] | trade-off between communication latency and throughput.
𝑖 ̸= 𝑗 ∧ 1 ≤ 𝑖, 𝑗 ≤ 𝑛}, and 𝒜3 := {[𝑎𝑖 𝑎𝑗 𝑎𝑘 ] | 𝑖 ̸= 𝑗 ̸=
𝑘 ∧ 1 ≤ 𝑖, 𝑗, 𝑘 ≤ 𝑛}, and so on. Similar to the construction 4.2. Univariate Polynomials
of [𝜖] and [𝛿] in Section 2.2, we define the difference between
the inputs and the masks as [𝛿𝑖 ] := [𝑥𝑖 ] − [𝑎𝑖 ]. The secret- The formal∑︀ form of𝑖 univariate polynomials is defined as
shared [𝛿𝑖 ] is then made public across all parties without 𝑃 (𝑥) = 𝑛 𝑖=0 𝑏𝑖 𝑥 , where 𝑏𝑖 refers to the coefficients of
leakage to the ground-truth value of both [𝑥𝑖 ] and [𝑎𝑖 ]. The the degree-𝑖 term. The use of univariate polynomials en-
improvement of our method originates from the following ables efficient evaluation and manipulation of polynomial
equation: expressions. According to Damgård et al. [17], we can com-
pute all required [𝑥𝑖 ] in parallel using multivariate multipli-
𝑛 𝑛
∏︁ ∏︁ cations, then multiply them with corresponding plaintext
𝑥𝑖 = (𝛿𝑖 + 𝑎𝑖 ) coefficients. Despite its benefits, this trick has the disadvan-
𝑖=1 𝑖=1
∏︀ ∏︀ tage of exponentially increasing the size of transmitted data,
∏︁ ∑︁ 𝑎𝑚 ∑︁ 𝑎𝑚 (1)
= 𝛿𝑖 + 𝛿𝑖 + 𝛿𝑖 𝛿𝑗 which becomes unbearable when the exponent exceeds 5.
𝑖
𝑎𝑖 𝑎𝑖 𝑎𝑗 This method can be implemented in practice by comput-
𝑖,𝑗,𝑖̸=𝑗
∏︁ ing a tuple of base terms and then multiplying the tuple by
+ ··· + 𝑎𝑖 . a certain term iteratively, as in the exponentiating by squar-
ing method or the fast modulo algorithm. In other words,
Here∏︀
we informally use the fractional representation, such a tuple 𝑔 = (1, 𝑥, . . . , 𝑥𝑚−1 ) of size ‖𝑔‖ = 𝑚 could be
as 𝑎𝑎𝑖𝑚 , to denote the products of all the terms except multiplied by 𝑥‖𝑔‖ repeatedly to iterate all the 𝑥𝑖 terms. The
for certain ones. Note that this fractional form does not overview of the implementation of univariate polynomials
involve
∏︀
any actual division. Also, each secret-shared term is described in Algorithm 3. Note that 𝑏𝑠:𝑒 is the subvector
𝑎𝑚
of [ 𝑎𝑖 ...𝑎 𝑗
] could be found in the auxiliary sets {𝒜𝑖 }𝑛 𝑖=1 , of 𝑏 from position 𝑠 to 𝑒.
which is preshared across all parties.
Adaptation of Equation 1 in secret-sharing scheme is as Algorithm 3 Univariate Polynomial Poly([𝑥], 𝑏)
follows:
Input: Secret-shared input [𝑥], coefficients 𝑏 =
𝑛 ∑︁ ∏︀ 𝑎𝑚 ∏︀ , . . . , 𝑏𝑛 ), base terms size ‖𝑔‖.
(𝑏0 , 𝑏1∑︀
∏︁ ∏︁ ∑︁ 𝑎𝑚
[ 𝑥𝑖 ] = 𝛿𝑖 + 𝛿𝑖 [ ]+ 𝛿𝑖 𝛿𝑗 [ ] Output: 𝑛 𝑖
𝑎𝑖 𝑎𝑖 𝑎𝑗 𝑖=0 𝑏𝑖 [𝑥 ].
𝑖=1 𝑖 𝑖,𝑗,𝑖̸=𝑗 1: ◁ Construct base terms
2: parallel for 𝑖 ∈ [1, ‖𝑔‖] do
∏︁
+ · · · + [ 𝑎𝑖 ].
(2) 3: [𝑥𝑖 ] ← Mul([𝑥], . . . , [𝑥]) ◁ multiplied by [𝑥] of 𝑖
Since Equation 2 is linear to the secret-sharing terms, all times
4: end parallel for
communications could be conducted in parallel, i.e. in a ‖𝑔‖−1
5: 𝑔 ← (1, [𝑥], . . . , [𝑥 ])
single round of communications. In this case, we could
6: ◁ Iteratively exponentiating
simply reveal all the secret-sharing terms in {𝒜𝑖 }𝑛 𝑖=1 and
7: 𝑡 ← 0
compute the sharing of final results in constant complexity. 𝑛
8: for 𝑖 ∈ [0, ⌊ ‖𝑔‖ ⌋ − 1] do
The protocol is formally described in Algorithm 2.
9: s ← 𝑖 · ‖𝑔‖
Algorithm 2 Multivariate Beaver Multiplication of 𝑛 inputs 10: e ← (𝑖 + 1) · ‖𝑔‖
Mul([𝑥1 ], [𝑥2 ], . . . , [𝑥𝑛 ]) 11: 𝑡 ← 𝑡 + 𝑏𝑠:𝑒 · 𝑔
12: ◁ Vectorized Beaver Multiplication
Input: Secret-shared inputs {[𝑥𝑖 ]}𝑛 𝑖=1 , auxiliary sets 13: 𝑔 ← [𝑥‖𝑔‖ ] · 𝑔
{𝒜𝑖 }𝑛 𝑖=1
∏︀ . 14: end for
Output: [ 𝑥𝑖 ]. 15: return 𝑡
1: ◁ Compute masked values
2: for 𝑖 ∈ [1, 𝑛] do
3: [𝛿𝑖 ] ← [𝑥𝑖 − 𝑎𝑖 ] = [𝑥𝑖 ] − [𝑎𝑖 ]
4: end for 4.3. Nonlinear Approximations
5: ◁ Reveal 𝛿𝑖 through one-round communications
In this section, we take one step further to optimize the com-
6: parallel for 𝑖 ∈ [1, 𝑛] do
monly used nonlinear functions by leveraging the property
7: 𝛿𝑖 ← Reveal([𝛿𝑖 ])
of parallelization of our proposed multivariate multiplica-
8: end parallel for
tion.
9: ◁ Compute results using ∏︀
preshared {𝒜𝑖 }𝑛 𝑖=1 ∏︀
𝛿𝑖 + 𝑖 𝛿𝑖 [ 𝑎𝑖 ] + 𝑖,𝑗,𝑖̸=𝑗 𝛿𝑖 𝛿𝑗 [ 𝑎𝑖𝑎𝑎𝑚
𝑎𝑚
∏︀ ∑︀ ∑︀
10: return ]+
∏︀ 𝑗 Exponentiation. Since exponential functions grow in ge-
· · · + [ 𝑎𝑖 ] ometrical speed, approximations based on series expansion
generally suffer from a significant reduction in accuracy
The total rounds of communications are indeed reduced since we do not know the exact value of the input. Con-
from 𝑛−1 to constant 1 when a regular 𝑛-ary multiplication sequently, we resort to the naive iterative approximation,
is performed, but the overall size of communication data which is capable of utilizing multivariate multiplication ef-
increases from linear to exponential. In a naïve implemen- fectively:
(︂ )︂𝑑𝑛
tation, the data size of 𝑛-ary multiplication is only 3(𝑛 − 1) [𝑥] [𝑥]
𝑒 = lim 1 + 𝑛 .
for a total transmission of 𝑛 − 1 Beaver triples. As opposed 𝑛→∞ 𝑑
to a multivariate implementation, it is 2𝑛 − 1 to transmit
During each iteration, the 𝑑-th power of the previous result alter the order of communication rounds without modifying
is calculated. With increasing iteration rounds 𝑛, the answer the payload, which is also reliable and secure.
would become closer to the actual results. Multivariate computations are similarly secure as tradi-
tional Beaver multiplications under semi-honest conditions.
Logarithm. The calculation of logarithms relies on the It is intuitively obvious that since 𝑎𝑖 is chosen at random
higher-order iterative methods for a better convergence, i.e. by TTP, the 𝛿𝑖 = 𝑥𝑖 − 𝑎𝑖 value is indistinguishable from a
Householder methods on 𝑦 = ln 𝑥: random number. Consequently, the disclosure of [𝛿𝑖 ] does
not reveal any critical information regarding 𝑥𝑖 . This as-
[ℎ𝑛 ] = 1 − [𝑥]𝑒−[𝑦𝑛 ] sumption holds even if multiple parties, except for the TTP,
∞ collude.
∑︁ 1
[𝑦𝑛+1 ] = [𝑦𝑛 ] − [ℎ𝑘𝑛 ] To clarify the security of multivariate multiplication for-
𝑘 mally, we denote [𝑥]𝑝 as the secret share of 𝑥 for party
𝑘=1
𝑝 ∈ 𝒫. The global equations of the multivariate system are
Note that the implementation of logarithm is the combi-
as follows: ∑︁
nation of exponentiation and univariate polynomials. The
[𝑥𝑖 ]𝑝 = 𝑥𝑖
degree of the polynomials determines the precision of the 𝑝∈𝒫
output. ∑︁
[𝑎𝑖 ]𝑝 = 𝑎𝑖
Reciprocal. The reciprocal function 𝑦 = 𝑥1 is calculated ∑︁
𝑝∈𝒫
using the Newton-Raphson method with an initial guess 𝑦0 : [𝑎𝑖 𝑎𝑗 ]𝑝 = 𝑎𝑖 𝑎𝑗
(3)
𝑝∈𝒫
[𝑦𝑛+1 ] = [𝑦𝑛 ](2 − [𝑥][𝑦𝑛 ])
......
= 2[𝑦𝑛 ] − [𝑥][𝑦𝑛 ][𝑦𝑛 ] ∑︁
[𝑎1 . . . 𝑎𝑛 ] = 𝑎1 . . . 𝑎𝑛
Trigonometry. Trigonometric functions could be treated 𝑝∈𝒫
as the special case of exponentiation with 𝑑 = 2. The sine [𝑥𝑖 ]𝑝 − [𝑎𝑖 ]𝑝 = [𝛿𝑖 ]𝑝
and cosine functions are calculated in the field of complex
with known [𝛿𝑖 ]𝑝 for every 𝑝 ∈ 𝒫 to each party. From each
numbers:
[sin 𝑥] = Im([𝑒𝑖𝑥 ]) party’s view, these 2𝑛 + 2𝑛 − 1 equations have Θ(2𝑛 ‖𝒫‖)
unknown variables. This indicates the difficulty in deter-
[cos 𝑥] = Re([𝑒𝑖𝑥 ]) mining the exact value of 𝑥𝑖 , as shown in [18].
Using the above-mentioned nonlinear functions, we can A party’s view represents all the values it can obtain
calculate most of the existing loss functions in deep learning, during its execution. Then the following theorem holds:
such as the sigmoid, tanh, and cross-entropy functions.
Theorem 1. Let {𝑥′𝑖 } and {𝑥′′𝑖 } be random values. The
Various other common nonlinear functions, such as the
distribution of the view of each party is identical when 𝑥𝑖 = 𝑥′𝑖
softmax function and kernel function, can also be calcu-
or 𝑥𝑖 = 𝑥′′𝑖 .
lated using exponential and reciprocal functions.
This guarantees the security of multivariate multiplication
4.4. Communication Coalescing by ensuring the indistinguishability between the random
distribution and the view’s distribution.
The key to achieving low-latency secret-sharing compu-
tation is to reduce the total number of rounds of commu-
nications among different parties. While we introduce a 6. Experiments
latency-friendly implementation of basic math operations,
other kinds of communications, such as precision checking, 6.1. Experimental Setup
still require an additional but independent communication
As part of our proposed methodology, we use CrypTen [5]
round.
as the basic MPC deep learning framework, which has al-
In general, the communication involved in multiple math
ready provided naïve implementations of secret-sharing-
operations could be abstracted as a communication graph,
based computations. In most of our experiments, we use
or strictly, as a communication tree. Accordingly, we ob-
3-party MPC on CPUs. Additionally, we allow a maximum
serve some independent communications that do not affect
of 4-ary multiplication as stated in Section 4.1, and we set
downstream results can be deferred and combined into one
𝑑 = 3 for exponentiation and 𝑘 = 8 for logarithm as de-
single round of communication. This process is referred to
scribed in Section 4.3.
as communication coalescing, and it eliminates unnecessary
To measure the performance, we perform several ex-
rounds of communication and improves the utilization of
periments with deep learning models with different sizes:
network bandwidth.
(a) Linear Support Vector Classification (LinearSVC) with
L2 penalty; (b) LeNet [19] with shallow convolutional
5. Security Analysis and linear layers along with ReLU activation functions;
(c) ResNet-18 model [20] with multiple convolutional, lin-
The correctness of the multivariate multiplication is trivial ear, pooling, and activation layers; (d) Transformer Encoder
based on the observation in Equation 1 and 2. As univariate model [21] with a single multi-head attention layer and
polynomials are implemented using the same method as the BatchNorm [22] in place of LayerNorm [23]. We employ
extended fast modulo algorithm, their effectiveness could several datasets for classification tasks with appropriate
also be demonstrated by the correctness and security of adaption to specific models, including MNIST [19], CIFAR-
multivariate multiplication. Coalescing mechanisms only 10 [24], ImageNet[25], and Sentiment140 [26] datasets.
Table 1
Communication Latency and Data with Different Network Settings. Latency and data are measured in milliseconds and MiBs.
Data Size 𝑡comm (𝑁low ) 𝑡comm (𝑁med ) 𝑡comm (𝑁high )
Model Dataset 𝑡comp
Naïve Ours Naïve Ours Naïve Ours Naïve Ours
LinearSVC MNIST 0.032 0.022 0.024 0.082 0.059 0.604 0.568 4.339 4.026
LeNet CIFAR-10 0.900 38.562 41.647 1.373 1.182 7.673 6.983 53.010 47.868
ResNet-18 ImageNet 110.447 11571.294 12612.711 219.541 185.293 1681.685 1407.570 ~11 760 ~9 870
Transformer Sentiment140 7.824 771.787 893.421 12.190 9.715 78.624 61.877 559.645 421.413
Each of our experiments is conducted in a simulated multi- Table 2
node environment using Docker. TTP is conducted in an Classification Accuracy using Different Methods.
independent environment separate from the normal parties.
Accuracy (%)
To manually simulate different network environments con- Model Dataset
cerning bandwidth and latency, we utilize the docker-tc Origin Naïve Ours
tool to adjust the docker network settings accordingly. LinearSVC MNIST 100.00 100.00 100.00
LeNet CIFAR-10 100.00 100.00 100.00
6.2. Metrics ResNet-18 ImageNet 69.30 61.58 60.10
Transformer Sentiment140 59.87 58.55 57.74
To provide a comprehensive evaluation of our proposed
method, we adopt metrics from a variety of perspectives. Softmax
Conv2D
(60.5%)
(49.8%)
• 𝑡comp : The computational time cost for evaluating a
single data sample in one round.
Other
(6.1%) Other
• 𝑡comm : The time cost of communication associated (2.0%)
with evaluating a single data sample in one round.
MatMul
ReLU BatchNorm (14.4%)
• Size of Transmission Data: The size of transmit- (19.1%)
(25.0%)
GeLU
ted network packets when evaluating a single data (23.1%)
sample in one round.
(a) ResNet Basic Block (b) Attention Block
• Accuracy: The classification accuracy when evalu- Figure 1: Communication percentage of different models.
ated on a particular dataset.
6.3. Latency & Throughput percentage of ResNet basic blocks and Attention blocks.
To assess the efficiency of our proposed method, we simu- As can be seen, the attention mechanism is constrained
late networks with different network latencies: (a) network by its communication bottleneck in Softmax operation,
𝑁low with 0.1ms latency, (b) network 𝑁med with 5ms la- while CNN is constrained by its communication via convo-
tency, (c) and network 𝑁high with 40ms latency. All of these lutional operations. Considering that our method makes an
networks have a bandwidth of 1Gbps. Our simulated multi- improved optimization for nonlinear functions, attention-
node settings include 3 nodes with an additional TTP by based models show a significant improvement in latency,
default. with almost a 25% improvement. Additionally, this explains
As shown in Table 1, the computation cost of each model the limited improvement of only 8 ∼ 15% in traditional
is negligible in medium and high latency network settings machine-learning models and CNN-based models.
in comparison to the communication cost. Therefore, we
will focus only on the communication costs associated with 6.4. Evaluation
our proposed method.
Compared to the naïve method implemented by CrypTen, In this section, we examine the side effects and factors asso-
our method illustrated in Section 4.1 remains close since ciated with the basic settings, such as the downstream tasks’
it does not introduce a substantial amount of additional accuracy, the number of parties involved, and the trade-off
communication payload if the maximum number of input between network latency and bandwidth.
variables is set appropriately. For instance, a 3-ary or 4-ary To evaluate the drop in accuracy, we compare our method
multiplication would not produce a significant increase in with both the original baseline and the naïve implementa-
the total size of communications. tion without a low-latency design. Figure 2 shows that, in
It is noteworthy that our proposed method reduces the relatively small scenarios, both the naïve implementation
communication cost in every network setting as compared and our methods are capable of achieving perfect perfor-
to the naïve implementation of MPC. Overall, we achieve mance as the baselines. Nevertheless, both MPC-based im-
an improvement of 10 ∼ 20%, which shows significant plementations obtain lower accuracy in complex scenarios
enhancement in the performance of high-latency environ- than the baseline, while our methods perform slightly worse
ments for practical purposes. than the naïve implementation. We hypothesize that the
Furthermore, we observe that our proposed method be- multivariate multiplication introduces additional precision
haves differently with neural models with different archi- requirements, which in turn reduces accuracy.
tectures. Figure 1 illustrates the communication occupation The throughput and latency of MPC-based methods are
Naı̈ve
90 Naı̈ve 35
Ours
Ours
30
Transmission Data (MiB)
80
25
tcomm (ms)
70
20
60
15
50
10
40
100 101 102 103
3 4 5 6 Bandwidth (Mbps)
# Parties Figure 3: Latency of naïve and our proposed methods concerning
different network bandwidth. The experiment is conducted using
(a) Size of Trans. Data w.r.t # Parties the LeNet model on CIFAR-10 using a medium latency network.
20
Naı̈ve
18 Ours
distortion in the input data.
16 Moreover, our proposed method is only applicable to
functions that are based on linear MPC operation. To avoid
tcomm (ms)
14 heavy communications, a modern MPC-based deep learn-
ing framework would also involve other protocols, such as
12 Homomorphic Encryption [27], Garbled Circuit [28], and
Function Secret Sharing[29]. Though these works may have
10 less communication, our approach could still be seamlessly
integrated with the current secret-sharing framework and
8 achieve a latency improvement of ~20% without adding ex-
cessive computational workload.
3 4 5 6
# Parties
8. Conclusion
(b) Latency w.r.t # Parties
This study proposes a secret-sharing-based MPC method for
Figure 2: Transimission data and latency of naïve and our pro- enhancing the linear computation required in deep learning
posed methods when a different number of parties are involved. through increased communication utilization. By utilizing
The experiment is conducted using the LeNet model on CIFAR-10
the multivariate multiplication and communication coalesc-
using a medium latency network.
ing mechanisms, we can reduce the number of unnecessary
communication rounds during the execution of both linear
and nonlinear deep learning functions. In our experiments,
also affected by the number of parties involved in the compu- we demonstrate that our proposed methods achieve an over-
tation. From Figure 2, it can be seen that the communication all improvement in latency of 10 ∼ 20% when compared
data size of both methods increases linearly as the number to the naïve MPC implementation. Additionally, it indicates
of parties involved increases. There is, however, a tendency that throughput and downstream task performance are com-
for the latency to be worse when there are more parties parable to naïve implementations, which demonstrate the
involved. method’s validity and efficiency. We hope that this work
Moreover, Figure 3 illustrates how network bandwidth will inspire future improvements in privacy-preserving deep
affects communication costs. When sufficient bandwidth is learning techniques and lead to more practical MPC appli-
available, our method can still optimize the network latency. cations.
It is important to note, however, that when the bandwidth
becomes the bottleneck, our method would not be any more
effective in reducing the overall costs of communication. Acknowledgments
This indicates that bandwidth remains an important factor
in a multi-node MPC setting, especially as the number of This work is supported by the National Key R&D Program
nodes in use grows. of China under grant (No. 2022YFB2703001).
7. Discussion References
Since the proposed multivariate multiplication is based on [1] I. Damgård, V. Pastro, N. Smart, S. Zakarias,
a finite ring, it is likely to have precision issues that lead to Multiparty computation from somewhat homomor-
incorrect results. Fortunately, a loss in precision would not phic encryption, in: R. Safavi-Naini, R. Canetti
significantly affect the overall performance of deep learning, (Eds.), CRYPTO 2012, Springer Berlin Heidelberg,
since the loss could be interpreted as random noise and
Berlin, Heidelberg, 2012, pp. 643–662. doi:10.1007/ 2018/442.
978-3-642-32009-5_38. [16] P. Mohassel, P. Rindal, Aby3: A mixed protocol frame-
[2] M. Keller, Mp-spdz: A versatile framework for multi- work for machine learning, in: Proceedings of the 2018
party computation, in: Proceedings of the 2020 ACM ACM SIGSAC Conference on Computer and Commu-
SIGSAC Conference on Computer and Communica- nications Security, CCS ’18, Association for Comput-
tions Security, CCS ’20, Association for Computing ing Machinery, New York, NY, USA, 2018, p. 35–52.
Machinery, New York, NY, USA, 2020, p. 1575–1590. doi:10.1145/3243734.3243760.
doi:10.1145/3372297.3417872. [17] I. Damgård, M. Fitzi, E. Kiltz, J. B. Nielsen, T. Toft, Un-
[3] X. Liu, L. Xie, Y. Wang, J. Zou, J. Xiong, Z. Ying, A. V. conditionally secure constant-rounds multi-party com-
Vasilakos, Privacy and security issues in deep learning: putation for equality, comparison, bits and exponenti-
A survey, IEEE Access 9 (2021) 4566–4593. doi:10. ation, in: S. Halevi, T. Rabin (Eds.), Theory of Cryptog-
1109/ACCESS.2020.3045078. raphy, Springer Berlin Heidelberg, Berlin, Heidelberg,
[4] D. Lu, A. Yu, A. Kate, H. Maji, Polymath: Low-latency 2006, pp. 285–304. doi:10.1007/11681878_15.
mpc via secure polynomial evaluations and its applica- [18] G. Couteau, A note on the communication complexity
tions, Proceedings on Privacy Enhancing Technologies of multiparty computation in the correlated random-
2022 (2021). doi:10.2478/popets-2022-0020. ness model, in: EUROCRYPT 2019, Springer-Verlag,
[5] B. Knott, S. Venkataraman, A. Hannun, S. Sengupta, Berlin, Heidelberg, 2019, p. 473–503. doi:10.1007/
M. Ibrahim, L. van der Maaten, Crypten: Secure multi- 978-3-030-17656-3_17.
party computation meets machine learning, NIPS [19] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-
34 (2021) 4961–4973. doi:10.48550/arXiv.2109. based learning applied to document recognition, Pro-
00984. ceedings of the IEEE 86 (1998) 2278–2324. doi:10.
[6] S. Tan, B. Knott, Y. Tian, D. J. Wu, CryptGPU: 1109/5.726791.
Fast privacy-preserving machine learning on the gpu, [20] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning
in: IEEE S&P, 2021. doi:10.1109/SP40001.2021. for image recognition, in: CVPR, 2016, pp. 770–778.
00098. doi:10.1109/CVPR.2016.90.
[7] D. Li, H. Wang, R. Shao, H. Guo, E. Xing, H. Zhang, [21] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
Mpcformer: fast, performant and private transformer L. Jones, A. N. Gomez, L. u. Kaiser, I. Polosukhin, At-
inference with mpc, in: The 11th International Con- tention is all you need, in: I. Guyon, U. V. Luxburg,
ference on Learning Representations, 2023. doi:10. S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan,
48550/arXiv.2211.01452. R. Garnett (Eds.), NIPS, volume 30, Curran Associates,
[8] S. Wagh, S. Tople, F. Benhamouda, E. Kushilevitz, Inc., 2017. doi:10.5555/3295222.3295349.
P. Mittal, T. Rabin, Falcon: Honest-majority mali- [22] S. Ioffe, C. Szegedy, Batch normalization: accelerating
ciously secure framework for private deep learning, deep network training by reducing internal covari-
Proceedings on Privacy Enhancing Technologies 2021 ate shift, in: Proc. 32nd Int. Conf. Machine Learning
(2020) 188 – 208. doi:10.2478/popets-2021-0011. - Volume 37, ICML’15, JMLR.org, 2015, p. 448–456.
[9] T. Krips, R. Küsters, P. Reisert, M. Rivinius, Arithmetic doi:10.5555/3045118.3045167.
tuples for mpc, Cryptology ePrint Archive (2022). URL: [23] J. L. Ba, J. R. Kiros, G. E. Hinton, Layer normalization,
https://eprint.iacr.org/2022/667. 2016. arXiv:1607.06450.
[10] A. Shamir, How to share a secret, Commun. ACM 22 [24] A. Krizhevsky, G. Hinton, et al., Learning multiple
(1979) 612–613. doi:10.1145/359168.359176. layers of features from tiny images (2009).
[11] D. Beaver, Efficient multiparty protocols using circuit [25] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Im-
randomization, in: J. Feigenbaum (Ed.), CRYPTO 1991, agenet: A large-scale hierarchical image database, in:
Springer Berlin Heidelberg, Berlin, Heidelberg, 1992, CVPR, 2009, pp. 248–255. doi:10.1109/CVPR.2009.
pp. 420–432. doi:10.1007/3-540-46766-1_34. 5206848.
[12] Y. Ishai, E. Kushilevitz, Randomizing polynomials: [26] A. Go, R. Bhayani, L. Huang, Twitter sentiment clas-
A new representation with applications to round- sification using distant supervision, CS224N project
efficient secure computation, in: Proceedings 41st report, Stanford 1 (2009) 2009.
Annual Symposium on Foundations of Computer Sci- [27] C. Gentry, Fully homomorphic encryption using ideal
ence, 2000, pp. 294–304. doi:10.1109/SFCS.2000. lattices, in: Proceedings of the Forty-First Annual
892118. ACM Symposium on Theory of Computing, STOC
[13] P. Mohassel, M. Franklin, Efficient polynomial opera- ’09, Association for Computing Machinery, New York,
tions in the shared-coefficients setting, in: M. Yung, NY, USA, 2009, p. 169–178. doi:10.1145/1536414.
Y. Dodis, A. Kiayias, T. Malkin (Eds.), PKC 2006, 1536440.
Springer Berlin Heidelberg, Berlin, Heidelberg, 2006, [28] A. C. Yao, Protocols for secure computations, in:
pp. 44–57. doi:10.1007/11745853_4. 23rd Annual Symposium on Foundations of Computer
[14] D. Dachman-Soled, T. Malkin, M. Raykova, M. Yung, Science (sfcs 1982), 1982, pp. 160–164. doi:10.1109/
Secure efficient multiparty computing of multivariate SFCS.1982.38.
polynomials and applications, in: J. Lopez, G. Tsudik [29] E. Boyle, N. Gilboa, Y. Ishai, Function secret sharing,
(Eds.), Applied Cryptography and Network Security, in: E. Oswald, M. Fischlin (Eds.), EUROCRYPT 2015,
Springer Berlin Heidelberg, Berlin, Heidelberg, 2011, Springer Berlin Heidelberg, Berlin, Heidelberg, 2015,
pp. 130–146. doi:10.1007/978-3-642-21554-4_ pp. 337–367. doi:10.1007/978-3-662-46803-6_
8. 12.
[15] S. Wagh, D. Gupta, N. Chandran, Securenn: Effi-
cient and private neural network training, Cryptol-
ogy ePrint Archive (2018). URL: https://eprint.iacr.org/