<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Curl: Private LLMs through Wavelet-Encoded Look-Up Tables</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Manuel B. Santos</string-name>
          <email>manuel.santos@nillion.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dimitris Mouris</string-name>
          <email>dimitris@nillion.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mehmet Ugurbil</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stanislaw Jarecki</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Reis</string-name>
          <email>jose.reis@nillion.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shubho Sengupta</string-name>
          <email>ssengupta@meta.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miguel de Vega</string-name>
          <email>miguel@nillion.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Meta Platforms</institution>
          ,
          <addr-line>Menlo Park, California 94025</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Nillion</institution>
          ,
          <addr-line>Zählerweg 5, 6300 Zug</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent advancements in transformers have revolutionized machine learning, forming the core of Large language models (LLMs). However, integrating these systems into everyday applications raises privacy concerns as client queries are exposed to model owners. Secure multiparty computation (MPC) allows parties to evaluate machine learning applications while keeping sensitive user inputs and proprietary models private. Due to inherent MPC costs, recent works introduce model-specific optimizations that hinder widespread adoption by machine learning researchers. CrypTen (NeurIPS'21) aimed to solve this problem by exposing MPC primitives via common machine learning abstractions such as tensors and modular neural networks. Unfortunately, CrypTen and many other MPC frameworks rely on polynomial approximations of the non-linear functions, resulting in high errors and communication complexity. This paper introduces Curl, an easy-to-use MPC framework that evaluates non-linear functions as lookup tables, resulting in better approximations and significant round and communication reduction. Curl exposes a similar programming model as CrypTen and is highly parallelizable through tensors. At its core, Curl relies on discrete wavelet transformations to reduce the lookup table size without sacrificing accuracy, which results in up to 19× round and communication reduction compared to CrypTen for non-linear functions such as logarithms and reciprocals. We evaluate Curl on a diverse set of LLMs, including BERT, GPT-2, and GPT Neo, and compare against state-of-the-art related works such as Iron (NeurIPS'22) and Bolt (S&amp;P'24) achieving at least 1.9× less communication and latency. Finally, we resolve a long-standing debate regarding the security of widely used probabilistic truncation protocols by proving their security in the stand-alone model. This is of independent interest as many related works rely on this truncation style.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Models (LLMs)</kwd>
        <kwd>Privacy-Enhancing Technologies (PETs)</kwd>
        <kwd>Secure Multiparty Computation (MPC)</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Large language models (LLMs) like GPT-2, GPT-4 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], BERT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and LLaMA [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] have emerged as prime
examples showcasing the capabilities of artificial intelligence. LLMs assist individuals and businesses
in everyday tasks; from machine translation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], to text generation [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and question answering [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
among others. To generate human-like responses, LLMs have been trained in vast amounts of data and
continue learning through their interactions with users.
      </p>
      <p>
        However, as LLMs become increasingly integrated into human lives, privacy becomes a critical
concern as individuals frequently share sensitive information, including names, addresses, and credit
card numbers, or even financial and healthcare information [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. On top of that, there is a shift towards
personalized AI, with OpenAI enabling a memory feature to ChatGPT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. For an LLM-powered assistant
to be able to understand individual preferences, habits, and workflows and provide tailored assistance,
it would require access to an immense amount of personal data. As this personalized data is retained
and used to continuously improve the LLMs, the possibility of a data breach or unauthorized access
poses huge risks to individuals’ privacy.
      </p>
      <p>
        Privacy-enhancing technologies (PETs) such as multiparty computation (MPC) [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] enable a
plethora of privacy-preserving machine learning (PPML) use-cases. In the most prominent setting,
a server possesses a proprietary model and aims to provide it as a service to clients for use with
their private data [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14 ref15">11, 12, 13, 14, 15</xref>
        ]. The objective is for clients to receive only the inference results
without gaining any knowledge about the model, while the model provider does not learn anything
about clients’ input. Another popular PPML use case involves multiple distrusting parties to securely
train a model over their joint sensitive data without revealing anything about their data to each other
[
        <xref ref-type="bibr" rid="ref16 ref17 ref18 ref19 ref20">16, 17, 18, 19, 20</xref>
        ].
      </p>
      <p>
        Several MPC systems leverage linear secret sharing schemes as they allow linear operations to
be performed with minimal communication overhead [
        <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
        ]. However, non-linear operations (e.g.,
square roots, logarithms) often present greater challenges and require specialized techniques such as
polynomial approximations [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], look-up table evaluations [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], or other protocol-specific approaches
[
        <xref ref-type="bibr" rid="ref24 ref25">24, 25</xref>
        ]. Furthermore, incorporating domain knowledge can significantly enhance the eficiency and
efectiveness of these protocols by tailoring the parameters to better fit specific use cases [
        <xref ref-type="bibr" rid="ref26 ref27 ref28">26, 27, 28</xref>
        ].
      </p>
      <sec id="sec-1-1">
        <title>1.1. Related Work</title>
        <sec id="sec-1-1-1">
          <title>1.1.1. Focusing on Function Secret Sharing</title>
          <p>
            Recently, several secure computation works have emerged leveraging function secret sharing (FSS)
[
            <xref ref-type="bibr" rid="ref23">29, 23, 30, 31, 32</xref>
            ]. Pika [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ] extends the prior work of [29] by showcasing a novel approach to securely
evaluate look-up tables (LUTs) through FSS. While it demonstrates eficacy in benchmarking against
popular datasets like MNIST and CIFAR-10, its scalability poses challenges. As noted by Grotto [30], for
large LUTs, the computational cost may render the protocol infeasible due to the extensive number of
distributed point functions (DPF) evaluations required. In contrast, Curl circumvents this challenge by
eliminating the need for DPF evaluations, thanks to the integration of the discrete wavelet transform
(DWT) technique. On top of that, our technique can also benefit FSS-based frameworks by reducing the
size of the LUTs, resulting in less computation and communication. On the other hand, Grotto [30]
introduces innovative protocols leveraging custom splines and DPFs to handle a subset of functions
eficiently. Yet, it faces challenges in terms of computational and communication overhead compared to
alternatives like Sigma [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]. Orca [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ] showcases the potential of GPU acceleration in FSS protocols,
particularly tailored for convolutional neural networks (CNNs). However, its suitability for other
architectures like transformers is questioned due to its reliance on heavy non-linearities [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ].
          </p>
          <p>
            Sigma [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ] builds on top of Orca and distinguishes itself by relying on minimal-sized LUTs through
protocols tailored for small ranges. Despite its eficiency from custom protocols with small-sized LUTs,
it demands significant computing and communication resources (e.g., 1 TB RAM and around 9Gbps
communication link). Furthermore, its use of deterministic truncation, although claiming improved
security, falls short in speed compared to probabilistic truncation methods. Unfortunately, all FSS-based
works focus on the two-party setting as FSS becomes impractical with more than two parties.
          </p>
        </sec>
        <sec id="sec-1-1-2">
          <title>1.1.2. Focusing on Preprocessing</title>
          <p>
            The line of work initiated by Pika [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ] focuses on the two-party setting with an additional dealer party.
In this scenario, the preprocessing phase necessitates the dealer to prepare and distribute an ()-sized
element. Given the assumption that the dealer will not collude with any party, reducing dependency on
such assumptions is desirable. This has led to the development of protocols that eliminate the need for
a dealer. Notably, some previous works have aimed to achieve this in the two-party setting, enabling
both parties to independently generate the required preprocessing material [33, 34, 35].
          </p>
          <p>The one-time truth table (OTTT) protocol [33] employs a boolean circuit to represent the table
for every possible input, resulting in an exponential complexity relative to the bit size. The OP-LUT
protocol [34] attempts to enhance the OTTT preprocessing phase but only achieves improvements
for small bit sizes, with the communication cost remaining exponential. The SP-LUT protocol [34]
significantly enhances the preprocessing phase but modifies the online phase, leading to an exponential
communication cost in the bit size during the online phase. FLUTE [35] ofers advancements over
previous works but still requires (2) communication in the setup phase. Therefore, finding a
preprocessing phase with sub-exponential communication complexity while maintaining the eficiency
of the online phase remains an open challenge.</p>
        </sec>
        <sec id="sec-1-1-3">
          <title>1.1.3. Focusing on Fully-Homomorphic Encryption</title>
          <p>Fully homomorphic encryption (FHE) schemes have benefited significantly from the use of LUTs to
enhance performance and enable new applications. This concept was initially proposed by Ducas and
Micciancio [36], who devised a method to evaluate arbitrary binary gates in FHE using LUTs. Building
on this foundation, Chillotti et al. [37] implemented the evaluation of arbitrary functions through a tree
of leveled multiplexers, leading to the development of the Torus FHE (TFHE) scheme. Despite their
development of a fast bootstrapping mechanism, capable of execution in approximately 10 milliseconds
on a CPU, its adoption has been limited. This is primarily because the control inputs for the multiplexers
required fresh ciphertexts – meaning prior computation on them was not possible – and the approach
necessitated expressing programs as deterministic automata. A further improvement over TFHE was
presented in [38] with the introduction of the programmable bootstrapping (PBS) technique, allowing
for eficient and general-purpose LUT evaluation. To enhance adaptability, HELM [ 39] expanded on the
PBS technique and introduced a framework for automatically converting Verilog hardware description
language (HDL) into encrypted circuits. HELM employs three modes of operation. The first mode
exclusively processes binary gates. The second mode operates on integers using secure LUT evaluations.
The third, mixed mode, works with binary circuits and "bridges" over to integers for secure LUT
evaluations before returning to the binary domain. However, HELM is limited to very low-precision
LUTs. This limitation arises because converting an integer to multiple bits requires multiple n-to-1 LUTs.
Recently, Ripple [40] proposed using compression techniques based on discrete wavelet transform
(DWT) theory to decrease the size of LUT, thereby accelerating the PBS technique for smooth functions.</p>
        </sec>
        <sec id="sec-1-1-4">
          <title>1.1.4. Focusing on Secure LLM Inference</title>
          <p>
            Secure frameworks for LLM inference have only recently gained traction due to the inherent complexity
of transformers compared to traditional neural networks. Techniques to ensure secure inference include
notable implementations like THE-X [41] and Iron [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ], which were among the first to use homomorphic
encryption for matrix multiplication and non-linear functions.
          </p>
          <p>
            Subsequently, several studies have explored MPC techniques for private transformer inference
system [
            <xref ref-type="bibr" rid="ref27">27, 42</xref>
            ]. MPCFormer [
            <xref ref-type="bibr" rid="ref27">27</xref>
            ] leverages the CrypTen [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] MPC engine, while Puma [42] utilizes the
SecretFlow-SPU [43] MPC engine to assess MPC’s suitability for LLM inference. MPCFormer introduces
a distillation process where stronger models train weaker models to improve accuracy, compensating
for the limitations of using small (2-degree) approximations of non-linear functions. Nevertheless,
MPCFormer faces challenges due to the loss of approximation accuracy, necessitating fine-tuning the
models. On the other hand, Puma uses a piece-wise approximation of the GeLU activation function,
requiring comparisons and polynomials up to degree 6.
          </p>
          <p>
            Recent research has increasingly focused on hybrid solutions, combining the best techniques for
specific operations [
            <xref ref-type="bibr" rid="ref20 ref28">28, 20</xref>
            ]. Bolt [
            <xref ref-type="bibr" rid="ref28">28</xref>
            ] developed a solution integrating both fully homomorphic
encryption and MPC. Like Puma [42], Bolt uses a piece-wise approximation that requires comparison, which
increases round complexity and communication. To boost eficiency, Bolt employs word elimination
while maintaining accuracy. Additionally, it enhances MPCFormer’s polynomial approximations by
increasing the polynomial degree and mitigates the corresponding eficiency loss through Motzkin’s
polynomial pre-processing procedure. In contrast, Curl avoids both comparisons and polynomial
approximations by using an optimized lookup table protocol that reduces the round and communication
complexity typically added by polynomial approximations while maintaining accuracy.
          </p>
        </sec>
        <sec id="sec-1-1-5">
          <title>1.1.5. Focusing on Secure Truncation</title>
          <p>Machine learning tasks employ floating-point numbers to construct precise models [ 44]. However,
it is widely recognized that the use of floating-point numbers renders MPC impractical from an
eficiency standpoint. For this reason, most works on machine learning under MPC employ fixed-point
representation instead, requiring a truncation protocol to maintain constant precision.</p>
          <p>
            Among the available truncation methodologies – deterministic, nearest integer, and probabilistic –
the deterministic and probabilistic approaches (see Section 3) are the most widely used in the secure
computation setting [
            <xref ref-type="bibr" rid="ref24">24, 45</xref>
            ]. Moreover, a substantial body of research favors probabilistic truncation due
to its lower communication costs and resultant enhanced performance, e.g., [
            <xref ref-type="bibr" rid="ref11 ref13 ref16 ref18">46, 16, 47, 11, 48, 13, 18, 49</xref>
            ].
Despite these advantages, some recent works express concerns about probabilistic truncation, and opt
for more resource-intensive protocols to achieve deterministic truncation [
            <xref ref-type="bibr" rid="ref15">45, 50, 15</xref>
            ]. The concerns
raised with regard to probabilistic truncation include security and rounding errors.
Correctness error. Before we address the above two concerns we recall that a widely used and
inexpensive probabilistic truncation protocol over fields, due to Catrina and Hoogh [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ], has a correctness errors
with probability 2− (−| |) where || is the maximum bitsize of truncation input . This necessitated
using significantly larger rings, setting  ≥ | | +  where  is a statistical security parameter. Damgård
et al. [51] presented a probabilistic protocol with zero probability of correctness error, but it required
a non-constant-round bit-decomposition protocol. Subsequent works [
            <xref ref-type="bibr" rid="ref11">11, 52</xref>
            ] optimized [
            <xref ref-type="bibr" rid="ref24">24, 51</xref>
            ] and
showed a constant-round probabilistic truncation protocol with communication and computation costs
matching Catrina and Hoogh [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ], but with no correctness error and allowing  ≥ | | + 1. CrypTen [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]
proposed a protocol inspired by [53] for division by public value, which can be reduced to a probabilistic
truncation with non-zero correctness error.
          </p>
          <p>
            Security. Recently, Li et al. [45] identified a security flaw in the security proof of the probabilistic
truncation of [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ], revealing a discrepancy between the information leaked by that protocol and by
the ideal probabilistic truncation functionality. Note that the same issue pertains to the -optimal
probabilistic truncation of [52]. This led several recent papers, e.g., [
            <xref ref-type="bibr" rid="ref15 ref15">15, 15</xref>
            ], to move away from the
inexpensive probabilistic truncation protocols because of concerns regarding their security. In this
paper, we show that the -optimal probabilistic truncation protocol of [52] (as well as the original
probabilistic truncation of [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ]) securely realizes a natural ideal probabilistic truncation functionality
which we define. This enables us to capitalize on the eficiency benefits of probabilistic truncation
without compromising security (Section 3).
          </p>
          <p>
            Rounding Errors. The LLAMA paper [50] recently suggested the use of deterministic truncation,
claiming it improves inference accuracy. However, this assertion has not been backed by experimental
evidence, and earlier studies indicating a possible decrease in accuracy with probabilistic truncation
have not been confirmed through empirical testing [
            <xref ref-type="bibr" rid="ref11">49, 11</xref>
            ]. Intriguingly, Gupta et al. [44] demonstrated
that computation using 16-bit fixed-point with 12-bit precision and probabilistic truncation achieves
nearly equivalent accuracy to that of a 32-bit floating-point. Conversely, truncation to the nearest
integer fails to train under such conditions adequately. Additionally, although not the primary focus of
the work, Keller and Sun [54] provided experimental evidence indicating that in neural networks, there
is minimal disparity between probabilistic truncation and truncation to the nearest integer. Indeed,
out of 28 distinct experiments, 13 exhibited superior accuracy with probabilistic truncation, 13 with
truncation to the nearest integer, and 2 yielded identical accuracy.
          </p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Our Contributions</title>
        <p>In this paper, we introduce Curl, a user-friendly MPC framework designed for eficiently evaluating
non-linear functions using lookup tables. Curl addresses three critical aspects essential to secure
protocols: eficiency, accuracy, and security.</p>
        <p>
          Curl employs a novel LUT compression technique using discrete wavelet transformation (DWT)
achieving eficient approximations with minimal errors. This efectively reduces the size of original
LUTs while preserving the accuracy of non-linear functions, thus surpassing traditional polynomial
piece-wise approximation methods. By minimizing communication costs, Curl significantly enhances
end-to-end runtime performance and supports a wide range of activation functions including GeLU,
SiLU, Gaussian error function, Sigmoid, and Hyperbolic tangent. Additionally, Curl generalizes the
piece-wise approximation used within the GeLU protocol from [
          <xref ref-type="bibr" rid="ref15 ref28">28, 15</xref>
          ] to encompass any bounded odd
or even function that converges to a piecewise polynomial.
        </p>
        <p>Curl’s core technique extends to complex non-linear functions used by LLMs, ensuring both client
input data privacy and safeguarding the intellectual property of service providers, such as their models.
We evaluate Curl across a spectrum of LLMs including BERT (tiny, base, large), GPT-2, and GPT-Neo,
including both CPU and GPU backends. Our evaluation demonstrates advancements over
state-of-theart frameworks, achieving at least 6.5 times fewer rounds and 1.9 times less communication overhead.</p>
        <p>Finally, Curl addresses the security concerns surrounding eficient probabilistic truncation by
introducing a natural ideal functionality, a corresponding protocol, and providing a simulation security
proof. Many works that rely on this truncation style were proven to have a security flaw by [ 45] as
they cannot be simulated in the real-ideal paradigm. Our result is of independent interest since our
proof settles this debate and restores confidence in secure probabilistic truncation protocols.</p>
        <p>
          We summarize our contributions as follows:
• A novel framework based on DWT compression that achieves high accuracy with low round and
communication complexities. When applied to LLM inference, Curl demonstrates at least a 6.5×
reduction in round complexity and a 1.9× reduction in communication when compared against
existing methods.
• Curl builds on top of the user-friendly CrypTen [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] framework, ofering high flexibility and
low adoption friction for developers and researchers, thereby democratizing access to secure
computation techniques. As a proof-of-concept, we implement a variety of LLM models and
numerous non-linear functions, all operable on both CPU and GPU.
• We introduce a novel natural ideal functionality for eficient probabilistic truncation and prove
        </p>
        <p>Escudero et al. [52] to be secure. This result is of independent interest.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <sec id="sec-2-1">
        <title>2.1. Arithmetic &amp; Binary Secret Sharing</title>
        <p>
          We consider two main algebraic structures: the ring Z where  = 2 for some bit width , and the
binary field Z2. We consider additive secret sharing between a set of parties  in both Z and Z2.
Arithmetic secret sharing. JK denotes additive shares in the ring Z for parties 0, . . . , N− 1. This
means each party  owns a share JK, where the sum of all shares is equal to . Private addition
of secret shared values is executed by local share addition and private multiplication is implemented
through Beaver triples [55]. Fixed-point representation of a floating-point number  is considered
using the standard integer encoding with nearest-integer approximation:  = ⌊︀  · 2 ⌉︀ , for precision
 [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. To decode, we simply divide by 2 .
        </p>
        <p>Binary secret sharing. Some secure operations profit from a secret sharing representation in the
binary domain. For this reason, we consider a binary secret sharing scheme for  ∈ Z2 , denoted as
⟨⟨⟩⟩. A binary secret share ⟨⟨⟩⟩ is formed by secret sharing all the bits of  in Z2 and each party 
owns a share ⟨⟨⟩⟩. To reconstruct the secret, the parties 0, . . . , N− 1 add their shares in Z2 (i.e.,
equivalent to bitwise XOR).</p>
        <p>
          Conversions. Arithmetic shares suit best arithmetic operations (+, × ) and binary shares perform best
in evaluating logical expressions (XOR, AND, right-shift and left-shift operations). For this reason,
following the line of research of mixed-mode secure computation [
          <xref ref-type="bibr" rid="ref12">52, 12, 56</xref>
          ], we make use of known
conversions between the arithmetic and the binary domain. We follow the approach provided by
Damgård et al. ([
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], S3) to convert from arithmetic shares to binary shares (A2B) and the procedure
using algorithm 2 in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] to convert binary shares into arithmetic shares (B2A). We denote these
protocols by ΠA,2B and ΠB,2A, respectively, where  represents the input bitwidth and  the output
bitwidth.
        </p>
        <p>We note that the preprocessing procedure used for B2A conversion can be optimized using the
techniques proposed by Escudero et. al [52]. In particular, CrypTen’s approach requires  tuples of
shares (JK, ⟨⟨⟩⟩) for a bit , whereas [52] employs just 1 tuple of share (JK, ⟨⟨⟩⟩) for a ring element ,
with the former being comparatively more costly than the latter.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Large Language Models (LLMs)</title>
        <p>
          In 2017, Vaswani et al. [57] introduced the Transformer architecture, a type of neural network that
utilizes attention mechanisms, enabling AI models to prioritize diferent segments of the input sequence
when generating an output. A transformer comprises an encoder-only architecture (like in BERT [
          <xref ref-type="bibr" rid="ref2">2, 58</xref>
          ]),
a decoder-only (e.g., GPT), or both encoder and decoder [57]. The encoder takes the input sequence and
maps it to a lower-dimensional representation to generate a sequence of hidden states. The decoder
generates an output sequence by iteratively generating tokens from the previously generated hidden
states. In practice, multiple encoder and decoder blocks are used together to achieve higher accuracy.
Attention. A core component of transformers is self-attention, a mechanism that weighs the importance
of diferent parts of an input sequence when making predictions. Self-attention takes a query matrix
 and a set of key-value matrix pairs ,  , which are all produced by linear layers, and produces an
output as follows, where  is the dimension of the keys:
        </p>
        <p>Attention(, ,  ) = softmax
︂(  ·  )︂
√
· .</p>
        <p>Usually, the inputs to the softmax are masked so that each token is only influenced by previous tokens,
thereby creating a causal relationship.</p>
        <p>Layers. In the context of secure LLM inference, all the components of transformers can be seen as
linear and non-linear layers. A linear layer consists of matrix multiplications and feed-forward neural
networks (FNN) which can be evaluated using fixed-point arithmetic. A crucial part of evaluating
the linear layers is to maintain the fixed-point precision using truncation. 1 For instance, the matrix
multiplication  ·  in self-attention can be computed as multiple inner products, each of which
needs to be followed by a truncation. Next comes the FNN which relies on an inner product followed
by an activation function or a non-linear layer.</p>
        <p>
          A non-linear layer, like the rectified linear unit ( ReLU) [59], Gaussian error linear unit (GeLU) [60],
sigmoid linear unit (SiLU), softmax, etc., is not straightforward to evaluate securely. ReLU can be
implemented with a secure comparison (i.e., ReLU() = max(0, )), while GeLU and other activation
functions are not as trivial. Several works like CrypTen [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] resort to polynomial approximations
which has a significant impact on accuracy. More recent works like Sigma [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] approximate GeLU
using ReLU and then using lookup table methods to evaluate the diference. Lastly, layer normalization
(LayerNorm) and root mean squared normalization (RMSNorm) require non-linear functions such as
reciprocals or reciprocal square roots.
        </p>
        <p>Transformer Block. A transformer block comprises several linear and non-linear layers. This usually
includes attention, linear layers, non-linear activation functions, and layer normalization.
Embedding Layers. Like any language model, LLMs start of with a word embedding layer and a
positional embedding layer. These layers are LUT protocols based on the token value and position of
the token respectively. As LUT protocols, they can also be thought of as multiplications with a one-hot
vector. To implement these under MPC, we have several options.</p>
        <p>Firstly, even though the token value is a hidden input, its position is public. While we keep the
weights hidden under MPC, with the public input we can do a plaintext lookup. With the hidden token
1The precision on fixed-point values doubles after multiplication so truncation is used to bring it back down.
value, however, the lookup is not so straightforward. The client either has to secret share the token
as a one-hot vector, which can be quite large or we can use a random one-hot vector generated by
the provider. The alternative, which is to run equality with every possible value is also quite costly.
To relieve the client from sharing a one-hot vector for every token and to avoid running the costly
equality multiple times, we use the provider to generate the one-hot vector. This protocol then works
just like the LUT protocols, with the exception that the LUT is not public. Since the LUT, which is the
embedding weights matrix, is also private, we need to rely on Beaver triple multiplication.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. The Discrete Wavelet Transform (DWT)</title>
        <p>A discrete wavelet transformation (DWT) is a transformation that decomposes a discretely sampled
signal into approximation and detail coeficients [ 61]. The detail coeficients contain the information
about the error incurred by the approximation coeficients. Given the approximation and detail
coefifcients, one can invert the application of a DWT to obtain the original signal. We convey this idea
in Fig. 1. Starting with the whole signal we can apply a DWT to obtain a first level of approximation
and detail coeficients. This first-level approximation and detail coeficients are enough to recover
the original signal. We can repeat this process on the approximation coeficient to obtain a second
level of approximation and detail coeficients. The second-level approximation and detail coeficients
together with the first-level detail coeficients allow us to recover the original signal. This process can
be repeated as necessary to reduce the size of the approximation coeficients.</p>
        <p>Details
Level 0
Level 1</p>
        <p>Whole signal</p>
        <p>Approximation
Level 2
Figure 1: Overview of two DWT iterations. The signal is represented by the approximation (blue) and the details
(gray).</p>
        <p>There are many diferent wavelet families and each uses linear transformations (e.g., matrix
multiplication): Daubechy (Db) wavelets, Biorthogonal wavelets, and others. Let us explore in more detail two
types of wavelets: the Haar and the (5, 3) Biorthogonal transformations.
2.3.1. Haar DWT
A special case of the Db family is the Haar wavelet, also called Db1. In Haar, the approximation
coeficients are obtained by averaging every two consecutive samples of the original signal. One of the
properties of the Db wavelet of degree  (Dbm) is that if a signal encodes data of polynomials of degree
strictly below , then the corresponding detail coeficients for those polynomial parts will be zero.</p>
        <p>More specifically, the Haar DWT is a linear transformation represented by a matrix  . When applied
to a one-dimensional signal  of length 2 (where  ∈ Z), this linear transformation is defined as:
⎡ 0 ⎤ ⎡ 0 ⎤
⎢⎢ ... ⎥⎥⎥ ⎢⎢ ... ⎥⎥⎥
⎢⎢ − 1 ⎥⎥ = ⎢⎢⎢⎢−0 1⎥⎥⎥ =
 ·  =  · ⎢⎢  ⎥
⎢⎣⎢ ... ⎥⎥⎦ ⎢⎢⎣ ... ⎦⎥⎥
︂[ ]︂</p>
        <p>,
2− 1
− 1
where  and  correspond to the approximation and detail coeficients and are computed as:
 = 2 +22+1 ,  = 2 − 22+1 ,
for  = 0, 1, . . . ,  − 1. We note that the Haar DWT operation  can be applied multiple  times. For
a vector  with size  divisible by 2 ,
⎡↶⎤</p>
        <p>·  = ⎣↶⎦ ,</p>
        <p>↶ ↶
where the approximation vector  has size /2 and the detail vector  amounts to the rest (i.e.,
 − /2 ). Note that, since  is an orthogonal matrix, its inverse is obtained by its transpose. One can
see that, for  = 0, 1, . . . ,  − 1, the inverse is given by 2+1 =  +  and 2 =  − .</p>
        <sec id="sec-2-3-1">
          <title>2.3.2. Biorthogonal DWT</title>
          <p>While the Dbm wavelets use an orthogonal matrix, the Biorthogonal wavelets require two diferent
matrices where the transpose of one matrix is the inverse of the other. One matrix, , is used for the
decomposition of the signal and the other, ̃︀, is used for the reconstruction.</p>
          <p>In this work, we consider the (5, 3) Biorthogonal DWT defined by  where the corresponding
approximation and detail coeficients are given by:</p>
          <p>1
 = 8 (− 2− 2 + 2 · 2− 1 + 6 · 2 + 2 · 2+1 − 2+2)</p>
          <p>1
 = 4 (2 − 2 · 2+1 + 2+2) ,
for  = 0, . . . ,  − 1. Note that the (5, 3) Biorthogonal DWT computes the approximation and detail
coeficients with a weighted average of 5 and 3 points, respectively. The inverse transformation − 1 is
given by ̃︀ where − 1 = ̃︀⊤.</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Threat &amp; Computation Model</title>
        <p>Curl allows two or more computing parties (0, 1, . . . ) to perform private inference and training of
machine-learning models. In the most prominent use case, one party (e.g., 0) owns a trained model,
while another party (e.g., 1) aims to utilize the model for inference on their private data. Both parties
wish to keep both the data and the model’s architecture private. For instance, imagine a scenario where
0 is a cybersecurity firm developing a malware detection machine learning model that they ofer as a
service (MLaaS) to client organizations. The clients (1) would like to use the model in their internal
codebase while keeping sensitive implementation details private. In MLaaS, both the model owner
and the clients could benefit from outsourcing the model and their inputs respectively to an MPC
infrastructure and not participate in the computation.</p>
        <p>Curl is flexible and can facilitate multiple other use cases such as: (a) having diferent parties that
possess distinct sets of features jointly train a model without exchanging raw data, (b) unveiling
correlations where one party retains feature data and another party holds corresponding labels without
revealing anything else, among others. For the sake of presentation, we assume a two-party protocol
and we focus on the inference setting (i.e., 0 holds the private model and 1 holds the sensitive inputs),
as demonstrated in Fig. 2.</p>
        <p>
          We assume the parties are semi-honest (i.e., follow the protocol specification) in the pre-processing
model. This threat model enables a wide range of realistic use cases of secure machine learning
[
          <xref ref-type="bibr" rid="ref12">62, 12, 63, 46, 64</xref>
          ], while pre-processing is used to generate correlated randomness which the parties
consume in the online phase. There is a plethora of ways to generate this randomness; one can use
general-purpose MPC [65, 66, 67], specialized protocols [68, 34, 35], or more commonly a trusted dealer
[
          <xref ref-type="bibr" rid="ref12 ref20">29, 69, 50, 20, 12, 49, 30</xref>
          ]. In Curl, we resort to the latter. Lastly, as long as not all the parties collude, no
information is leaked about the private inputs (or private models depending on the use case).
        </p>
        <p>0
Private model</p>
        <p>Secret share inputs
Secret share model</p>
        <p>Final inference</p>
        <p>1
Private inputs</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Secure Truncation</title>
      <p>In this work, we use truncation over the ring Z and we define it as follows. For input JK, it outputs
′ where ′ = ⌊/2 ⌋ and  is the precision of the fixed-point representation.</p>
      <p>J K</p>
      <p>The truncation operation is required whenever we work with fixed-point numbers. Multiplication of
two fixed-point rationals with precision  obtains a fixed-point rational with precision 2 . Therefore,
we require truncation to go back to a fixed-point rational with precision  . As we analyze in Section 3.3,
the number of truncations required in a GPT-2 model is at least 220. Hence, it is paramount to use
an eficient truncation protocol. As mentioned in Section 1.1.5, works on secure computation over
ifxed-point rationals use two types of truncation protocols: a deterministic truncation, where output
′ is set exactly to ⌊/2 ⌋; or a probabilistic truncation, where ′ = ⌊/2 ⌋ +  where  = 1 with
probability ( mod 2 )/2 and  = 0 otherwise. Although implementing probabilistic truncations
incurs minimal costs, two concerns have been raised about it: security and the impact of rounding
errors in ML applications. In this section, we summarize the security issue raised by Li et al. [45] and
present a solution in the stand-alone secure computation framework [70, 71].</p>
      <sec id="sec-3-1">
        <title>3.1. Revisiting Li et al. [45]</title>
        <p>
          Li et al. [45] pointed out that some eficient probabilistic truncation protocols fail to meet the criteria
for standard simulation-based security [71], rendering them insecure in this framework. This includes
Catrina and Hoogh’s [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], CrypTen’s [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], and other proposals including [
          <xref ref-type="bibr" rid="ref11">51, 11, 52</xref>
          ].
        </p>
        <p>
          Li et al. use a variant of the protocol of [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] to exemplify that the above protocols are not simulatable
as realizations of a standard ideal probabilistic truncation functionality, because in these protocols
the rounding error  is a deterministic function of input  and an information which the protocol
reveals. Specifically, given the protocol execution, the rounding error  is a deterministic function
of () = ( mod 2 ), i.e., the last  bits of . This violates an ideal functionality for probabilistic
truncation, where  = 1 with probability ()/2 (and  = 0 with probability 1 − ()/2 ), but whether
or not  becomes 1 or 0 is not revealed.
        </p>
        <p>
          We show that the probabilistic rounding protocols cited above realize a modified probabilistic
truncation functionality, which we define in Fig. 3. The modification is natural: The functionality picks
and reveals a “cutof” point () chosen at random in Z2 = [0, ..., 2 − 1], and sets the rounding error
as  = 1 if () ≥ (), and as  = 0 otherwise. In Appendix A we prove that the -optimal
probabilistic truncation protocol of Escudero et al. [52], included in Fig. 9, realizes this modified truncation
functionality. We note that several other probabilistic truncation protocols, e.g., [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], also realize this
functionality, after adjustments related to correctness error.
        </p>
        <p>Technically, we show an eficient simulator Sim s.t. for every input  and every set C of corrupted
parties,</p>
        <p>︀( J˜′K, ˜(), ViewC(JK))︀ ≈  ︀( JK′, (), Sim(JKC, J′KC, ())︀)
where J˜′K and ˜() are the output and the cutof point set by a protocol execution, ViewC(JK) is the
adversarial view of that protocol execution, J′K and () are the output and the cutof point set by the
ideal functionality, and JKC and J′KC are the shares of resp. JK and J′K held by parties C.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Modified Probabilistic Truncation Functionality</title>
        <p>If  is an integer in Z2 , we define () = ( mod 2 ) as the  least significant bits of , and () =
⌊/2 ⌋ as the truncation of the last  bits of . In particular it holds that  = () · 2 + ().
We present the modified probabilistic truncation functionality ℱT,runcPr(JK) in Fig. 3. Note that the
proposed functionality reveals the cutof point () whereas the functionality considered by Li et al. ([45,
Functionality 2]) can be stated as following the same procedure except that it hides (). (In particular,
if  is based on the random cutof () as in Fig. 3 then  = 1 with probability ()/2 , as expected.)</p>
        <p>Note that this extra element conveys as much information as the correctness definition of the
probabilistic truncation itself, which has seen wide acceptance and use in the literature. By definition,
probabilistic truncation incurs a rounding error with probability  = ()/2 . Therefore, for a fixed
input value , one can estimate probability  given outputs ′ from multiple runs of this functionality,
and then approximate the  least significant bits of  as () ≈  · 2 . Each run of functionality
ℱT,runcPr(JK) reveals in addition the cutof points (), which can also be used to approximate ()
because each () partitions range [0, ..., 2 − 1], and bit  = ′ − () reveals to which side of this
partition () belongs.</p>
        <p>TruncPr(JK)
ℱ,
Input: Assume parties hold JK ∈ Z2 for  &lt; 2− 1.
1. Reconstruct  from sharing JK.
2. Set () = ⌊/2 ⌋ and () = ( mod 2 ).
3. Pick cutof point () at random in Z2 .
4. Set ′ = () +  where</p>
        <p>= ︂{ 01 if ot(he)r≤ wise(.),
5. Generate a random sharing J′K of ′.</p>
        <p>6. Output J′K and leak () to the adversary.</p>
        <p>We prove the following theorem in Appendix A:
Theorem 3.1. Protocol ΠTr,uncPr(JK) of Escudero et al. [52], shown in Fig. 9 in Appendix A, securely</p>
        <p>TruncPr(JK) shown in Fig. 3 with semi-honest security in the stand-alone security
realizes functionality ℱ,
framework [71].</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Comparison to CryptTen’s Truncation</title>
        <p>
          We chose to implement truncation using the protocol of Escudero et al. [52] instead of CrypTen’s
truncation [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] because of correctness and rounding errors in the latter protocol. First, as in the case of
the probabilistic truncation of [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] discussed in Section 1.1.5, CrypTen’s truncation incurs a correctness
error with probability 2− (−| |) where || is the maximum bit size of the truncation input .
        </p>
        <p>Consider the implications of the above for LLM inference. Assuming the maximum bit size of
fixedpoint elements is 24 (8-bit integer part and 16-bit precision), after multiplication, the input bit size
doubles. Since  is fixed at 64 bits, the correctness error occurs with a probability of 2− (64− 48) = 2− 16.
For instance, the GPT-2 model requires at least 220 inner products.2 Assuming a truncation is executed
2For GPT-2, model = 768,  = 3072, heads = 12, and k = 64. Considering an input of size  = 64, the token embedding
requires model ×  inner products, each attention mechanism requires (k + model) × , and each feed-forward layer requires
at least (f + model) ×  inner products. This amounts to at least 3, 637, 248 &gt; 220 inner products.
after each inner product, we expect 16 random errors, undermining inference correctness.</p>
        <p>We verified this experimentally by running 220 CrypTen truncations of 48-bit elements 30 times,
resulting in an average error of 15.83, which aligns with the expected value of 24. By contrast, the
probabilistic truncation protocol of Escudero et al. [52] is always correct.</p>
        <p>Furthermore, CrypTen’s truncation mechanism exhibits worse rounding errors compared to
functionality in Fig. 3 realized by the protocol of [52], even for  = 2 parties, increasing further if the number
of parties increases. CrypTen’s rounding error is linear in N because each party truncates their share of
the input value  locally, i.e., [′] = ⌊︀ []/2 ⌋︀ . Since each [′] can be 1 away from the rational value,
summing N of them results in the output ′ which can be up to N away from (). In contrast, value ′
output by the truncation protocol of [52] is at most 1 away from (). For these reasons we opt for the
truncation protocol from Escudero et al. [52] instead of CrypTen’s truncation mechanism after each
multiplication.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Curl Overview</title>
      <p>As we mentioned in Section 2.3, DWT is used to decompose signals into approximation and detail
coeficients. DWT is particularly efective for signals that represent smooth functions (e.g., logarithm,
square root) since the detail coeficients are relatively small compared to the approximation coeficients.
This unique observation allows us to set the detail coeficients to zero and invert the application of the
DWT to obtain an approximation of the original signal. The main benefit of this process is that for
smooth functions we acquire detailed approximations while requiring significantly fewer samples than
the original signals. We leverage this to develop a secure protocol with compressed LUTs achieving
high accuracy with low round and communication complexities.</p>
      <p>We begin by analyzing several DWT techniques, specifically Haar and Biorthogonal DWT, for
plaintext evaluation of LUTs and we translate the plaintext procedures into the secure setting. Next,
we apply our DWT LUT techniques and create Curl, an easy-to-use framework for privacy-preserving
machine learning that exposes a tensor-based programming model for both CPU and GPU backends.
Finally, we evaluate Curl on non-linear functions and LLMs securely, improving accuracy and reducing
communication compared to related works.</p>
      <sec id="sec-4-1">
        <title>4.1. Improving Approximations with DWT-Encoded LUTs</title>
        <p>
          Our key contribution lies in utilizing discrete wavelet transform (DWT) to securely evaluate a table.
We start by explaining the secure evaluation of a look-up table (LUT) and demonstrate the application
of both Haar and Biorthogonal DWT compression techniques for function evaluation using plaintext
values. Next, we extend those two techniques to the secure setting. Our protocols use some techniques
introduced in [
          <xref ref-type="bibr" rid="ref23">72, 33, 23</xref>
          ] but contrary to previous works, Curl can be easily extended to any number of
parties. For simplicity, we present our protocols in the two-party setting.
        </p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Secure LUT Evaluation</title>
          <p>Let F : R → R be a function and consider its fixed-point encoded counterpart tF : Z2 → Z2 , with
precision  . The function tF can be represented as a LUT in (Z2 )2 , where tF() corresponds the
value at the -th index. We briefly explain the method for securely computing tF(JK) = JK, which is
depicted in Fig. 4.</p>
          <p>Before initiating the protocol, both parties 0 and 1 can locally produce a LUT tF representation of
the function F. This evaluation needs to be performed only once and can be reused indefinitely. The
secure evaluation of the LUT begins with the dealer generating a random element  ∈ Z2 . Then, it
generates a 1-hot vector  with elements equal to 0 everywhere except in the -th entry. Then, the
dealer secret-shares to both 0 and 1 the random value  along with the 1-hot vector . These steps
make up the preprocessing phase as they do not depend on the function input . During the online
phase, the parties locally compute the ofset value  given by  =  − . This ofset value is revealed to

− ← $ Z
 = [0, . . . , 0, 1, 0, . . . , 0]</p>
          <p>s.t.  is 1-hot at 
Share  to JK0, JK1
Share  to JK0 J K
,  1
 0, JK0, JK0
J K
Reveal  =  − 
0
1
 1, JK1, JK1
J K
Reveal  =  − 
JK1
J0K
. . .</p>
          <p>JJJ100KKK</p>
          <p>Rotate by 
&amp; Multiply
Rotate by 
&amp; Multiply</p>
          <p>LUT tF
JK1
J0K
J119K
J0K
. . .</p>
          <p>J0K
th index
both parties 0 and 1 as it is used to locally rotate the secret-shared vector . Note that  cannot be
revealed to the dealer, otherwise  would be able to recover the underlying value of the input . After
the parties rotate their shares of  by  , the underlying value 1 is placed at  + ( − ) = -th index.
Finally, the parties compute the inner product between the shares of the rotated vector and the public
LUT tF. Indeed, this yields the share of tF at index . All operations are executed in the ring Z2 .</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Look-Up Table Approximation</title>
        <p>As previously discussed, DWT provides a way to approximate signals. We observe that it is specifically
well-suited for smooth functions, such as exponential, sigmoid, logarithm, and square root, to name
a few. Additionally, it provides a compression mechanism that allows us to reduce bandwidth usage.
Next, we delve into two approaches for evaluating a DWT-approximated LUT in the plaintext setting.
This exploration will help inform the secure protocols for both Haar and (5, 3) Biorthogonal DWTs.
Approach 1. Recall from Section 2.3 that when DWT is applied to a vector   times, we end up
↶ ↶ ↶
with two vectors  and  , where  is 2 times smaller than the original vector. By setting the
detail coeficients to zero, we define the approximation of  to be the approximation coeficients. This
efectively reduces the size of the original vector by 2 .</p>
        <p>Similar to how we apply the DWT to a general vector  ∈ (Z2 )2 , we apply it to the LUT
repre↶
sentation tF. We define tF ∈ (Z2 )2−  to be the LUT containing only the approximation coeficients
of the original LUT after  DWT operations,   · tF. Note that, in this process, the original LUT is
compressed by a factor of 2 , while the original input value  remains within Z2 . This leads to a
↶
natural question: how do we find the original input  in the -times compressed table tF ? To locate  in
↶ ↶
tF, we truncate  by  bits and find the corresponding index in tF. This process yields the following
approximation:</p>
        <p>↶
tF() ≈ tF( ≫ ),
↶
which is depicted in Fig. 5. Observe that the output bitwidth of the compressed LUT tF is the same as
the output bitwidth of the original table tF.</p>
        <p>Approach 2. From DWT theory, we can retrieve the original signal from the approximation and
detail coeficients by evaluating the inverse transform over them. Moreover, it is known that the
inverse transform provides a method to amplify a signal, by considering the original signal as the

and we obtain tF() ≈ ̃t︀F() for any DWT type.</p>
        <p>Next, we explain both approaches for the Haar and (5, 3) Biorthogonal transformations. We discuss
how the structure of the DWT matrices afects the approximation, as this provides insight into translating
a plaintext approximation to a secure protocol. Finally, we extend these approaches to the secure setting
and present the corresponding protocols.</p>
        <sec id="sec-4-2-1">
          <title>4.2.1. Haar-LUT Evaluation</title>
          <p>The two aforementioned approaches presented in Section 4.2 can be applied to the Haar DWT.
Considering vectors of size 2, recall from Section 2.3.1 that the Haar DWT compresses data by computing
the average of two elements:  = (2 + 2+1)/2, for  = 0, 1, . . . , 2− 1 − 1. In other words, as 
represents the  − 1 most significant bits (MSB) of the original index, the index elements with the same
 − 1 MSBs are averaged. Applying consecutively DWT  times, the DWT averages the index elements
with the same  −  MSBs.</p>
          <p>We observe that for the Haar DWT, both approaches 1 and 2 yield the same approximation. As
described in Section 4.2, the inverse Haar transformation is given by 2+1 =  +  and 2 =  − .
Since the detail coeficients are set to 0, we observe that 2 = 2+1 = , i.e., the elements from
indexes with the same  − 1 MSBs are reconstructed to the same value. In other words, the inverse
transformation duplicates the entries of the indexes having the same  − 1 MSBs. This yields the
↶1
same result as taking the  − 1 MSB of  and looking up the compressed table tF , i.e., more generally
↶
tF( ≫ ) = ̃t︀F().</p>
          <p>Secure Haar LUT Evaluation. The techniques described above can be used together to securely
compute an approximated function F through a LUT. The full protocol is formally described in Fig. 6.</p>
          <p>The secure LUT procedure of Fig. 4 can be adapted to be compatible with the DWT compression
technique. Assume input bitwidth , output bitwidth , and the number of DWT transformations to be
. The secure Haar LUT protocol, ΠH,aa,r,-FLUT (Fig. 6) is given as follows. During the preprocessing phase,
similar to the secure LUT evaluation, the dealer produces a secret-shared random 1-hot vector. However,
to profit from DWT compression, the 1-hot vector has reduced size, i.e., 2−  , and  ∈ Z2−  . Following
approach 1, we have that the look-up index is given by the  −  MSBs of the input . Therefore, the
↶
shift  to be applied to the compressed LUT tF is computed using the  −  MSBs of  and . The online
Parameters: ,  for input bit-width and output bit-width. F is the function to be computed and
represented by the corresponding − times compressed LUT ↶tF.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Dealer setup:</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>LUT evaluation for parties  for  ∈ {0, 1}:</title>
      <p>4.  truncates the input value JK by : J* K = J ≫ K.
5.  reduces its share J* K by locally computing J* K mod 2− .
6.  computes J K = J* K − JK and reveal  .
7.  rotates JK by  .</p>
      <p>9. Output: JK.</p>
      <p>8.  applies the inner product between ↶tF and the rotated vector, and gets the share JK.
◁ Truncation protocol [52, §5.1]
phase starts with the parties truncating the shares of  by  and computing its modulo in Z2−  . As we
saw, the truncation from the DWT-LUT evaluation approach 1 can be executed with the Escudero et
al. [52, §5.1] probabilistic truncation protocol over rings. The modulo operation is required because the
shares of  belong to Z2−  and the shares of truncated  belong to Z2 . We note that modulo can be
executed locally as described below. The rest of the protocol is similar to the online phase of the secure
LUT evaluation (Fig. 4).</p>
      <p>
        Local modulo. In the literature, several protocols evaluate the modulo operation over shares in a ring or
ifeld [
        <xref ref-type="bibr" rid="ref24">24, 52</xref>
        ]. Those protocols are used whenever the space of the modulo operation is the same for both
inputs and outputs. However, we observe that the modulo operation required for our protocol needs
to reduce the ring size of the operand. More specifically, the shares J* K in Z2 have to be reduced to
shares in Z2−  . This can be achieved locally by letting the parties compute the modulo operation over
their shares. For correctness, consider  = (JK1 + JK2) mod 2 and ′ =  mod 2−  . We have,
′ =
      </p>
      <p>mod 2− 
= ((JK1 + JK2)
= (JK1 + JK2) mod 2− 
= (︀ JK1 mod 2−  + JK2
= (︀ J′K1 + J′K2)︀ mod 2−  .</p>
      <p>mod 2)
mod 2− 
◁ 2−  is a factor of 2.
mod 2−  )︀
mod 2− 
Thus, the shares of ′ can be computed locally from JK.</p>
      <sec id="sec-5-1">
        <title>4.2.2. Biorthogonal-LUT Evaluation</title>
        <p>Now, we explore the LUT DWT approximation for the (5, 3) Biorthogonal DWT. As described in
Section 2.3.2, the approximation coeficients are given by a weighted average of 5 elements and, similar
to Haar, approach 1 for the Biorthogonal compresses two indexes with the same  − 1 MSBs to the
↶
same index. On the other hand, approach 2 requires computing −  · eF =: ̃t︀F as shown in Eq. (1). For
simplicity, we set  = 1 and examine the structure of the inverted Biorthogonal DWT matrix − 1:
To simplify the exposition, we present matrix − 1 only with the elements that interact with the
approximation coeficients, i.e.,</p>
        <p>tF. Thus, the detail coeficients are efectively set to zero. Observe that
the line indexes with the same  − 1 MSBs ( green ) constitute a 2 × 2 submatrix that combines two
elements from ↶tF1, i.e., its corresponding index element and the subsequent element. For instance, the
LSB ( red ) of the line index plays a role in the weights of the sub-matrices. Indeed, we have:
001 -th block of ̃t︀F, combines the 001 -th and the ( 001 + 1)-th index element of ↶tF1. Note that the</p>
        <p>Finally, consider the case  = 2. We omit the (1/4) constant for readability. The 2 LSBs of the indices
× 2 sub-matrices and the  − 2 MSBs of the indices define the elements from
↶
.
.
.
1 (︂
4</p>
        <p>Index</p>
        <p>We can generalize the above pattern for any , which provides an approximation formula suited for
the secure setting:
̃t︀F() =</p>
        <p>↶
︀( 2 − () · tF(()) + () · tF(() + 1) ,
︀)
↶
︂)
(2)
Index
. . ⎟⎠⎟⎟ ⎜⎝⎜
.</p>
        <p>⎞ ⎛
⎟⎟⎟ ⎜⎜⎜
⎟ ⎜⎜ 2− 2 ⎟⎟
⎟
↶Fe2xt
0
.
.
.
0
.
.
.
0
⎛
⎜⎜ (22 −
⎜⎜ (22 −
⎜⎜ (22 −
⎞
⎟
⎟
⎟
(22 − 0 ) ·  0 + 0 ·  0 +1 ⎟
1 ) ·  0 + 1 ·  0 +1 ⎟⎟
2 ) ·  0 + 2 ·  0 +1 ⎟⎟
⎟
⎟
⎟
⎟
⎠
⎟⎟ = ⎜⎜⎜ (22 − 0 ) ·  1 + 0 ·  1 +1 ⎟⎟</p>
        <p>⎟
⎟</p>
        <p>3 ) ·  0 + 3 ·  0 +1 ⎟⎟
⎜⎜ (22 −
⎜⎜ (22 −
⎜⎜ (22 − 3 ) ·  1 + 3 ·  1 +1 ⎟⎠
⎝ .</p>
        <p>2 ) ·  1 + 2 ·  1 +1 ⎟⎟
1 ) ·  1 + 1 ·  1 +1 ⎟⎟</p>
        <p>⎞
̃t︀F
.
.
define the elements of 4
the approximation LUT ↶tF2 to be used.</p>
        <p>MSBs can be obtained by truncation and the  LSB by the mod 2 operation.
where () and () represent the  −  MSBs and  LSBs of , respectively. We recall that the  − 
Secure Biorthogonal LUT evaluation. Eq. (2) can be used to securely compute an approximated
function F through a look-up table. The full protocol is formally described in Fig. 7. Similarly, the
secure LUT procedure described before (Fig. 4) can be adapted to be compatible with the Biorthogonal
DWT compression technique. Assume input bitwidth , output bitwidth , and the number of DWT
transformations to be .</p>
        <p>↶</p>
        <p>The secure Biorthogonal LUT protocol, ΠBi,o,r-,LFUT (Fig. 7), is similar to ΠH,aa,r,-FLUT with the diference
that it requires two LUT evaluations of tF. As an optimization, note that these two LUT evaluations
do not require two diferent one-hot vectors. Since the LUT evaluation is executed in two consecutive
indices, we can use the same vector  to generate two shifted vectors: one shifted by  and the other
J</p>
        <p>K
by  + 1. This allows us to use the same dealer setup phase as in secure Haar LUT evaluation. The
online phase starts with the parties computing () as the  −  MSBs and () as the  LSBs of . More
specifically, () is computed by truncating  by  (i.e., [52, §5.1]) and locally computing its modulo
in Z2−  , while () is computed locally based on () as  = 2 · () + (). The rest of the protocol is
Parameters: ,  for input bit-width and output bit-width. F is the function to be computed and
represented by the corresponding − times compressed LUT ↶tF.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Dealer setup:</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>LUT evaluation for parties  for  ∈ {0, 1}:</title>
      <p>locally computing ()  mod 2− .</p>
      <p>J</p>
      <p>K
4.  truncates the input value JK by , J ≫ K := J()K. Then,  reduces its share J()K by
5.  computes the  LSB of  by locally computing J()K := JK − 2 · J()K.
6.  compute J K = J()K − JK and reveal  .
7.  rotate JK by  and  + 1.
8.  apply the inner product between ↶tF and the rotated vectors. We denote the resulting shares
9.  computes the expression JK := 41
10. Output: JK.
similar to the online phase of the secure LUT evaluation (Fig. 4) on  and  + 1. In the final step 10, the
parties compute the expression (2) as the output.</p>
      <p>
        Both Haar and Biorthogonal secure evaluation protocols can be used to directly evaluate a function
over a fixed interval. Furthermore, we note that our protocols can be combined with a piece-wise
approach commonly employed in prior works [
        <xref ref-type="bibr" rid="ref15 ref28">28, 15</xref>
        ]. We extend this method in Appendix B to
accommodate any bounded even or odd function (Fig 11). We demonstrate the applicability of this
protocol to various functions, including s-shape functions such as tanh, erf and sigmoid, as well as
activation functions like GeLU and SiLUq. Additionally, in Appendix B.3, we introduce a protocol for
both sin and cos.
      </p>
      <sec id="sec-6-1">
        <title>4.3. Complexity Analysis</title>
        <p>4.3.1. Memory
The secure DWT-LUT approach, using compressed tables, significantly minimizes the memory required
for LUT handling. For instance, consider the interval (0, 4096) with a precision of  = 16. A
conventional LUT protocol would typically demand a table size of approximately 2.15 GB. However, employing
the DWT technique allows us to reduce the original table size by a factor of approximately 220 while
maintaining high accuracy levels as discussed in Section 5.1.2. The corresponding DWT-reduced LUT
occupies only 1 KB of memory, demonstrating a substantial reduction in storage requirements without
compromising accuracy.</p>
        <sec id="sec-6-1-1">
          <title>4.3.2. Round &amp; Communication</title>
          <p>Both Haar and Biorthogonal protocols have the same preprocessing phase and both start the online phase
by truncating and opening a single field element. However, the Biorthogonal requires an additional
y
c
n
e
t
a
L
10− 1
10− 2
truncation and two share multiplications (see Fig. 7 Step 9). Observe that Escudero et al. [52, §5.1]
probabilistic truncation protocol over rings involves one round to open a ring element. Consequently,
while the Haar protocol operates within 2 rounds and transmits 2 ring elements, the Biorthogonal is
more resource-intensive, requiring 4 rounds and transmitting 5 elements.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. Experimental Results</title>
      <p>We implemented Curl by building on top of CrypTen and open-source it in https://github.com/jimouris/
curl . Curl natively supports both CPU and GPU backends, however, we focus on CPU experiments as
this is the most widely used setting in MPC. For completeness, we conduct some GPU experiments using
an NVIDIA GeForce RTX 4080. All parties run in separate processes on a c5n.9xlarge AWS instance
with 36 vCPUs and 96 GB of memory. As communication and round complexity form crucial metrics
for assessing MPC end-to-end runtime, we extensively report these numbers in all our benchmarks.
This can be used to glean WAN performance.</p>
      <p>
        Furthermore, as Curl’s novelty lies in improving approximations through DWT-encoded lookup
tables, we focus on reporting the mean absolute errors (MAE) and mean relative errors (MRE) over the
expected result. Finally, we focus on private LLM inference using secret-shared models on secret-shared
data. Note that Curl can be used for private training in a similar way, which we omit due to space
constraints. We use 16 bits of precision, following related works that typically use precision between 9
and 17 bits [
        <xref ref-type="bibr" rid="ref15 ref23">46, 15, 73, 23</xref>
        ].
      </p>
      <sec id="sec-7-1">
        <title>5.1. Non-linear Functions</title>
        <p>
          We experimentally verified that the Biorthogonal protocol achieves better approximations than Haar
with almost identical latency and communication costs (Appendix C). Therefore, we use the Biorthogonal
protocol for almost all our experiments. We choose constrained ranges of input values to enable a
meaningful comparison with CrypTen. For wider ranges, such as (0, 212), CrypTen’s approximations
exhibit significantly higher errors. Since previous MPC works [
          <xref ref-type="bibr" rid="ref27 ref28">28, 27, 42</xref>
          ] rely on similar polynomial
approximations, we can argue that they all have similar error levels. For example, for the log function,
        </p>
        <p>CrypTen’s approximation yields a MAE of 1.09e+10, whereas the Biorthogonal with a compression
 = 21 achieves a substantially lower MAE of 2.66e-1. We note that the latency time reported is
dominated by computation time as the protocol is executed in the same machine.</p>
        <sec id="sec-7-1-1">
          <title>5.1.1. Choosing LUT Sizes</title>
          <p>We evaluate our secure DWT protocols to compute a set of smooth functions commonly used in
machine learning tasks such as logarithm (used for perplexity and cross-entropy that measure LLM’s
performance), square root and inverse square root (used for layer normalization), sigmoid, and SiLU. In
Fig. 8 we investigate the trade-ofs between latency on the left y-axes (green) and MAE on the right
y-axes (red) for the aforementioned non-linear functions. As a baseline, we show CrypTen’s latency
and MAE with dashed green and red lines, respectively. In the case of SiLU, which we evaluate with
two diferent protocols, CrypTen does not support this so we only show Curl’s latency and MAE. Of
course, bigger LUTs in Curl achieve better approximations but require more computation and thus
higher latency. However, DWT reduces the size of the LUTs which therefore decreases the computation
complexity of Curl while still achieving good approximations. For a LAN setting, the total runtime is
expected to be dominated by the computation time. However, for WAN, the total runtime is mostly
dominated by communication. Therefore, we focus on achieving better errors than CrypTen while
keeping a similar latency level. As an example, consider running 265× 265 log functions in a LAN
setting with 0.8ms ping-time and 3 Gbps bandwidth and in a WAN setting with 80ms ping-time and
200 Mbps bandwidth. We estimate3 Curl to take 180ms and CrypTen to take 313ms (× 1.8 more) in
LAN and Curl to take 599ms and CrypTen to take 5, 039ms (× 10.1 more) in WAN.</p>
          <p>For general and unbounded functions, we apply the secure Biorthogonal protocol (Fig. 7).
Exceptionally, we use Haar-encoded protocol (Fig. 6) for the invsqrt function as it achieves better errors whenever
the (0, 1) interval is included. For sqrt and invsqrt we consider the interval (0, 256). However, for
log we consider the interval to be (0, 64) as CrypTen’s MAE and MRE are of the order of 1e+6 for
(0, 256). This corresponds to an original LUT of size 224 for sqrt and invsqrt and of size 222 for log.
Observe the logarithm approximation in Fig. 8a in which both latency and MAE of Curl matches that
3We add the computation latency, the ping-time multiplied by the number of rounds and the amount of information sent
divided by the bandwidth. We use the numbers reported in Table 1.
of CrypTen’s for LUT sizes of 8 bits. In Figs. 8b and 8c, Curl’s MAE is always below CrypTen’s. For
these two functions, we use LUT sizes with 6 bits as these provide lower latency for Curl. We set the
compression parameter to  = 18 for both sqrt and invsqrt and 14 for log. For sigmoid, we use the
protocol presented in Fig. 11 and set the interval to be (− 64, 64). As analyzed in Appendix B.1, sigmoid
is not an odd function but the last step of the protocol can be adapted to be applied to any s-shape
function, including sigmoid. We observe that Curl achieves a better approximation than CrypTen for
LUT sizes with 6 or more bits with runtime being marginally slower. Finally, for SiLU, we set the
interval to be (− 64, 64) and we use two diferent protocols: the approach described in Appendix B.2
which is a derivation of Fig 11 protocol (Fig. 8e); and the Biorthogonal protocol (Fig. 8f). We observe
that Appendix B.2 approach achieves better MAE errors when compared to the Biorthogonal protocol.
The latency from Fig. 8e is less afected by an increase in the LUT when compared with Fig. 8f. The
same behavior happens with GeLU and SiLU as we can observe in Table 1. This is justified by the fact
that the approach from Appendix B.2 requires comparisons, whether the Biorthogonal protocol does
not. Indeed, note that in all functions where a DWT-LUT protocol is applied, the latency decreases by
half when we half the LUT size.</p>
        </sec>
        <sec id="sec-7-1-2">
          <title>5.1.2. Evaluations</title>
          <p>Having found the optimal tradeofs for LUT sizes, we compare the Curl approximations with the DWT
technique to CrypTen’s polynomial approximations in terms of latency, communication, and errors.
In Table 1, we evaluate a set of commonly used non-linear functions including logarithm, inverse
square root, the sigmoid activation function, as well as GeLU and SiLU. For unbounded functions, we
use the protocol from Fig. 7, while for bounded functions (e.g., sin) we use our optimized protocols
which are referenced in the table. We experimentally verified that, in most cases, the Biorthogonal
protocol outperforms the Haar DWT in terms of both MAE and MRE, with the only exception being
invsqrt. Observe that for the same LUT size, the latency for Haar is 2/3 lower than the Biorthogonal
approach (compare sqrt 0.06s latency using Biorthogonal with invsqrt 0.04s latency using Haar). For
each function, we specify the evaluation domain for which we report the errors (i.e., MAE and MRE).</p>
          <p>Observe that all functions that utilize the protocol from Fig. 7 and Fig. 6 have constant rounds and
communication, regardless of the LUT size, as the one-hot vectors (which depend on the size) are
generated during preprocessing from the dealer. Latency is afected by the LUT size (e.g., sqrt vs log)
since the LUT output value is accumulated with shares of zero, which are as many as the LUT size
(see last step of Fig. 4). However, as discussed before, the end-to-end latency will be dominated by the
number of rounds and total communication size, for which Curl is almost always significantly better.
Regarding trigonometric functions (sin and cos) we use the approach proposed in Appendix B.3. We note
that CrypTen’s approximation of both functions already has a 1.2e+2 error for the range (− 128, 128).
For this reason, we decrease the interval range to (− 64, 64). On the other hand, Curl’s approximation
keeps the same level of errors independent of the input range. Moreover, Curl achieves that while
having less latency, rounds and communication when compared with CrypTen.</p>
        </sec>
      </sec>
      <sec id="sec-7-2">
        <title>5.2. LLM Inference</title>
        <p>We evaluate Curl with BERT Tiny (4 million parameters), BERT Base (110 million parameters), BERT
Large (340 million parameters), GPT-2 (124 million parameters), and GPT Neo (1.3 billion parameters).
In Table 2, we report the latency, number of rounds, and total communication for each LLM for 64
inputs. For completeness, we also evaluate BERT Tiny and Base on GPU for 64 elements and report
between 2 − 3× speedup compared to CPU (1.1 and 6.5 seconds, respectively). Finally, we use the QNLI
classification task from the GLUE benchmark [ 74] to evaluate the accuracy of our models. Curl achieves
an accuracy of 80.3% for encrypted BERT Tiny, which almost matches the plaintext accuracy (i.e.,
81.4%). This further verifies our claim regarding the probabilistic truncation protocol that is suficiently
accurate to evaluate LLMs. However, we observed that in LLMs the LUT protocols are not great for
inverse square root as the input range varies and the output of the function changes rapidly in (0, 1].
Thus, to achieve high accuracy we resort to a comparison and two LUTs: a dense one between (0, 1]
and a sparse one in (1, 256), getting the best of both worlds.</p>
        <p>Next, we compare with Iron, MPCFormer, Puma, and Bolt in Table 3 for BERT Base with 128 items.
In terms of end-to-end latency, Curl outperforms all other frameworks by at least 1.5× , with as much
as 20× in the case of Iron. More importantly, Curl achieves this with significantly less amount of
rounds and total communication. Specifically, Iron and Bolt require 8× and 6× more rounds than Curl,
respectively, while in terms of total communication, all other frameworks need more than 10 GBs (with
Iron needing 281 GBs) while Curl only needs 5.7 GBs. Although the end-to-end runtime in a real-world
instantiation would increase due to communication, Curl would not be afected as much by that since it
exhibits the lowest communication and number of rounds.
† Approximated since CrypTen could not fit GPT Neo in RAM.</p>
        <p>† In Bolt, WE stands for word elimination.</p>
        <p>Finally, in Table 4 we compare Curl with CrypTen for five LLMs with 64 inputs. This difers from
Tables 2 and 3 as CrypTen lacks a protocol to evaluate embeddings (Section 2.2). Thus, in Table 4, we
skip embedding for a fair comparison. Curl needs less than half the amount of rounds CrypTen does for
all five models and almost half the total communication. Although Curl’s runtimes are already faster
than CrypTen’s, the diference in end-to-end latency will increase further in a realistic instantiation due
to communication.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>6. Concluding Remarks</title>
      <p>We have introduced Curl, a framework for privacy-preserving machine learning that evaluates non-linear
functions as lookup tables, resulting in better approximations and significant round and communication
reduction. Curl achieves this by relying on discrete wavelet transformations to reduce the lookup table
sizes without sacrificing accuracy, which has resulted in up to 19× round and communication reduction
compared to related works. We have evaluated Curl on five large language models such as BERT, GPT-2,
and GPT Neo (1.3 billion parameters), and have compared it against state-of-the-art related works,
significantly reducing latency, the number of communication rounds, and the total communication. On
top of that, Curl is easy to use by exposing a tensor-based programming model that machine learning
researchers and developers are familiar with which allows for both CPU and GPU backends. Lastly,
we have introduced a new ideal functionality for probabilistic truncation protocols and proved their
security in the real-ideal paradigm. This is of independent interest, as truncation is a core component of
encrypted machine-learning models, and probabilistic truncation was previously considered insecure.
[29] E. Boyle, N. Chandran, N. Gilboa, D. Gupta, Y. Ishai, N. Kumar, M. Rathee, Function secret
sharing for mixed-mode and fixed-point secure computation, in: A. Canteaut, F.-X. Standaert
(Eds.), Advances in Cryptology – EUROCRYPT 2021, Part II, volume 12697 of Lecture Notes in
Computer Science, Springer, Heidelberg, Germany, Zagreb, Croatia, 2021, pp. 871–900. doi:10.
1007/978-3-030-77886-6_30.
[30] K. Storrier, A. Vadapalli, A. Lyons, R. Henry, Grotto: Screaming fast (2+1)-PC or Z2 via (2, 2)-DPFs,
in: ACM CCS 2023: 30th Conference on Computer and Communications Security, ACM Press,
2023, pp. 2143–2157. doi:10.1145/3576915.3623147.
[31] D. Mouris, P. Sarkar, N. G. Tsoutsos, PLASMA: Private, Lightweight Aggregated Statistics against
Malicious Adversaries, Proceedings on Privacy Enhancing Technologies 2024 (2024) 1–19. doi:10.
56553/popets-2024-0064.
[32] D. Mouris, C. Patton, H. Davis, P. Sarkar, N. G. Tsoutsos, Mastic: Private Weighted Heavy-Hitters
and Attribute-Based Metrics, Proceedings on Privacy Enhancing Technologies 2025 (2025) 1–30.
[33] Y. Ishai, E. Kushilevitz, S. Meldgaard, C. Orlandi, A. Paskin-Cherniavsky, On the power of correlated
randomness in secure computation, in: A. Sahai (Ed.), TCC 2013: 10th Theory of Cryptography
Conference, volume 7785 of Lecture Notes in Computer Science, Springer, Heidelberg, Germany,
Tokyo, Japan, 2013, pp. 600–620. doi:10.1007/978-3-642-36594-2_34.
[34] G. Dessouky, F. Koushanfar, A.-R. Sadeghi, T. Schneider, S. Zeitouni, M. Zohner, Pushing the
communication barrier in secure computation using lookup tables, in: ISOC Network and Distributed
System Security Symposium – NDSS 2017, The Internet Society, San Diego, CA, USA, 2017, pp.
1–15.
[35] A. Brüggemann, R. Hundt, T. Schneider, A. Suresh, H. Yalame, FLUTE: Fast and secure lookup
table evaluations, in: 2023 IEEE Symposium on Security and Privacy, IEEE Computer Society
Press, San Francisco, CA, USA, 2023, pp. 515–533. doi:10.1109/SP46215.2023.10179345.
[36] L. Ducas, D. Micciancio, FHEW: Bootstrapping homomorphic encryption in less than a second,
in: E. Oswald, M. Fischlin (Eds.), Advances in Cryptology – EUROCRYPT 2015, Part I, volume
9056 of Lecture Notes in Computer Science, Springer, Heidelberg, Germany, Sofia, Bulgaria, 2015,
pp. 617–640. doi:10.1007/978-3-662-46800-5_24.
[37] I. Chillotti, N. Gama, M. Georgieva, M. Izabachène, TFHE: Fast fully homomorphic encryption
over the torus, Journal of Cryptology 33 (2020) 34–91. doi:10.1007/s00145-019-09319-x.
[38] I. Chillotti, M. Joye, P. Paillier, Programmable bootstrapping enables eficient homomorphic
inference of deep neural networks, in: S. Dolev, O. Margalit, B. Pinkas, A. Schwarzmann (Eds.),
Cyber Security Cryptography and Machine Learning, Springer International Publishing, Cham,
2021, pp. 1–19.
[39] C. Gouert, D. Mouris, N. G. Tsoutsos, HELM: Navigating Homomorphic Encryption through Gates
and Lookup Tables, Cryptology ePrint Archive, Paper 2023/1382, 2023. https://eprint.iacr.org/2023/
1382.
[40] C. Gouert, M. Ugurbil, D. Mouris, M. de Vega, N. G. Tsoutsos, Ripple: Accelerating Programmable
Bootstraps for FHE with Wavelet Approximations, in: International Conference on Information
Security, Springer, 2024, pp. 1–20.
[41] T. Chen, H. Bao, S. Huang, L. Dong, B. Jiao, D. Jiang, H. Zhou, J. Li, F. Wei, THE-X:
Privacypreserving transformer inference with homomorphic encryption, in: S. Muresan, P. Nakov,
A. Villavicencio (Eds.), Findings of the Association for Computational Linguistics: ACL 2022,
Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 3510–3520. doi:10.18653/
v1/2022.findings-acl.277.
[42] Y. Dong, W. jie Lu, Y. Zheng, H. Wu, D. Zhao, J. Tan, Z. Huang, C. Hong, T. Wei, W. Cheng, Puma:</p>
      <p>Secure inference of llama-7b in five minutes, arXiv preprint arXiv:2307.12533 (2023).
[43] J. Ma, Y. Zheng, J. Feng, D. Zhao, H. Wu, W. Fang, J. Tan, C. Yu, B. Zhang, L. Wang, SecretFlow-SPU:
A performant and User-Friendly framework for Privacy-Preserving machine learning, in: 2023
USENIX Annual Technical Conference (USENIX ATC 23), USENIX Association, Boston, MA, 2023,
pp. 17–33.
[44] S. Gupta, A. Agrawal, K. Gopalakrishnan, P. Narayanan, Deep learning with limited numerical
precision, in: Proceedings of the 32nd International Conference on International Conference on
Machine Learning - Volume 37, ICML’15, JMLR.org, 2015, p. 1737–1746.
[45] Y. Li, Y. Duan, Z. Huang, C. Hong, C. Zhang, Y. Song, Eficient 3PC for binary circuits with
application to Maliciously-Secure DNN inference, in: 32nd USENIX Security Symposium (USENIX
Security 23), USENIX Association, Anaheim, CA, 2023, pp. 5377–5394.
[46] P. Mohassel, Y. Zhang, SecureML: A system for scalable privacy-preserving machine learning, in:
2017 IEEE Symposium on Security and Privacy, IEEE Computer Society Press, San Jose, CA, USA,
2017, pp. 19–38. doi:10.1109/SP.2017.12.
[47] A. Patra, A. Suresh, BLAZE: Blazing fast privacy-preserving machine learning, in: ISOC Network
and Distributed System Security Symposium – NDSS 2020, The Internet Society, San Diego, CA,
USA, 2020, pp. 1–18.
[48] A. P. K. Dalskov, D. Escudero, M. Keller, Fantastic four: Honest-majority four-party secure
computation with malicious security, in: M. Bailey, R. Greenstadt (Eds.), USENIX Security 2021:
30th USENIX Security Symposium, USENIX Association, 2021, pp. 2183–2200.
[49] T. Ryfel, P. Tholoniat, D. Pointcheval, F. R. Bach, AriaNN: Low-interaction privacy-preserving
deep learning via function secret sharing, Proceedings on Privacy Enhancing Technologies 2022
(2022) 291–316. doi:10.2478/popets-2022-0015.
[50] K. Gupta, D. Kumaraswamy, N. Chandran, D. Gupta, LLAMA: A low latency math library
for secure inference, Proceedings on Privacy Enhancing Technologies 2022 (2022) 274–294.
doi:10.56553/popets-2022-0109.
[51] I. Damgård, D. Escudero, T. K. Frederiksen, M. Keller, P. Scholl, N. Volgushev, New primitives
for actively-secure MPC over rings with applications to private machine learning, in: 2019 IEEE
Symposium on Security and Privacy, IEEE Computer Society Press, San Francisco, CA, USA, 2019,
pp. 1102–1120. doi:10.1109/SP.2019.00078.
[52] D. Escudero, S. Ghosh, M. Keller, R. Rachuri, P. Scholl, Improved primitives for MPC over mixed
arithmetic-binary circuits, in: D. Micciancio, T. Ristenpart (Eds.), Advances in Cryptology –
CRYPTO 2020, Part II, volume 12171 of Lecture Notes in Computer Science, Springer, Heidelberg,
Germany, Santa Barbara, CA, USA, 2020, pp. 823–852. doi:10.1007/978-3-030-56880-1_29.
[53] S. Wagh, D. Gupta, N. Chandran, SecureNN: 3-party secure computation for neural network
training, Proceedings on Privacy Enhancing Technologies 2019 (2019) 26–49. doi:10.2478/
popets-2019-0035.
[54] M. Keller, K. Sun, Efectiveness of MPC-friendly Softmax Replacement, arXiv, 2020. doi: 10.48550/</p>
      <p>ARXIV.2011.11202.
[55] D. Beaver, Eficient multiparty protocols using circuit randomization, in: J. Feigenbaum (Ed.),
Advances in Cryptology – CRYPTO’91, volume 576 of Lecture Notes in Computer Science, Springer,
Heidelberg, Germany, Santa Barbara, CA, USA, 1992, pp. 420–432. doi:10.1007/3-540-46766-1_
34.
[56] A. Patra, T. Schneider, A. Suresh, H. Yalame, ABY2.0: Improved mixed-protocol secure two-party
computation, Cryptology ePrint Archive, Report 2020/1225, 2020. https://eprint.iacr.org/2020/1225.
[57] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,
Attention is All you Need, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S.
Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems, volume 30,
Curran Associates, Inc., 2017, pp. 1–11.
[58] J. Yang, H. Jin, R. Tang, X. Han, Q. Feng, H. Jiang, S. Zhong, B. Yin, X. Hu, Harnessing the power of
LLMs in practice: A survey on ChatGPT and beyond, ACM Transactions on Knowledge Discovery
from Data (2023).
[59] K. Fukushima, Cognitron: A self-organizing multilayered neural network, Biological cybernetics
20 (1975) 121–136.
[60] D. Hendrycks, K. Gimpel, Gaussian error linear units (GELUs), arXiv preprint arXiv:1606.08415
(2016).
[61] P. Van Fleet, Discrete Wavelet Transformations: An Elementary Approach with Applications,</p>
      <p>Wiley, 2019.
[62] C. Juvekar, V. Vaikuntanathan, A. Chandrakasan, GAZELLE: A low latency framework for secure
neural network inference, in: W. Enck, A. P. Felt (Eds.), USENIX Security 2018: 27th USENIX
Security Symposium, USENIX Association, Baltimore, MD, USA, 2018, pp. 1651–1669.
[63] J. Liu, M. Juuti, Y. Lu, N. Asokan, Oblivious neural network predictions via MiniONN
transformations, in: B. M. Thuraisingham, D. Evans, T. Malkin, D. Xu (Eds.), ACM CCS 2017: 24th Conference
on Computer and Communications Security, ACM Press, Dallas, TX, USA, 2017, pp. 619–631.
doi:10.1145/3133956.3134056.
[64] M. S. Riazi, M. Samragh, H. Chen, K. Laine, K. E. Lauter, F. Koushanfar, XONN: XNOR-based
oblivious deep neural network inference, in: N. Heninger, P. Traynor (Eds.), USENIX Security
2019: 28th USENIX Security Symposium, USENIX Association, Santa Clara, CA, USA, 2019, pp.
1501–1518.
[65] A. C.-C. Yao, How to generate and exchange secrets (extended abstract), in: 27th Annual
Symposium on Foundations of Computer Science, IEEE Computer Society Press, Toronto, Ontario,
Canada, 1986, pp. 162–167. doi:10.1109/SFCS.1986.25.
[66] M. Keller, E. Orsini, P. Scholl, MASCOT: Faster malicious arithmetic secure computation with
oblivious transfer, in: E. R. Weippl, S. Katzenbeisser, C. Kruegel, A. C. Myers, S. Halevi (Eds.),
ACM CCS 2016: 23rd Conference on Computer and Communications Security, ACM Press, Vienna,
Austria, 2016, pp. 830–842. doi:10.1145/2976749.2978357.
[67] M. Keller, MP-SPDZ: A versatile framework for multi-party computation, in: J. Ligatti, X. Ou,
J. Katz, G. Vigna (Eds.), ACM CCS 2020: 27th Conference on Computer and Communications
Security, ACM Press, Virtual Event, USA, 2020, pp. 1575–1590. doi:10.1145/3372297.3417872.
[68] J. Doerner, a. shelat, Scaling ORAM for secure computation, in: B. M. Thuraisingham, D. Evans,
T. Malkin, D. Xu (Eds.), ACM CCS 2017: 24th Conference on Computer and Communications
Security, ACM Press, Dallas, TX, USA, 2017, pp. 523–535. doi:10.1145/3133956.3133967.
[69] E. Boyle, N. Gilboa, Y. Ishai, Secure computation with preprocessing via function secret sharing,
in: D. Hofheinz, A. Rosen (Eds.), TCC 2019: 17th Theory of Cryptography Conference, Part I,
volume 11891 of Lecture Notes in Computer Science, Springer, Heidelberg, Germany, Nuremberg,
Germany, 2019, pp. 341–371. doi:10.1007/978-3-030-36030-6_14.
[70] R. Canetti, Security and composition of multiparty cryptographic protocols, Journal of
CRYPTOL</p>
      <p>OGY 13 (2000) 143–202.
[71] Y. Lindell, How to simulate it - A tutorial on the simulation proof technique, Cryptology ePrint</p>
      <p>Archive, Report 2016/046, 2016. https://eprint.iacr.org/2016/046.
[72] B. Chor, O. Goldreich, E. Kushilevitz, M. Sudan, Private information retrieval, in: 36th Annual
Symposium on Foundations of Computer Science, IEEE Computer Society Press, Milwaukee,
Wisconsin, 1995, pp. 41–50. doi:10.1109/SFCS.1995.492461.
[73] S. Wagh, S. Tople, F. Benhamouda, E. Kushilevitz, P. Mittal, T. Rabin, Falcon: Honest-majority
maliciously secure framework for private deep learning, Proceedings on Privacy Enhancing
Technologies 2021 (2021) 188–208. doi:10.2478/popets-2021-0011.
[74] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, S. Bowman, GLUE: A multi-task benchmark and
analysis platform for natural language understanding, in: T. Linzen, G. Chrupała, A. Alishahi
(Eds.), Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting
Neural Networks for NLP, Association for Computational Linguistics, Brussels, Belgium, 2018, pp.
353–355. doi:10.18653/v1/W18-5446.
[75] S. Elfwing, E. Uchibe, K. Doya, Sigmoid-weighted linear units for neural network function
approximation in reinforcement learning, Neural Networks 107 (2018) 3–11. doi:10.1016/j.
neunet.2017.12.012.
[76] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language Models are Unsupervised</p>
      <p>Multitask Learners, 2019.</p>
    </sec>
    <sec id="sec-9">
      <title>A. Proof of Theorem 3.1</title>
      <p>We consider the probabilistic truncation protocol of Escudero et al. [52], denoted ΠTr,uncPr(JK) and
shown in Fig. 9. Below we prove Theorem 3.1, i.e. we show that protocol ΠTr,uncPr(  ) securely realizes
J K
the modified probabilistic truncation functionality
simplify notation, we assume below that ℓ =  −
is done over Z2ℓ+1 instead of Z2 .
˜ = 2− 1 + 2  + ′. In the general case of ℓ &lt;  all arguments are the same except that computation
ℱ,</p>
      <p>TruncPr(JK) shown in Fig. 3 in Section 3. To
1, in which case ˜ =  = ( + ˜) mod 2 for
ΠTr,uncPr(  )</p>
      <p>J K
1. Assume parties hold JK ∈ Z2 for  &lt; 2ℓ, ℓ &lt; .</p>
      <p>Z2 , and Z2.
2. Assume parties hold (JK, J′K, JK) from pre-processing where , ′,  are random in resp. Z2ℓ−  ,
3. Parties set JK = 2− (ℓ+1) · (JK + 2ℓJK + 2 JK + J′K) and publicly reconstruct  from JK.
4. Denote  = 2− (ℓ+1) · ˜ and note that ˜ = ( + ˜) mod 2ℓ+1 where ˜ = 2ℓ + 2  + ′ &lt; 2ℓ+1.
5. Parties compute JK = J ⊕ ˜ℓK = JK + ˜ℓ · (1 − 2JK), where ˜ℓ is the ℓ-th bit of ˜.
6. Parties output J′K = 2ℓ−  JK + ⌊(˜ mod 2ℓ)/2 ⌋ − JK.
determined by the following formula:</p>
      <p>J</p>
      <p>K
J</p>
      <p>K
value () ∈ Z2 from functionality ℱ,
Proof. The simulator receives the shares of corrupted parties in sharings JK, JK, J′K, JK and a random</p>
      <p>TruncPr(  ). On these inputs the simulator picks a random
value () in Z2−  , sets  = 2 · () + (), and simulates the reconstruction of secret-sharing
 =  + ˜ =
protocol are local given the input shares and the revealed value , so the simulator’s work is done.</p>
      <p>J
K + 2− 1JK + 2 JK + J′K into value . Note that all other computations in the
the simulation agrees with value () set by the functionality.</p>
      <p>J</p>
      <p>K</p>
      <p>Note that ΠTr,uncPr(  ) reveals only  reconstructed from secret-sharing 
 =  + ˜ mod 2 and ˜ is random in Z2 it follows that  is uniform in Z2 for every , hence the
simulation above is perfect. Note also that since the simulator sets  as 2 · () + (), output () in
J K = J
 + ˜ . Since</p>
      <p>K
world functionality ℱ,</p>
      <p>It remains to argue that distribution of ′ output by the ΠTr,uncPr(JK) is the same as in the
idealTruncPr(  ). In the ideal-world functionality, value ′ is a function of  and (),
′ =
() + 1
if () ≥ (),
otherwise.
are two cases for how () and () relate to (), , ˜ and (), ′:
We argue that ′ is determined in the same way in protocol ΠTr,uncPr(  ). Recall that for any , we
denote () = ⌊/2 ⌋ and () = ( mod 2 ), hence  = ()
· 2 + (). Correctness of the protocol
is based on the observation that if  =  + ˜ mod 2 then  =  + (2  + ′) − 2− 1 over integers,
where  =  ⊕ − 1, equivalently  = 2 (() + ( − ˜)) + (() + ′) where ˜ = (2(− 1)−  ). There
J K
() =
() =</p>
      <p>() + ( − ˜)
() + ( − ˜) + 1</p>
      <p>() + ′
() + ′ − 2</p>
      <p>(3)
(4)
(5)
Since J′K is set to ()
− J − ˜K, equation (4) implies that:
′ = ()
− ( − ˜) =
︂{</p>
      <p>However, equation (5) implies that:
() +′ &lt; 2 if and only if
() +′ ≥ 2 if and only if () = () +(′ − 2 ) &lt; ().</p>
      <p>() = () +′ ≥ ()
functionality ℱ,</p>
      <p>TruncPr(  ).</p>
      <p>J K
Equations (6) and (7) together imply the same expression of ′ according to () and () as in equation
(3). We conclude that ′ output by protocol ΠTr,uncPr(JK) is set by the same formula as in the ideal</p>
      <p>
        In the proof above, note that the randomness of the opened masked value is part of the view of the
adversary, so the rounding error is fixed in both the real and ideal worlds by the adversary’s view.
This argument can be extended to all probabilistic truncation protocols that follow the same pattern
regarding masking the input value with randomness. More specifically, the proof can be adapted for
the protocols with non-zero correctness error, including [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and [51], and for other protocols with
(6)
(7)
perfect correctness, e.g., [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
    </sec>
    <sec id="sec-10">
      <title>B. Bounded functions</title>
      <p>
        Recall that an odd function F obeys the following property F(− ) = − F() and an even function F
is such that F(− ) = F(). In this section, we generalize the approach taken by Sigma for the GeLU
function [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and apply it to any bounded odd or even function that converges to some constant  for
|| →
functions, used as activation functions (hyperbolic tangent tanh and error function erf); shifted s-shape
∞ (e.g., s-shape functions). This method can be applied to several important functions: s-shape
functions, like sigmoid; and more general functions such as GeLU and SiLU.
      </p>
      <p>
        The odd/even property guarantees that we only need to compute the function for positive values.
This efectively halves the size of the required LUT. To achieve this, we compute the sign of the input
and multiply it by its value, then get its absolute value. The convergence property ensures that, given
some acceptable error, we can clip the function for some interval [− , ] and assign a constant value 
to all elements outside that interval. We note that [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] sets  = F() but we set the constant  to be
the limit value of the function F. Outside [− , ], we claim that the error incurred by [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] is bigger for
most of the inputs but for a short interval. Intuitively, Fig. 10 shows the errors incurred by Curl’s and
Sigma’s approach to the Sigmoid function. Observe that the error following Sigma’s approach ( red
area) is bigger than the error by Curl’s approach ( green area) outside the interval [2, 2.76].
      </p>
      <p>
        1
0.94
0.88
0.5
 ()
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] has
lower error
2
function  () for  &gt; 2. Curl’s approach sets the approximation ̃︀() = 1 and Sigma’s approach sets
̃︀() =  (2) for  &gt; 2. Sigma’s approach error is lower in [2, 2.76) and Curl’s approach error is lower in
ΠF,,,,(JK)
Parameters:  is the full bit-width,  defines the interval in which the bior-LUT protocol is applied,
 is the constant value for the function outside the interval,  is the parity of F ( = 1 for odd
functions and  = 0 for even functions) and  is the compression variable for the bior-LUT protocol.
Compute sign and absolute value, Jsgn()K and J||K:
      </p>
      <p>ΠA,2B(JK) ∈ Z2
Π1B,2A(⟨⟨⟩⟩) ∈ Z2
5. J||K ← Jsgn()K · JK ∈ Z2</p>
      <p>ΠA,2B(JK) ∈ Z2</p>
      <p>Π1B,2A(⟨⟨⟩⟩) ∈ Z2
12. Output: Jsgn()K · JK ∈ Z2</p>
      <p>◁ Equivalent to JK for even functions and J− K otherwise.</p>
      <p>The ΠF,,,, (JK) protocol is formally shown in Fig. 11. Recall from Section 2.1, that ΠA,2B and ΠB,2A
represent the arithmetic to binary and binary to arithmetic conversion protocols. The protocol goes as
follows. For input value , the parties start by computing the signal of  and its absolute value. Note
that the signal value can be computed using the expression sgn() = 1 − 2 · ( &lt; 0) and the absolute
value of  can be computed as || = sgn() · . The absolute value is then used to check whether 
lies within a predetermined interval [− , ] or not. Then, parties run the secure Biorthogonal protocol
ΠBi,o,r-,LFUT(JK) to evaluate on ||, returning ′. In case  ∈ [− , ], parties output ′, otherwise they
output the limit value  of the function F. This is efectively computed by the operation  · ′ + (1 − ) · ,
where  is the comparison bit for || &lt; . In the last step, the output value is corrected in case the
function is odd: returning sgn() · . This is set by the parity parameter , which is  = 0 for even
functions and  = 1 for odd functions.</p>
      <sec id="sec-10-1">
        <title>B.1. S-shape functions</title>
        <p>In the previous section, we presented a protocol that can be applied to any bounded odd or even function
that is convergent. S-shape functions are bounded functions that converge to some value  for || → ∞
but are not necessarily odd or even. For example, we have that tanh and erf are odd functions but the
sigmoid function  () is not. However, we note that we can transform any s-shape function to an odd
function by applying a shift. For example, the sigmoid function  () can be shifted by 1/2, resulting in
an odd function:  () − 1/2.</p>
        <p>Generally, observe that, for an s-shape function F, there exists a shift value , such that F(− ) −  =
− F() +  ⇔ F(− ) = 2 ·  − F(). So, setting for  =  &lt; 0, we can change the last step (12) of
protocol ΠF,,,, (JK) to evaluate the function F:</p>
        <p>F() =  · F(−| |) + (1 − ) · F(||)
where sgn() = (1 − 2 · ). Thus, for the sigmoid function where  = 1/2, the last step (12) from
Fig. 11 is as follows: JK + Jsgn()K · JK. Note that for both tanh and erf  = 0, recovering the initial
expression.</p>
        <p>
          Parameters. Now, for each of the three s-shape functions, we need to define the following parameters:
bitwidth , the interval [− , ], the constant , the parity  and the compression value . We use the
hardcoded bitwidth value from CrypTen  = 64, imposed by the PyTorch dependency. Also, ,  = 1
for all three s-shape functions. We note that [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] clips the interval in GeLu and accepts a constant
error on the order of ∼ 10− 5. We follow a similar approach and accept errors &lt; 10− 5. For the error
function (erf), hyperbolic tangent (tanh) and sigmoid (sigmoid) we set the interval to be [
          <xref ref-type="bibr" rid="ref4">− 4, 4</xref>
          ], [
          <xref ref-type="bibr" rid="ref8">− 8, 8</xref>
          ]
and [
          <xref ref-type="bibr" rid="ref16">− 16, 16</xref>
          ], respectively. These intervals incurr a maximum error on the order of 10− 8, 10− 7 and
10− 7, and the size of the original LUT is 218, 219 and 220, respectively.
        </p>
      </sec>
      <sec id="sec-10-2">
        <title>B.2. Activation Functions: GeLU And SiLU</title>
        <p>In this section, we consider two popular activation functions: GeLU [60] and SiLU [75], given by
 (︂ ︂(  )︂</p>
        <p>GeLU() := 2 1 + erf √2 ,
where erf(· ) denotes the Gauss error function and</p>
        <p>SiLU() :=  · sigmoid().</p>
        <p>These activation functions are sometimes considered smooth approximations to the classical ReLU of
[59]. We observe that the functions DGeLU := ReLU() − GeLU() and DSiLU := ReLU() − SiLU()
are even functions. Regarding DGeLU, note that
= DSiLU(− ).</p>
        <p>Since these two activation functions are also bounded and converge to 0 for || → ∞, we can use
ΠF,,,, protocol (Fig. 11) to compute both GeLU and SiLU by evaluating: GeLU() = ReLU() −
ΠD,Ge,LU,, () and SiLU() = ReLU() − ΠD,SiL,U,, ().</p>
        <p>Note that ReLU can be computed using ReLU() = ( &lt; 0) · . Since we compute  &lt; 0 in step 3.
of the ΠF,,,, protocol (Fig. 11), we can reuse this value.</p>
        <p>
          Parameters. Again, we use the hardcoded bitwidth value from CrypTen  = 64 and ,  = 0. As noted
before, [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] clips the interval in GeLU and accepts a constant error on the order of ∼ 10− 5. We follow
a similar approach and accept errors &lt; 10− 5. For GeLU and SiLU we set the interval to be [
          <xref ref-type="bibr" rid="ref4">− 4, 4</xref>
          ] and
[
          <xref ref-type="bibr" rid="ref16">− 16, 16</xref>
          ], respectively. These intervals incur a maximum error on the order of 10− 8 and 10− 7, and the
size of the original LUT is 218 and 220, respectively.
        </p>
        <p>Πsin,,(JK)</p>
        <p>ΠA,2B(JK)
5. J||K ← Jsgn()K · JK</p>
        <sec id="sec-10-2-1">
          <title>Compute sin:</title>
          <p>Parameters:  is the full bit-width,  is the number of bits used for the fractional part, and  = 0 for
even functions) and  is the compression variable for the Biorthogonal protocol.</p>
          <p>Compute sign and absolute value, Jsgn()K and J||K:
6. Use encoding of 2 ,  := ⌊2 · 2 ⌋, and compute J|′|K ←
Πdiv(J||K, ).
8. Output: Jsgn()K · JK ∈ Z2
◁ For cos it returns JK.</p>
        </sec>
      </sec>
      <sec id="sec-10-3">
        <title>B.3. Trigonometric functions</title>
        <p>
          It is known that sin is an odd function and cos is an even function. However, these two functions do not
converge, making protocol ΠF,,,, (Fig. 11) unsuitable. Despite this, we can still use the core concepts
from that protocol and adapt them to these trigonometric functions. The key insight is that both
functions are periodic with a period of 2 . We leverage this periodicity by scaling these trigonometric
functions to have period 1. This is achieved by generating LUT for sin(2 · ) and cos(2 ), where
 ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]. Then, for input , we evaluate the LUT using the secure Biorthogonal protocol at point
/(2 ). We execute division (Πdiv) based on the Escudero et al. [52, §5.1] probabilistic truncation
protocol over rings. The protocol for sin is formally described in Fig. 12. The cos protocol is similar but
does not require the sign correction in the last step.
        </p>
      </sec>
      <sec id="sec-10-4">
        <title>B.4. The Softmax Function</title>
        <p>Recall that softmax is given by the following expression:
 =</p>
        <p>
          − max()
∑︀=−01 − max()
,
where  ∈ R and max is the maximum operation over the vector . To be able to compute the
expression above, we need three main functions: maximum, natural exponentiation, and reciprocal.
The maximum operation is computed using the protocol implemented in CrypTen (Section C.1.4 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]).
We note that both CrypTen adopts a tree-reduction algorithm but also provides a pairwise method with
constant round complexity.
        </p>
        <p>
          Reciprocal. From the softmax expression, we have that there exists at least one index  such that
− max() = 1. Also, since there are at most  terms, we have that the reciprocal function can be only
evaluated within the [1, ] interval. We set  = 64 as per in GPT-2 model [76]. Then, we apply the
secure ΠBi,o,r-,LFUT to the interval [
          <xref ref-type="bibr" rid="ref1">1, 64</xref>
          ].
        </p>
        <p>log
reciprocal</p>
        <p>sqrt
invsqrt</p>
        <sec id="sec-10-4-1">
          <title>Haar</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>C. Truncation with Haar vs Biorthogonal</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Achiam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Adler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Akkaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. L.</given-names>
            <surname>Aleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Altenschmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Altman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Anadkat</surname>
          </string-name>
          , et al.,
          <source>GPT-4 technical report, arXiv preprint arXiv:2303.08774</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
          </string-name>
          , et al.,
          <article-title>LLaMA: Open and eficient foundation language models</article-title>
          ,
          <source>arXiv preprint arXiv:2302.13971</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          , et al.,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Durrett, News summarization and evaluation in the era of GPT-3</article-title>
          , arXiv preprint arXiv:
          <volume>2209</volume>
          .12356 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nakano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Balaji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kosaraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Saunders</surname>
          </string-name>
          , et al.,
          <article-title>Webgpt: Browser-assisted question-answering with human feedback</article-title>
          ,
          <source>arXiv preprint arXiv:2112.09332</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Peris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dupuy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Majmudar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Parikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Smaili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zemel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>Privacy in the time of language models</article-title>
          ,
          <source>in: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1291</fpage>
          -
          <lpage>1292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Working memory capacity of ChatGPT: An empirical study</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>38</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>10048</fpage>
          -
          <lpage>10056</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>O.</given-names>
            <surname>Goldreich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Micali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wigderson</surname>
          </string-name>
          ,
          <article-title>How to play any mental game or A completeness theorem for protocols with honest majority</article-title>
          , in: A.
          <string-name>
            <surname>Aho</surname>
          </string-name>
          (Ed.),
          <source>19th Annual ACM Symposium on Theory of Computing</source>
          , ACM Press, New York City, NY, USA,
          <year>1987</year>
          , pp.
          <fpage>218</fpage>
          -
          <lpage>229</lpage>
          . doi:
          <volume>10</volume>
          .1145/28395.28420.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>A. C.-C. Yao</surname>
          </string-name>
          ,
          <article-title>Protocols for secure computations (extended abstract)</article-title>
          ,
          <source>in: 23rd Annual Symposium on Foundations of Computer Science</source>
          , IEEE Computer Society Press, Chicago, Illinois,
          <year>1982</year>
          , pp.
          <fpage>160</fpage>
          -
          <lpage>164</lpage>
          . doi:
          <volume>10</volume>
          .1109/SFCS.
          <year>1982</year>
          .
          <volume>38</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>A. P. K. Dalskov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Escudero</surname>
          </string-name>
          , M. Keller,
          <article-title>Secure evaluation of quantized neural networks</article-title>
          ,
          <source>Proceedings on Privacy Enhancing Technologies</source>
          <year>2020</year>
          (
          <year>2020</year>
          )
          <fpage>355</fpage>
          -
          <lpage>375</lpage>
          . doi:
          <volume>10</volume>
          .2478/popets-2020-0077.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Knott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Venkataraman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hannun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sengupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ibrahim</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. van der Maaten</surname>
          </string-name>
          ,
          <article-title>Crypten: Secure multi-party computation meets machine learning</article-title>
          , in: M.
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Beygelzimer</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Dauphin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>J. W.</given-names>
          </string-name>
          <string-name>
            <surname>Vaughan</surname>
          </string-name>
          (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>34</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2021</year>
          , pp.
          <fpage>4961</fpage>
          -
          <lpage>4973</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Koti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pancholi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Suresh</surname>
          </string-name>
          , SWIFT:
          <article-title>Super-fast and robust privacy-preserving machine learning</article-title>
          , in: M.
          <string-name>
            <surname>Bailey</surname>
          </string-name>
          , R. Greenstadt (Eds.),
          <source>USENIX Security</source>
          <year>2021</year>
          :
          <article-title>30th USENIX Security Symposium</article-title>
          , USENIX Association,
          <year>2021</year>
          , pp.
          <fpage>2651</fpage>
          -
          <lpage>2668</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          , W. jie Lu,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <article-title>Cheetah: Lean and fast secure two-party deep neural network inference, in:</article-title>
          <string-name>
            <surname>K. R. B. Butler</surname>
          </string-name>
          , K. Thomas (Eds.),
          <source>USENIX Security</source>
          <year>2022</year>
          :
          <article-title>31st USENIX Security Symposium</article-title>
          , USENIX Association, Boston, MA, USA,
          <year>2022</year>
          , pp.
          <fpage>809</fpage>
          -
          <lpage>826</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jawalkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chandran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>SIGMA: Secure GPT Inference with Function Secret Sharing</article-title>
          ,
          <source>Proceedings on Privacy Enhancing Technologies</source>
          <year>2024</year>
          (
          <year>2024</year>
          )
          <fpage>61</fpage>
          -
          <lpage>79</lpage>
          . doi:
          <volume>10</volume>
          .56553/popets-2024-0107.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mohassel</surname>
          </string-name>
          , P. Rindal,
          <article-title>ABY3: A mixed protocol framework for machine learning</article-title>
          , in: D.
          <string-name>
            <surname>Lie</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Mannan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Backes</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          Wang (Eds.),
          <source>ACM CCS 2018: 25th Conference on Computer and Communications Security</source>
          , ACM Press, Toronto, ON, Canada,
          <year>2018</year>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>52</lpage>
          . doi:
          <volume>10</volume>
          .1145/ 3243734.3243760.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Knott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. J. Wu,</surname>
          </string-name>
          <article-title>CryptGPU: Fast privacy-preserving machine learning on the GPU</article-title>
          ,
          <source>in: 2021 IEEE Symposium on Security and Privacy</source>
          , IEEE Computer Society Press, San Francisco, CA, USA,
          <year>2021</year>
          , pp.
          <fpage>1021</fpage>
          -
          <lpage>1038</lpage>
          . doi:
          <volume>10</volume>
          .1109/SP40001.
          <year>2021</year>
          .
          <volume>00098</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Keller</surname>
          </string-name>
          , K. Sun,
          <article-title>Secure quantized training for deep learning</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>10912</fpage>
          -
          <lpage>10938</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>J.-L. Watson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Wagh</surname>
            ,
            <given-names>R. A.</given-names>
          </string-name>
          <string-name>
            <surname>Popa</surname>
          </string-name>
          ,
          <article-title>Piranha: A GPU platform for secure computation</article-title>
          , in: K. R. B.
          <string-name>
            <surname>Butler</surname>
          </string-name>
          , K. Thomas (Eds.),
          <source>USENIX Security</source>
          <year>2022</year>
          :
          <article-title>31st USENIX Security Symposium</article-title>
          , USENIX Association, Boston, MA, USA,
          <year>2022</year>
          , pp.
          <fpage>827</fpage>
          -
          <lpage>844</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jawalkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Basu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chandran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>Orca: FSS-based secure training with GPUs</article-title>
          ,
          <source>Cryptology ePrint Archive, Report 2023/206</source>
          ,
          <year>2023</year>
          . https://eprint.iacr.org/
          <year>2023</year>
          /206.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>I.</given-names>
            <surname>Damgård</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Pastro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. P.</given-names>
            <surname>Smart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zakarias</surname>
          </string-name>
          ,
          <article-title>Multiparty computation from somewhat homomorphic encryption</article-title>
          , in: R.
          <string-name>
            <surname>Safavi-Naini</surname>
          </string-name>
          , R. Canetti (Eds.),
          <source>Advances in Cryptology - CRYPTO</source>
          <year>2012</year>
          , volume
          <volume>7417</volume>
          of Lecture Notes in Computer Science, Springer, Heidelberg, Germany, Santa Barbara, CA, USA,
          <year>2012</year>
          , pp.
          <fpage>643</fpage>
          -
          <lpage>662</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -32009-5_
          <fpage>38</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>D.</given-names>
            <surname>Escudero</surname>
          </string-name>
          ,
          <article-title>An introduction to secret-sharing-based secure multiparty computation</article-title>
          ,
          <source>Cryptology ePrint Archive, Report 2022/062</source>
          ,
          <year>2022</year>
          . https://eprint.iacr.org/
          <year>2022</year>
          /062.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wagh</surname>
          </string-name>
          , Pika:
          <article-title>Secure computation using function secret sharing over rings</article-title>
          ,
          <source>Proceedings on Privacy Enhancing Technologies</source>
          <year>2022</year>
          (
          <year>2022</year>
          )
          <fpage>351</fpage>
          -
          <lpage>377</lpage>
          . doi:
          <volume>10</volume>
          .56553/popets-2022-0113.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>O.</given-names>
            <surname>Catrina</surname>
          </string-name>
          , S. de Hoogh,
          <article-title>Improved primitives for secure multiparty integer computation</article-title>
          , in: J. A.
          <string-name>
            <surname>Garay</surname>
          </string-name>
          , R. D. Prisco (Eds.),
          <source>SCN 10: 7th International Conference on Security in Communication Networks</source>
          , volume
          <volume>6280</volume>
          of Lecture Notes in Computer Science, Springer, Heidelberg, Germany, Amalfi, Italy,
          <year>2010</year>
          , pp.
          <fpage>182</fpage>
          -
          <lpage>199</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -15317-4_
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>I.</given-names>
            <surname>Damgård</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fitzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kiltz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Nielsen</surname>
          </string-name>
          , T. Toft,
          <article-title>Unconditionally secure constant-rounds multiparty computation for equality, comparison, bits and exponentiation</article-title>
          , in: S. Halevi, T. Rabin (Eds.),
          <source>TCC 2006: 3rd Theory of Cryptography Conference</source>
          , volume
          <volume>3876</volume>
          of Lecture Notes in Computer Science, Springer, Heidelberg, Germany, New York, NY, USA,
          <year>2006</year>
          , pp.
          <fpage>285</fpage>
          -
          <lpage>304</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 11681878_
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Xing</surname>
          </string-name>
          , G. Xu,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Iron:
          <article-title>Private inference on transformers</article-title>
          , in: S. Koyejo,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Belgrave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Oh (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>35</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2022</year>
          , pp.
          <fpage>15718</fpage>
          -
          <lpage>15731</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. P.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <surname>H. Zhang,</surname>
          </string-name>
          <article-title>MPCFormer: Fast, performant and private transformer inference with MPC</article-title>
          ,
          <source>in: International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Möllering</surname>
          </string-name>
          , W. Zheng,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <article-title>Bolt: Privacy-preserving, accurate and eficient inference for transformers</article-title>
          ,
          <source>in: 2024 IEEE Symposium on Security and Privacy (SP)</source>
          ,
          <source>IEEE Computer Society</source>
          , Los Alamitos, CA, USA,
          <year>2024</year>
          , pp.
          <fpage>133</fpage>
          -
          <lpage>133</lpage>
          . doi:
          <volume>10</volume>
          .1109/SP54263.
          <year>2024</year>
          .
          <volume>00130</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>