=Paper= {{Paper |id=Vol-1558/paper33 |storemode=property |title=Using Robust Estimation Theory to Design Efficient Secure Multiparty Linear Regression |pdfUrl=https://ceur-ws.org/Vol-1558/paper33.pdf |volume=Vol-1558 |authors=Fida K. Dankar,Sabri Boughorbel,Radja Badji |dblpUrl=https://dblp.org/rec/conf/edbt/DankarBB16 }} ==Using Robust Estimation Theory to Design Efficient Secure Multiparty Linear Regression== https://ceur-ws.org/Vol-1558/paper33.pdf
         Using Robust Estimation Theory to Design Efficient
                Secure Multiparty Linear Regression
             Fida K. Dankar                                      Sabri Boughorbel                                 Radja Badji
 Sidra medical and Research Center                     Sidra medical and Research Center           Sidra medical and Research Center
            Doha, Qatar                                           Doha, Qatar                                 Doha, Qatar
           fdankar@sidra.org                                sboughorbel@sidra.org                             rbadji@sidra.org



ABSTRACT                                                                   These secure techniques enable data holders to compute a
Various biomedical research studies, such as large-population              function over their data while keeping these data private. Only the
studies and studies on rare diseases, require sharing of data across       final result of the computation is revealed to all parties.
multiple sources or institutions. In fact, data sharing will enable        Although secure computation protocols exist for multiple data
the collection of more cases for analysis and thus increase the            mining/ statistical functions, these protocols are mostly
statistical power of the study. However, combining data from               inappropriate for real world application due to their computational
various sources poses privacy risks. A number of protocols have            overhead. Communication cost is the main factor driving this
been proposed in the literature to address the privacy concerns;           inefficiency (extensive message passing between the different
but these protocols do not fully deliver either on privacy or              concerned parties is the main bottleneck of existing secure
complexity. The main reason lies in the methodology used to                computations). The main reason lies in the methodology used to
design these secure algorithms. It is based on translating regular         design these secure algorithms. It is based on translating regular
algorithms into secure versions using cryptographic procedures             algorithms into secure versions using cryptographic procedures
and tricks rather than on establishing robust theory for designing         and tricks rather than on establishing robust theory for designing
secure and communication free distributed algorithms. In this              secure and communication free- distributed algorithms.
paper, we use well-established theoretical results to design a
secure and low communication linear regression protocol. The               There is considerable research devoted to designing distributed
method used is comprehensive and can be generalized to other               algorithms that are communication free. In such algorithms, data
estimators.                                                                is partitioned into multiple subsets, subsets are stored on different
                                                                           machines, processes run on the different subsets simultaneously
CCS Concepts                                                               and the final outcome is calculated as a function of the individual
• Security and Privacy➝Cryptography • Security and                         ones, thus limiting communication to the final step. In this paper,
Privacy➝Security services➝Privacy-preserving protocols.                    we import some of this theory into the design of a secure and low
                                                                           communication linear regression protocol. The method used is
Keywords                                                                   comprehensive and can be generalized to other estimators
Data sharing; secure multiparty computation; linear regression;            (geometric median, principal component analysis, etc). The theory
information privacy.                                                       is described in the next section followed by a description of the
                                                                           secure protocol. The paper concludes with a discussion of the
1.   INTRODUCTION                                                          limitations and future directions.
To advance large-scale biomedical studies, and to strengthen the
statistical power of research studies (such as complex                     2.   THEORY
associations), researchers frequently need to share data with              We introduce the classical setting of a linear regression problem.
colleagues from around the world. Such data sharing activities are         Let 𝑋 = {𝑥%,' } be an 𝑛×𝑝 matrix of features and 𝑌 = (𝑦/ , … , 𝑦1 )3
contingent on the protection of patient anonymity. Traditionally,          a corresponding 𝑛×1 response vector, where 𝑛 is the number of
researchers would strip the data from its identifying information -        samples and 𝑝 is the number of features. The linear regression
such as names and unique IDs- before sharing it with each other.           problem consists of solving the linear model, 𝑌 = 𝑋𝛽 + 𝜖 [4]. To
However, as many recent studies have shown [1]–[3], it is now              improve the computational efficiency of the linear regression
possible to deduce the identity of research participants from              problem, Wang et al [5] designed a distributed linear regression
genomic and/or clinical data that was considered anonymized.               algorithm. The algorithm, referred to as message, partitions the
                                                                           dataset into multiple subsets and processes these subsets in
In the face of these growing privacy concerns, researchers are
                                                                           parallel. Formally, the data (𝑋, 𝑌) is divided horizontally into 𝑘
looking more and more at cryptographic secure computations
(also known as secure multiparty computations).                            subsets (𝑋 / , 𝑌/ ; … ; (𝑋 : , 𝑌 : )}, with 𝑋 % = (𝑥/% , … , 𝑥;% ) the 𝑛% ×𝑝
                                                                           feature matrix for subset 𝑖 and 𝑌 % = (𝑦/% , … , 𝑦1% = )3 the
                                                                           corresponding      𝑛% ×1 response vector. The algorithm then
                                                                           executes the following two steps:
  (c) 2016, Copyright is with the authors. Published in the Workshop
 Proceedings of the EDBT/ICDT 2016 Joint Conference (March 15,                  1.   In the first step, the feature selection vector for each
 2016, Bordeaux, France) on CEUR-WS.org (ISSN 1613-0073).                            subset is calculated separately using the Lasso algorithm
 Distribution of this paper is permitted under the terms of the Creative
                                                                                     [6]. After that, the overall model is obtained by
 Commons license CC-by-nc-nd 4.0
                                                                                     including the features selected by the majority of the
                                                                                     subsets.     Thus, if 𝛾 % = {𝛾/% , … , 𝛾;% } is the feature
          selection vector for site 𝑖, (with 𝛾'% = 1 if feature 𝑗 is      3.2   Protocol
          included and 0 otherwise), then the overall vector is           The main algorithm is described in Algorithm 1; it can be roughly
          obtained by:                                                    divided into 4 steps:
          𝛾 = {𝑚𝑜𝑑𝑒 𝛾// , … , 𝛾/: , … , 𝑚𝑜𝑑𝑒 𝛾:/ , … , 𝛾:: }
                                                                              1.   The third party generates the keys for the Paillier
     2.   In the second step, the coefficients of the selected
                                                                                   cryptosystem and propagates the public key to all
          features are estimated separately for each subset, and
                                                                                   parties
          the final result is obtained by averaging the coefficient
                                                                              2.   Each party calculates its local feature selection vector
          estimates of each feature. Thus, if 𝛽 % = {𝛽/% , … , 𝛽;% } is            via Lasso, the overall feature selection vector is then
          the feature coefficients vector for site 𝑖, then the overall             calculated via a secure mode protocol (Algorithm 2).
          vector is obtained by:                                                   The secure mode protocol retains all features that have
          𝛾 = {𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝛽// , … , 𝛽/: , … , 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝛽:/ , … , 𝛽:: }                 an overall inclusion probability greater than ½. The
                                                                                   calculated mode vector is then propagated to the
The message algorithm fairs well not only in terms of
                                                                                   different parties without leaking any information about
computational time, but also in terms of feature selection
                                                                                   individual feature selection vectors.
performance. In fact, the authors prove that under certain
assumptions (that will be detailed in the discussion section) the
feature inclusion vector converges exponentially to the optimal                  Initialization
model [5]. A result that is better than applying Lasso to the whole              1: Input
dataset.
                                                                                  𝑛, 𝑝, 𝑘, {𝑆/ , … , 𝑆: }, {(𝑋 / , 𝑌/ ); … ; (𝑋 : , 𝑌 : )}, {𝑛/ , … , 𝑛: } #
Privacy is not a concern in the message algorithm. In fact, sharing              𝑛 is the total sample size, 𝑝 is the number of features, 𝑘 is
of intermediate and local results between the different subsets is
                                                                                 the number of sites, {𝑆/ , … , 𝑆: } are the different sites,
supported in the algorithm. Prior work, [7], demonstrated through
multiple scenarios, how identifying information can be inferred                  𝑋 % , 𝑌 % represent the data set held by site 𝑖, and 𝑛% is the
from local and intermediate statistics using various different                   sample size at site 𝑖.
attacks.
This paper uses the theory behind the message algorithm to                       Third party
formulate an efficient and private linear regression algorithm. The
proposed algorithm performs efficiently with regards to feature                  2: Propagates a public encryption key to all sites
selection and model calculation under certain assumptions. The
                                                                                 Each party calculates
algorithm is presented in the next section and the assumptions are
discussed in Section 5. The paper concludes with a discussion of
future work.
                                                                                 3: 𝛾 % ∈ {0,1}; # the feature inclusion model via lasso
3.   SECURE PROTOCOL                                                             All parties calculate
We assume that a dataset (𝑋, 𝑌) is horizontally distributed among
𝑘 ≥ 2 data holders (or sites) 𝑆/ , … , 𝑆: . The different sites are              4: 𝛾 = 𝑚𝑜𝑑𝑒(𝛾 % , 𝑖 ∈ {1, … , 𝑘}) via Secure mode protocol
interested in cooperatively performing linear regression on the                  Each party calculates
union of their datasets, however they are not willing to share their
                                                                                                  3          3
data. Only the final result of the computation should be revealed                5: 𝛽% = (𝑋 % 𝑋 % )W/ 𝑋 % 𝑌 % # the estimated coefficients for
to all parties. The secure algorithm is depicted in Algorithms1 and              features 𝛾
2, it requires a semi-trusted third party whose role is the
generation of keys for a public encryption cryptosystem as well as               Each party calculates
the decryption of some results.                                                              X=
                                                                                 6: 𝐸𝑛𝑐( )
                                                                                             :
Paillier cryptosystem [8] will be used due to its nice homomorphic
properties. These properties will be introduced in the next                      All parties calculate
subsection followed by a presentation of the secure protocol.
                                                                                                                 X=                   X=
                                                                                 7: 𝐸𝑛𝑐(𝛽) = 𝐸𝑛𝑐 Y∑:%[/ \ = ∏:%[/ 𝐸𝑛𝑐( )
                                                                                                                 :                     :
3.1   Paillier Public Cryptosystem                                               #Calculation is done sequentially: party 1 calculates
In public key cryptosystems, The ciphertext (the encryption) 𝑐 of
                                                                                       X^
a message 𝑚 (usually an integer) is obtained by applying an                      𝐸𝑛𝑐( ) and sends it to party 2, party 2 calculates
                                                                                        :
encryption function: 𝑐 = 𝐸𝑛𝑐;: (𝑚), where 𝑝𝑘 is the public                              X^             XT
                                                                                 𝐸𝑛𝑐 Y \ ∗ 𝐸𝑛𝑐( ) and sends it to party 3, and so on …
encryption key. The ciphertext 𝑐 can be decrypted using another                          :             :
key 𝑠𝑘 (referred to as the secret key) and a decryption function:                8: 𝐸𝑛𝑐(𝛽) is sent to the third party
	
  𝑚 = 𝐷𝑒𝑐Q: (𝑐). Paillier public cryptosystem is additively
homomorphic, as such the sum of two messages can be obtained
from their respective cyphertexts. For Paillier, this translates to
                                                                                 Third party
𝐸𝑛𝑐;: 𝑚/ + 𝑚R = 𝐸𝑛𝑐;: 𝑚/ ×𝐸𝑛𝑐;: 𝑚R               [8]. Moreover,
Paillier allows a limited form of homomorphic multiplication, in                 9: Decrypts and propagates 𝛽 	
  
that we can multiply an encrypted message by a plaintext. It is
done as follows: 𝐸𝑛𝑐(𝑚/ )ST = 𝐸𝑛𝑐(𝑚/ 𝑚R ).                                                            Algorithm 1. Main algorithm
     3.   After receiving the overall feature selection vector, each             :    %
                                                                       If 𝜕 =    %[/ 𝛾 is the sum of the feature inclusion vectors of the
          party calculates the coefficients of the selected features                                     :
                                                                       different parties, then 𝑤 = 𝜕 − { }; would indicate whether the
          separately. The encrypted average of these features is                                         R
          then securely computed using Paillier encryption as          feature should be included or not through its sign: if w is positive
                                                            X=         at position 𝑗 (𝑤 𝑗 > 0) this implies that the majority of sites had
          follows: each site 𝑖 calculates 𝐸𝑛𝑐( ), then all sites
                                                            S
                                   X=                       X=         1 at the 	
  𝑗th position in their feature inclusion model, and thus the
          calculate 𝐸𝑛𝑐 :%[/        = :%[/ 𝐸𝑛𝑐( ) sequentially.
                                :                   :                  overall feature inclusion vector should be set as 𝛾 𝑗 = 1,
          The encrypted result is sent to the third party.             otherwise it is set to 0. Thus, in order to calculate the feature
     4.   The third party decrypts and propagates the estimated
                                                                       inclusion vector, it is enough to securely calculate 𝑤. However, as
          feature coefficients.
                                                                       𝑤 presents aggregated information about the sites (the number of
The secure mode protocol is described in Algorithm 2, it               sites that include every feature), our secure protocol will instead
computes the mode of the 𝑘 feature inclusion vectors without           calculate an obfuscated version: 𝑤𝑥, where 𝑥 is a positive integer
revealing any information about the different sites (in other words,   whose factors are distributed among the different parties: 𝑥 =
the inclusion information for the sites is kept confidential, only        :
                                                                          %[/ 𝑥% , where 𝑥% is held by site 𝑖	
   and is kept confidential. As 𝑥 is
the mode is revealed to the different parties).                        positive, 𝑤𝑥 would still indicate whether the feature should be
                                                                       included or not through its sign. The protocol is explained in
                                                                       detail in Algorithm2.
   Initialization
            /
   1: 𝜃 = { };
                                                                       3.3   Protocol Modification
            R                                                          It is important to note that the algorithm can be slightly changed
                                                                       to operate without the need of a third party. In such case, a
                                                                       threshold Paillier cryptosystem can replace the third party [9]–
   Each party 𝒊                                                        [11]. In a threshold cryptosystem, the secret decryption key is
   2: Generates a random positive integer 𝑥% # denote by 𝑥 =           distributed among a preset number, 𝑡, of entities (the data holders
                                                                       in our case). For the decryption to occur, each of the 𝑡 entities has
   ∏:%[/ 𝑥%
                                                                       to perform its share of the decryption. The decryption shares are
   Each party calculates                                               then combined to obtain the final result. This way, the protocol
                                                                       will be robust against the corruption of at most 𝑡 − 1 entities [12]
   3: 𝑤 % = (𝛾 % − 𝜃)                                                  (note that the current version works under the assumptions that all
   4: 𝐸𝑛𝑐k𝑤 % l                                                        sites are non-corruptible, because if one party is corrupted, then
                                                                       they can communicate with the third party to decrypt intermediate
                                                                       results).
   All parties calculate                                               The threshold Paillier cryptosystem can be set up through a
                                                                       trusted party that will generate and distribute the public and secret
   5: 𝐸𝑛𝑐(𝑤) = ∑:%[/ 𝐸𝑛𝑐(𝑤 % )          # note if 𝑤 is positive at
                                                                       keys. The trusted party can then erase all information pertaining to
   position 𝑗 (𝑤[𝑗] > 0) this implies that the majority of sites       the key generation. If no such trusted party is available, the keys
   had 1 at the 	
  𝑗 position in their feature inclusion model        can be generated using secure multiparty computations [13].
                                                                       Although this requires more computation overhead from each data
   Sequentially the parties calculate
                                                                       owner, it only has to be done once. As such, it is an acceptable
   6: 𝐸𝑛𝑐(𝑤𝑥) = {𝐸𝑛𝑐(𝑤[1]) o , … , 𝐸𝑛𝑐(𝑤[𝑝]) o } # party1              tradeoff.
   calculates 𝐸𝑛𝑐(𝑤𝑥/ ) = {𝐸𝑛𝑐(𝑤[1])o^ , … , 𝐸𝑛𝑐(𝑤[𝑝]) o^ }
   and sends it to party2, party 2 calculates 𝐸𝑛𝑐(𝑤𝑥/ 𝑥R ) =
                                                                       4.   COMPLEXITY
                                                                       In this section, we evaluate the additional computational burden
   {𝐸𝑛𝑐(𝑤𝑥/ )oT , … , 𝐸𝑛𝑐(𝑤𝑥/ )oT } and so on.                         of the algorithm incurred to achieve security. This burden is
   7: 𝐸𝑛𝑐(𝑤𝑥) is sent to third party                                   evaluated on each participating party. The complexity will be
                                                                       expressed in terms of the number of messages and in terms of
                                                                       basic functional units. These units are homomorphic
   Third party                                                         multiplication (HM) and homomorphic addition (HA). Assuming
                                                                       an instance of Paillier modulus 𝑚R [8], HA is equivalent to
   8: Decrypts 𝐸𝑛𝑐(𝑤𝑥)                                                 multiplying two integers modulo 𝑚R , and HM is equivalent to
   9: Propagates 𝑤𝑥                                                    computing an exponentiation modulo 𝑚R where the exponent is at
                                                                       most 𝑚. As such, an HM operation is equivalent to log	
  (𝑚) HA.
   Each site                                                           With these considerations, it follows that a message encryption is
                                  γ[j] = 1	
  if	
  W[j] > 0           dominated by 2HM, while a decryption is dominated by HM.
   10: Calculate γ as follows q                              	
  
                                  γ[j] = 0	
  otherwise                The algorithm performs 2 encryptions per party (step 6 of main
                                                                       algorithm and step 4 of secure protocol), and one homomorphic
                                                                       multiplication per party (step 6 in secure protocol). This amount
                                                                       to a total of 5 HM per party. On the other hand, the number of
                Algorithm 2. Secure mode protocol
                                                                       message passing per party is constant, amounting to 5 to 7
                                                                       messages.
Prior secure linear regression algorithms bear much higher
computational burdens on the participants [7], [12], [14]–[16].                 Table 1. Choices of sample sizes (𝒏) and number of sites (𝒌) as
                                           ;T       ;|
The best algorithm [12] requires 𝑂(𝑘            +        ) messages per party                               in [5].
                                           R        }
           ;~
and 𝑂( ) HM operations per party.                                                                            Number of        Number of sites
           •                                                                          Experimental data
                                                                                                             samples (𝑛)          (𝑘)
5.   DISCUSSION
                                                                                          Synthetic          2000-10000           50,200
Algorithm consistency for regression or classification is a well-
studied property in the field of statistical learning. It gives a                     household electric
theoretical guarantee that an algorithm converges to a optimal                             power         75259, 2 millions          200
algorithm when the number of training samples becomes very                              consumption
large. In practical situations, the number of samples is not that
large and therefore the (theoretical) consistency property is not of                       HIGGS
critical importance, what’s more critical is the validation of the                      classification       11 millions           1000
proposed algorithms on multiple diverse datasets as it provides
confidence in their performances on real datasets and under
different settings.                                                             In summary, the previous experimental results showed that a ratio
                                                                                between the number of samples and sites of about 50 might ensure
In general, both consistency and good performance on
                                                                                the nice consistency property.
experimental datasets are well-desired properties for regression
algorithms. The algorithm message (that was used to derive our                  6.   FUTURE WORK
algorithm) was tried in various experiments using multiple                      As future work, we plan to investigate the optimal choice of the
datasets in [5], and it showed good consistency and good                        number of sites as a function of the number of samples to
performance. Nonetheless, in what follows, we discuss the                       guarantee the trade-off between accurate regression and efficient
required assumptions to ensure the theoretical property of                      computation. The consistency assumption gives an inequality
consistency. There are four assumptions for the consistency of the              condition between the number of sites and samples. We plan to
selection algorithm [5]:                                                        look deeper into this relation and validate the theoretical findings
     •          A.1 Consistency condition for estimation: This                  using experimental data.
                condition is usually required for high dimensional              We also plan to test our algorithm on real data sets and compare it
                model regression.                                               (i) to existing secure computation protocols, and (ii) to the
     •          A.2 Conditions on model parameters. It imposes a                ordinary Lasso technique (applied on the combined dataset) in
                restriction on model noise, parameter estimate and the          terms of both accuracy and performance.
                number of selected features.
     •          A.3 (Lasso) The strong irrepresentable condition.               Additionally, we plan to extend our method to other estimators
     •          A.4 The sparse Riesz condition.                                 such as principal component analysis. This may require designing
                                                                                new secure protocols for dealing with intermediate statistical
Where A.2 is a basic assumption that is required for high-                      calculations among the different sites, analogous to the presented
dimensional model estimation [17], [18], A.3 and A.4 are the                    secure mode protocol.
specific conditions for model selection consistency in Lasso.
In [5] a theoretical analysis of assumptions A.1, A.3 and A.4 is
proposed. This analysis indicates that it is possible to ensure the             7.   REFERENCES
validity of these three assumptions under the assumption that 𝑛 ≥               [1]    Y. Erlich and A. Narayanan, “Routes for breaching and
𝑘 ∗ (𝐴 + 𝑘𝐶) where 𝑛 is the total number of samples; 𝑘 is the                          protecting genetic privacy,” Nat. Rev. Genet., vol. 15, no.
number of sites and the number of samples per site is assumed to                       6, pp. 409–421, 2014.
                1
be equal (𝑛% = ). 𝐴 and 𝐵 are complex terms that depend on the                  [2]    M. Naveed, E. Ayday, E. W. Clayton, J. Fellay, C. A.
                :
training data and model parameters. The previous equation                              Gunter, J.-P. Hubaux, B. A. Malin, and X. Wang, “Privacy
guarantees that the three assumptions are valid in probability                         in the genomic era,” ACM Comput. Surv. CSUR, vol. 48,
provided that 𝑘 and 𝑛 are chosen such that 𝑘 = 𝑂(𝑛).                                   no. 1, p. 6, 2015.

The experimental results in the paper that introduced message                   [3]    E. Check Hayden, “Researchers wrestle with a privacy
algorithm are obtained for both synthetic data and real-world data                     problem,” Nature, vol. 525, no. 7570, pp. 440–442, Sep.
                                                                                       2015.
[5]. For the synthetic data, the chosen values for 𝑘 is 200 and the
sample size 𝑛 is varied between 2000 and 10000. The number of                   [4]    D. C. Montgomery, E. A. Peck, and G. G. Vining,
features 𝑝 is 10000, 𝑘 =50 and 𝑛 varied between 2000 and 10000.                        Introduction to linear regression analysis, vol. 821. John
For the real-world data (household electric power consumption                          Wiley & Sons, 2012.
dataset), the sample size is defined as 𝑛 =2 million for training               [5]    X. Wang, P. Peng, and D. B. Dunson, “Median Selection
and 𝑛 =75259 for testing. The number of sites is 𝑘 = 200. For the                      Subset Aggregation for Parallel Inference,” in Advances in
second real-world data (HIGGS classification dataset), 𝑛 =11                           Neural Information Processing Systems, 2014, pp. 2195–
million and 𝑘 =1000. For the previously described experimental                         2203.
settings, the median selection algorithm showed very good results.
Table 1 summarizes the values of 𝑛 and 𝑘 chosen in [5].                         [6]    R. Tibshirani, “Regression shrinkage and selection via the
                                                                                       lasso,” J. R. Stat. Soc. Ser. B Methodol., pp. 267–288,
                                                                                       1996.
[7]    K. El Emam, S. Samet, L. Arbuckle, R. Tamblyn, C. Earle,      [12]   F. Dankar, “Privacy Preserving Linear Regression on
       and M. Kantarcioglu, “A secure distributed logistic                  Distributed Databases,” Trans. Data Priv., vol. 8, pp. 3–28,
       regression protocol for the detection of rare adverse drug           2015.
       events,” J. Am. Med. Inform. Assoc. JAMIA, vol. 20, no. 3,    [13]   T. Nishide and K. Sakurai, “Distributed paillier
       pp. 453–461, May 2013.                                               cryptosystem without trusted dealer,” in Information
[8]    P. Paillier, “Public-key cryptosystems based on composite            Security Applications, Springer, 2011, pp. 44–60.
       degree residuosity classes,” in Advances in cryptology—       [14]   R. Hall, S. E. Fienberg, and Y. Nardi, “Secure multiple
       EUROCRYPT’99, 1999, pp. 223–238.                                     linear regression based on homomorphic encryption,” J.
[9]    Y. Desmedt, “Threshold cryptosystems,” in Advances in                Off. Stat., vol. 27, no. 4, p. 669, 2011.
       Cryptology — AUSCRYPT ’92, J. Seberry and Y. Zheng,           [15]   V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D.
       Eds. Springer Berlin Heidelberg, 1993, pp. 1–14.                     Boneh, and N. Taft, “Privacy-Preserving Ridge Regression
[10]   C. Hazay, G. L. Mikkelsen, T. Rabin, and T. Toft,                    on Hundreds of Millions of Records,” 2012.
       “Efficient rsa key generation and threshold paillier in the   [16]   F. Dankar, R. Brien, C. Adams, and S. Matwin, “Secure
       two-party setting,” in Topics in Cryptology–CT-RSA 2012,             Multi-Party linear Regression.,” in EDBT/ICDT
       Springer, 2012, pp. 313–331.                                         Workshops, 2014, pp. 406–414.
[11]   R. Canetti and S. Goldwasser, “An efficient threshold         [17]   P. Zhao and B. Yu, “On model selection consistency of
       public key cryptosystem secure against adaptive chosen               Lasso,” J. Mach. Learn. Res., vol. 7, pp. 2541–2563, 2006.
       ciphertext attack,” in Advances in Cryptology—
       EUROCRYPT’99, 1999, pp. 90–106.                               [18]   Y. Kim, S. Kwon, and H. Choi, “Consistent model
                                                                            selection criteria on high dimensions,” J. Mach. Learn.
                                                                            Res., vol. 13, no. 1, pp. 1037–1057, 2012.