A probabilistic small model theorem to assess confidentiality of dispersed cloud storage (extended abstract) Marco Baldi1 , Ezio Bartocci2 , Franco Chiaraluce1 , Alessandro Cucchiarelli1 , Linda Senigagliesi1 , Luca Spalazzi1 , and Francesco Spegni1 1 Università Politecnica delle Marche, Ancona, Italy 2 TU Wien, Vienna, Austria Abstract. Recent developments in cloud architectures and security con- cerns have originated new models of online storage clouds based on data dispersal algorithms. According to these algorithms the data is divided into several slices that are distributed among remote and independent storage nodes. Ensuring confidentiality in this context is crucial: only legitimate users should access any part of information they distribute among storage nodes. We use parameterized Markov Decision Processes to model such a class of systems and Probabilistic Model Checking to assess the likelihood of breaking the confidentiality. We showed that a Small Model Theorem can be proven for a specific types of models, preserving PCTL? formulae. Finally, we report the result of applying our methodology to feasibly assess the security of existing dispersed cloud storage solutions. 1 Introduction New models of cloud storage protocols are emerging, that are based on data dispersal algorithms, where data is divided into several slices distributed among remote and independent storage node [4,5,6]. The main advantage of these tech- niques consists in their reliability since the dispersion is usually accompanied by redundancy. However, in this context, ensuring confidentiality is equally im- portant: only legitimate users should have access to any part of the information that was dispersed among the independent storage nodes. We developed an assessment methodology for dispersed storage clouds that can describe real-world implementations and the presence of a passive eaves- dropper trying to spill the slices and reconstruct the users secret. The proposed methodology relies on parameterized Markov Decision Pro- cesses to model such a class of systems and probabilistic model checking as a verification tool. Since many parameters contribute to the description of the system, we addressed it applying common bounded model checking techniques. For certain classes of models we were able to prove a small-model theorem, im- plying that certain security specifications hold irrespective of the actual number of storage providers. 2 The AONT-based dispersed storage clouds In dispersed storage clouds any user has some amount of cloud storage space assigned on independent storage nodes. In order to assure reliability on the one hand and security on the other, several authors have proposed schemata based on fragmentation, erasure coding, and encryption [4,5]. From a purely abstract point of view, all these algorithms can be character- ized by a set of parameters. Let x be the original file size (measured in bits) to be dispersed and let l · q be the size of each fragment called slice (when q = 8, l is the size of a slice in bytes). The parameters of interest are n and k, the former being the number of slices after the transformation of the original file, and the latter the minimum number of slices to recover the original file (i.e. n − k is the maximum number of lost (erased) slices that still enables file recover). In Fig. 1 we depicted AONT-RS [5], a popular schemata consisting in apply- ing the Reed-Solomon erasure code (Encoder ) to the All-or-Nothing-Transform (AONT and Slicer ). The produced fragments are finally dispatched among the providers (Dispatcher ). Attacker models. To the aim of assessing the security of dispersed stor- age clouds, we introduced a probabilistic attacker model. Assuming the user is connected to the Internet from its (wired or wireless) local area network (LAN), a passive eavesdropper on the user LAN has some chances to intercept the ex- changed slices. We have considered mainly two types of attackers: in one case the attacker can intercept each exchanged slice independently from the others (slice attacker [1,3]), while in the other he/she can choose to attack the storage providers trying to steal all the slices stored there (provider attacker [3]). In the case of a slice attacker on a wireless LAN, one can use a frequentistic approach and rely on the physical properties of the channel to determine the likelihood of successfully intercepting a slice (as we have shown in [1]). In the case of a provider attacker, one can use a more subjective approach to determine the likelihood of breaking a storage provider. In both cases such likelihoods are required as input parameters of the system models. 3 Assessment methodology Now we briefly describe how the user, links to storage nodes and attacker are modeled using MDPs. x+α k · (l · q) n · (l · q) document AONT Slicer Encoder Dispatcher (x bits) l · n1 · q l · n2 · q l · nm · q SP1 SP2 ... SPm random key (α bits) cloud Fig. 1: Block diagram of AONT-RS (with data length expressed in bits) [Ri ] nri > 0 ∧ nri0 = nri − 1 di : nw < n ∧ [Wi ] c0 = i c = i ∧ c0 = 0 ∧ nw0 = nw + 1 nra > 0 ∧ nra0 = nra − 1 ∧ 0 nr1 = ctr1 ∧ . . . 0 nra0 = n reads nw = n ∧ c = 0 ∧ nrm = ctrm nr1 = 0 ∧ . . . ∧ nrm = 0 nra = 0 ∧ nr1 = 0 ∧ . . . ∧ nrm = 0 Fig. 2: The User(d1 , . . . , dm ) 3.1 Modeling In Figs. 2, 3 and 4 we represent the relevant MDPs for the case of a system with a slice attacker. We use straight variable names for edge pre-conditions, while primed variable names are considered edge post-conditions. First, the user loops until all slices are dispatched to the storage providers (i.e. nw = n). Next, he/she loops until n reads attempts have been consumed (i.e. nra = 0). The link to the storage provider i either receives a written slice [Wi ] or sends a slice to be read [Ri ]. In both cases there is some chance ai that the slice is leaked to the attacker. The latter increments a counter of stolen slices ctra for each leak event [Li ]. Security assessment analysis. It is clear that assessing the security of such systems is a parametric problem. Indeed, by allowing an arbitrarily large num- ber of read operations by the user, the attacker has probability 1 of intercepting more than k slices (every read the attacker has one more chance of intercepting the missing slices, until it intercepts all of them). Similarly, assuming the secret is split into an arbitrarily large number of slices gives the attacker a negligible probability of succeeding in his/her attack. Between these ends lie all the pa- rameters values of the actual implementations of AONT-based algorithms. Very often such values are not bound to a clearly stated security metric. We employ bounded and probabilistic model checking to compute the likelihood of a suc- cessful attack for several parameter configurations. The obtinaed probabilities [Wi ] [Li ] [Wi ] ai : ctri < c ∧ leak1 = 0 ∧ . . . ∧ leakm = 0 ∧ ctra < n ∧ ctra0 = ctra + 1 1 − ai : ctri < c ∧ leaki0 = 0 ∧ leaki0 = 1 ∧ ctri0 = ctri ctri0 = ctri + 1 leaki0 = 0 [Li ] leaki = 0 [Ri ] 1 − ai : ctri > 0 ∧ leaki0 = 0 ∧ [Ri ] 0 ctri = ctri ai : ctri > 0 ∧ leak1 = 0 ∧ . . . ∧ leakm = 0 ∧ leaki = 1 ∧ ctri0 = ctri − 1 0 Fig. 3: The Linkci (ai ) Fig. 4: The Attacker are collected in a graph relating the probability of a successful attack with the parameter values. For m storage providers, the checked system is: MAONT m := User(d1 , . . . , dm ) k Linkn1 (a1 ) k . . . k Linknm (am ) k Attacker. Finally, the probabilistic model checker is repeatedly invoked to solve the fol- lowing problem varying the parameter values: Pmax (F (ctra ≥ k), MAONT m ). Small model theorem for node links. A small model theorem allows to verify a class of infinite state systems by only checking a finite size system. The key observation is that, in a system where slices are intercepted when traveling between user and storage nodes, two or more node links with the same attack probability are indistinguishable from a single node link having the same attack probability, modulo some technicalities. Since the attack probability is deter- mined by the physical properties of the employed LAN, the number of required node links reduces to the number of used LANs. We denote with Linkci (a) a generic link to storage provider i, having capac- ity of c slices and probability a of leaking a slice to the slice attacker, during reads or writes. We write ≈Γ denoting the (equivalence) relation of probabilistic bisimulation modulo action replacement [1]. Then we can state the following. Lemma 1 (Reduction [1]). For any natural numbers c, d, i, j, k > 0 such that i 6= j, any probability a. Given the MDPs Linkci (a), Linkdj (a), and Linkc+d k (a): Linkci (a) k Linkdj (a) ≈Γ Linkc+d k (a) where Γ renames Ri , Rj (resp. Wi , Wj , resp. Li , Lj ) to Rk (resp. Wk , resp. Lk ). Given a sorted list of numbers a1 , . . . , am s.t. a1 ≤ . . . ≤ am , let us call its distinction the list of indices i1 , . . . , iq+1 satisfying the following: – i1 = 1, iq+1 = m, and i1 < . . . < iq+1 , – ∀j ∈ [1, q].∀k ∈ [ij , ij+1 − 1]. aij = ak , and – ∀j ∈ [1, q]. aij < aij+1 . Such constraints mean that the list a1 , . . . , am can be partitioned into q sub- lists, each containing identical values, and each pair of lists containing distinct values. For example, the distinction of the sorted list of probabilities 0.00, 0.00, 0.05, 0.10, 0.10, 0.10, 0.15, is the list of indices 1, 3, 4, 7. The small model theorem states that there exists a cutoff to the number of Links to be considered in a system. Theorem 1 (Small model theorem). For any naturals m, c1 , . . . , cm > 0 and probabilities a1 , . . . , am . Given the MDPs Linkc11 (a1 ), . . . , Linkcmm (am ). For any MDP M and formula Φ ∈ PCTL? the following holds: M k Linkc11 (a1 ) k . . . k Linkcmm (am ) |= Φ ⇔ c ci M k Link1i1 (ai1 ) k . . . k Linkq q (aiq ) |= Φ where, for some 0 < q ≤ m, the list of indices i1 , . . . , iq is a distinction of the list a1 , . . . , am (assume w.l.o.g. that the latter is sorted), the dispatch probabilities are Pij+1 −1 Pij+1 −1 given by dij = k=i j dk while the capacities are defined as cij = k=i j ck . 4 Conclusions In our works we have introduced a novel formal probabilistic model to verify security properties of online storage clouds based on data dispersal algorithms. We have also considered different (probabilistic) models of attackers, namely some try to intercept the traveling slices while others try to attack the providers hosting the slices. In both cases their aim is to collect at least k slices, allowing to reconstruct the user’s secret. Based on this we designed a methodology for assessing security of dispersed cloud storage architectures. In the case of an attacker intercepting the exchanged slices, we were even able to prove a small model theorem for the number of storage providers to be model checked in the system. This in turn allowed us to measure the confidentiality of such systems for any number of storage providers in the network. Our methodology can be applied (1) a posteriori to measure the degree of security of an existing system w.r.t. a given specification, or (2) a priori in order to determine the best parameter values allowing the system to minimize the likelihood of an attack, from the considered intruder model. In [1,2,3] we have shown how to apply our methodology on scenarios where a slice attacker is present, both for assessing security of an existing system and for determining best parameter values. In particular, in [2] we exploited our methodology to find parameter values ensuring that the dispersed cloud storage reaches the degree of security known as perfect secrecy. Intuitively, the latter means that the slice attacker has equal likelihoods of intercepting k slices and of guessing the entire message content out of nothing. In [3] the methodology has been used to model scenarios with a provider attacker. References 1. Baldi, M., Bartocci, E., Chiaraluce, F., Cucchiarelli, A., Senigagliesi, L., Spalazzi, L., Spegni, F.: A probabilistic small model theorem to assess confidentiality of dis- persed cloud storage. In: Quantitative Evaluation of Systems - 14th International Conference, QEST 2017, Proceedings. pp. 123–139 (2017) 2. Baldi, M., Chiaraluce, F., Senigagliesi, L., Spalazzi, L., Spegni, F.: Security in heterogeneous distributed storage systems: A practically achievable information- theoretic approach. In: Computers and Communications (ISCC), 2017 IEEE Sym- posium on. pp. 1021–1028. IEEE (2017) 3. Baldi, M., Cucchiarelli, A., Senigagliesi, L., Spalazzi, L., Spegni, F.: Parametric and probabilistic model checking of confidentiality in data dispersal algorithms. In: High Performance Computing & Simulation (HPCS), 2016 International Conference on. pp. 476–483 (2016) 4. Li, M., Qin, C., Li, J., Lee, P.P.: CDstore: Toward reliable, secure, and cost-efficient cloud storage via convergent dispersal. IEEE Internet Comp. 20(3), 45–53 (2016) 5. Resch, J.K., Plank, J.S.: Aont-rs: blending security and performance in dispersed storage systems. In: Proceedings of the 9th USENIX conference on File and stroage technologies. pp. 14–14 (2011) 6. Shen, L., Feng, S., Sun, J., Li, Z., Wang, G., Liu, X.: Clouds: A multi-cloud storage system with multi-level security. In: Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing. pp. 703–716 (2015)