HEALTH+Z: Confidential Provider Selection in Collaborative Healthcare P2P Networks Sergej Zerr* , Odyseas Papapetrou** , Elena Demidova* * L3S Research Center, Hannover, Germany {zerr,demidova}@L3S.de ** SoftNet lab, Technical University of Crete, Chania, Greece papapetrou@softnet.tuc.gr ABSTRACT indirectly disclosures illness frequency. DHT-based indexes Many real world applications in the healthcare domain would are the standard choice for efficient identification of con- gain a substantial advantage from sharing and search tech- tent providers and searching information in P2P networks nologies available for P2P infrastructures if these technolo- in general. However, an ordinary DHT-based index does gies could provide required confidentiality guarantees. Cur- not provide sufficient confidentiality guarantees for health- rently, DHT-based indexes which are typically applied for care data. This index is created using the inverted index effective and efficient information sharing and retrieval in data structure, which is then distributed over the network P2P networks do not offer sufficient confidentiality for the peers. An inverted index is a sequence of posting lists, each patient data in a healthcare network and medical document of which contains the IDs of all peers containing informa- archives. In this paper we discuss the challenges involved tion about the specific term (which corresponds to a patient in securing patient data stored in a DHT-based index and ID in our scenario). Table 1 shows an inverted index with discuss initial solutions to address these challenges. four posting lists and seven posting list elements (elements for short). For instance, for patient John Doe the index includes information on one dentist, one urologist and one 1. INTRODUCTION general practitioner who treated her in the past. This infor- Patient data records in the healthcare domain are often nat- mation can be easily extracted from the ordinary inverted urally distributed over the archives of corresponding doctors index and thus requires additional protection against unau- and healthcare facilities. Real world applications using this thorized access. A naive solution would be to rely just on data would gain a substantial advantage from using sharing access control mechanisms on a trusted server. However, it and search technologies available for P2P infrastructures. is unlikely that all institutionally independent doctors and The P2P paradigm enables efficient sharing and retrieval hospitals in a collaborative healthcare network can agree on of information in distributed settings and promises unlim- a single trusted central authority to enforce access control ited scalability, easy maintenance, and robustness against on index entries. Moreover, centralized indexes are attrac- network attacks and failures. A study [19] stressed the im- tive targets for attack and will need additional protection portance of P2P networks in medical informatics, especially even if the index would be encrypted. For example, even if for improving data sharing between doctors and hospitals, the exact content of the elements is obscured, the length of in the national (US) as well as international contexts. How- the posting lists corresponds to the number of doctors the ever, considering high sensibility of the personal confiden- patient visited in the past. Additionally, an adversary can tial data, privacy preserving mechanisms are unavoidable in scan posting lists on a compromised server to collect and this context. In this paper we illustrate the problem of ef- count the ID’s of the patients of a specific doctor. ficient and confidential information sharing in a healthcare network along the following scenario: In case of emergency, In this paper we investigate the problem of building a DHT- information about blood group, allergies and vaccinations of based inverted index HEALTH +Z for secure provider selec- a patient must be accumulated from collaborative network tion in collaborative healthcare P2P networks. This index peers and presented to an authorized emergency physician fulfills the following conditions: (i) any information pub- to enable rapid and informed treatment decisions. This in- lished in the DHT can be accessed only by authorized par- formation is naturally spread among several network peers, ticipants; (ii) each participant can easily and inexpensively e.g. physicians, internists and hospitals that treated the pa- access all information she has authorization for; (iii) the tient in the past. In case of emergency these peers need to solution must withstand adversaries, and; (iv) the solution be efficiently identified and requested to provide required in- must be completely decentralized and stable even if some formation. However, the knowledge of the content provider, providers will not be available, to allow scalability in large in this case a doctor or a hospital, can also disclose insides P2P networks. Our contribution is summarized as follows: in a patient’s history for the interested third parties. For (i) we formalize the problem of securing provider informa- instance, an insurance company, a bank or a potential em- tion stored in the DHT-based index: we describe the pos- ployer might want to find out some data about the patient sible threats that need to be addressed by an acceptable history. The specific area of expertise of the corresponding solution and show what characteristics each acceptable so- specialist can give insides in the art of potential diseases lution should adhere; (ii) we propose a solution for securing or the number of medical peers corresponding to a person Papetrou, O. dentist:Peer P19, podiatrist: Peer P7 DHT Finger table DHT Index Shared Information Peer Key range Zerr, S dentist2:Peer P30 P7 (192.3.11.2) 7-13 Doe, Joe urologist: Peer P40, dentist:Peer P19 P14 (195.32.1.14) 14-18 P30 (111.27.2.2) 30-32 Smith, Joe dentist:Peer P19 P57 P1 P57 (28.124.2.67) 57-63 DHT Finger table DHT Index Shared Information Table 1: A Patient-Doctor Inverted Index P49 P7 Key Value (Posting list) Papapetrou,O. dentist:url/323101 dermatologist:url/1132 P40 P14 Allergy to penicillin the DHT index. The solution combines several technologies Zerr,Sergey dentist2:url/pat/zerr which are required to fully secure the data: k out of n en- cryption, encryption against statistical attacks, and policy- P33 DHT Finger table DHT Index Shared Information P19 Peer of a doctor driven authorization; (iii) we perform a theoretical evalua- Patient Name Info P30 tion for the cost and security offered by the network. The P21 Papapetrou,O. Allergy to penicillin Full history: paper is organized as follows: Section 2 discusses the threat dentist:url/323101 model; Section 3 presents HEALTH +Z index; Section 4 con- .... tains evaluation; Section 5 describes related work; Section 6 provides a conclusion. Figure 1: An Unsecured Inverted Index over DHT 2. THREAT MODEL 3. HEALTH+Z NETWORK HEALTH +Z targets the problem of supporting efficient provider In this section we define HEALTH +Z index structure which selection for healthcare data distributed over a set of network provides confidentiality guarantees that hold even if a given peers. In order to provide efficient, scalable and completely number of the network peers are compromised or malicious decentralized solution this network makes use of a DHT- and analyze characteristics of the index. DHT as a Dis- based index which is distributed among network peers. In- tributed Inverted Index : HEALTH +Z network consists of a formation stored in this index requires protection against set of content providers CP = {cp1 , . . . , cph } (doctors or unauthorized usage. The index needs to resist statistical hospitals in our scenario) which share information about attacks and achieve the privacy goals described in the fol- entities E = {e1 , . . . , em } (e.g. patients). For the ease of lowing. presentation we assume that each content provider corre- sponds to one network peer P1 . . . Ph . In order to enable Attacks: To give a sense of the set of potential dangers, efficient search, information about the entities is indexed us- consider the following three goals of a potential attack on ing HEALTH +Z distributed index. HEALTH +Z distributed an index. index is based on a Distributed Hash Table (DHT). DHT is a family of distributed algorithms typically applied in the • Determine the number of peers sharing patient’s data mainstream P2P systems. As the name implies, the func- on the network. Aggregate number of posting elements tionality of DHTs is similar to the functionality of traditional shared about a particular patient over the network cor- hash tables: they enable efficient distributed storage and re- responds to the number of peers treated the patient in trieval of (key, value) pairs. Thereby an ordinary inverted the past. For example, an adversary may observe that index, like the one presented in Table 1, can be partitioned the number of peers sharing records of a patient ex- across several peers.Without loss of generality, key is a num- ceeds average number of peers for other patients and ber in the range of [0 . . . 2z ) where z is a value specific to the conclude the increased illness probability. DHT implementation (e.g., for Chord DHT[20], z is 160). • Determine whether a patient record appears at a par- In our scenario, we want to use as keys the patient names. ticular inaccessible site, or at any indexed site. For Therefore, patient names are converted to numeric repre- example, a patient record at a specialists’ peer cor- sentations by using a consistent hash function. There are responds to an increased probability of a particular several suitable consistent hash functions for converting any disease. type of data to integers. In this work we use MD5 hashing, • Reconstruct the list of records shared by a particular followed by modulo with the maximum key value. peer on the network. The list of posting elements The process of retrieving all information for a patient in- shared by the peer corresponds to the list patients volves two steps: (1) find all doctors that this patient has shared by this peer. For instance, a competitor peer visited, and (2) contact the peers corresponding to these doc- may want to obtain such list of patients. tors, to retrieve all relevant information. The first step, of Privacy Goal : HEALTH +Z focuses on attaining content pri- locating all relevant doctors, is performed using the DHT vacy with respect to data d made searchable by some con- inverted index. The name of the patient is transformed tent provider p. That means that an adversary A should to its numeric representation using a consistent hash func- not be allowed to deduce that p is sharing data d unless tion. Then, the peer responsible for holding this value in A has been granted access to d by p. In addition, state of the DHT is located, and contacted to retrieve the list of the art techniques auch as secure communication channels doctors that this patient visited. The peers corresponding such as https should be used to provide confidentially for to these doctors contacted directly, for authorized clients the content of queries and updates. Query privacy preserv- (such as emergency doctors) to retrieve important informa- ing techniques like [16, 7], can be used to prevent an adver- tion for the patient, e.g., allergies, medication, and past ill- sary from determining which searcher issued what particu- nesses. The good scalability characteristics of DHTs make lar queries. An adversary could determine peers involved in them suitable information sharing infrastructures for many the patient history by examining query logs, for this reason mainstream applications. However, current DHT-based sys- HEALTH +Z does not store any query log information. tems do not enable indexing information confidentially, or restricting information access. Everything that is published x-coordinate of the peer. The numbers p and xi are made in the DHT is by default accessible to all participating peers. public, so all users know them. In the next section we show how the DHT can be secured so To index an element a0 its provider generates a pseudo- that only authorized peers can retrieve relevant information. random polynomial f of degree k -1. The coefficients ai (ex- Confidential Distributed Indexing: A naive approach to lo- cept ao ) are randomly picked from the field Zp . The secret cate doctors for a particular patient would be to broadcast share given to the ith peer is f (xi ). k such shares are enough the query to all available peers which leads to unacceptable to reconstruct the polynomial. To decrypt an element, a user latency in a larger network. As discussed above, an ordi- must obtain k of its secret shares and determine the coef- nary inverted index will help to precisely locate patients’ ficients of the polynomial f by solving a system of k linear medical records, but does not provide the required confiden- equations. tiality guarantees. In order to index entities confidentially, This scheme avoids complex key management and does not HEALTH +Z modifies index content as discussed in the fol- require re-encryption of the data unless more than k peers lowing. Each posting list in this index is a bit map; like in in the network are compromised. Moreover, if an adversary ordinary inverted index this list corresponds to a patient; learns some of the shares, proactive sharing techniques can each posting element (bit) in this map corresponds to a con- be used to prevent the adversary from getting k shares [11]. tent provider. This bit is set to one if the corresponding With this technique, the shares are updated so that those provider shares information about the entity and to zero already known become useless. otherwise. Note that in general a posting element may con- k-out-of-n encryption in HEALTH +Z replaces replication typ- tain additional data shared by the content provider. Here we ically performed in P2P networks. Differently from the pub- consider the bit map to simplify the presentation. In fact, a lic networks, HEALTH +Z does not store any exact copies of non-encrypted HEALTH +Z index is an entity-provider inci- the index as all n parts of the encrypted secret differ. How- dence matrix which is presented in Figure 1. ever, owing to the k-out-of-n encryption scheme the network More formally, given the network H, index I, a content provider is resistant to the failures of up to n-k peers which store any cpi , and an entity ek , part of the index. We discuss overhead introduced by this scheme in the evaluation section. cpi ∈ H ⇒ ∀ek ∈ H : cpi ek ∈ I Access Control : Like in an ordinary P2P system, the index is partitioned across several peers according to entities such Practically, this means that index structure contains an en- that each network peer stores only a part of the index. In try for every content provider-entity pair. In order to pro- difference to the public P2P systems, this index is stored tect the index against unauthorized usage, bit maps are privately on the peers and queries are answered only upon encrypted using k-out-of-n encryption scheme as discussed requests of the authorized users. In order to perform access later in the “Encryption” paragraph. The presence of an en- control on the index entries, HEALTH +Z makes use of stan- crypted entry in the index does indicate that an entity is dard authentication and authorization techniques. shared by the corresponding peer. Index Construction and Updates: Assume a network con- Encryption: In order to protect the index against unautho- tains H content providers cp1 , . . . , cph . At startup the in- rized usage, posting elements are encrypted using k-out-of-n dex is empty. If the content provider cpi wants to share the encryption scheme [17]. Application of k-out-of-n encryp- data of entity ej , it first searches for the entity ej as dis- tion to distributed indexing was first proposed in [21]. In cussed in the following. In case the entity is not indexed, this scheme a single posting element (secret) is spit into n cpi receives an empty result. An empty result corresponds parts (secret shares) such that at least k out of n parts are to the case of a new patient, which was never indexed in required in order to reconstruct the secret. These secret the DHT by any content provider, either doctor or hospital. shares are computed at the peer holding the plain informa- To insert the new entity in the index cpi creates a new bit tion and then distributed over the network peers, such that map of size N and sets the ith bit to one and all other bits only encrypted information is sent over the network and even to zero. Then, cpi encodes each posting element using k in case index holding peers are compromised/malicious, the out of n encryption scheme and distributes the result over plain text information is not available for them. The query- n network peers. Unlike ordinary P2P networks where the ing user needs to be authorized by at least k peers in order set of peers changes dynamically, set of content providers to obtaine enough shares to decrypt posting elements. Even in HEALTH +Z is rather static due to the natural proper- if k-1 peers are compromized, it will not possible to recon- ties of the healthcare network. This set can be extended by struct the initial information. Figure 3 illustrates a part of adding a new column to the index; this is a rather expen- P2P network with peers P1 , P2 , P3 and n=3. The post- sive but infrequent operation and can be further optimized, ing list for the entity e1 is encoded into three posting lists e.g., by adding columns in batches of B bits. Thus each each represented as a random vector. Each of those vectors adding of the columns will accommodate an increase of B is stored on a separate peer (i.e., P1 , P2 and P3 ). Assume content providers in the index and each posting list will in- k =2; then in order to decrypt the elements corresponding crease by B bits. On the contrary, the bitmaps in the index to the entity e1 the user needs to be authorized by at least require frequent dynamic updates; the bitmaps correspond- two peers out of P1 , P2 , P3 . ing to the entities can be added and updated dynamically by corresponding content providers. Each content provider The encryption algorithm works as follows: All the opera- only needs to update the column that corresponds to her tions described later in this section are carried out in the fi- peer. This update can be performed inexpensively as it re- nite field Zp . The secret splitting algorithm starts by choos- quires only a constant number of DHT lookups. Deletion ing a large prime number p, such that any posting element of an entity is a rare operation which frequency in most of (secret) to be shared is in Zp . In addition, each peer i is the cases depends on the retention period of records (e.g. assigned a unique random value xi in Zp . We call this the cp1 cp2 cp3 « cph e1 0 1 0 « 0 e2 1 1 0 « 0 « « « « « « Figure 2: Entity-Provider Incidence Matrix P1 cp1 cp2 cp3 ... cph P2 cp1 cp2 cp3 ... cph e1 1 0 1 ... 0 e1 0 0 1 ... 1 ... ... ... ... ... ... ... ... ... ... ... ... P3 cp1 cp2 cp3 ... cph e1 0 1 1 ... 1 ... ... ... ... ... ... Figure 3: k-out-of-n Encryption of a Posting List Figure 4: Population and Number the Physicians per 10 years by German law). In order to delete an entity from European Country the index, corresponding bitmap is simply removed. Con- fidentiality Guarantees: HEALTH +Z index provides strong and quantifiable confidentiality guarantees that hold even if the entire index entries stored on k -1 malicious peers are made public. On her compromised peer, an adversary A can examine index entries. As all posting lists have equal length and represented as random bit vectors, she cannot determine the number of peers sharing patient’s data on the network. She cannot determine at which particular site the patient record appear, although she can conclude that the patient record appears at least at one indexed site (which is not sensitive information in current setup since it corre- Figure 5: Network Cost for Index Construction per Par- sponds to the fact that a particular person visited a doctor ticipant at least once). Similarly, she cannot reconstruct the list of records shared by a particular peer on the network as every and patients that our system has to manage. Figure 4 shows peer corresponds to all patients in the index matrix. The k the number of physicians per European country. According parameter in the k-out-of-n encryption defines the number to the data, the number of physicians in the majority of of the peers that share a secret about a particular posting the European countries does not exceed 300,000 whereas list and need to be compromised by an adversary in order 80,000,000 is a maximal estimate for the population. Both to break the encryption of posting elements. numbers correspond to Germany. The proportion of the There is a tradeoff between confidentiality preservation and physicians with respect to the European population does retrieval efficiency. The higher the k value, the more secure not vary much and the proportion physician/persons can be the index. However, higher k values lead to increased net- estimated as 1/450 on average.Using these boundaries we work traffic and response time. In the most secure case, k is created a matrix index. We randomly assigned patients to close or equal to the number of providers (doctors) within doctors using following estimations: the network and querying would essentially be performed by • We assumed the normal distribution of the number of broadcasting the query. Smaller k values decrease network doctors per patient cost as well as security level. Thus k is a tunable parameter • We assumed that on average a person has her data by that can be adjusted during the index creation with respect 20 doctors and used this number as a mean for the to the trust level within the network. distribution The N value determines the number of peers holding a par- • We assumed that patients are uniformly distributed by ticular index entry. Since k peers holding shares of a par- the doctors ticular index entry are needed to reconstruct the entry, N-k Thus each patient was assigned to 20 randomly chosen doc- is the number of peers that can be offline at a time and the tors on average, and each doctor served on average 5,333 network would be still able to deliver enough shares. patients. Assuming a bit of storage per patient-doctor re- 4. EVALUATION lation, the index requires 25 kBytes for each patient’s bit After discussing HEALTH +Z architecture and confidential- map. The k out of n encryption additionally increases this ity guarantees, we evaluate its storage requirements, query size by n times. costs, and network usage for a network participant com- pared with an ordinary DHT, using a simulated data set. 4.2 Experimental Setup We created a simulated network with a reasonable size for a With our experiments we compared network and storage European country. costs for an unencrypted index and for various encrypted indices. Network cost was measured as follows: 4.1 Experimental Data We used the data from the World Health Organization for a. Network cost for creating the index from scratch. This Europe1 in order to estimate the potential number of doctors cost occurs only once, when bootstrapping the net- work. This is the cost required for publishing all infor- 1 http://www.who.int/gho/health_workforce/physicians_density/en/ mation of all content providers in the DHT all their patients at the same time. However, this bootstrap- ping process does not run under time constraints, therefore content providers can just wait for a couple of hours after installing the system, before starting to use it. Query and Update Overhead : The number of messages needed for the retrieval of a particular posting list increases by k times compared with an ordinary DHT, because of the k- out-of-n encryption. However, even for k =6, retrieving the patientŠs information requires only 54 messages. Assuming ASDL speeds, this number of messages is negligible and can be easily executed in real-time. Figure 6: Cost per Query or Update, per Participant Cost per update grows linearly with n. This happens be- cause the content provider needs to locate the peers that hold all the n bit maps for the patient, and update one bit at each of them. For this update, the whole lists need not be retrieved. Similar to query cost, this cost is also negligible and can be executed in real time. Storage Overhead : Unlike a DHT which is an inverted in- dex, in HEALTH +Z all posting lists have the same number of elements which corresponds to a number of document providers. Encryption under Shamir’s k-out-of-n scheme does not change the size of the posting elements although Figure 7: Storage Cost per Participant the number of posting lists in the network increases by n b. Network cost for executing a query or for updating a times. Figure 7: (Storage Cost per Participant) shows that record. This cost occurs every time a content provider a storage overhead increases linearly with the growing num- needs to locate information for a patient (e.g., an emer- ber of n. For all the proposed setups, storage costs per peer gency room doctor), or when a content provider adds do not exceed 60 Mbytes. This storage overhead is negligi- a patient in her patients list ble for today’s off-the-shelf personal computers. c. Storage cost. Each content provider contributes to the Overall the results of the experiments prove the matrix index DHT by holding a small part of the distributed in- scalability for a given scenario and show that the network verted index. This cost is the storage cost incurred by and storage costs are also reasonable. each peer on average Note that our analysis does not include two additional cost 5. RELATED WORK factors: (a) the network overhead for maintaining the DHT The P2P paradigm promises unlimited scalability, easy main- connectivity between all content providers, and, (b) the cost tenance, and robustness against network attacks and fail- for storing the actual medical information at the content ures [3]. A recent study stressed the importance of P2P net- providers. The former factor is not included because DHTs works in medical informatics, especially for improving data were already evaluated independently, and they were found sharing between doctors and hospitals, in the national (US) to be scalable and extensible [20]. as well as international contexts [19]. HEALTH +Z builds The latter factor depends on the information that is kept upon the existing work on information sharing and provider for each patient, and it is orthogonal to the application; this selection in P2P systems and enriches the DHT-based index cost anyway occurs in current medical systems. structure used in P2P networks with confidentiality guaran- Parameter Selection: The parameter k determines the secu- tees required in medical applications. rity level within the network. However, in order to increase Encryption is a standard technique for storing data confi- k, n also needs to be increased. The parameter n, on its dentially [4, 9, 13]. Other techniques include suppressing turn, determines the number of the times the index storage and/or generalizing released data into less specific forms, so cost has to grow. We have to assume that different possibly that they no longer uniquely represent individuals [8, 12]; k- old hardware is used by the network participants and thus anonymity is one popular form of generalization (e.g., [2, 14, each peer should hold not more around 50 Mbytes index 15]). Unfortunately, it is not possible to directly apply these data which corresponds to k <6 in our setup. We run the techniques to secure an inverted index. Even if posting list experiments for an unencrypted matrix index and compared entries are encrypted, they can leak critical statistical data. it with 12 setups that differ in the choice of k and n. The problem of sensitivity of the posting list length infor- Network Cost for Index Creation: Figure 5 (Network Cost mation was also stressed by [5] for Index Construction per Participant) summarizes the av- The authors in [1, 21] considered protecting an inverted in- erage network cost per peer for creating the index. We mea- dex when there is no single trusted central authority to en- sure cost with number of messages. The cost for both, unen- force access control on posting list elements. Like µ-Serv, crypted and encrypted index is at the same order of magni- HEALTH +Z addresses the problem of confidential provider tude, even with high encryption parameters, e.g., k =6 and selection in a network. However, µ-Serv does not provide n=6. As expected, this cost grows linearly with n. sufficient protection for the data in the healthcare domain Recall that this cost occur only once, while bootstrapping as the adversary can still conclude that certain percentage the network. During this bootstrapping period, it is ex- of posting elements in the index are true positives, which pected that the network will be more loaded than usual, be- enables indirect conclusions on illness frequency of a person. cause all content providers will be publishing information for Moreover, µ-Serv lengthens the querying process and wastes cycles at sites that do not contain query-relevant entries. For 8. REFERENCES example, if x = 5%, the user must query 20 times as many [1] M. Bawa, R. J. Bayardo, Jr, R. Agrawal, and sites to get the relevant results, which can lead to critical J. Vaidya. Privacy-preserving indexing of documents delays in medical emergency applications. On the contrary, on the network. The VLDB Journal 2009. HEALTH +Z enables an authorized user directly identify cor- [2] R. Bayardo and R. Agrawal. Data privacy through responding peers. optimal k-anonymization. In ICDE’05. Zerber [21] developed in our previous work is an r -confidential [3] M. Bender, S. Michel, P. Triantafillou, G. Weikum, inverted index which protects indexed data by means of and C. Zimmer. Minerva: Collaborative p2p search. In frequency-based merging of posting elements related to sev- VLDB ’05. eral terms in one posting list. In order to provide confi- [4] M. Blaze. A cryptographic file system for unix. In dentiality guarantees for the information stored in the in- CCS ’93. dex it requires a training data set from which it can learn [5] S. Büttcher and C. L. A. Clarke. A security model for document frequency distribution. However, the terms in full-text file system search in multi-user environments. the HEALTH +Z index are unique patient IDs, such that re- In FAST’05. quired training information is not available in this scenario. [6] T. Cho, S.-H. Lee, and W. Kim. A group key recovery On the contrary, HEALTH +Z enables confidential provider mechanism based on logical key hierarchy. J. Comput. selection in case no training information is available. Secur. 2004. While many other researchers have addressed aspects of data confidentiality, none of their schemes are intended for an en- [7] N. L. Farnan, A. J. Lee, P. K. Chrysanthis, and T. Yu. Don’t reveal my intension: Protecting user privacy vironment with many dynamic collaboration peers. For ex- using declarative preferences during distributed query ample, researchers have suggested ways to search encrypted text or tables stored on a remote untrusted server (e.g., [10, processing. In ESORICS’11. 18]). In a situation with many collaboration peers encryp- [8] B. Fung, K. Wang, and P. Yu. Top-down specialization tion based approaches are not easy to use or manage due for information and privacy preservation. In ICDE’05. to the encryption key management. Data owners and/or [9] M. Goodrich, R. Tamassia, and A. Schwerin. project group managers must generate and distribute key- Implementation of an authenticated dictionary with ing material for all group members.If a key is lost, stolen, or skip lists and commutative hashing. In DISCEX’01. even published, the index entries encrypted with it are com- [10] H. Hacigümüş, B. Iyer, C. Li, and S. Mehrotra. promised. When a key is compromised or a member leaves Executing sql over encrypted data in the a group, the key must be revoked and all the content asso- database-service-provider model. In SIGMOD ’02. ciated with that key must be re-encrypted and re-indexed. [11] A. Herzberg, S. Jarecki, H. Krawczyk, and M. Yung. Modern group key management schemes, such as logical key Proactive secret sharing or: How to cope with trees [6] and broadcast encryption, reduce the costs associ- perpetual leakage. In CRYPT0Š 95. ated with giving keys to members, but still require content [12] V. S. Iyengar. Transforming data to satisfy privacy re-encryption. Some approaches also require that the entire constraints. In KDD’02. index for a particular collection of documents be regenerated [13] M. Kallahalla, E. Riedel, R. Swaminathan, Q. Wang, by the collection owner every time an entry is added to or and K. Fu. Plutus: Scalable secure file sharing on deleted from the index. Zerber [21] proposed usage k-out- untrusted storage. In FAST’03. of-n encryption scheme which avoids key usage for data en- [14] K. LeFevre, D. DeWitt, and R. Ramakrishnan. cryption. HEALTH +Z builds upon this encryption scheme. Mondrian multidimensional k-anonymity. In ICDE’06. [15] A. Machanavajjhala, J. Gehrke, D. Kifer, and 6. CONCLUSION AND FUTURE WORK M. Venkitasubramaniam. L-diversity: privacy beyond In this paper we considered challenges involved in building k-anonymity. In ICDE ’06. confidential index in a P2P healthcare network and discussed [16] S. T. Peddinti and N. Saxena. Web search query initial solutions to address these challenges. Our experi- privacy: Evaluating query obfuscation and ments show that for a current setup it feasible to maintain anonymizing networks. J. Comput. Secur. 2014. an incidence matrix based index with confidentiality guar- [17] A. Shamir. How to share a secret. Commun. ACM’79. anties within a P2P like network. Such index is protected [18] D. X. Song, D. Wagner, and A. Perrig. Practical against any statistical attacks even if overtaken by an ad- techniques for searches on encrypted data. In IEEE versary. One of the requirements of DHTs is that they need Symposium on Security and Privacy, 2000. to withstand unexpected peer failures and disconnections. [19] W. W. Stead and e. C. o. E. t. C. S. R. C. i. H. C. I. To withstand such events without losing data, DHTs em- N. R. C. Herbert S. Lin. Computational Technology for ploy data replication. The integration of the replication in Effective Health Care:Immediate Steps and Strategic HEALTH +Z keeping its’ confidentiality guarantees is an in- Directions. The National Academies Press, 2009. teresting direction for the upcoming research. [20] I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: 7. ACKNOWLEDGMENTS a scalable peer-to-peer lookup protocol for internet This work is partly funded by the European Research Coun- applications. Networking, IEEE/ACM Transactions cil under ALEXANDRIA (ERC 339233) and by the project 2003. ”Gute Arbeit nach dem Boom” (Re-SozIT) funded by the [21] S. Zerr, D. Olmedilla, W. Nejdl, and W. Siberski. German Federal Ministry of Education and Research (BMBF) Zerber+r: Top-k retrieval from a confidential index. In (01UG1249C). Responsibility for the contents lies with the EDBT ’09. authors.