=Paper=
{{Paper
|id=None
|storemode=property
|title=Privacy Preservation for Location-Based Services Based on Attribute Visibility
|pdfUrl=https://ceur-ws.org/Vol-908/paper6.pdf
|volume=Vol-908
|dblpUrl=https://dblp.org/rec/conf/immoa/Mano0DI12
}}
==Privacy Preservation for Location-Based Services Based on Attribute Visibility==
Privacy Preservation for Location-Based Services Based on Attribute Visibility ∗ Masanori Mano Xi Guo Tingting Dong Graduate School of Graduate School of Graduate School of Information Science Information Science Information Science Nagoya University Nagoya University Nagoya University mano@db.itc.nagoya- guoxi@db.itc.nagoya- dongtt@db.itc.nagoya- u.ac.jp u.ac.jp u.ac.jp Yoshiharu Ishikawa Information Technology Center / Graduate School of Information Science Nagoya University y-ishikawa@nagoya-u.jp ABSTRACT non-intended purposes. In an extreme case, the user’s iden- To provide a high-quality mobile service in a safe way, many tity may be estimated by combining the location informa- techniques for location anonymity have been proposed in tion with additional information sources. The use of loca- recent years. Advanced location-based services such as mo- tion anonymization would solve the problem in some sense, bile advertisement services may use not only users’ locations but it may result in the degradation of service quality; an but also users’ attributes. However, the existing location appropriate anonymization method is required. anonymization methods do not consider attribute informa- tion and may result in low-quality privacy protection. In this 1.2 Location-based services that use attribute paper, we propose the notion of visibility, which describes information the degree that an adversary can infer the identity of the For a typical location-based service which only utilizes user by an observation. Then we present an anonymiza- location information, the conventional notion of location tion method which considers not only location information anonymity is effective for privacy protection. However, ad- but also users’ attributes. We show several strategies for vanced location-based services may use additional attribute the anonymization process and evaluate them based on the information such as user’s age, sex, and occupation. For experiments. illustrating our motivation, let us consider an example of a mobile advertisement service. In this service, we assume that a mobile user issues a 1. INTRODUCTION request for an advertisement and it is delivered to an appro- priate advertiser. Then the advertiser sends corresponding 1.1 Background advertisements to the user. In this sense, the advertisement service is a pull-based service. The matching service (called In recent years, location anonymization has become one of the matchmaker ) plays the role of a mediator between users the important topics in location-based services and mobile and advertisers, and uses users’ attribute information for computing [6]. The issue concerned is that a user should selecting appropriate advertisers. Since the success of an send her location information to receive a high-quality ser- advertisement is charged by the matchmaker, advertisers vice in general. However, if the service provider is an ad- would like to perform effective advertisements with low in- versary, the detailed location information may be used for vestments. If an advertiser can specify the type of the target users (e.g., young women), then the effectiveness of the ad- ∗Current Affiliation: NTT DOCOMO Inc. visement would increase. Figure 1 illustrates the overview of a mobile advertise- ment service assumed in this paper. The matchmaker be- tween mobile users and advertisers is a trusted third party and manages each user’s information as her profile. As de- scribed later, the matchmaker is responsible for anonymiza- tion. When a mobile user issues a request for a service (i.e., an advertisement), the matchmaker anonymizes the location and profile of the user and sends them to the advertisers. Then appropriate advertisers send corresponding advertise- ments to the user via the matchmaker. By the obtained 33 advertisement, the user can receive benefits such as coupons purpose, an important point is whether we can guess each and discounts. In this paper, we focus on the anonymization user attribute with an observation. To represent this idea, part in this scenario. we incorporate a new criterion called observability. In addi- tion, since different users may have different privacy policies, we provide an anonymization method which considers users’ preferences. The preliminary version of the paper was appeared in [7]. In this paper, we revised the problem setting and the method proposed is a totally novel one. 2. RELATED WORK 2.1 Anonymization for location-based services There have been many proposals on privacy preservation in location-based services. A popular approach in this field is spatial cloaking, in which an anonymizer constructs a cloaked region which contains target users. For example, [4] uses the Figure 1: Location-based anonymization framework notion of k-anonimity [9], which is often used in database publishing. The notion of k-anonymity is used in many pro- 1.3 Privacy issues posals and there are variations such as the use of graph structure [3] and cell decompositions [1, 8]. In this paper, In the system architecture, we should note that an ad- we extend the idea for our context. vertiser is not necessarily reliable and it may be an adver- Most of the anonymization methods for location-based sary. If the exact location is notified to an adversary, there services do not consider users’ properties. One exception is a risk that the advertiser identifies the user by watching is [10], in which an attribute vector is constructed for each the location. For this problem, we may be able to apply a user based on her attribute values. In the anonymization conventional location-based anonymization method, but the process, a user group is constructed based on the proximity following problem happens if we consider users’ attributes. of vectors. The problem of the approach is that it does not Assume that users in Fig. 2 issue requests of advertise- consider difference of attributes in terms of observability so ments with the order u1 , u2 , . . . , u5 . Their profile informa- that attribute values tend to be over-generalized and results tion is also shown in the figure. The matchmaker needs in low-quality services. to consider tradeoffs between requirements of users, who want to preserve privacy, and advertisers, who want to know 2.2 Classification of attributes the details of user information to improve the service qual- In traditional privacy-preservation methods based on k- ity. One idea is to apply the k-anonymization technique; anonymity [9], user attributes are classified into the follow- it groups k users based on proximity. For example, given ing three categories: k = 3, we can perform anonymization as {u1 , u2 , u4 } as an example. If the matchmaker provides the users’ profiles, • Sensitive attribute: It represents privacy information the received advertiser would know three persons with ages such as disease names. 23, 26, and 38 are requesting advertisements. The problem • Identifier : It is used for uniquely identifying individu- is that the advertiser easily identifies user with age 38 by als such as names and addresses. watching the target area. • Quasi-identifier : Like age and sex attributes, it does not identify individuals directly, but their combina- tions with other attributes may reveal the identity. In contrast to the assumption of traditional data publishing, an adversary in our context is not easy to identify individ- uals using quasi-identifiers and external information (e.g., telephone directory) because it is difficult to determine the candidate users who appear in the target location for the given time. In contrast, visual observation is more prob- lematic in our context. If an adversary watches the target Figure 2: Users and their profiles area, he may be able to identify the person who requested the service. If the matchmaker considers not only user proximity but For this problem, we need to enhance the traditional treat- also user profiles, we have another option. Consider the ment of attributes. In the context of privacy protection in grouping {u1 , u2 , u5 }. In this case, it is not easy to deter- social networks, [5] considered two properties of attributes: mine who corresponds to each profile entry. Therefore, this • Sensitivity: It describes how the attribute is related anonymization is better than the former one. to privacy violation. For example, “address” is more sensitive than “birthplace” because the latter is not so 1.4 Research objectives useful for identifying people. [5] assumes that sensi- In this paper, we propose a location-based anonymiza- tivity of each attribute does not depend on a specific tion method that also considers users’ attributes. For this user and takes a constant value in the system. 34 • Visibility: It is used as a criterion of how much a user only shows only the descendants of node [20-39] for sim- can disclose a detailed value for the attribute. Visibil- plicity. We assume that taxonomies are available for other ity preference depends on each user and each attribute. domains (e.g., ZIP code). For example, different users may have different disclo- sure policies for “Birthdate”. The notion of visibility cannot be applied to our context. In a location-based service, an adversary can observe some of the user properties even if the user does not want that—it means that visibility is not controllable. In contrast, ob- servability of an attribute, which means how much we can estimate the actual value of the attribute from the observa- tion, is more important. We describe the notion in detail later. 2.3 Personalized anonymization For our context, a personalized privacy-protection mech- Figure 3: Taxonomy for “Age” domain anism is required because the exposure of user profiles de- pends on each user’s preference. However, most of the exist- ing data anonymization techniques do not consider person- We also assume that each user can specify a disclosure alization. [11] proposed a personalized privacy preservation level for each attribute. For example, consider a user with method for a static database. In this method, a hierarchical age 23. The user can specify node [20-29] as her disclosure taxonomy is constructed for each attribute. Every user can level for the age domain. If the selected node is near the leaf specify the level of detail in the hierarchy for each attribute level, the user can receive more personalized advertisements, and then she can represent her preference. In this paper, we but the privacy may not be well protected. extend the idea considering our context. 3.3 Profile 3. OVERVIEW OF THE APPROACH Each mobile user constructs a profile to represent her pref- erences on service quality and privacy levels. The trusted 3.1 Objectives of anonymization matchmaker maintains profiles. An example of user profiles We employ the following policies to take trade-off between is shown in Fig. 4. privacy preservation and service quality. ID Age Sex Threshold Prob. • Identification probability: The probability represents u1 23 [20-29] M [Any] 0.4 how a user is related with a profile. A user prefers a u2 26 [20-39] M [M] 0.5 low identification probability, but an advertiser would u3 22 [20-24] F [F] 0.6 expect to high identification probability for the good u4 38 [30-39] F [Any] 0.5 service. Thus, we assume that each user can specify u5 24 [20-24] M [M] 0.5 the threshold of the identification probability in her profile. In our approach, the identification probability Figure 4: Example of profiles of an anonymization result should be as large as possi- ble with the constraint that the probability should be smaller than the threshold. The contents of profiles are as follows: • Attribute generalization: Attribute generalization is a • Attribute value: It represents the attribute value of the fundamental method for protecting privacy. However, user (e.g., Age = 23 for user u1 ) excessive generalization results in low service quality, and preference on attribute generalization depends on • Attribute disclosure level : The level is given by speci- each user. Therefore, we consider that each user can fying a taxonomy node (e.g., [20-29] for user u1 ’s Age specify a preferred disclosure level for each attribute; attribute) the anonymization algorithm should not violate this restriction and tries to group users with similar at- • Threshold for identification probability: The user re- tribute values. quests that her identification probability should be smaller than this value. • Area size: A cloaked region with a large size results in a poor service quality. We assume that the system 3.4 Attribute observability sets the maximum area size for a cloaked region. Now we introduce a new criterion called observability. 3.2 Taxonomy for attribute domain Definition 1 (Observability). Attribute observabil- The taxonomy for an attribute domain is used in the pro- ity is a measure of how we can guess its actual value by cess of generalization. We assume that there exists a hierar- visually observing the user. chical taxonomy for each attribute domain. Figure 3 shows an example for “age” domain. The root node any at level For example, “Sex” is easy to guess, but “Birthplace” is 0 represents all the domain values and the leaf nodes cor- difficult to estimate by an observation. In this case, the respond to the most detailed information. Note that Fig. 3 observability of “Sex” is higher than “Birthplace”. In this 35 paper, we assume that the observability of an attribute do- 3.6.1 Computing identification probability for two main (e.g., age) is represented by a probability and takes a users system-wide constant value. We first consider a simpler case when there are two users We take the following approach for other two properties (u1 , u2 ) and their anonymized profiles are given as Fig. 6. on attribute privacy. Note that an adversary does not know which user corre- • A user can specify the disclosure level of each attribute sponds to which of the profile entries. Therefore, the ad- to reflect her preference on sensitivity. For example, if versary should consider two cases (u1 : p1 , u2 : p2 ) and a user considers that her age is very sensitive, she can (u1 : p2 , u2 : p1 ). Clearly, the following equation holds: specify “any” node in Fig. 3. Note that a user cannot Pr(u1 : p1 , u2 : p2 ) + Pr(u1 : p2 , u2 : p1 ) = 1. (3) fully control her sensitivity because an adversary may watch the user directly. pid Taxonomy Node • A user can control visibility by specifying the disclo- p1 [20-24] sure level of each attribute. If we select the leaf-level p2 [25-29] node, the visibility is the highest, but it depends on the attribute domain whether the attribute is actually observable. Figure 6: Anonymized profiles 3.5 Matching degree For computing the probability, we consider the following To use the notion of observability in an anonymization al- idea. We play a dice for each user ui . A dice has a face gorithm, we need to introduce a method to measure the corresponding to each taxonomy node and its occurrence observability of an attribute. We take the following ap- probability obeys the matching degree. In this example, we proach: we measure the degree considering taxonomy nodes. play two dices for u1 , u2 at the same time and there are four For example, consider attribute “Age”. The attribute value patterns of the results: (u1 : p1 , u2 : p1 ), (u1 : p1 , u2 : p2 ), age = 21 is highly related with node [20-24], but has little (u1 : p2 , u2 : p1 ), and (u1 : p2 , u2 : p2 ). The occurrence relationship with node [30-34]. We call the degree that user probability of (u1 : p1 , u2 : p2 ) is calculated as ui and taxonomy node nk match their matching degree and Pr(p1 |u1 ) × Pr(p2 |u2 ) = 0.54 × 0.52 = 0.281, (4) define it as follows: and the probability of (u1 : p2 , u2 : p1 ) is given as match(ui → nk ) = Pr(nk |ui ). (1) Pr(p2 |u1 ) × Pr(p1 |u2 ) = 0.34 × 0.38 = 0.129. (5) When there are K nodes in a level of the taxonomy, the aggregated matching degree is defined as follows: Since (u1 : p1 , u2 : p1 ) and (u1 : p2 , u2 : p2 ) are prohibited patterns (one profile entry does not correspond to multi- K ∑ K ∑ ple users), we omit when these patterns occur. Thus, the match(ui → nk ) = Pr(nk |ui ). (2) identification probabilities are given as k=1 k=1 0.281 In this paper, we assume that the matchmaker holds the Pr(u1 : p1 , u2 : p2 ) = = 0.69 (6) 0.281 + 0.129 predefined matching degrees between all the combination of 0.129 attribute values and taxonomy nodes. Figure 5 shows an Pr(u1 : p2 , u2 : p1 ) = = 0.31. (7) example. Due to the limited space, we omit the level 0 node 0.281 + 0.129 [any] and only show some representative nodes. 3.6.2 Computing identification probability for gen- eral case l=1 l=2 l=3 ID [20-39] [20-29] [30-39] [20-24] [25-29] [30-34] [35-39] The basic idea is similar to the former case. For exam- u1 0.88 0.88 0.00 0.54 0.34 0.00 0.00 ple, if the number of users is three, we should consider six u2 1.00 0.90 0.10 0.38 0.52 0.10 0.00 combination patterns. u3 0.79 0.79 0.00 0.56 0.23 0.00 0.00 For the anonymization, we need to consider an identifica- u4 0.64 0.00 0.64 0.00 0.00 0.11 0.53 tion probability of each user. Consider users u1 , u2 , u3 and u5 0.97 0.95 0.02 0.51 0.44 0.02 0.00 profiles p1 , p2 , p3 are given. User u1 is only interested in her identification probability is lower than the specified thresh- Figure 5: Matching degrees old and does not care the identification probabilities of u2 and u3 . As an example, the probability that user u1 and In this paper, we assume that each attribute in a profile profile p1 is related with is calculated as is independent. Therefore, the total matching degree can be calculated by multiplying attribute-wise matching degrees. Pr(u1 : p1 ) = Pr(u1 : p1 , u2 : p2 , u3 : p3 ) + Pr(u1 : p1 , u2 : p3 , u3 : p2 ). (8) 3.6 Identification probability In the following, we use the term identification probability An identification probability is a probability that a user in this sense. is identified by watching the users in the target area with the anonymized profiles. If the identification probability is lower than the threshold probability specified by the user, 4. ANONYMIZATION ALGORITHM we can say that the requirement of the user is satisfied. As Table 1 shows the symbols used for describing the algo- described below, an identification probability is calculated rithm. The algorithm consists of two components: profile using matching degrees. generalization and user group construction. 36 S’s (the sets that contain the finished users) from the can- Table 1: Symbols and their definitions didate set UC . Function checkExpiration from line 17 is Symbol Definition for checking and managing the expireation of user requests. ui Mobile user pj Profile nk Taxonomy node Algorithm 2 Anonymization uq User who requested an advertisement 1: procedure anonymize(uq ) uq .t The time when uq issued a request 2: Add user id into HU uq .et Request duration time for uq 3: . heap entries are ordered by {uq , uq .t + uq .et } uq .th Threshold probability of uq 4: for all UR such that UR ∈ UC do UR Set of users in a cloaked region 5: UR ← UR ∪ uq UC Candidate set for UR 6: if getMBRSize(UR ) ≤ MAX RECT SIZE HU Priority heap of users who requested then advertisements 7: PR ← generalizeProfile(UR ) PR Profiles for users in UR 8: if ∀ui ∈ UR , ∀pj ∈ PR , Pr(ui : pj ) ≤ ui .th then 9: ∀S ∈ UR , remove S from UC 4.1 Generalization of profiles 10: return {UR , PR } For lowering the identification probability for each user, 11: else we perform generalization of user profiles in a target cloaked 12: U C ← UC ∪ U R region. A profile is, as described above, a set of taxonomy 13: end if nodes. Since we assume that attributes are independent, 14: end if the process results in generalization of each attribute in the 15: end for corresponding taxonomy. Note that the minimum identifi- 16: end procedure cation probability obtained by generalization is 1/N when N users are in the candidate cloaked region. 17: procedure checkExpiration Algorithm 1 shows the generalization algorithm when N 18: while true do users exist in the cloaked region. LUB(n1 , n2 , ..., nN ) re- 19: {u, deadline} ← pop(HU ) turns the least upper bound of taxonomy nodes n1 , . . . , nN 20: if deadline > now then for the target attribute. In Fig. 3 for example, we get 21: Remove all the sets that contain u from UC 22: else LUB([20-25], [25-29]) = [20-29] 23: break LUB([20-25], [30-39], [40-]) = [any] 24: end if LUB([20-29], [20-25]) = [20-29]. 25: end while 26: end procedure generalize is a function which generalizes ni to the speci- fied level. Given the least upper bound node and the disclo- sure level specified by the user, it employs the highest one We illustrate how the algorithm works using Fig. 2. As- for the generalization. sume that the requests are issued with the order u1 , u2 , u3 , u4 , u5 . The process of candidate maintenance in the matchmaker is shown in Fig. 7, where “Ev” represents “Event”. We can Algorithm 1 Taxonomy Node Generalization see that the candidates of cloaked regions increase during 1: procedure generalizeNode the process until the output of the user group {u1 , u2 , u5 }, 2: ñ ← LUB(n1 , n2 , ..., nN ) which corresponds to a cloaked region. Note that each can- 3: for all i such that 1 ≤ i ≤ N do didate of cloaked region consists of users, their profiles, and 4: n0i ← generalize(ni , max(ui .discl level, ñ.level)) their identification probabilities. 5: end for 6: return {n01 , n02 , ..., n0N } Ev Candidate Groups 7: end procedure init g0 = ∅ u1 g1 = g0 ∪ {{u1 [20-24] : 1.0}} u2 g2 = g1 ∪ {{u2 [25-29] : 1.0}, {u1 [20-29] : 0.5, u2 [20-29] : 0.5}} 4.2 User group construction u3 g3 = g2 ∪ {{u3 [20-24] : 1.0}} u4 g4 = g3 ∪ {{u4 [30-34] : 1.0}, Algorithm 2 shows the outline of the anonymization pro- {u1 [20-29] : 1.0, u4 [30-39] : 1.0}, cess when a user requests a service. At line 2, we insert the {u2 [20-39] : 0.91, u4 [30-39] : 0.91}, {u1 [20-29] : 0.55, u2 [20-39] : 0.5, u4 [30-39] : 0.95}} user id into priority heap HU . HU is ordered by the expi- u5 g5 = g4 ∪ {{u5 [20-24] : 1.0}, ration time, which is the sum of the service request time {u1 [20-24] : 0.5, u5 [20-24] : 0.5}, and the duration time. At line 5, we check whether the {u2 [20-29] : 0.56, u5 [20-24] : 0.56}, bounding box for the grouped users is larger than the max- {u1 [20-29] : 0.4, u2 [20-29] : 0.37, u5 [20-24] : 0.34}} out {u1 , u2 , u5 } is output. imum limit size. generalizeProfile at line 6 performs After the output, candidate groups are generalization of profiles. It uses the aforementioned gen- g6 = {∅, {u3 [20-24] : 1.0}, {u4 [30-34] : 1.0}}. eralizeNode function for node generalization. From line 7 to 12, we check whether the identification probability is lower than the threshold. If it is successful, we remove all Figure 7: Management of candidates 37 At the initial state, the candidate set is empty: UC = ∅. issues a request, she does not issue another request later. In As requests arrive, the number of candidates increases, and the simulation, we assume that there is only “Age” attribute the algorithm performs profile generalization and identifica- in the profiles. The range of age is from 20 to 39, and the tion probability calculation. For example, since the thresh- matching degrees are set based on Fig. 5 (the lacked entries old probability of u1 is 0.4 in Fig. 4, if the calculated identifi- in the figure are filled). We extend the taxonomy shown in cation probability for u1 is less than 0.4, the anonymization Fig. 3 and selects disclosure levels from 1 (node [20-39]) to is considered successful for u1 . Note that the maximum size 3 (leaf nodes). of MBR is defined by the system parameter. Therefore, user u3 , which is far away from u1 and u2 , is not grouped with Table 2: Basic parameters and their settings them. In the example of Fig. 2, we cannot get a satisfactory Name Value grouping until u4 arrives. When u5 requests a service, we Number of users 1000 can get an anonymization group {u1 , u2 , u5 }, which satisfies Unit time of advertisement request 1/100 s the constraints of identification probabilities. The match- Advertisement request frequencies 10 times/ s maker sends the constructed group to an appropriate adver- Used attribute Age tiser and then removes the candidates which include u1 , u2 , Range of user age [20, 39] and u5 from UC . The remaining users u3 and u4 should wait Disclosure level 1, 2, 3 the forthcoming user requests. Threshold probability 0.3, 0.4, 0.5 Expiration duration 10 s ± 10% 4.3 Processing strategies and evaluation crite- Maximum area of a cloaked region 1000 × 1000 ria The algorithm shown in Subsection 4.2 was the baseline (naive) algorithm. It outputs an anonymized group when 5.2 Strategies for anonymization a group of users that satisfies the constraints can be con- Based on the idea shown in Subsection 4.3, we consider structed. We can consider other option such that we wait the following seven strategies: the decision for a better grouping until the earliest deadline of users is reached. For selecting an appropriate strategy, it • Naive: This is the algorithm in Algorithm 2. We pro- is important how to evaluate an anonymization result. We cess each user based on the arrival order and then out- employ the following evaluation criteria: put a group immediately when we can construct it. • Throughput: It is the ratio how many users can be • The following two strategies share the same idea. We anonymized among all the requested users. A large do not output a constructed group immediately and throughput is preferrable. wait the appearance of a better group. • Quality (Detailedness): From the perspective of an ad- – Deadline-based : This strategy maintains the can- vertiser, detailed information is better. For evaluating didate groups until the earliest deadline of the the detailedness, we use the average level of taxonomy current users approaches. If a new user arrives, nodes after the anonymization process. For example, we try to add this user into the existing candi- assume that we only have “Age” attribute and there date groups. If the existing groups cannot merge are two generaliation results: r1 = {[20-24], [20-24], the user, we try to construct new groups with the [25-29]} and r2 = {[20-24], [20-29], [20-29]}. Since the existing non-grouped users based on Algorithm 2. levels of [20-24] and [25-29] are three and the level of – Lazy: This is similar to deadline-based. When we [20-29] is two, the average levels of r1 and r2 are 3 and add a new user, deadline-based checks the existing 2.33, respectively. We can deduce that r1 is better groups which satisfy the threshold probabilities than r2 in quality. first. In contrast, this strategy checks the groups which do not satisfy the threshold probabilities 5. EXPERIMENTAL EVALUATION first. The lazy strategy can be said as a variation of naive which waits the deadline and cares users 5.1 Setting of experiments who are not in the current candidate groups. We evaluate the performance of different strategies using synthetic data and simulation-based data. The synthetic • The following two strategies are also based on the same data is generated by multiple two-dimensional Gaussians idea. They maintain all the candidate groups that sat- with different centers and variances. The simulation-based isfy threshold probabilities. When the earliest deadline data is obtained from the road network of Oldenburg city of users approaches, they select one group from the ex- used in Brinkhoff’s moving objects generator [2]. Although isting candidates. The groups selected and output are the generator generates moving histories of moving objects, different as follows: we only use their first appearance places since we do not consider movement of users. – Many-first: The group which has the largest num- The basic settings of simulation parameters are shown in ber of users among the groups that contain the Table 2. In the default setting, we assume that requests are user. issued based on a Poisson arrival and a new user requests a – Next-deadline-based : The group which contains service in every 1/100 second with the probability parameter the user with the next-earlier deadline. The in- λ = 0.1 (if two users issue requests at the same time, one tuition is that we care the user whose deadline of the users should wait other one’s process). Once a user approaches near future. 38 – Avg-deadline-based : The group with the earliest average deadline. )/ 0(" , %1#% * ' +( , - . – Threshold-based : The group which contains the lowest threshold probability. 5.3 Experiment 1: Users’ request frequencies In this experiment, we change the frequencies of user re- ! "$ #% " & ' ( quests and we check the number of users whose anonymiza- tion processes are successful. Increase of request frequency results in a large number of users in the target area, and we can estimate that many groups will be generated. We consider four cases of request frequencies: 5, 10, 50, and 100 times per second. This experiment is done using the Figure 9: Request frequencies and delays synthetic data and we use the parameter settings shown in Table 2. The experimental result is shown in Fig. 8. Three naive and lazy provide reults with good quality in which methods naive, deadline-based, and lazy have good through- moderate generalization is performed. puts as the increase of request frequency. In contrast, many- first, next-deadline-based, avg-deadline-based, and threshold- based have bad performance especially for 50 / 100 times per second. The reason is that the four methods maintain all the candidate groups so that their number rapidly increases as the increase of users. Figure 10: Maximum area sizes and number of qual- ified users Figure 8: Request frequencies and number of quali- fied users Figure 9 shows the number of users of two types: 1) whose process is delayed more than 0.1 seconds due to the foregoing users’ processes do not finish, and 2) whose pro- cess is expired since the wait time reaches the deadline. We consider four strategies naive, deadline-based, lazy, and many-first. We can see that delays happen in deadline- based and especially in many-first. Note that next-deadline- based, avg-deadline-based, and threshold-based have almost the same result with many-first. Since many-first, next- deadline-based, avg-deadline-based, and threshold-based con- Figure 11: Averaged attribute generalization levels tain all the groups which satisfy the threshold probabilities, the increase of the number of candidates results in delays Additionally, we performed similar experiments using the for the requests. simulation-based dataset and the correlated distributions, but the trends were similar. 5.4 Experiment 2: Changing maximum area size 5.5 Experiment 3: Changing user conditions We perform experiments by changing the maximum area In this experiment, we observe the behaviors when we size of a cloaked region (MAX RECT SIZE in Algorithm 2) change deadline and identification parameters in Table 2 from 500 × 500 to 2000 × 2000. using the synthetic data. First, we change the deadline to Figure 10 shows the number of qualified users for the syn- 10 ± 50%. Figure 12 shows the qualified users for each thetic data and the uniform attribute distribution. When deadline setting. We anticipate that next-deadline-based the maximum size is 2000 × 2000, delays happen only for and avg-deadline-based have good results, but the results avg-deadline-based and results in the low the number of are different—deadline-based and many-first, which do not qualified users. The number of qualified users are large for care deadlines, perform well. Detailed analysis reveals that many-first, deadline-based, and threshold-based. Figure 11 deadline-based strategies could output users with nearly ex- shows how user attributes are generalized. In this figure, piring, but failed to output groups which contain many users. 39 we consider not only location information but also user at- tributes. For that purpose, we defined a new criteria called observability and introduced the notion of a matching de- gree. We proposed several variations of strategies and eval- uated their performance based on the experiments. Future work includes the development of robust and high- throuput method and a new algorithm which can anonymize users with low threshold settings. ! " # #$ 7. ACKNOWLEDGMENTS Figure 12: Number of qualified users for each dead- This research was partly supported by the Funding Pro- line gram for World-Leading Innovative R&D on Science and Technology (First Program). Next, we change the deadline setting to the original one (10 ± 10%), but add 0.2 to threshold probabilities. Fig- 8. REFERENCES ure 13 shows the number of suceeded users for each threshold [1] B. Bamba, L. Liu, P. Pesti, and T. Wang. Supporting probability setting. In contrast to the case above, threshold- anonymous location queries in mobile environments based, which tries to output low threshold ones, shows a with PrivacyGrid. In Proc. of WWW, pages 237–246, good result for the threshold setting of 0.3. However, it is 2008. worse than deadline-based and many-first, which do not care [2] T. Brinkhoff. A framework for generating thresholds and try to output groups with many users. All network-based moving objects. GeoInformatica, the strategies could not make a group for users with thresh- 6:153–180, 2002. old settings lower than 0.2. [3] B. Gedik and L. Liu. Protecting location privacy with personalized k-anonymity: Architecture and algorithms. IEEE Transactions on Mobile Computing, 7(1):1–18, 2008. [4] M. Gruteser and D. Grunwald. Anonymous usage of location-based services through spatial and temporal cloaking. In Proc. MobiSys, pages 31–42, 2003. [5] K. Liu and E. Terzi. A framework for computing the privacy scores of users in online social networks. In Proc. ICDM, pages 288–297, 2009. [6] L. Liu. Privacy and location anonymization in location-based services. SIGSPATIAL Special, ! 1(2):15–22, 2009. [7] M. Mano and Y. Ishikawa. Anonymizing user location and profile information for privacy-aware mobile Figure 13: Number of qualified users for each services. In Proc. the 2nd ACM SIGSPATIAL threshold probability setting International Workshop on Location Based Social Networks (LBSN ’10). pages 68–75, 2010. [8] M. F. Mokbel, C.-Y. Chow, and W. G. Aref. The New 5.6 Discussion Casper: Query processing for location services without In terms of throughputs, many-first showed good perfor- compromising privacy. In Proc. VLDB, pages 763–774, mance. Compared to the strategies that considers deadline 2006. and threshold (avg-deadline-based, next-deadline-based, and [9] P. Samarati. Protecting respondents’ identities in threshold-based ), the quality of the generated groups were microdata release. IEEE TKDE, 13(6):1010–1027, better. However, these four strategies have a common prob- 2001. lem when request frequency is high due to the increase of [10] H. Shin, V. Atluri, and J. Vaidya. A profile the number of candidate groups. For such a heavy-traffic anonymization model for privacy in a personalized case, the naive strategy might be a better choice since it location based service environment. In Proc. MDM, can achieve high successful rate with low cost. It may be pages 73–80, 2008. possible to change strategies considering the traffic. [11] X. Xiao and Y. Tao. Personalized privacy preservation. In terms of the availability of cloaked regions, lazy was In Proc. ACM SIGMOD, pages 229–240, 2006. good. In this strategy, since generalization is not performed agressively, the quality of the results was generally good. This is a good property for advertisers. In addition, the strategy can support many users without serious delays. 6. CONCLUSIONS In this paper, we have proposed a new anonymization method for location-based services. The feature is that 40