=Paper= {{Paper |id=None |storemode=property |title=Privacy Preservation for Location-Based Services Based on Attribute Visibility |pdfUrl=https://ceur-ws.org/Vol-908/paper6.pdf |volume=Vol-908 |dblpUrl=https://dblp.org/rec/conf/immoa/Mano0DI12 }} ==Privacy Preservation for Location-Based Services Based on Attribute Visibility== https://ceur-ws.org/Vol-908/paper6.pdf
          Privacy Preservation for Location-Based Services
                     Based on Attribute Visibility

                                    ∗
               Masanori Mano                                Xi Guo                             Tingting Dong
               Graduate School of                    Graduate School of                      Graduate School of
               Information Science                   Information Science                     Information Science
                Nagoya University                     Nagoya University                       Nagoya University
            mano@db.itc.nagoya-                   guoxi@db.itc.nagoya-                   dongtt@db.itc.nagoya-
                 u.ac.jp                                 u.ac.jp                                u.ac.jp
                                                   Yoshiharu Ishikawa
                                               Information Technology Center
                                                    / Graduate School of
                                                     Information Science
                                                      Nagoya University
                                                y-ishikawa@nagoya-u.jp

ABSTRACT                                                              non-intended purposes. In an extreme case, the user’s iden-
To provide a high-quality mobile service in a safe way, many          tity may be estimated by combining the location informa-
techniques for location anonymity have been proposed in               tion with additional information sources. The use of loca-
recent years. Advanced location-based services such as mo-            tion anonymization would solve the problem in some sense,
bile advertisement services may use not only users’ locations         but it may result in the degradation of service quality; an
but also users’ attributes. However, the existing location            appropriate anonymization method is required.
anonymization methods do not consider attribute informa-
tion and may result in low-quality privacy protection. In this        1.2   Location-based services that use attribute
paper, we propose the notion of visibility, which describes                 information
the degree that an adversary can infer the identity of the               For a typical location-based service which only utilizes
user by an observation. Then we present an anonymiza-                 location information, the conventional notion of location
tion method which considers not only location information             anonymity is effective for privacy protection. However, ad-
but also users’ attributes. We show several strategies for            vanced location-based services may use additional attribute
the anonymization process and evaluate them based on the              information such as user’s age, sex, and occupation. For
experiments.                                                          illustrating our motivation, let us consider an example of a
                                                                      mobile advertisement service.
                                                                         In this service, we assume that a mobile user issues a
1. INTRODUCTION                                                       request for an advertisement and it is delivered to an appro-
                                                                      priate advertiser. Then the advertiser sends corresponding
1.1 Background                                                        advertisements to the user. In this sense, the advertisement
                                                                      service is a pull-based service. The matching service (called
  In recent years, location anonymization has become one of
                                                                      the matchmaker ) plays the role of a mediator between users
the important topics in location-based services and mobile
                                                                      and advertisers, and uses users’ attribute information for
computing [6]. The issue concerned is that a user should
                                                                      selecting appropriate advertisers. Since the success of an
send her location information to receive a high-quality ser-
                                                                      advertisement is charged by the matchmaker, advertisers
vice in general. However, if the service provider is an ad-
                                                                      would like to perform effective advertisements with low in-
versary, the detailed location information may be used for
                                                                      vestments. If an advertiser can specify the type of the target
                                                                      users (e.g., young women), then the effectiveness of the ad-
∗Current Affiliation: NTT DOCOMO Inc.
                                                                      visement would increase.
                                                                         Figure 1 illustrates the overview of a mobile advertise-
                                                                      ment service assumed in this paper. The matchmaker be-
                                                                      tween mobile users and advertisers is a trusted third party
                                                                      and manages each user’s information as her profile. As de-
                                                                      scribed later, the matchmaker is responsible for anonymiza-
                                                                      tion. When a mobile user issues a request for a service (i.e.,
                                                                      an advertisement), the matchmaker anonymizes the location
                                                                      and profile of the user and sends them to the advertisers.
                                                                      Then appropriate advertisers send corresponding advertise-
                                                                      ments to the user via the matchmaker. By the obtained


                                                                 33
advertisement, the user can receive benefits such as coupons              purpose, an important point is whether we can guess each
and discounts. In this paper, we focus on the anonymization               user attribute with an observation. To represent this idea,
part in this scenario.                                                    we incorporate a new criterion called observability. In addi-
                                                                          tion, since different users may have different privacy policies,
                                                                          we provide an anonymization method which considers users’
                                                                          preferences.
                                                                             The preliminary version of the paper was appeared in [7].
                                                                          In this paper, we revised the problem setting and the method
                                                                          proposed is a totally novel one.

                                                                          2.     RELATED WORK
                                                                          2.1     Anonymization for location-based services
                                                                             There have been many proposals on privacy preservation
                                                                          in location-based services. A popular approach in this field is
                                                                          spatial cloaking, in which an anonymizer constructs a cloaked
                                                                          region which contains target users. For example, [4] uses the
Figure 1: Location-based anonymization framework
                                                                          notion of k-anonimity [9], which is often used in database
                                                                          publishing. The notion of k-anonymity is used in many pro-
1.3 Privacy issues                                                        posals and there are variations such as the use of graph
                                                                          structure [3] and cell decompositions [1, 8]. In this paper,
   In the system architecture, we should note that an ad-
                                                                          we extend the idea for our context.
vertiser is not necessarily reliable and it may be an adver-
                                                                             Most of the anonymization methods for location-based
sary. If the exact location is notified to an adversary, there
                                                                          services do not consider users’ properties. One exception
is a risk that the advertiser identifies the user by watching
                                                                          is [10], in which an attribute vector is constructed for each
the location. For this problem, we may be able to apply a
                                                                          user based on her attribute values. In the anonymization
conventional location-based anonymization method, but the
                                                                          process, a user group is constructed based on the proximity
following problem happens if we consider users’ attributes.
                                                                          of vectors. The problem of the approach is that it does not
   Assume that users in Fig. 2 issue requests of advertise-
                                                                          consider difference of attributes in terms of observability so
ments with the order u1 , u2 , . . . , u5 . Their profile informa-
                                                                          that attribute values tend to be over-generalized and results
tion is also shown in the figure. The matchmaker needs
                                                                          in low-quality services.
to consider tradeoffs between requirements of users, who
want to preserve privacy, and advertisers, who want to know               2.2     Classification of attributes
the details of user information to improve the service qual-                In traditional privacy-preservation methods based on k-
ity. One idea is to apply the k-anonymization technique;                  anonymity [9], user attributes are classified into the follow-
it groups k users based on proximity. For example, given                  ing three categories:
k = 3, we can perform anonymization as {u1 , u2 , u4 } as an
example. If the matchmaker provides the users’ profiles,                       • Sensitive attribute: It represents privacy information
the received advertiser would know three persons with ages                       such as disease names.
23, 26, and 38 are requesting advertisements. The problem                      • Identifier : It is used for uniquely identifying individu-
is that the advertiser easily identifies user with age 38 by                     als such as names and addresses.
watching the target area.
                                                                               • Quasi-identifier : Like age and sex attributes, it does
                                                                                 not identify individuals directly, but their combina-
                                                                        tions with other attributes may reveal the identity.
                                                       
                                                                          In contrast to the assumption of traditional data publishing,
                                                     
                                                                        an adversary in our context is not easy to identify individ-
                                                       
                                                                          uals using quasi-identifiers and external information (e.g.,
                                                                    telephone directory) because it is difficult to determine the
                                                                      candidate users who appear in the target location for the
                                                                          given time. In contrast, visual observation is more prob-
                                                                          lematic in our context. If an adversary watches the target
           Figure 2: Users and their profiles
                                                                          area, he may be able to identify the person who requested
                                                                          the service.
  If the matchmaker considers not only user proximity but                    For this problem, we need to enhance the traditional treat-
also user profiles, we have another option. Consider the                  ment of attributes. In the context of privacy protection in
grouping {u1 , u2 , u5 }. In this case, it is not easy to deter-          social networks, [5] considered two properties of attributes:
mine who corresponds to each profile entry. Therefore, this                    • Sensitivity: It describes how the attribute is related
anonymization is better than the former one.                                     to privacy violation. For example, “address” is more
                                                                                 sensitive than “birthplace” because the latter is not so
1.4 Research objectives                                                          useful for identifying people. [5] assumes that sensi-
   In this paper, we propose a location-based anonymiza-                         tivity of each attribute does not depend on a specific
tion method that also considers users’ attributes. For this                      user and takes a constant value in the system.


                                                                     34
   • Visibility: It is used as a criterion of how much a user          only shows only the descendants of node [20-39] for sim-
     can disclose a detailed value for the attribute. Visibil-         plicity. We assume that taxonomies are available for other
     ity preference depends on each user and each attribute.           domains (e.g., ZIP code).
     For example, different users may have different disclo-
     sure policies for “Birthdate”.
The notion of visibility cannot be applied to our context. In
a location-based service, an adversary can observe some of
the user properties even if the user does not want that—it
means that visibility is not controllable. In contrast, ob-
servability of an attribute, which means how much we can
estimate the actual value of the attribute from the observa-
tion, is more important. We describe the notion in detail
later.

2.3 Personalized anonymization
   For our context, a personalized privacy-protection mech-                   Figure 3: Taxonomy for “Age” domain
anism is required because the exposure of user profiles de-
pends on each user’s preference. However, most of the exist-
ing data anonymization techniques do not consider person-                 We also assume that each user can specify a disclosure
alization. [11] proposed a personalized privacy preservation           level for each attribute. For example, consider a user with
method for a static database. In this method, a hierarchical           age 23. The user can specify node [20-29] as her disclosure
taxonomy is constructed for each attribute. Every user can             level for the age domain. If the selected node is near the leaf
specify the level of detail in the hierarchy for each attribute        level, the user can receive more personalized advertisements,
and then she can represent her preference. In this paper, we           but the privacy may not be well protected.
extend the idea considering our context.
                                                                       3.3    Profile
3. OVERVIEW OF THE APPROACH                                               Each mobile user constructs a profile to represent her pref-
                                                                       erences on service quality and privacy levels. The trusted
3.1 Objectives of anonymization                                        matchmaker maintains profiles. An example of user profiles
  We employ the following policies to take trade-off between            is shown in Fig. 4.
privacy preservation and service quality.
                                                                             ID        Age            Sex      Threshold Prob.
   • Identification probability: The probability represents                  u1   23    [20-29]   M    [Any]         0.4
     how a user is related with a profile. A user prefers a                  u2   26    [20-39]   M     [M]          0.5
     low identification probability, but an advertiser would                 u3   22    [20-24]   F     [F]          0.6
     expect to high identification probability for the good                  u4   38    [30-39]   F    [Any]         0.5
     service. Thus, we assume that each user can specify                     u5   24    [20-24]   M     [M]          0.5
     the threshold of the identification probability in her
     profile. In our approach, the identification probability                       Figure 4: Example of profiles
     of an anonymization result should be as large as possi-
     ble with the constraint that the probability should be
     smaller than the threshold.                                       The contents of profiles are as follows:
   • Attribute generalization: Attribute generalization is a              • Attribute value: It represents the attribute value of the
     fundamental method for protecting privacy. However,                    user (e.g., Age = 23 for user u1 )
     excessive generalization results in low service quality,
     and preference on attribute generalization depends on                • Attribute disclosure level : The level is given by speci-
     each user. Therefore, we consider that each user can                   fying a taxonomy node (e.g., [20-29] for user u1 ’s Age
     specify a preferred disclosure level for each attribute;               attribute)
     the anonymization algorithm should not violate this
     restriction and tries to group users with similar at-                • Threshold for identification probability: The user re-
     tribute values.                                                        quests that her identification probability should be
                                                                            smaller than this value.
   • Area size: A cloaked region with a large size results
     in a poor service quality. We assume that the system              3.4    Attribute observability
     sets the maximum area size for a cloaked region.
                                                                         Now we introduce a new criterion called observability.
3.2 Taxonomy for attribute domain                                         Definition 1 (Observability). Attribute observabil-
  The taxonomy for an attribute domain is used in the pro-             ity is a measure of how we can guess its actual value by
cess of generalization. We assume that there exists a hierar-          visually observing the user.
chical taxonomy for each attribute domain. Figure 3 shows
an example for “age” domain. The root node any at level                For example, “Sex” is easy to guess, but “Birthplace” is
0 represents all the domain values and the leaf nodes cor-             difficult to estimate by an observation. In this case, the
respond to the most detailed information. Note that Fig. 3             observability of “Sex” is higher than “Birthplace”. In this


                                                                  35
paper, we assume that the observability of an attribute do-                      3.6.1 Computing identification probability for two
main (e.g., age) is represented by a probability and takes a                           users
system-wide constant value.                                                        We first consider a simpler case when there are two users
  We take the following approach for other two properties                        (u1 , u2 ) and their anonymized profiles are given as Fig. 6.
on attribute privacy.                                                            Note that an adversary does not know which user corre-
   • A user can specify the disclosure level of each attribute                   sponds to which of the profile entries. Therefore, the ad-
     to reflect her preference on sensitivity. For example, if                   versary should consider two cases (u1 : p1 , u2 : p2 ) and
     a user considers that her age is very sensitive, she can                    (u1 : p2 , u2 : p1 ). Clearly, the following equation holds:
     specify “any” node in Fig. 3. Note that a user cannot                             Pr(u1 : p1 , u2 : p2 ) + Pr(u1 : p2 , u2 : p1 ) = 1.       (3)
     fully control her sensitivity because an adversary may
     watch the user directly.
                                                                                                      pid    Taxonomy Node
   • A user can control visibility by specifying the disclo-                                          p1         [20-24]
     sure level of each attribute. If we select the leaf-level                                        p2         [25-29]
     node, the visibility is the highest, but it depends on
     the attribute domain whether the attribute is actually
     observable.                                                                              Figure 6: Anonymized profiles

3.5 Matching degree                                                                For computing the probability, we consider the following
  To use the notion of observability in an anonymization al-                     idea. We play a dice for each user ui . A dice has a face
gorithm, we need to introduce a method to measure the                            corresponding to each taxonomy node and its occurrence
observability of an attribute. We take the following ap-                         probability obeys the matching degree. In this example, we
proach: we measure the degree considering taxonomy nodes.                        play two dices for u1 , u2 at the same time and there are four
For example, consider attribute “Age”. The attribute value                       patterns of the results: (u1 : p1 , u2 : p1 ), (u1 : p1 , u2 : p2 ),
age = 21 is highly related with node [20-24], but has little                     (u1 : p2 , u2 : p1 ), and (u1 : p2 , u2 : p2 ). The occurrence
relationship with node [30-34]. We call the degree that user                     probability of (u1 : p1 , u2 : p2 ) is calculated as
ui and taxonomy node nk match their matching degree and                                  Pr(p1 |u1 ) × Pr(p2 |u2 ) = 0.54 × 0.52 = 0.281,         (4)
define it as follows:
                                                                                 and the probability of (u1 : p2 , u2 : p1 ) is given as
                 match(ui → nk ) = Pr(nk |ui ).                       (1)
                                                                                         Pr(p2 |u1 ) × Pr(p1 |u2 ) = 0.34 × 0.38 = 0.129.         (5)
When there are K nodes in a level of the taxonomy, the
aggregated matching degree is defined as follows:                                Since (u1 : p1 , u2 : p1 ) and (u1 : p2 , u2 : p2 ) are prohibited
                                                                                 patterns (one profile entry does not correspond to multi-
           K
           ∑                             K
                                         ∑                                       ple users), we omit when these patterns occur. Thus, the
                 match(ui → nk ) =             Pr(nk |ui ).           (2)        identification probabilities are given as
           k=1                           k=1
                                                                                                                         0.281
In this paper, we assume that the matchmaker holds the                                  Pr(u1 : p1 , u2 : p2 ) =                   = 0.69         (6)
                                                                                                                     0.281 + 0.129
predefined matching degrees between all the combination of
                                                                                                                         0.129
attribute values and taxonomy nodes. Figure 5 shows an                                  Pr(u1 : p2 , u2 : p1 ) =                   = 0.31.        (7)
example. Due to the limited space, we omit the level 0 node                                                          0.281 + 0.129
[any] and only show some representative nodes.                                   3.6.2 Computing identification probability for gen-
                                                                                       eral case
       l=1           l=2                           l=3
 ID   [20-39]   [20-29]   [30-39]   [20-24]   [25-29]   [30-34]   [35-39]           The basic idea is similar to the former case. For exam-
 u1    0.88      0.88      0.00      0.54      0.34      0.00      0.00          ple, if the number of users is three, we should consider six
 u2    1.00      0.90      0.10      0.38      0.52      0.10      0.00          combination patterns.
 u3    0.79      0.79      0.00      0.56      0.23      0.00      0.00             For the anonymization, we need to consider an identifica-
 u4    0.64      0.00      0.64      0.00      0.00      0.11      0.53          tion probability of each user. Consider users u1 , u2 , u3 and
 u5    0.97      0.95      0.02      0.51      0.44      0.02      0.00
                                                                                 profiles p1 , p2 , p3 are given. User u1 is only interested in her
                                                                                 identification probability is lower than the specified thresh-
                Figure 5: Matching degrees                                       old and does not care the identification probabilities of u2
                                                                                 and u3 . As an example, the probability that user u1 and
   In this paper, we assume that each attribute in a profile                     profile p1 is related with is calculated as
is independent. Therefore, the total matching degree can be
calculated by multiplying attribute-wise matching degrees.                               Pr(u1 : p1 ) = Pr(u1 : p1 , u2 : p2 , u3 : p3 )
                                                                                                            + Pr(u1 : p1 , u2 : p3 , u3 : p2 ).   (8)
3.6 Identification probability
                                                                                 In the following, we use the term identification probability
   An identification probability is a probability that a user
                                                                                 in this sense.
is identified by watching the users in the target area with
the anonymized profiles. If the identification probability is
lower than the threshold probability specified by the user,                      4.   ANONYMIZATION ALGORITHM
we can say that the requirement of the user is satisfied. As                        Table 1 shows the symbols used for describing the algo-
described below, an identification probability is calculated                     rithm. The algorithm consists of two components: profile
using matching degrees.                                                          generalization and user group construction.


                                                                            36
                                                                      S’s (the sets that contain the finished users) from the can-
       Table 1: Symbols and their definitions                         didate set UC . Function checkExpiration from line 17 is
      Symbol                   Definition                             for checking and managing the expireation of user requests.
         ui     Mobile user
         pj     Profile
        nk      Taxonomy node                                         Algorithm 2 Anonymization
         uq     User who requested an advertisement                    1: procedure anonymize(uq )
        uq .t   The time when uq issued a request                      2:    Add user id into HU
       uq .et   Request duration time for uq                           3:         . heap entries are ordered by {uq , uq .t + uq .et }
       uq .th   Threshold probability of uq                            4:    for all UR such that UR ∈ UC do
        UR      Set of users in a cloaked region                       5:       UR ← UR ∪ uq
        UC      Candidate set for UR                                   6:       if getMBRSize(UR ) ≤ MAX RECT SIZE
        HU      Priority heap of users who requested                      then
                advertisements                                         7:           PR ← generalizeProfile(UR )
        PR      Profiles for users in UR                               8:           if ∀ui ∈ UR , ∀pj ∈ PR , Pr(ui : pj ) ≤ ui .th
                                                                          then
                                                                       9:              ∀S ∈ UR , remove S from UC
4.1 Generalization of profiles                                        10:               return {UR , PR }
  For lowering the identification probability for each user,          11:           else
we perform generalization of user profiles in a target cloaked        12:               U C ← UC ∪ U R
region. A profile is, as described above, a set of taxonomy           13:           end if
nodes. Since we assume that attributes are independent,               14:       end if
the process results in generalization of each attribute in the        15:    end for
corresponding taxonomy. Note that the minimum identifi-               16: end procedure
cation probability obtained by generalization is 1/N when
N users are in the candidate cloaked region.                          17: procedure checkExpiration
  Algorithm 1 shows the generalization algorithm when N               18:    while true do
users exist in the cloaked region. LUB(n1 , n2 , ..., nN ) re-        19:       {u, deadline} ← pop(HU )
turns the least upper bound of taxonomy nodes n1 , . . . , nN         20:       if deadline > now then
for the target attribute. In Fig. 3 for example, we get               21:           Remove all the sets that contain u from UC
                                                                      22:       else
                LUB([20-25], [25-29]) =     [20-29]                   23:           break
          LUB([20-25], [30-39], [40-]) =    [any]                     24:       end if
                LUB([20-29], [20-25]) =     [20-29].                  25:    end while
                                                                      26: end procedure
generalize is a function which generalizes ni to the speci-
fied level. Given the least upper bound node and the disclo-
sure level specified by the user, it employs the highest one            We illustrate how the algorithm works using Fig. 2. As-
for the generalization.                                               sume that the requests are issued with the order u1 , u2 , u3 , u4 , u5 .
                                                                      The process of candidate maintenance in the matchmaker is
                                                                      shown in Fig. 7, where “Ev” represents “Event”. We can
Algorithm 1 Taxonomy Node Generalization
                                                                      see that the candidates of cloaked regions increase during
1: procedure generalizeNode                                           the process until the output of the user group {u1 , u2 , u5 },
2:    ñ ← LUB(n1 , n2 , ..., nN )                                    which corresponds to a cloaked region. Note that each can-
3:    for all i such that 1 ≤ i ≤ N do                                didate of cloaked region consists of users, their profiles, and
4:        n0i ← generalize(ni , max(ui .discl level, ñ.level))       their identification probabilities.
5:    end for
6:    return {n01 , n02 , ..., n0N }
                                                                            Ev                         Candidate Groups
7: end procedure                                                           init    g0 = ∅
                                                                            u1     g1 = g0 ∪ {{u1 [20-24] : 1.0}}
                                                                            u2     g2 = g1 ∪ {{u2 [25-29] : 1.0},
                                                                                     {u1 [20-29] : 0.5, u2 [20-29] : 0.5}}
4.2 User group construction                                                u3      g3 = g2 ∪ {{u3 [20-24] : 1.0}}
                                                                           u4      g4 = g3 ∪ {{u4 [30-34] : 1.0},
  Algorithm 2 shows the outline of the anonymization pro-                            {u1 [20-29] : 1.0, u4 [30-39] : 1.0},
cess when a user requests a service. At line 2, we insert the                        {u2 [20-39] : 0.91, u4 [30-39] : 0.91},
                                                                                     {u1 [20-29] : 0.55, u2 [20-39] : 0.5, u4 [30-39] : 0.95}}
user id into priority heap HU . HU is ordered by the expi-
                                                                           u5      g5 = g4 ∪ {{u5 [20-24] : 1.0},
ration time, which is the sum of the service request time                            {u1 [20-24] : 0.5, u5 [20-24] : 0.5},
and the duration time. At line 5, we check whether the                               {u2 [20-29] : 0.56, u5 [20-24] : 0.56},
bounding box for the grouped users is larger than the max-                           {u1 [20-29] : 0.4, u2 [20-29] : 0.37, u5 [20-24] : 0.34}}
                                                                           out     {u1 , u2 , u5 } is output.
imum limit size. generalizeProfile at line 6 performs                              After the output, candidate groups are
generalization of profiles. It uses the aforementioned gen-                        g6 = {∅, {u3 [20-24] : 1.0}, {u4 [30-34] : 1.0}}.
eralizeNode function for node generalization. From line
7 to 12, we check whether the identification probability is
lower than the threshold. If it is successful, we remove all                      Figure 7: Management of candidates


                                                                 37
   At the initial state, the candidate set is empty: UC = ∅.               issues a request, she does not issue another request later. In
As requests arrive, the number of candidates increases, and                the simulation, we assume that there is only “Age” attribute
the algorithm performs profile generalization and identifica-              in the profiles. The range of age is from 20 to 39, and the
tion probability calculation. For example, since the thresh-               matching degrees are set based on Fig. 5 (the lacked entries
old probability of u1 is 0.4 in Fig. 4, if the calculated identifi-        in the figure are filled). We extend the taxonomy shown in
cation probability for u1 is less than 0.4, the anonymization              Fig. 3 and selects disclosure levels from 1 (node [20-39]) to
is considered successful for u1 . Note that the maximum size               3 (leaf nodes).
of MBR is defined by the system parameter. Therefore, user
u3 , which is far away from u1 and u2 , is not grouped with                      Table 2: Basic parameters and their settings
them.
   In the example of Fig. 2, we cannot get a satisfactory                                       Name                       Value
grouping until u4 arrives. When u5 requests a service, we                                 Number of users                  1000
can get an anonymization group {u1 , u2 , u5 }, which satisfies                  Unit time of advertisement request       1/100 s
the constraints of identification probabilities. The match-                      Advertisement request frequencies      10 times/ s
maker sends the constructed group to an appropriate adver-                                 Used attribute                   Age
tiser and then removes the candidates which include u1 , u2 ,                             Range of user age               [20, 39]
and u5 from UC . The remaining users u3 and u4 should wait                                 Disclosure level                1, 2, 3
the forthcoming user requests.                                                          Threshold probability          0.3, 0.4, 0.5
                                                                                         Expiration duration            10 s ± 10%
4.3 Processing strategies and evaluation crite-                                  Maximum area of a cloaked region      1000 × 1000
    ria
   The algorithm shown in Subsection 4.2 was the baseline
(naive) algorithm. It outputs an anonymized group when                     5.2     Strategies for anonymization
a group of users that satisfies the constraints can be con-                  Based on the idea shown in Subsection 4.3, we consider
structed. We can consider other option such that we wait                   the following seven strategies:
the decision for a better grouping until the earliest deadline
of users is reached. For selecting an appropriate strategy, it                • Naive: This is the algorithm in Algorithm 2. We pro-
is important how to evaluate an anonymization result. We                        cess each user based on the arrival order and then out-
employ the following evaluation criteria:                                       put a group immediately when we can construct it.

   • Throughput: It is the ratio how many users can be                        • The following two strategies share the same idea. We
     anonymized among all the requested users. A large                          do not output a constructed group immediately and
     throughput is preferrable.                                                 wait the appearance of a better group.
   • Quality (Detailedness): From the perspective of an ad-                         – Deadline-based : This strategy maintains the can-
     vertiser, detailed information is better. For evaluating                         didate groups until the earliest deadline of the
     the detailedness, we use the average level of taxonomy                           current users approaches. If a new user arrives,
     nodes after the anonymization process. For example,                              we try to add this user into the existing candi-
     assume that we only have “Age” attribute and there                               date groups. If the existing groups cannot merge
     are two generaliation results: r1 = {[20-24], [20-24],                           the user, we try to construct new groups with the
     [25-29]} and r2 = {[20-24], [20-29], [20-29]}. Since the                         existing non-grouped users based on Algorithm 2.
     levels of [20-24] and [25-29] are three and the level of
                                                                                    – Lazy: This is similar to deadline-based. When we
     [20-29] is two, the average levels of r1 and r2 are 3 and
                                                                                      add a new user, deadline-based checks the existing
     2.33, respectively. We can deduce that r1 is better
                                                                                      groups which satisfy the threshold probabilities
     than r2 in quality.
                                                                                      first. In contrast, this strategy checks the groups
                                                                                      which do not satisfy the threshold probabilities
5. EXPERIMENTAL EVALUATION                                                            first. The lazy strategy can be said as a variation
                                                                                      of naive which waits the deadline and cares users
5.1 Setting of experiments                                                            who are not in the current candidate groups.
   We evaluate the performance of different strategies using
synthetic data and simulation-based data. The synthetic                       • The following two strategies are also based on the same
data is generated by multiple two-dimensional Gaussians                         idea. They maintain all the candidate groups that sat-
with different centers and variances. The simulation-based                       isfy threshold probabilities. When the earliest deadline
data is obtained from the road network of Oldenburg city                        of users approaches, they select one group from the ex-
used in Brinkhoff’s moving objects generator [2]. Although                       isting candidates. The groups selected and output are
the generator generates moving histories of moving objects,                     different as follows:
we only use their first appearance places since we do not
consider movement of users.                                                         – Many-first: The group which has the largest num-
   The basic settings of simulation parameters are shown in                           ber of users among the groups that contain the
Table 2. In the default setting, we assume that requests are                          user.
issued based on a Poisson arrival and a new user requests a                         – Next-deadline-based : The group which contains
service in every 1/100 second with the probability parameter                          the user with the next-earlier deadline. The in-
λ = 0.1 (if two users issue requests at the same time, one                            tuition is that we care the user whose deadline
of the users should wait other one’s process). Once a user                            approaches near future.


                                                                      38
                                                                                                                                               
                                                                                                                                               
                                – Avg-deadline-based : The group with the earliest




                                                                                                                    
                                  average deadline.


                                                                                                                                                               )/    0("  , %1#% * ' +( ,   -   .
                                – Threshold-based : The group which contains the
                                  lowest threshold probability.




                                                                                                                   
5.3 Experiment 1: Users’ request frequencies
   In this experiment, we change the frequencies of user re-                                                                                                           
                                                                                                                                                                                ! "$ #%  "  &  ' (
quests and we check the number of users whose anonymiza-
tion processes are successful. Increase of request frequency
results in a large number of users in the target area, and
we can estimate that many groups will be generated. We
consider four cases of request frequencies: 5, 10, 50, and
100 times per second. This experiment is done using the                                                                                       Figure 9: Request frequencies and delays
synthetic data and we use the parameter settings shown in
Table 2. The experimental result is shown in Fig. 8. Three                                           naive and lazy provide reults with good quality in which
methods naive, deadline-based, and lazy have good through-                                           moderate generalization is performed.
puts as the increase of request frequency. In contrast, many-
first, next-deadline-based, avg-deadline-based, and threshold-                                                                               
based have bad performance especially for 50 / 100 times per                                                                              
                                                                                                                                           




                                                                                                         
second. The reason is that the four methods maintain all                                                                                     
the candidate groups so that their number rapidly increases                                                                                    
as the increase of users.                                                                                                                                                                                            
                                                                                                                                                 




                                                                                                        
                                                                                                                                                                                                                     
                                                                                                                                                 
                                                                                                                                                                                                                    
                   
   




                                                                                                                                                                                                                     
                       
                                                                               
    




                         
                                                                                
                                                                                 
                            
                                                                                  
                                                                                                     Figure 10: Maximum area sizes and number of qual-
                                                                                                     ified users

                                                                                                                                            
Figure 8: Request frequencies and number of quali-                                                                                         
                                                                                                            




fied users                                                                                                                                  
                                                                                                                                           
  Figure 9 shows the number of users of two types: 1)                                                                                                                                                                   
whose process is delayed more than 0.1 seconds due to the                                                                                                                                                            
foregoing users’ processes do not finish, and 2) whose pro-                                                                                 
                                                                                                                                                                                                                            
cess is expired since the wait time reaches the deadline.                                                                                  
                                                                                                                                                                                                                          
We consider four strategies naive, deadline-based, lazy, and                                                                                
many-first. We can see that delays happen in deadline-
based and especially in many-first. Note that next-deadline-
based, avg-deadline-based, and threshold-based have almost
the same result with many-first. Since many-first, next-
deadline-based, avg-deadline-based, and threshold-based con-                                         Figure 11: Averaged attribute generalization levels
tain all the groups which satisfy the threshold probabilities,
the increase of the number of candidates results in delays                                             Additionally, we performed similar experiments using the
for the requests.                                                                                    simulation-based dataset and the correlated distributions,
                                                                                                     but the trends were similar.
5.4 Experiment 2: Changing maximum area
    size                                                                                             5.5                                              Experiment 3: Changing user conditions
   We perform experiments by changing the maximum area                                                 In this experiment, we observe the behaviors when we
size of a cloaked region (MAX RECT SIZE in Algorithm 2)                                              change deadline and identification parameters in Table 2
from 500 × 500 to 2000 × 2000.                                                                       using the synthetic data. First, we change the deadline to
   Figure 10 shows the number of qualified users for the syn-                                        10 ± 50%. Figure 12 shows the qualified users for each
thetic data and the uniform attribute distribution. When                                             deadline setting. We anticipate that next-deadline-based
the maximum size is 2000 × 2000, delays happen only for                                              and avg-deadline-based have good results, but the results
avg-deadline-based and results in the low the number of                                              are different—deadline-based and many-first, which do not
qualified users. The number of qualified users are large for                                         care deadlines, perform well. Detailed analysis reveals that
many-first, deadline-based, and threshold-based. Figure 11                                           deadline-based strategies could output users with nearly ex-
shows how user attributes are generalized. In this figure,                                           piring, but failed to output groups which contain many users.


                                                                                                39
                       
                        
                                                                                                                                      we consider not only location information but also user at-
                                                                                                                              tributes. For that purpose, we defined a new criteria called
                          
                                                                                          
                                                                                                                                      observability and introduced the notion of a matching de-
                                                                                                 
                              
                                                                                                                                      gree. We proposed several variations of strategies and eval-
                                                                                                                                uated their performance based on the experiments.
       




                                                                                                                            Future work includes the development of robust and high-
                                   
                                                                                                 
                                                                                                                                      throuput method and a new algorithm which can anonymize
                                                                                                                users with low threshold settings.
                                                                                           !     "  
                                                                               #   #$        
                                                                                                                                      7.   ACKNOWLEDGMENTS
Figure 12: Number of qualified users for each dead-                                                                                     This research was partly supported by the Funding Pro-
line                                                                                                                                  gram for World-Leading Innovative R&D on Science and
                                                                                                                                      Technology (First Program).

  Next, we change the deadline setting to the original one
(10 ± 10%), but add 0.2 to threshold probabilities. Fig-
                                                                                                                                      8.   REFERENCES
ure 13 shows the number of suceeded users for each threshold                                                                           [1] B. Bamba, L. Liu, P. Pesti, and T. Wang. Supporting
probability setting. In contrast to the case above, threshold-                                                                             anonymous location queries in mobile environments
based, which tries to output low threshold ones, shows a                                                                                   with PrivacyGrid. In Proc. of WWW, pages 237–246,
good result for the threshold setting of 0.3. However, it is                                                                               2008.
worse than deadline-based and many-first, which do not care                                                                            [2] T. Brinkhoff. A framework for generating
thresholds and try to output groups with many users. All                                                                                   network-based moving objects. GeoInformatica,
the strategies could not make a group for users with thresh-                                                                               6:153–180, 2002.
old settings lower than 0.2.                                                                                                           [3] B. Gedik and L. Liu. Protecting location privacy with
                                                                                                                                           personalized k-anonymity: Architecture and
                                                                                                                                          algorithms. IEEE Transactions on Mobile Computing,
                                                                                                                                           7(1):1–18, 2008.
               
   




                                                                                                                                   [4] M. Gruteser and D. Grunwald. Anonymous usage of
                                                                                                                                          location-based services through spatial and temporal
                                                                                                           
                                                                                                                                         cloaking. In Proc. MobiSys, pages 31–42, 2003.
                                                                                                   
                                                                                                                                       [5] K. Liu and E. Terzi. A framework for computing the
  




               
                                                                                                                                  privacy scores of users in online social networks. In
               
                                                                                                                               Proc. ICDM, pages 288–297, 2009.
                                                                                                                                      [6] L. Liu. Privacy and location anonymization in
                                                                                                          
                                                                                                                                           location-based services. SIGSPATIAL Special,
                                                                                                   !                    1(2):15–22, 2009.
                                                                                                                                       [7] M. Mano and Y. Ishikawa. Anonymizing user location
                                                                
                                                                                                                                           and profile information for privacy-aware mobile
Figure 13: Number of qualified users for each                                                                                              services. In Proc. the 2nd ACM SIGSPATIAL
threshold probability setting                                                                                                              International Workshop on Location Based Social
                                                                                                                                           Networks (LBSN ’10). pages 68–75, 2010.
                                                                                                                                       [8] M. F. Mokbel, C.-Y. Chow, and W. G. Aref. The New
5.6 Discussion                                                                                                                             Casper: Query processing for location services without
  In terms of throughputs, many-first showed good perfor-                                                                                  compromising privacy. In Proc. VLDB, pages 763–774,
mance. Compared to the strategies that considers deadline                                                                                  2006.
and threshold (avg-deadline-based, next-deadline-based, and                                                                            [9] P. Samarati. Protecting respondents’ identities in
threshold-based ), the quality of the generated groups were                                                                                microdata release. IEEE TKDE, 13(6):1010–1027,
better. However, these four strategies have a common prob-                                                                                 2001.
lem when request frequency is high due to the increase of                                                                             [10] H. Shin, V. Atluri, and J. Vaidya. A profile
the number of candidate groups. For such a heavy-traffic                                                                                     anonymization model for privacy in a personalized
case, the naive strategy might be a better choice since it                                                                                 location based service environment. In Proc. MDM,
can achieve high successful rate with low cost. It may be                                                                                  pages 73–80, 2008.
possible to change strategies considering the traffic.                                                                                  [11] X. Xiao and Y. Tao. Personalized privacy preservation.
  In terms of the availability of cloaked regions, lazy was                                                                                In Proc. ACM SIGMOD, pages 229–240, 2006.
good. In this strategy, since generalization is not performed
agressively, the quality of the results was generally good.
This is a good property for advertisers. In addition, the
strategy can support many users without serious delays.

6. CONCLUSIONS
 In this paper, we have proposed a new anonymization
method for location-based services. The feature is that


                                                                                                                                 40