Claudio Bettini
Sushil Jajodia
Pierangela Samarati
X. Sean Wang


                       PiLBA’08
  Privacy in Location-Based Applications
          Workshop co-located with ESORICS 2008
               Malaga, Spain, October 9, 2008
                        Proceedings
Editors’ addresses:
Claudio Bettini
D.I.Co. , Universitá di Milano - Via Comelico 39, I-20135 Milano, Italy
bettini@dico.unimi.it

Sushil Jajodia
C.S.I.S. , George Mason University - 4400 University Drive, Fairfax, VA 22030, USA
jajodia@gmu.edu

Pierangela Samarati
D.T.I. , Universitá di Milano - Via Bramante 65 , 26013 Crema , Italy
samarati@dti.unimi.it

X. Sean Wang
Dep. of C.S. , University of Vermont - 33 Colchester Avenue, Burlington, VT 05405, USA
xywang@cs.uvm.edu


Copyright c 2008 for the individual papers by the papers’ authors. Copying permitted
for private and academic purposes. Re-publication of material from this volume requires
permission by the copyright owners.
Preface

Location based applications in travel, logistics, health care, social networks and other
industries already exist and are poised to proliferate. One of the critical issues for a wide-
spread deployment of these applications is how to conciliate their effectiveness and quality
with privacy concerns. The PiLBA ’08 workshop was intended to bring together scientists
from security and data management to discuss the most recent advances in the field. These
proceedings include the eight contributions on this topic selected from the submissions by
the PC chairs with the help of the members of the program committee, and presented at the
workshop. They include an extended abstract of the invited talk, two survey papers, and
five research papers covering several complementary aspects of privacy in location based
applications and services.


The Organizing Committee would like to thank all the people that supported and helped in
the organization of this event. A particular acknowledgment to the PC members, to Javier
Lopez for his suggestions and support with local organization and to Linda Pareschi for
her help with the web site and with the preparation of these proceedings.


                                                                             Claudio Bettini
                                                                              Sushil Jajodia
                                                                        Pierangela Samarati
                                                                              X. Sean Wang


                                                                               October 2008


                                              3
Organizing Committee

Claudio Bettini - Universitá di Milano, Italy
Sushil Jajodia - George Mason University, USA
Pierangela Samarati - Universitá di Milano, Italy
Sean Wang - University of Vermont, USA


Publicity Chair

Linda Pareschi - Universitá di Milano, Italy


Programme Committee

Claudio Agostino Ardagna - Universitá di Milano, Italy
Vijay Atluri - Universitá di Milano, Italy
Ying Cai - Iowa State University, USA
Reynold Cheng - University of Hong Kong, China
Josep Domingo-Ferrer - Rovira i Virgili University, Spain
Marco Gruteser - Rutgers University, USA
Urs Hengartner - University of Waterloo, Canada
Panos Kalnis - National University of Singapore, Singapore
George Kollios - Boston University, USA
Ling Liu - Georgia Institute of Technology, USA
Sergio Mascetti - Universitá di Milano, Italy
Dino Pedreschi - Universitá di Pisa, Italy
Daniele Riboni - Universitá di Milano, Italy
Cyrus Shahabi - University of Southern California, USA
Heng Xu - Penn State University, USA
Man Lung Yiu - Aalborg University, Denmark


External Reviewers

Baik Hoh
Ali Khoshgozaran
Houtan Shirani-Mehr
Xike Xie
Yan Zhang


                                                4
Contents

Invited talk extended abstract: Safety and Privacy in Vehicular Communications
   Josep Domingo-Ferrer and Qianhong Wu                                                6

Location Privacy in Location-Based Services: Beyond TTP-based Schemes
   Agusti Solanas, Josep Domingo-Ferrer and Antoni Martı́nez-Ballesté                12

Privacy in Georeferenced Context-aware Services: A Survey
   Daniele Riboni, Linda Pareschi and Claudio Bettini                                 24

Pattern-Preserving k-Anonymization of Sequences and its Application to Mobil-
   ity Data Mining
   Ruggero G. Pensa, Anna Monreale, Fabio Pinelli and Dino Pedreschi          44

On the Impact of User Movement Simulations in the Evaluation of LBS Privacy-
   Preserving Techniques
   Sergio Mascetti, Dario Freni, Claudio Bettini, X. Sean Wang and Sushil Jajodia 61

A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks
   Claudio Agostino Ardagna, Angelos Stavrou, Sushil Jajodia, Pierangela Samarati
   and Rhandi Martin                                                              82

User Privacy in Transport Systems Based on RFID E-Tickets
   Ahmad-Reza Sadeghi, Ivan Visconti, and Christian Wachsmann                        102

The CURUPIRA-2 Block Cipher for Constrained Platforms: Specification and
   Benchmarking
   Marcos Simplı́cio Jr., Paulo S. L. M. Barreto, Tereza C. M. B. Carvalho, Cintia B.
   Margi, and Mats Näslund                                                           123


                                           5
Safety and Privacy in Vehicular Communications

                 Josep Domingo-Ferrer and Qianhong Wu
  Universitat Rovira i Virgili, UNESCO Chair in Data Privacy, Dept. of Computer
Engineering and Mathematics, Av. Paı̈sos Catalans 26, E-43007 Tarragona, Catalonia
                  e-mail {josep.domingo,qianhong.wu}@urv.cat


      Abstract. Vehicular ad hoc networks (VANETs) allow vehicles to dis-
      seminate messages about road conditions to other vehicles. As long as
      these messages are trustworthy, they can greatly increase traffic safety
      and efficiency. Hence, care must be exerted to ensure that vehicle-generated
      messages do not convey inaccurate or false content. A natural way to pro-
      ceed is to request endorsement by nearby vehicles on the content of a
      message originated by a certain vehicle. However, such a message gen-
      eration and peer-to-peer endorsement should not result in any privacy
      loss on the part of vehicles co-operating in it. We survey the available
      solutions to this security-privacy tension and discuss their limitations.
      We sketch a new privacy-preserving system which guarantees message
      authentication through both a priori and a posteriori countermeasures.


1   Introduction

VANETs allow vehicles to broadcast messages to other vehicles in the
vicinity. It is suggested that each vehicle periodically send messages over
a single hop every 300ms within a distance of 10s travel time (which is a
distance range between 10 m and 300 m)[RH05]. This mechanism can be
used to improve safety and optimize traffic. However, malicious vehicles
can also make use of this mechanism by sending fraudulent messages for
their own profit or just to jeopardize the traffic system. Hence, the system
must be designed to ensure that the transmission comes from a trusted
source and has not been tampered with since transmission.
    Another critical concern in VANETs is driving privacy or vehicle
anonymity. As noted in [Dot06], a lot can be inferred on the driver’s pri-
vacy if the whereabouts and the driving pattern of a car can be tracked.
However, it is possible for attackers to trace vehicles by using cameras or
physical tracking. But such physical attacks can only trace specific tar-
gets and are much more expensive than monitoring the communication
in VANETs. This paper addresses the latter attacks.


                                         6
Safety and Privacy in Vehicular Communications


  2     Countermeasures for securing VANETs

  VANETs function to improve safety only if the messages sent by vehi-
  cles are trustworthy. Dealing with fraudulent messages is a thorny issue
  for safety engineers due to the self-organized property of VANETs. The
  situation is deteriorated by the privacy requirements of vehicles since,
  in a privacy-preserving setting, the message generators, i.e. the vehicles,
  are anonymous in the sense that their identities are unknown. A num-
  ber of schemes have been proposed to reduce fraudulent messages; such
  proposals fall into two classes, namely a posteriori and a priori.

  2.1    A posteriori countermeasures
  A posteriori countermeasures consist of taking punitive action against
  vehicles who have been proven to have originated fraudulent messages.
  To be compatible with privacy preservation, these countermeasures re-
  quire the presence of a trusted third party able to open the identities
  of dishonest vehicles. Then the revoked vehicles can be expelled from
  the system. Cryptographic authentication technologies have been exten-
  sively exploited to offer a posteriori countermeasures. Some proposals use
  regular digital signatures [RPH06,RH07,RPAJ07,AFWZ07]. In these pro-
  posals, vehicle privacy is provided by a pseudonym mechanism, in which
  certificate authorities (CAs) produce many pseudonyms for each vehi-
  cle so that attackers cannot trace the vehicles producing signatures in
  different periods with different pseudonyms, except if the CAs open the
  identities of the vehicles. The pseudonym mechanism is not that efficient
  due to the heavy overhead of pseudonym generation and storage. Other
  schemes use sophisticated cryptographic technologies such as group signa-
  tures [GBW07] or ring signatures [LSHS07,GGT06]. The latter methods
  are more efficient, but those using ring signatures cannot trace malicious
  vehicles due to the unconditional anonymity of ring signatures. Along this
  research line, the scheme in [GBW07] seems the most efficient one that
  can provide revokable anonymity.

  2.2    A priori countermeasures
  A priori countermeasures attempt to prevent the generation of fraud-
  ulent messages. This approach is based on the assumption that most
  users are honest and will not endorse any message containing false data.
  Another implicit assumption is the usual common sense that, the more
  people endorse a message, the more trustworthy it is. Along this research


                                         7
Safety and Privacy in Vehicular Communications


  line, the schemes in [GGS04,ODS07,PP05,RAH06,DDSV08] exploit the
  assumption that there is a majority of honest vehicles in VANETs. Hence,
  these schemes introduce some form of threshold mechanism: a message is
  trusted if it has been verifiably endorsed by a number of vehicles above a
  certain threshold. Among these schemes, the proposals in [DDSV08] may
  be the most efficient while enabling anonymity of message originators.
  But their scheme does not provide anonymity revocability, which may
  not suit some applications in which anonymity must be revoked “for the
  prevention, investigation, detection and prosecution of serious criminal
  offences”[EP05].


  3    Discussion

  Unfortunately, neither a posteriori nor a priori countermeasures are solely
  sufficient to secure VANETs. By taking strict punitive action, a posteri-
  ori countermeasures can exclude some rational attackers producing bo-
  gus messages to obtain benefits or pranks. However, they are ineffective
  against irrational attackers such as terrorists. Even for rational attack-
  ers, damage has already occurred when punitive action is taken. It seems
  that a priori countermeasures function better in this case because they
  prevent damage beforehand by letting the vehicles trust only messages
  endorsed by a number of vehicles. Although the underlying assumption
  that there is a majority of honest vehicles in VANETs generally holds, it
  cannot be excluded that a number of malicious vehicles greater than or
  equal to the threshold are present in specific locations, for instance. For
  example, this is very plausible if some criminal organization undertakes
  to divert traffic from a certain area by broadcasting messages informing
  that a road is barred. Furthermore, for convenience in implementation,
  existing schemes use an even stronger assumption that the number of
  honest vehicles in all cases should be at least a preset threshold. But such
  a universally valid threshold does not exist in practice. Indeed, the thresh-
  old should somehow take the traffic density and the message scope into
  account: a low density of vehicles calls for a lower threshold, whereas a
  high density and a message relevant to the whole traffic of a city requires
  a sufficiently high threshold.
      The situation is aggravated by the anonymity technologies used some
  proposals. A system preserves anonymity when it does not require the
  identity of its users to be disclosed. Without anonymity, attackers can
  trace all the vehicles by monitoring the communication in VANETs, which
  in turn can enable the attackers to mount serious attacks against specific


                                         8
Safety and Privacy in Vehicular Communications


  targets. Hence, anonymity is a critical concern in VANETs. However,
  anonymity can also weaken a posteriori and a priori countermeasures.
  Indeed, attackers can send fraudulent messages without fear of being
  caught due to anonymity, and as a result, no punitive action can be taken
  against them. Furthermore, some proposals provide strong anonymity,
  i.e. unlinkability. Unlinkability implies that a verifier cannot distinguish
  whether two signatures come from the same vehicle or two vehicles. This
  feature may enable malicious vehicles to mount the so-called Sybil attack:
  a vehicle generates a fraudulent message and then endorses the message
  herself by computing on it as many signatures as required by the thresh-
  old in use; since signatures are unlinkable, no one can find out that all
  of them come from the same vehicle. Hence, elegantly designed protocols
  are required to secure VANETs when incorporating anonymity.

  4    Towards a combination of a priori and a posteriori
       countermeasures
  Bearing in mind that enhancing safety and traffic efficiency is one of
  the main thrusts behind VANETs, we propose a new efficient system
  to balance public safety and vehicle privacy. Both a priori and a pos-
  teriori countermeasures are resorted to in order to thwart attackers. To
  the best of our knowledge, ours is the first system equipped with both
  types of countermeasures. We achieve this goal by drawing on the novel
  technology of message-linkable group signatures (MLGS). In an MLGS
  scheme, a vehicle stays anonymous if it produces two signatures on two
  different messages. However, if it produces two signatures on the same
  message, then it will be identified, which effectively thwarts the Sybil at-
  tack in a privacy-preserving system. This novel technology also enables us
  to realize a threshold-adaptive authentication in which the threshold can
  adaptively change in light of the context of messages, instead of having to
  be preset during the system design stage. Furthermore, a fast batch ver-
  ification method is presented to speed up the validation of authenticated
  messages. Since vehicles periodically receive a large number of messages
  to be validated, such a batch verification is critical to make authentica-
  tion implementable in VANETs. Details on the new scheme will be given
  in [WD08].

  Acknowledgments and disclaimer
  This work was partly supported by the Spanish Government through
  projects CONSOLIDER INGENIO 2010 CSD2007-00004 “ARES” and


                                         9
Safety and Privacy in Vehicular Communications


  TSI2007-65406-C03-01 “E-AEGIS”, and by the Government of Catalonia
  under grant 2005 SGR 00446. The authors are with the UNESCO Chair
  in Data Privacy, but their views do not necessarily reflect the position of
  UNESCO nor commit that organization.

  References
  [RH05] M. Raya and J.-P. Hubaux. The security of vehicular ad hoc networks. In
     SASN’05, 2005.
  [Dot06] F. Dötzer. Privacy issues in vehicular ad hoc networks. Lecture Notes in Com-
     puter Science, vol. 3856, pp. 197-209, 2006.
  [RPH06] M. Raya, P. Papadimitratos and J.-P. Hubaux. Securing vehicular commu-
     nications. IEEE Wireless Communications Magazine, vol. 13, no. 5, pp. 8-15, 2006.
  [RH07] M. Raya and J.-P. Hubaux. Securing vehicular ad hoc networks. Journal of
     Computer Security, Special Issue on Security of Ad Hoc and Sensor Networks, vol.
     15, no. 1, pp. 39-68, 2007.
  [RPAJ07] M. Raya, P. Papadimitratos, I. Aad, D. Jungels and J.-P. Hubaux. Eviction
     of misbehaving and faulty nodes in vehicular networks. IEEE Journal on Selected
     Areas in Communications, vol. 25, no. 8, pp. 1557-1568, 2007.
  [AFWZ07] F. Armknecht, A. Festag, D. Westhoff and K. Zeng. Cross-layer privacy
     enhancement and non-repudiation in vehicular communication. In 4th Workshop
     on Mobile Ad-Hoc Networks (WMAN), Bern, Switzerland, March 2007.
  [GBW07] J. Guo, J.P. Baugh and S. Wang. A group signature based secure and
     privacy-preserving vehicular communication framework. In Mobile Networking for
     Vehicular Environments, pp. 103-108, 2007.
  [LSHS07] X. Lin, X. Sun, P.-H. Ho and X. Shen. GSIS: A secure and privacy preserving
     protocol for vehicular communications. IEEE Transactions on Vehicular Technology,
     vol. 56, no. 6, pp. 3442-3456, 2007.
  [GGT06] C. Gamage, B. Gras and A.S. Tanenbaum. An identity-based ring signature
     scheme with enhanced privacy. In Proceedings of the IEEE SecureComm Confer-
     ence, pp. 1-5, 2006.
  [GGS04] P. Golle, D. Greene and J. Staddon. Detecting and correcting malicious data
     in VANETs. In Proceedings of the 1st ACM international workshop on Vehicular
     Ad Hoc Networks, pp. 29-37, 2004.
  [PP05] B. Parno and A. Perrig. Challenges in securing vehicular networks. In Proceed-
     ings of the ACM Workshop on Hot Topics in Networks, 2005.
  [ODS07] B. Ostermaier, F. Dötzer and M. Strassberger. Enhancing the security of
     local danger warnings in VANETs - A simulative analysis of voting schemes. In
     Proceedings of the Second International Conference on Availability, Reliability and
     Security, pp. 422-431, 2007.
  [RAH06] M. Raya, A. Aziz and J.-P. Hubaux. Efficient secure aggregation in VANETs.
     In Proceedings of the 3rd International Workshop on Vehicular Ad hoc Networks -
     VANET 06, pp. 67-75, 2006.
  [DDSV08] V. Daza, J. Domingo-Ferrer, F. Sebe and A. Viejo. Trustworthy privacy-
     preserving car-generated announcements in vehicular ad hoc networks. IEEE Trans-
     actions on Vehicular Technology, Accepted, July 2008.
  [EP05] European Parliament. Legislative resolution on the proposal for a direc-
     tive of the European Parliament and of the Council on the retention of data


                                          10
Safety and Privacy in Vehicular Communications


    processed in connection with the provision of public electronic communication
    services and amending Directive 2002/58/EC (COM(2005)0438 C6-0293/2005
    2005/0182(COD)), 2005
  [WD08] Q. Wu and J. Domingo-Ferrer. Improved trustworthiness of vehicular commu-
    nications with a priori and a posteriori countermeasures. Manuscript in preparation,
    2008.


                                          11
    Location Privacy in Location-Based Services:
            Beyond TTP-based Schemes

     Agusti Solanas, Josep Domingo-Ferrer, and Antoni Martı́nez-Ballesté

                           Rovira i Virgili University
                 Department of Computer Engineering and Maths
                        UNESCO Chair in Data Privacy
                            Av. Paı̈sos Catalans 26
                      E-43007 Tarragona, Catalonia, Spain
           {agusti.solanas,josep.domingo,antoni.martinez}@urv.cat


      Abstract. Location-Based Services (LBS) are gaining importance due
      to the advances in mobile networks and positioning technologies. Never-
      theless, the wide deployment of LBS can jeopardise the privacy of their
      users, so ensuring user privacy is paramount to the success of those ser-
      vices. This article surveys the most relevant techniques for guaranteeing
      location privacy to LBS users. The rigid dichotomy between schemes
      which rely on Trusted Third Parties (TTP-based) and those which do
      not (TTP-free) is emphasised. Also, the convenience of both approaches
      is discussed and some ideas on the future of location privacy in these
      services are sketched.
      Keywords: Anonymisation/pseudonymisation in LBS, Trust manage-
      ment in LBS.


1   Introduction

The Information Society rests on the Information and Communications Tech-
nologies (ICT). Location-Based Services (LBS) are becoming an important ICT
and will be eventually available anywhere anytime. LBS provide users with highly
personalised information accessible by means of a variety of mobile devices that
are able to locate themselves, e.g. by using a GPS or a fixed network infrastruc-
ture with GSM [1]. Mobile devices are ubiquitous and services related to the
user’s current location proliferate. Examples of LBS are location-based tourist
information [2], route guidance [3], emergency assistance [4], location-based ad-
vertising [5], etc.
    The extensive deployment of ubiquitous technology is not without privacy
drawbacks. By sending their locations, LBS users could endanger their security
and privacy because, for example, an attacker could determine their location and
track them. This tracking capability of attackers opens up many computer-aided
crime possibilities (harassment, car theft, kidnapping, etc.). Also, if an attacker
impersonates an LBS provider, the traffic patterns of LBS users could be influ-
enced by false information, and the users’ location could be compromised [6].


                                        12
Location Privacy in Location-Based Services: Beyond TTP-based Schemes

  There are also other attacks which aim to identify users by means of the loca-
  tions contained in their queries. By identifying users, attackers can link queries
  to real identities. In those ways, attackers can obtain detailed profiles of the users
  and send them undesired advertisements or even harass them. Some examples
  of techniques/attacks for identifying users are the restricted space identification
  (RSI) attack and the observation identification (OI) attack. The RSI attack con-
  sists in linking locations to identities by using queries which are submitted from
  a restricted space (e.g. if a user submits queries from his garage in a suburban
  house, it is easy to link those queries to his real identity by looking up who lives
  in that house, for example by means of a phonebook). Similarly the OI attack
  links queries to identities by observing where users are (i.e. the attacker knows
  the user’s location because she can see him) and correlating this information
  with the location contained in their queries [7].
      Several countries have taken legal initiative to cope with privacy problems
  related to electronic communications. In Europe, the European directive on Data
  Protection and Privacy [8] agrees on a set of measures to assure the privacy of
  the users of telecommunications technologies such as LBS. Similarly, the Wire-
  less Privacy Protection Act [9] does the same in the US. Unfortunately, all these
  measures regulate well-established business models but they can hardly be ap-
  plied to the new LBS that arise in ad-hoc networks created and dismantled on
  the fly.
      Although there are many other relevant topics related to LBS (e.g profile
  anonymisation [10, 11], trajectories analysis [12, 13], privacy in location-based
  community services [14], etc.), in this article we concentrate on the methods to
  protect the location privacy of LBS users who send their location to an LBS
  provider.


  1.1    Contribution and plan of this article

  In this article, we provide a survey of the most relevant and recent schemes
  designed to offer location privacy to LBS users. We analyse, organise and classify
  them in two main groups: (i) TTP-based schemes and (ii) TTP-free schemes.
  Moreover, we sketch some ideas on the future of location privacy in LBS and
  some lines for future research.
       The rest of the article is organised as follows. In Section 2 we suggest a clas-
  sification of the methods for location privacy in LBS proposed in the literature.
  Section 3 is devoted to the analysis of TTP-based schemes and Section 4 studies
  TTP-free approaches. Finally, Section 5 contains a brief discussion and some
  suggestions for future research.


  2     Classification of methods for location privacy in LBS

  In the simplest form of communication between an LBS user (U ) and an LBS
  provider (P ), the former sends a simple query (Q) containing an ID, his location
  (L) and a request for information (I) that he wants to retrieve from P . Thus, a


                                          13
Location Privacy in Location-Based Services: Beyond TTP-based Schemes

                                              A


                           B                                       C


  Fig. 1. A: Simple communication scheme with an LBS user and an LBS provider. B:
  Communication scheme between an LBS user, an intermediate trusted entity and an
  LBS provider. C: Communication scheme between a set of collaborative LBS users and
  an untrusted LBS provider. Note that in this scheme location information is not the
  real one (L), but a perturbed one (L′ ) and no TTP is used.


  simple query sent from U to P can be Q = {IDU , L, I} = {IDU , xU , yU , “Where
  is the closest bus station?”} (cf. Figure 1.A). By sending their current locations
  to P , LBS users assume that P manages their data honestly and refrains from
  any misuse. However, LBS providers cannot always be trusted and more complex
  communication schemes are needed.
      Most of the solutions proposed in the literature to address the location pri-
  vacy problem are based on Trusted Third Parties (TTP), i.e. entities which fully
  guarantee the privacy of their users. Although this approach is widely accepted,
  it simply moves users’ trust from LBS providers to intermediate entities. By
  doing so, LBS providers are no longer aware of the real locations and identities
  of the users; trust and, by extension, power are handed over to intermediate
  entities such as brokers, pseudonymisers or anonymisers. The problem is that
  users are not necessarily satisfied by completely trusting intermediate entities or
  providers, especially after the recent scandals related to the disclosure of personal
  data by this kind of trusted entities1 (cf. Figure 1.B).
      The main difference between the simple communication scheme and the TTP-
  based one is that in the latter the set of intermediate entities can be expected
  to be smaller than the number of service providers. Therefore, intermediate en-
  tities can be well-known and the risk of trusting a dishonest entity is lessened.
  However, due to the above mentioned scandals, many users would prefer to trust
  1
      In Autumn 2007, several data privacy disasters happened in the UK connected to
      Her Majesty’s Revenue and Customs. Two computer disks full of personal data on
      25 million British individuals disappeared; HMRC also lost another disk containing
      pension records of 15,000 people and a laptop containing personal data on 400 people.
      In 2006 in the U.S, data on 26.5 million people were stolen from the home of an
      employee of the Department of Veterans Affairs, and queries by 658,000 users were
      disclosed by the AOL search engine.


                                            14
Location Privacy in Location-Based Services: Beyond TTP-based Schemes

                                 LBS Privacy Methods


                   TTP-based                                TTP-free


   Simple                            Policy-based                               PIR-based


        Pseudonym-based   Anonymity-based              Collaboration-based   Obfuscation-based


                Fig. 2. Classification of location privacy methods for LBS


  nobody, which leads to TTP-free schemes. These represent a substantial change
  of paradigm (cf. Figure 1.C). Instead of trusting a third party, users collaborate
  to protect their privacy. As it is explained in Section 4, there is not even need to
  trust the users one collaborates with. Figure 2 depicts our proposed classification
  of location privacy methods. The main aim of the classification is to emphasise
  the rigid dichotomy between these two paradigms: (i) TTP-based methods and
  (ii) TTP-free methods. In the following sections we review some of the most
  relevant representatives of TTP-based and TTP-free methods.


  3     Privacy in TTP-based schemes

  TTP-based schemes are very common because they are easy to understand/develop,
  and because, in general, they offer a reasonable trade-off between efficiency, ac-
  curacy and privacy. Moreover, some of the ideas used in these schemes arose in
  more mature fields like e-commerce.
      In the simple scheme described in Section 2, users send their location in-
  formation and queries directly to the LBS provider. In this scheme, whatever
  location privacy LBS users can get depends on the honest behaviour of the LBS
  provider.
      In the following sections we concentrate on some TTP-based schemes that
  aim to protect the location privacy of the users.


  3.1   Policy-based schemes

  Policy-based schemes are one step forward in LBS privacy with respect to the
  simple scheme. Although the conceptual framework is the same (i.e. a user sub-
  mits queries to a provider), in this case, providers adhere to a set of privacy
  policies known by users. Hence, if providers do not properly follow their privacy
  policies, users have the right to ask for a compensation and/or take legal action
  against providers.


                                            15
Location Privacy in Location-Based Services: Beyond TTP-based Schemes

      Privacy policies are legal notices that contain statements defining what ser-
  vice providers can do with their users’ personal data. Privacy policies are pub-
  lished by service providers, and users decide whether such policies are acceptable
  to them. These policies refer to many concepts and specific languages are used
  to define them [15, 16]. Users reach an agreement with providers about which
  data are collected, what are these data used for and how they can be distributed
  to third parties. In this kind of schemes, privacy is understood as the ability of
  individuals to decide when, what, and how information about them is disclosed
  to others. Ideally, users can choose amongst a variety of policies. So, depending
  on the selected policy, users can save some money but, in return, providers can
  distribute/sell some of their data.
      These schemes are widely used on the Internet by e.g. e-commerce sites which
  define their privacy policies in e.g. P3P (Platform for Privacy Preferences) [17].
  They have been used for automotive telematics [18], and the Geopriv (Geo-
  graphic Location/Privacy) Charter of the IETF proposes their use for LBS
  also [19]. A recent study on the use of policies and access control techniques
  can be found in [20].


  3.2    Pseudonymisers

  Pseudonymisers are the simplest intermediate entity between LBS users and
  providers. Pseudonymisers receive queries from users and, prior to forwarding
  them to LBS providers, they replace the real IDs of the users by fake ones (i.e.
  pseudonyms). In this way, the real user IDs remain hidden to the provider, but
  pseudonymisers must store the real IDs and their corresponding pseudonyms
  in order to forward the answers from the providers to the users. Clearly, users
  must completely trust pseudonymisers, because the latter see all the location
  information on the former.
      The main problem of this technique is that an attacker (e.g. the LBS provider
  herself) can infer the real identity of the user by linking the user location with
  e.g. a public telephone directory (e.g. by using the aforementioned RSI or OI
  attacks [7]).


  3.3    Anonymisers

  Anonymisers are the most sophisticated option in TTP-based location privacy.
  Instead of taking care of policies or users’ identifiers, anonymisers assume that
  communications are anonymous, i.e. LBS providers do not require an ID to
  answer queries2 . Anonymisers aim to hide users true identity with respect to
  emitted location information. In this section we concentrate on techniques that
  hide the location information of users and we assume that identifier abstraction
  is already guaranteed.
  2
      If this assumption was not made, it would be easy to track a given LBS user by
      simply checking the ID or the pseudonym (like in the case of pseudonymisers).


                                         16
Location Privacy in Location-Based Services: Beyond TTP-based Schemes

      A very common way to hide the real location of the users from the LBS
  provider is by using the k-anonymity property. k-Anonymity is an interesting ap-
  proach to face the conflict between information loss and disclosure risk, suggested
  by Samarati and Sweeney [21–24]. Although it was designed for application in
  databases by the Statistical Disclosure Control (SDC) community, k-anonymity
  has been adapted to LBS privacy. In this context, we say that the location of
  a user is k-anonymous if it is indistinguishable from the location of another
  k − 1 users. So, the fundamental idea behind k-anonymisers is to replace the
  real location of the user by cloaking areas in which at least k users are located.
  Anonymisers transform locations (x, y) at time t to ([x1, x2], [y1, y2], [t1, t2])
  where ([x1, x2], [y1, y2]) is the rectangular area containing (x, y) between times
  t1 and t2 such that t ∈ [t1, t2]. By doing so, LBS providers cannot easily deter-
  mine which of the k users in the cloaking area is really submitting the query.
      Many examples of this kind of approach and other similar ones based on
  cloaking can be found in the literature [7, 25, 26]. One of the most recent advances
  in anonymisers is proposed in [27], where an extension of a previous anonymiser
  version [25] is proposed. The proposed anonymiser allows a user to define his
  personal privacy requirements, i.e. the number k of users amongst which he
  wants to be anonymised, and the maximum delay and location perturbation he
  is willing to accept. The proposal is resilient against identification attacks such
  as RSI and OI. However, it has some important drawbacks which, as we explain
  in the next section, can be avoided by TTP-free approaches: (i) the architecture
  relies on a TTP, so that the user must completely trust the platform mediating
  between him and the LBS provider; (ii) it is assumed that LBS providers are
  not malicious but semi-honest, which might turn out to be too much of an
  idealisation; and (iii) the architecture is centralised, which makes it vulnerable
  to Denial of Service (DoS) attacks.
      In [28] a similar method called PrivacyGrid is described. Although the anonymiser
  described in [27] and the PrivacyGrid approach are very similar, the latter seems
  to be more efficient due to the cloaking techniques based on grids (i.e. bottom-up,
  top-down and hybrid) that it uses. Moreover PrivacyGrid adds the l-diversity
  property to the already considered k-anonymity one. By doing so, the privacy
  of LBS users is improved. Although PrivacyGrid seems to improve the proposal
  in [27], it mainly suffers from the same shortcomings.
      Current research on anonymisers focuses on improving the efficiency of the
  intermediaries and designing highly personalised services able to guarantee the
  privacy of the users.


  4   Privacy in TTP-free schemes

  Due to the shortcomings of the TTP-based schemes, other methods that do not
  rely on TTPs have been proposed. First, we consider the collaboration methods
  that aim to obtain the same results (e.g. k-anonymity, l-diversity, efficiency)
  than the ones based on TTP. Then, we pay attention to the methods based on


                                        17
Location Privacy in Location-Based Services: Beyond TTP-based Schemes

  the obfuscation of the real location without collaboration. Finally we point out
  a new location privacy trend based on Private Information Retrieval (PIR).


  4.1    Collaboration-based methods

  In [29], the first collaborative TTP-free algorithm for location privacy in LBS is
  proposed. The user perturbs his location by adding zero-mean Gaussian noise to
  it. Then the user broadcasts his perturbed location and requests neighbours to
  return perturbed versions of their locations. Amongst the replies received, the
  user selects k −1 neighbours such that the group formed by the locations of these
  neighbours and his own perturbed location spans an area A satisfying Amin <
  A < Amax , where Amin is a privacy parameter (the minimum required area for
  cloaking) and Amax is an accuracy parameter (the maximum area acceptable
  for cloaking). Finally, the user sends to the LBS the centroid of the group of
  k perturbed locations including his own. Since users only exchange perturbed
  locations, they do not need to trust each other for privacy. On the other hand,
  perturbations tend to cancel out each other in the centroid, so accuracy does
  not degrade3 . This method does not achieve k-anonymity because the centroid
  is only used by a single user to identify himself. In addition, due to the noise
  cancellation, users cannot use this method several times without changing their
  location. In [30], a similar peer-to-peer scheme for location privacy is presented.
  Its main idea is to generate cloaking areas as in [29]: users must find other users
  in their cover range and share their location information. Once this information
  is known, users can send their queries to LBS providers using the cloaking area
  instead of their real locations. The main shortcoming of this proposal is that
  users must trust other users because they exchange their real locations. Thus, a
  malicious user can easily obtain and publish the location of other users. Although
  we classify this technique as a TTP-free technique, it can also be understood as
  a distributed TTP-based scheme, where each user is a TTP.
      In [31], the authors propose a method based on Gaussian noise addition to
  compute a fake location that is shared by k users (unlike in [29]). Thus, all k
  users use the same fake location and the LBS provider is unable to distinguish
  one user from the rest, so that their location becomes k-anonymous. This method
  was extended to support non-centralised communications in [32]. The proposal
  is based on a stack of modules that progressively increase the privacy achieved
  by users. The basic module is equivalent to the method described in [30] where
  users have to trust each other because they share their location. Once they know
  the locations of other users, they can compute a centroid that they use as their
  fake location. In order to allow users to exchange their location without trusting
  other peers, a second module that perturbs the location is added. This module
  adds Gaussian noise with zero mean to the real location of users. As explained
  above, the centroid of locations perturbed with zero-mean Gaussian noise is quite
  similar to the centroid of unperturbed locations. However, if this procedure is
  3
      The average of k zero-mean perturbations with variance σ 2 has zero mean and
      variance σ 2 /k.


                                         18
Location Privacy in Location-Based Services: Beyond TTP-based Schemes

  repeated several times with static users (i.e. users that do not change their
  location substantially), their real location could be deduced because of the noise
  cancellation (this is the main problem of [29]). To prevent this, the protocol uses
  privacy homomorphisms [33] to guarantee that users cannot see the real locations
  of other users whilst still being able to compute the centroid. Finally, a module
  that distributes users in a chain is added to avoid denial of service attacks to the
  central user. At the end of the protocol users become k-anonymous and their
  location privacy is secured. However, the main problem of this proposal is that
  it cannot provide a lower bound of the location error.

  4.2   Obfuscation-based methods
  Obfuscation is a TTP-free alternative to collaboration-based methods. Obfusca-
  tion can be understood as the process of degrading the quality of information
  about a user’s location, with the aim to protect that user’s privacy [34]. Some
  methods like the ones described in previous sections (e.g. cloaking methods) can
  be understood as special kinds of obfuscation because they basically modify the
  location information in several ways to improve user’s privacy. However, we clas-
  sify them in different categories because they need TTPs and/or achieve other
  properties such as k-anonymity or l-diversity.
      In [35] an obfuscation method based on imprecision is presented. The space
  is modelled as a graph where vertices are locations and edges indicate adjacency.
  Hence, in order to obtain an imprecise location, the user sends a set of vertices
  instead of the single vertex in which he is located. The LBS provider cannot
  distinguish which of the vertices is the real one. The article proposes negotiation
  algorithms that allow users to increase the QoS whilst maintaining their privacy.
  The main problem of this technique is that users and providers must share the
  graph modelling the space (cf. [36] for a comprehensive approach to imprecision
  in location systems). Some other recently proposed obfuscation methods can be
  found in [37], where the real location of LBS users is replaced by circular areas
  of variable centre and radius.
      SpaceTwist [38] is the most recent proposal for non-collaborative TTP-free
  location privacy. SpaceTwist generates an anchor (i.e. a fake point) that is used
  to retrieve information on the k nearest points of interest from the LBS provider.
  After successive queries to the LBS provider, SpaceTwist is able to determine
  the closest interest point to the real location whilst the LBS provider cannot
  derive the real location of the user. The main advantages of this method are: (i)
  no TTP and no collaboration are needed; (ii) the closest interest point is always
  found; (iii) the location of the user is hidden in a controlled area. However, due
  to the lack of collaboration, this method is not able to achieve the k-anonymity
  and/or the l-diversity properties.

  4.3   PIR-based methods
  A totally different approach to TTP-free LBS privacy is proposed in [39]. In
  that article, Private Information Retrieval (PIR) is used to provide LBS users


                                         19
Location Privacy in Location-Based Services: Beyond TTP-based Schemes

  with location privacy. Although the idea of using PIR techniques is promising,
  the proposed approach requires the LBS provider to co-operate with users by
  following the PIR protocol; this prevents the use of this method in real envi-
  ronments, where LBS providers simply answer queries containing a location or
  an area without any regard for location privacy. However, if this shortcoming
  was solved and without significant computation and efficiency penalties, using
  collaborative PIR amongst peers (i.e. users) could be a really promising future
  research line.


  5   Discussion and future work
  In the above sections we have reviewed some of the most recent and relevant
  contributions to location privacy protection in LBS. There is a clear distinc-
  tion between TTP-based schemes and the TTP-free ones. Although TTP-based
  schemes are the most common ones, TTP-free schemes seem superior in terms
  of privacy due to the following shortcomings of intermediate TTPs: (i) TTPs
  are critical points which can be attacked; (ii) TTPs are bottlenecks; (iii) There
  must be many users subscribed to a TTP for the latter to be able to compute
  suitable cloaking regions (offering sufficient privacy and accuracy).
       In general TTP-based schemes are weak because users rely on a single trusted
  entity. This entity can be impersonated by a bogus TTP created by the attacker,
  in which case all the information shared by users with the bogus TTP falls in
  the hands of the attacker. A way to mount such an attack is to tamper with
  transmitters or use a more powerful signal.
       Despite being inferior regarding privacy, TTP-based schemes are easier to
  implement than collaborative-based methods because all the infrastructure re-
  quired by users to circumvent the use of a TTP is not necessary. However,
  obfuscation-based methods are also easy to implement. We believe that there is
  room in the market for both approaches.
       The use of k-anonymity and l-diversity properties must be carefully consid-
  ered because in some scenarios they are insufficient to preserve user’s privacy [40].
  In our opinion, there are a lot of opportunities for synergy between future work
  in PIR and TTP-free LBS privacy. Indeed, current PIR techniques face the (very
  serious) limitation of needing co-operation from the database server in following
  the PIR protocol. If practical PIR protocols are developed which do not need
  such a co-operation, it will be possible to use them for TTP-free location privacy:
  if a query can be submitted to a non-co-operative commercial LBS server in such
  a way that the latter does not learn what the query is about (i.e. the location
  supplied by the user), then one obtains a TTP-free LBS privacy protocol.


  Acknowledgements
  The authors are solely responsible for the views expressed in this article, which
  do not necessarily reflect the position of UNESCO nor commit that organisation.
  This work was partly supported by the Spanish Ministry of Education through


                                          20
Location Privacy in Location-Based Services: Beyond TTP-based Schemes

  projects TSI2007-65406-C03-01“E-AEGIS” and CONSOLIDER INGENIO 2010
  CSD2007-0004 “ARES”, and by the Government of Catalonia under grant 2005
  SGR 00446.
     The first author thanks Dr. Paul A. Karger (IBM T. J. Watson Research
  Center) for his help in providing relevant references and material.


  References
   1. Drane, C., Macnaughtan, M., Scott, C.: Positioning GSM telephones. IEEE Com-
      munications Magazine 36(4) (April 1998) 46 – 54 , 59
   2. Simcock, T., Hillenbrand, S.P., Thomas, B.H.: Developing a location based tourist
      guide application. In Johnson, C., Montague, P., Steketee, C., eds.: ACSW Fron-
      tiers ’03: Proceedings of the Australasian information security workshop confer-
      ence on ACSW frontiers 2003. Volume 21 of CRPIT., Darlinghurst, Australia,
      Australian Computer Society, Inc. (February 2003) 177–183
   3. Yoo, K., Park, D., Rhee, B.: Development of a location-based dynamic route
      guidance system of korea highway corporation. In Satoh, K., ed.: Proceedings of
      the Eastern Asia Society for Transportation Studies. Volume 5., Bangkok, Eastern
      Asia Society for Transportation Studies (2005) 1449 – 1463
   4. Reed, J.H., Krizman, K.J., Woerner, B.D., Rappaport, T.S.: An overview of the
      challenges and progress in meeting the E-911 requirement for location privacy.
      IEEE Communications Magazine 36(4) (April 1998) 30 – 37
   5. Kölmel, B., Alexakis, S.: Location based advertising. In: The First International
      Conference on Mobile Business, Athens, Greece (2002)
   6. Karger, P.A., Frankel, Y.: Security and privacy threats to ITS. In: The Second
      World Congress on Intelligent Transport Systems. Volume 5., Yokohama, Japan.
      (November 1995) 2452 – 2458
   7. Gruteser, M., Grunwald, D.: Anonymous usage of location-based services through
      spatial and temporal cloaking. In: Proceesings of MobiSys 2003: The First Interna-
      tional Conference on Mobile Systems, Applications, and Services., San Francisco,
      CA, USA, USENIX Association, ACM, Sigmobile, ACM (May 2003) 31 – 42
   8. The European Parliament and the Council: Directive 2002/58/EC on privacy and
      electronic communications. Official Journal of the European Communities 201
      (July 2002) 37 – 47
   9. 108th Congress: H.R. 71: The wireless privacy protection act. In: United States
      House of Representatives. (2003-4) http://frwebgate.access.gpo.gov/cgi-bin/
      getdoc.cgi?dbname=108_cong_bills&docid=f:h71ih.txt.pdf.
  10. Atluri, V., Shin, H.: Efficient security policy enforcement in a location based service
      environment. In Baker, S., Ahn, G., eds.: Data and Applications Security. Volume
      4602 of LNCS., IFIP, Springer Berlin / Heidelberg (2007) 61–76
  11. Shin, H., Atluri, V., Vaidya, J.: A profile anonymization model for privacy in a
      personalized location based service environment. In: 9th International Conference
      on Mobile Data Management. MDM’08. (2008) 73–80
  12. Hoh, B., Gruteser, M.: Protecting location privacy through path confusion. In:
      Proceedings of the First International Conceference on Security and Privacy for
      Emerging Areas in Communications Networks. (2005) 194–205
  13. Terrovitis, M., Mamoulis, N.: Privacy preservation in the publication of trajec-
      tories. In: 9th International Conference on Mobile Data Management. MDM’08.
      (2008) 65–72


                                             21
Location Privacy in Location-Based Services: Beyond TTP-based Schemes

  14. Ruppel, P., Treu, G., Küpper, A., Linnhoff-Popien, C.: Anonymous user tracking
      for location-based community services. In Hazas, M., Krumm, J., Strang, T., eds.:
      Second International Workshop on Location- and Context-Awareness. Volume 3987
      of LNCS., Springer Berlin / Heidelberg (2006) 116 – 133
  15. Snekkenes, E.: Concepts for personal location privacy policies. In: ACM Conference
      on Electronic Commerce. (2001) 48–57
  16. Cranor, L.F.: P3P: Making privacy policies more useful. IEEE Security & Privacy
      1(6) (2003) 50–55
  17. W3C: Platform for privacy preferences (P3P) project. Webpage (October 2007)
      http://www.w3.org/P3P/.
  18. Duri, S., Gruteser, M., Liu, X., Moskowitz, P., Perez, R., Singh, M., Tang, J.M.:
      Framework for security and privacy in automotive telematics. In: Proceedings of
      the 2nd international workshop on Mobile commerce, ACM Press New York, NY,
      USA (2002) 25 – 32
  19. Schulzrinne, H., Tschofenig, H., Morris, J., Cuellar, J., Polk, J.: Geolocation policy.
      Technical report, Internet Engineering Task Force (June 2008) http://www.ietf.
      org/internet-drafts/draft-ietf-geopriv-policy-17.txt.
  20. Bauer, L., Cranor, L.F., Reeder, R.W., Reiter, M.K., Vaniea, K.: A user study
      of policy creation in a flexible access-control system. In Czerwinski, M., Lund,
      A.M., Tan, D.S., eds.: Proceedings of the 2008 Conference on Human Factors in
      Computing Systems, ACM (2008) 543–552
  21. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans-
      actions on Knowledge and Data Engineering 13(6) (2001) 1010–1027
  22. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-
      anonymity and its enforcement through generalization and suppression. Technical
      report, SRI International (1998)
  23. Sweeney, L.: Achieving k-anonymity privacy protection using generalization and
      suppression. International Journal of Uncertainty, Fuzziness and Knowledge Based
      Systems 10(5) (2002) 571–588
  24. Sweeney, L.: k-anonimity: A model for protecting privacy. International Journal
      of Uncertainty, Fuzziness and Knowledge Based Systems 10(5) (2002) 557–570
  25. Gedik, B., Liu, L.: A customizable k-anonymity model for protecting location pri-
      vacy. In: Proceedings of the IEEE International conference on Distributed Com-
      puting Systems (ICDS’05). (2005) 620 – 629
  26. Cheng, R., Zhang, Y., Bertino, E., Prabhakar, S.: Preserving user location privacy
      in mobile data management infrastructures. In: 6th Workshop on Privacy En-
      hancing Technologies (PET). Volume 4258 of Lecture Notes in Computer Science.,
      Springer Berlin / Heidelberg (2006) 393 – 412
  27. Gedik, B., Liu, L.: Protecting location privacy with personalized k-anonymity: Ar-
      chitecture and algorithms. IEEE Transactions on Mobile Computing 7(1) (January
      2008) 1 – 18
  28. Bamba, B., Liu, L., Pesti, P., Wang, T.: Supporting anonymous location queries
      in mobile environments with privacygrid. In: International World Wide Web Con-
      ference WWW. (2008) 237–246
  29. Domingo-Ferrer, J.: Microaggregation for database and location privacy. In Etzion,
      O., Kuflik, T., Motro, A., eds.: Next Generation Information Technologies and
      Systems-NGITS. Volume 4032 of LNCS., Springer Berlin / Heidelberg (2006) 106–
      116
  30. Chow, C., Mokbel, M.F., Liu, X.: A peer-to-peer spatial cloaking algorithm for
      anonymous location-based services. In: GIS ’06: Proceedings of the 14th annual


                                             22
Location Privacy in Location-Based Services: Beyond TTP-based Schemes

      ACM international symposium on Advances in geographic information systems,
      Arlington, Virginia, USA, ACM (November 2006) 171–178
  31. Solanas, A., Martı́nez-Ballesté, A.: Privacy protection in location-based services
      through a public-key privacy homomorphism. In: Fourth European PKI Work-
      shop: theory and practice. Lecture Notes in Computer Science, Springer Berlin /
      Heidelberg (2007) 362 – 368 Palma de Mallorca, Spain.
  32. Solanas, A., Martı́nez-Ballesté, A.: A TTP-free protocol for location privacy in
      location-based services. Computer Communications 31(6) (April 2008) 1181–1191
  33. Okamoto, T., Uchiyama, S.: A new public-key cryptosystem as secure as factoring.
      In: Advances in Cryptology EUROCRYPT’98. Volume 1403 of Lecture Notes in
      Computer Science., Springer Berlin / Heidelberg (1998) 308
  34. Duckham, M., Kulit, L.: Location Privacy and Location-Aware Computing. Num-
      ber 3. In: Dynamic and Mobile GIS: Investigating Changes in Space and Time.
      CRC Press (2007) 35–52
  35. Duckham, M., Kulit, L.: A formal model of obfuscation and negotiation for loca-
      tion privacy. In: Pervasive Computing. Volume 3468 of LNCS., Springer Berlin /
      Heidelberg (2005) 152–170
  36. Duckham, M., Mason, K., Stell, J., Worboys, M.: A formal approach to imper-
      fection in geographic information. Computers, Environment and Urban Systems
      25(1) (2001) 89–103
  37. Ardagna, C.A., Cremonini, M., Damiani, E., S. De Capitani di Vimercati, Sama-
      rati, P.: Location privacy protection through obfuscation-based techniques. In
      Baker, S., Ahn, G., eds.: Data and Applications Security. Volume 4602 of LNCS.,
      IFIP (2007) 47 – 60
  38. Yiu, M.L., Jensen, C.S., Huang, X., Lu, H.: Spacetwist: Managing the trade-offs
      among location privacy, query performance, and query accuracy in mobile services.
      In: IEEE 24th International Conference on Data Engineering ICDE’08. (2008) 366–
      375
  39. Ghinita, G., Kalnis, P., Khoshgozaran, A., Shahabi, C., Tan, K.: Private queries
      in location based services: Anonymizers are not necessary. In: SIGMOD ’08: Pro-
      ceedings of the 2008 ACM SIGMOD international conference on Management of
      data, Vancouver, BC, Canada, ACM (June 2008) 121 – 132
  40. Pareschi, L., Riboni, D., Bettini, C.: Protecting users’ anonymity in pervasive
      computing environments. In: Sixth Annual IEEE International Conference on Per-
      vasive Computing and Communication (PERCOM’08), IEEE Computer Society
      (2008) 11–19


                                           23
       Privacy in Georeferenced Context-aware
                  Services: A Survey

              Daniele Riboni, Linda Pareschi, and Claudio Bettini

                   EveryWare Lab, D.I.Co., University of Milano
                      via Comelico 39, I-20135 Milano, Italy
                   {riboni,pareschi,bettini}@dico.unimi.it


      Abstract. Location based services (LBS) are a specific instance of a
      broader class of Internet services that are predicted to become popular
      in a near future: context-aware services. The privacy concerns that LBS
      have raised are likely to become even more serious when several context
      data, other than location and time, are sent to service providers as part
      of an Internet request. This paper provides a classification and a brief
      survey of the privacy preservation techniques that have been proposed
      for this type of services. After identifying the benefits and shortcomings
      of each class of techniques, the paper proposes a combined approach
      to achieve a more comprehensive solution for privacy preservation in
      georeferenced context-aware services.


1   Introduction

It is widely recognized that the success of context-aware services is conditioned
to the availability of effective privacy protection mechanisms (see, e.g., [1, 2]).
Techniques for privacy protection have been thoroughly studied in the field of
databases, in order to protect microdata released from large repositories. Re-
cently some of these techniques have been extended and integrated with new
ones to preserve the privacy of users of Location Based Services (LBS) against
possibly untrusted service providers as well as against other types of adversaries
[3]. The domain of service provisioning based on location and time of request
introduces novel challenges with respect to traditional privacy protection in mi-
crodata release. This is mainly due to the dynamic nature of the service paradigm
which requires a form of online privacy preservation technique as opposed to an
offline one used, for example, in the publication of a view from a database. In
the case of LBS, specific techniques are also necessary to process the spatio-
temporal information describing location and time of request which is also very
dynamic. On the other hand, location and time are only two of the possibly
many parameters characterizing the context of an Internet service request. In-
deed, context information goes far beyond location and time, including data
such as personal preferences and interests, current activity, physiological and
emotional status, and data collected from body-worn or environmental sensors,
just to name a few. Privacy protection techniques specifically developed for LBS


                                        24
Privacy in Georeferenced Context-aware Services: A Survey

  are often insufficient and/or inadequate when applied to generic context-aware
  services.
      Consider, for instance, cryptographic techniques proposed for LBS (e.g., [4,
  5]). These techniques provide strong privacy guarantees at the cost of high com-
  putational overhead on both the client and server side; moreover, they introduce
  expensive communication costs. Hence, while they may be profitably applied
  to simple LBS such as nearest neighbor services, it is unlikely that they would
  be practical for complex context-aware services. On the other hand, obfuscation
  techniques proposed for LBS (e.g., [6, 7]) are specifically addressed to location in-
  formation; hence, those techniques cannot be straightforwardly applied to other
  contextual domains. With respect to techniques based on identity anonymity in
  LBS (e.g., [8, 9]) we point out that, since many other kinds of context data be-
  sides location may help an adversary in identifying the owner of those data, the
  amount of context data to be generalized in order to enforce anonymity is large.
  Hence, even if filtering techniques can be used for improving the service response,
  it could happen that in order to achieve the desired anonymity level, context data
  become too general to provide the service at an acceptable quality level [10]. For
  this reason, specific anonymity techniques for generic context-aware services are
  needed.
      Moreover, in pervasive computing environments context-aware services can
  exploit data provided by sensors deployed in the environment that can constantly
  monitor context data. Hence, if those context sources are compromised, an ad-
  versary’s inference abilities may increase taking advantage of the observation of
  users’ behavior and of up-to-date context information. Defense techniques for
  privacy preservation proposed for LBS do not consider this kind of inference
  capabilities, since location and time are the only contextual parameters that are
  taken into account. As a result, protecting against the above mentioned kind of
  attacks requires new techniques.
      In this paper we survey privacy protection techniques for georeferenced context-
  aware services. As depicted in Figure 1, the general privacy threat we are facing
  is the release of sensitive associations between a user’s identity and the infor-
  mation that she considers private. The actual privacy risk certainly depends on
  the adversary’s model; for the purpose of this survey, unless we mention specific
  attacks, we adopt the general assumption that an adversary may obtain service
  requests and responses as well as publicly available information.
     We distinguish different types of defense techniques that can be used to
  contrast the privacy threat.
   ◦ Network and cryptographic protocols. These are mainly used to avoid
     that an adversary can access the content of a request or response while it is
     transmitted as well as to avoid that a network address identifies the location
     and/or the issuer of a request.
   ◦ Access control mechanisms. These are used to discriminate (possibly
     based on context itself) the entites that can obtain certain context informa-
     tion.


                                          25
Privacy in Georeferenced Context-aware Services: A Survey


                               Fig. 1. The privacy threat


   ◦ Obfuscation techniques. Under this name we group the techniques, usu-
     ally based on generalization or partial suppression, that limit the disclosure
     of private information contained in a request. Intuitively, they control the
     release of the second part of the association describing the privacy threat.
   ◦ Identity anonymization techniques. These are techniques that aim at
     avoiding the release of the first part of the association, i.e., the identity of the
     issuer. The goal is to make the issuer indistinguishable among a sufficiently
     large number of individuals.
  This classification may apply as well to defenses against LBS privacy threats,
  however our description of available approaches and solutions will be focused
  on those for more complex context-aware services. Sections 2, 3, 4, and 5 ad-
  dress each of the above types of defenses, respectively. Based on the weaknesses
  emerged from the analysis of the existing techniques, in Section 6 we advocate
  the use of a combined approach, present preliminary proposals, and illustrate
  the general characteristics that a comprehensive combined approach may have.
  Section 7 concludes the paper.


  2    Network and cryptographic protocols

  The development of context-aware services received impulse by technological
  progresses in the area of wireless communications, mobile devices, and sensors.
  The use of wireless channels, and more generally insecure channels, poses a first
  threat for the users’ privacy since it makes easier for an adversary to acquire ser-
  vice requests and responses by eavesdropping the communication or analyzing
  traffic on the network. In the literature, several models have been proposed for
  privacy preservation in context-aware systems. While some of them rely on a cen-
  tralized architecture with a single trusted entity in charge of ensuring the users’
  privacy, other models rely on a decentralized architecture in which mobile devices
  use direct communication channels with service providers. decentralized archi-


                                           26
Privacy in Georeferenced Context-aware Services: A Survey

  tectures in which mobile communication channels with service providers. In both
  cases, two natural countermeasures for privacy attacks are: a) implement secure
  communication channels so that no third party can obtain requests/responses
  while they are in transit, and b) avoid the recognition of the client’s network
  address, even by the service provider, which may be untrusted.
      In order to protect point-to-point communications, in addition to standard
  wireless security, different cryptographic techniques can be applied. One possi-
  bility is clearly for applications to rely on SSL to encrypt communication; an
  alternative (or additional) possibility is to provide authentication, authorization
  and channel encryption through systems like Kerberos ([11]). Kerberos is based
  on a centralized entity, Key Distribution Center (KDC), in charge of authen-
  ticating clients and servers in the network, and providing them with the keys
  needed for encrypting the communications. The centralized model that inspires
  Kerberos does not protect from attacks aimed at acquiring the control of the
  KDC entity. Specific solutions to communication protection also depend on the
  considered architecture and adversary’s model, and are outside the scope of this
  paper.
      Different approaches ([12, 13]) aim at guaranteeing a certain degree of an-
  onymity working at the IP level. The Tarzan system ([12]) adopted a solution
  based on a network overlay that clusters nodes in subnetworks called domains
  on the base of their IP addresses. The IP hiding is achieved by the substitution
  of the sender’s IP with the pseudonym corresponding to its domain. Moreover,
  when a node needs to send a packet, its communications are filtered by a spe-
  cial server called mimic that is in charge of i) substituting the IP and other
  information that could reveal the sender identity with the adequate pseudonym,
  and ii) of setting a virtual path (tunnel ) that guarantees the communication
  encryption.
      Most solutions presented in the literature apply a combination of routing pro-
  tocols for IP hiding, and cryptographic techniques ([14]) to protect from eaves-
  dropping over the communication channel. Onion Routing ([15]) implements
  both the features of IP hiding and message encryption. In order to preserve
  the sender’s IP address, each message travels towards the receiver via a series
  of proxies, called onion routers, which choose the next component of the path
  setting an unpredictable route. Each router in the path re-encrypts the message
  before forwarding it to the next router. However, even these solutions suffer from
  attacks aimed at acquiring the control of one or more nodes of the network.
      A different application of a privacy-preserving routing protocol is presented
  in [16]: the proposed solution has been designed for protecting the user’s privacy
  while moving in smart environments. This solution is based on a hierarchy of
  trusted servers where the leaves, called portals, are aware of the user’s location,
  while internal nodes are aware of services provided by the environment. The user
  accesses the network through a portal and, according to her privacy preferences,
  she is assigned to an internal node, called lighthouse, that has the task of filtering
  and encrypting all the communications between the user and the service provider.
  The lighthouse does not know the user’s position but is aware of the next hop


                                          27
Privacy in Georeferenced Context-aware Services: A Survey

  in the server hierarchy composing the path to the user’s portal. Similarly, the
  portal does not know which service the user is asking for, but it is aware of the
  path to the chosen lighthouse. The privacy preservation is achieved decoupling
  position data from both the identity information and other context parameters.
  However, this approach requires the servers in the hierarchy to be trusted and
  it does not protect by privacy attacks performed acquiring the control of one of
  the nodes in the structure.

      The use of cryptographic techniques can also be extended to hide from the
  service provider the exact request parameters as well as the response. This ap-
  proach has been proposed in the area of LBS where location information is often
  considered sensitive by users. In particular, solutions based on this approach aim
  at retrieving the nearest neighbor (NN) point of interest (poi) with respect to
  the user position at the time of the request.

      A first solution was proposed in [4]: the authors propose a form of encrypted
  query processing combining the use of a data structure suited for managing
  spatial information with a cryptographic schema for the secret sharing. On the
  server side, location data are handled through a directed acyclic graph (DAG),
  whose nodes correspond to Voronoi regions obtained by a tessellation of the
  space with respect to pois stored by the service provider. The query processing
  is performed according to the protocol proposed in [17] that allows a client to
  retrieve the correct Voronoi area without communicating its precise location.
  The drawback of this solution is that, in order to resolve a NN query, the user
  needs to send a number of queries that is proportional to the depth of the DAG
  instead of a single request. The consequent communication overhead impacts on
  the network traffic and on the response time, which are commonly considered
  important factors in mobile computing.

      Recently, a cryptographic approach inspired by the Private Information Re-
  trieval (PIR) field was proposed in [5]. The service provider builds a Voronoi
  tessellation according to the stored pois, and superimposes on its top a regular
  grid of arbitrary granularity. In order to obtain the response to a NN query the
  privacy preservation mechanism relies on a PIR technique that is used for en-
  crypting the user query, and for retrieving part of the location database without
  revealing spatial information. Some of the strong points of this solution are that
  location data are never disclosed; the user’s identity is confused among identities
  of all users; and no trusted third party is needed to protect the users’ privacy.
  However, since mobile devices are often characterized by limited computational
  capability, the query encryption and the answer processing performed at the
  client side have a strong impact on service response time, network and power
  consumption. In particular, when applied to context-aware services that perform
  the adaptation on a wide set of heterogeneous context data, this technique may
  result in unacceptable computation overhead both at the client and at the server
  side.


                                         28
Privacy in Georeferenced Context-aware Services: A Survey

  3    Access control in context-aware systems
  Pervasive computing environments claim for techniques to control release of
  data and access to resources on the basis of the context of users, environment,
  and hardware/software entities. In general, the problem of access control [18]
  consists in deciding whether to authorize or not a requesting entity (subject)
  to perform a given action on a given resource (object). Access control mecha-
  nisms have been thoroughly studied in many fields, including operating systems,
  databases, and distributed systems. However, the characteristic features of per-
  vasive environments introduce novel issues that must be taken into account for
  devising effective access control mechanisms. In particular, differently from cen-
  tralized organizational domains, pervasive environments are characterized by
  the intrinsic decentralization of authorization decisions, since the object owners
  (users, services, infrastructures) are spread through the environment, and may
  adopt different policies regarding disclosure of private information. Hence, spe-
  cific techniques to deal with the mobility and continuously changing context of
  the involved entities are needed to adapt authorizations to the current situation.
       To this aim various techniques for context-aware access control have been
  recently proposed. Context-aware access control strategies fall in two main cat-
  egories. The first category is the one of techniques aimed at granting or denying
  access to resources considering the context of the requesting user and of the
  resource (see, e.g., [19–21]). The second category is the one of techniques aimed
  at controlling the release of user’s context data on the basis of the context of
  the requesting entity and of the user herself. In this section we concentrate on
  techniques belonging to the latter category. On the contrary, techniques belong-
  ing to the former category are outside the scope of this paper, and will not be
  reviewed; however, we point out that, since those techniques imply the release of
  users’ context data to the access control mechanism, generally they also adopt
  strategies to enforce users’ privacy policies.
       Proposed context-aware access control mechanisms can be roughly classified
  in those that derive from discretionary (DAC) [22] and those that derive from
  role-based (RBAC) [23] access control. In DAC systems, the owner of each ob-
  ject is in charge of stating policies to determine the access privileges on the basis
  of the subject identity. These techniques are well suited to domains in which
  subjects do not belong to a structured organization (e.g., they are well suited to
  generic Internet services), since they are released from the burden of managing
  groups or roles of subjects. On the other hand, techniques based on RBAC (in
  which the access privileges depend on the subject role) are well suited to struc-
  tured organization domains (like, e.g., hospitals, companies), since the definition
  of functional roles simplifies the management of access control policies.
       Other techniques related to access-control in context-aware systems include
  the use of access-rights graphs and hidden constraints (e.g., [24]) as well as zero-
  knowledge proof theory [25] (e.g., [26]). These are called secret authorization
  mechanisms, since they allow an entity to certify to a verifier the possession
  of private information (e.g., context data) revealing neither the authorization
  policies nor the secret data.


                                          29
Privacy in Georeferenced Context-aware Services: A Survey

     In the following we briefly describe the access control techniques for context-
  awareness derived from DAC and RBAC models, respectively.

  Techniques derived from DAC. Even early approaches to discretionary ac-
  cess control allowed the expression of conditions to constrain permissions on the
  basis of the spatial and temporal characterization of the subject. For instance,
  in a bank setting, access to customer accounts could be acknowledged to au-
  thorized personnel only during working hours and from machines located within
  the bank. More recently, access control techniques specifically addressed to the
  protection of location information (e.g., [27]) have been proposed. However, the
  richness and dynamics of contextual situations that may occur in pervasive and
  mobile computing environments claim for the definition of formal languages to
  express complex conditions on a multitude of context data, as well as sufficiently
  expressive languages to represent the context itself. To this aim, Houdini [28]
  provides a comprehensive formal framework to represent dynamic context data,
  integrate them from heterogeneous sources, and share context information on the
  basis of users’ privacy policies. In particular, privacy policies can be expressed
  considering the context of the data owner (i.e., the user) and the context of the
  subject. As an example, a user of a service for locating friends could state a pol-
  icy to disclose her current location to her friends only if her mood is good and
  her current activity is not working. Privacy policies in Houdini are expressed in
  a restricted logic programming language supporting rule chaining but no cycles.
  Rules preconditions express conditions on context data, while postconditions ex-
  press permissions to access contextual information; reasoning with the resulting
  language has low computational complexity. Policy conflict resolution is based
  on explicit rule priorities.
      Another relevant proposal, specifically addressed to the preservation of mo-
  bile customers privacy, can be found in [29]. That work proposes an access control
  system aimed at controlling the release of private data based on time, location,
  and customer’s preferences. For instance, a user could state a policy to disclose
  her location and profile information only during the weekend and if she is in a
  mall, and only in exchange for a discount coupon on items in her shopping list.
  The proposed solution is based on an intermediary infrastructure in charge of
  managing location and profiles of mobile users and to enforce their privacy poli-
  cies. A specific index structure as well as algorithms are presented to efficiently
  enforce the proposed techniques.

  Techniques derived from RBAC. Many other existing approaches to context-
  aware access control are based on an extension of the RBAC model. As antici-
  pated before, RBAC systems are well-suited to structured organization domains.
  However, the baseline RBAC model is not adequate to pervasive and mobile com-
  puting domains, which are characterized by the dynamics of situations that may
  determine the role played by a given entity in a given context. For this rea-
  son, various proposals have been made to extend RBAC policies with contextual
  conditions (see, e.g., [19]), and in particular with spatio-temporal constraints
  (e.g., [30]). More recently, this approach has been applied to the privacy pro-


                                         30
Privacy in Georeferenced Context-aware Services: A Survey

  tection of personal context data. A proposal in this sense is provided by the
  UbiCOSM middleware [31], which tackles the comprehensive issue with mech-
  anisms to secure the access not only to services provided by ubiquitous infras-
  tructures, but also to users’ context data, based on contextual conditions and
  roles. The context model of UbiCOSM distinguishes between the physical di-
  mension, which describes the spatial characterization of the user, and the logical
  dimension, which describes other data such as the user’s current activity and de-
  vice capabilities. For instance, the context TouristAtMuseum is composed by the
  physical context AtMuseum (characterized by the presence of the user within the
  physical boundaries of a museum) and by the logical context Tourist (which de-
  fines the user’s role as the one of a tourist). Users can declare a policy to control
  the release of a personal context data as the association between a permission
  and a context in which the permission applies. Simple context descriptions can be
  composed in more complex ones by means of logical operators, and may involve
  the situation of multiple entities. For instance, in order to find other tourists
  that share her same interests, a user could state a policy to disclose her cultural
  preferences to a person only if their current context is TouristAtMuseum and
  they are both co-located with a person that is a friend of them both.
      Another worth-mentioning system is CoPS [32], which provides fine-grained
  mechanisms to control the release of personal context data, as well as techniques
  to identify misuse of the provided information. In particular, policies in CoPS are
  organized in a hierarchical manner, on the basis of the priority level of the policy
  (i.e., organization-level, user-level, default). Permissions depend on the context
  and the role of the subject. CoPS supports both administrator and user-defined
  roles. While the former reflect the hierarchical structure of the organization, the
  latter can be used to categorize entities in groups, in order to simplify the policy
  management by users. The system adopts a conflict resolution mechanism based
  on priorities and on the specificity of access control rules. Moreover, a trigger
  mechanism can be set up to control the release of particular context data against
  the frequency of the updates; this technique can be used, for instance, to notify
  the user in the case someone tries to track her movements by continuously polling
  her location.

  Open issues and remarks. As emerged from the above analysis of the state-
  of-the-art, the main strong point of techniques derived from DAC consists in the
  efficiency of the reasoning procedures they employ to evaluate at run-time the ac-
  cess privileges of the requesting entity. This characteristic makes them very well
  suited to application domains characterized by strict real-time requirements, like
  telecommunication and Internet services. On the other hand, the roles abstrac-
  tion adopted by techniques derived from RBAC can be profitably exploited not
  only in structured organizational domains but also in open environments (like
  ambient intelligence systems), since heterogeneous entities can be automatically
  mapped to predefined roles on the basis of the contextual situation to determine
  their access privileges.
       Nevertheless, some open issues about context-aware access control systems
  are worth to be considered. In particular, like in generic access control systems,


                                          31
Privacy in Georeferenced Context-aware Services: A Survey

  a formal model to represent policies and automatically recognize inconsistencies
  (especially in systems supporting the definition of negative authorizations) is
  needed; however, only part of the techniques proposed for context-aware com-
  puting face this issue. This problem is further complicated by the fact that the
  privacy policy of a subject may conflict with the privacy policy of an object
  owner. Proposed solutions for this issue include the use of techniques for secret
  authorization, like proposed in [24]. Moreover, an evident weakness of these sys-
  tems consists in their rigidity: if strictly applied, an access control policy either
  grants or denies access to a given object. This weakness is alleviated by the use
  of obfuscation techniques (reported in Section 4) to disclose the required data
  at different levels of accuracy on the basis of the current situation.
      A further critical issue for context-aware access control systems consists in
  devising techniques to support end users in self-defining privacy policies. Indeed,
  manual policy definition by users is an error-prone and tedious task. For this
  reason, straightforward techniques to support users’ policy definition consists
  in making use of user friendly interfaces and default policies, like in Houdini
  and in CoPS, respectively. However, a more sophisticated strategy to address
  this problem consists in the adoption of statistical techniques to automatically
  learn privacy policies on the basis of the past decisions of the user. To this
  aim, [33] propose the application of rough set theory to extract access control
  policies based on the observation of the user’s interaction with context-aware
  applications during a training period.
      As a final remark, we point out that context-aware access control systems
  do not protect privacy in the case the access to a service is considered a private
  information by itself (e.g., because it reveals particular interests or habits about
  the user). To address this issue, techniques aimed at enforcing anonymity exist
  and are reviewed in Section 5.


  4    Obfuscation of context data

  In some cases, the strict application of access control mechanisms (i.e., either
  deny or allow access to a given context data in a given situation) may be a too
  rigid strategy. For instance, consider the user of a service that redirects incoming
  calls and messages on the basis of the current activity. Suppose that the service is
  not completely trusted by the user; hence, since she considers her current activity
  (e.g., MeetingCustomers) a sensitive information, whether to allow or deny the
  access to her precise current activity may be unsatisfactory. Indeed, denying
  access to that data would determine the impossibility to take advantage of that
  service, while allowing access could result in a privacy violation. In this case, a
  more flexible solution is to obfuscate [34] the private data before communicating
  it to the service provider in order to decrease the sensitivity level of the data. For
  instance, the precise current activity MeetingCustomers could be obfuscated to
  the more generic activity BusinessMeeting. This solution is based on the intuition
  that each private data is associated to a given sensitivity level, which depends
  on the precision of the data itself; generally, the lesser the data is precise, the


                                          32
Privacy in Georeferenced Context-aware Services: A Survey

  lesser it is sensitive. Obfuscation techniques have been applied to the protection
  of microdata released from databases (e.g., [35]).
      Several techniques based on obfuscation have also been proposed to preserve
  the privacy of users of context-aware services. These techniques are generally
  coupled with an access control mechanism to tailor the obfuscation level to be
  enforced according to the trustiness of the subject and to the contextual situa-
  tion. However, in this section we concentrate on works that specifically address
  context data obfuscation. The main research issue in this field is to devise tech-
  niques to provide adequate privacy preservation while retaining the usefulness
  of the data to context-awareness purposes. We point out that, differently from
  techniques based on anonymity (reviewed in Section 5), techniques considered
  in this section do not protect against the disclosure of the user’s identity.
      Various obfuscation-based techniques to control the release of location infor-
  mation have been recently proposed (see, e.g., [36, 6, 7]), based on generalization
  or perturbation of the precise user’s position. One of the first attempts to sup-
  port privacy in generic context aware systems through obfuscation mechanisms
  is semantic eWallet [37], an architecture to support context-awareness by means
  of techniques to retrieve users’ context data while enforcing their privacy pref-
  erences. Users of the semantic eWallet may express their preferences about the
  accuracy level of their context data based on the requester’s identity and on the
  context of the request. That system supports both abstraction and falsification
  of context information. By abstraction, the user can decide to generalize the pro-
  vided data, or to omit some details about it. For instance, a user involved in a
  BusinessMeeting could decide to disclose her precise activity to a colleague only
  during working hours and if they both are located within a company building; ac-
  tivity should be generalized to Meeting in the other cases. On the other hand, by
  falsification the user can decide to deliberately provide false information in order
  to mask her precise current context in certain situations. For instance, a CEO
  could reveal to her secretary that she is currently AtTheDentist, while telling to
  the other employees that she is involved in a BusinessMeeting. In the seman-
  tic eWallet, context data are represented by means of ontologies. Obfuscation
  preferences are encoded as rules whose preconditions include a precise context
  data and conditions for obfuscation, and postconditions express the obfuscated
  context data to be disclosed if the preconditions hold.
      While in the semantic eWallet the mapping between precise and obfuscated
  information must be explicitly stated case-by-case, a more scalable approach
  to the definition of obfuscation preferences is proposed in [38]. That work copes
  with the multi-party ownership of context information in pervasive environments
  by proposing a framework to retrieve context information and distributing it on
  the basis of the obfuscation preferences stated by the data owner. It is worth
  to note that in the proposed framework the owner of the data is not necessar-
  ily the actual proprietary of the context source; instead, the data owner is the
  person whom the data refers to. For instance, the owner of data provided by
  a server-side positioning system is the user, not the manager of the positioning
  infrastructure; hence, the definition of obfuscation preferences about personal lo-


                                         33
Privacy in Georeferenced Context-aware Services: A Survey

  cation is left to the user. Obfuscation preferences are expressed by conditions on
  the current context, by specific context data, and by a maximum detail level at
  which that data can be disclosed in that context. The level of detail of a context
  data refers to the specificity of that data according to a predefined obfuscation
  ontology. Context data in an obfuscation ontology are organized as nodes into
  a hierarchy, such that parent nodes represent more general concepts with re-
  spect to their children; e.g., the activity MeetingCustomers has parent activity
  BusinessMeeting, which in turn has parent activity Working. For instance, an
  obfuscation preference could state to disclose the user’s current activity with a
  level 2 specificity in the case the requester is Bob and the request is made during
  working hours. In the case those conditions hold, the released data is calculated
  by generalizing the exact current activity up to the second level of the Activity
  obfuscation ontology (i.e., up to the level of the grandchildren of the root node),
  or to a lower level if the available information is less specific than that stated
  by the preference. Since manually organizing context data in an obfuscation
  ontology could be unpractical, a technique to automatically discover reasoning
  modules able to derive the data at the required specificity level is also presented.
      Based on the consideration that the quality of a context information (QoC,
  intended as its closeness to the physical reality it describes) is a strong indicator
  of privacy sensitiveness, Sheikh et al. propose the use of QoC to enforce users’
  privacy preferences [39]. In that work, the actual quality of the disclosed context
  data is negotiated between service providers and users. When a service provider
  needs a data regarding a user’s context, it specifies the QoC that it needs for that
  data in order to provide the service. On the other hand, the user specifies the
  maximum QoC she is willing to disclose for that data in order to take advantage
  of the service. Service requirements and user’s privacy preferences are commu-
  nicated to a middleware that is in charge of verifying if they are incompatible
  (i.e., if the service requires a data to a quality the user is not willing to provide).
  If this is not the case, obfuscation mechanisms are applied on that data in order
  to reach the quality level required by the service provider. QoC is specified on
  the basis of five indicators, i.e., precision, freshness, spatial and temporal reso-
  lution, and probability of correctness. Each context data is associated with five
  numerical values that express the quality of the data with respect to each of
  the five indicators. Given a particular context situation, a user can specify her
  privacy preferences for a context data by defining the maximum quality level for
  each of the five indicators that she is willing to disclose in that situation. For
  instance, the user of a remote health monitoring service could state to disclose
  vague context information to the caregivers when in a non-emergency context,
  while providing accurate data in the case of emergency.
     One inherent weakness of obfuscation techniques for privacy in context-
  awareness is evident: if the service provider requires a context data to a quality
  that the user is not willing to disclose, access to that service is not possible. In
  order to overcome this issue, anonymization techniques (presented in Section 5)
  have been proposed, which protect from the disclosure of the user’s identity,
  while possibly providing accurate context information.


                                           34
Privacy in Georeferenced Context-aware Services: A Survey

  5    Identity anonymization techniques
  While obfuscation techniques aim at protecting the right-hand side of the sensi-
  tive association (SA) (see Figure 1), the goal of techniques for identity anonymiza-
  tion is to protect the left-hand side of the SA in order to avoid that an adversary
  re-identifies the issuer of a request.
      In the area of database systems, the notion of k-anonymity has been in-
  troduced [40] to formally define when, upon release of a certain database view
  containing records about individuals, for any specific sensitive set of data in the
  view, the corresponding individual can be considered indistinguishable among
  at least k individuals. In order to enforce anonymity it is necessary to determine
  which attributes in a table play the role of quasi-identifiers (qi), i.e., data that
  joined with external knowledge may help the adversary to restrict the set of can-
  didate individuals. Techniques for database anonymization adopt generalization
  of qi values and/or suppression of records in order to guarantee that the set of
  released records can be partitioned in groups of at least k records having the
  same value for qi attributes (called qi-groups). Since each individual is assumed
  to be the respondent of a single record, this implies that there are at least k
  candidate respondents for each released record.
      The idea of k-anonymity has also been applied to define a privacy metric
  in location based services, as a specific kind of context-aware services [8]. In
  this case, the information being released is considered the information in the
  service request. In particular, the information about the user’s location may be
  used by an adversary to re-identify the issuer of the request if the adversary has
  access to external information about users’ location. Attacks and defense tech-
  niques in this context have been investigated in several papers, among which [8,
  9]. Moreover, a formal framework for the categorization of defense techniques
  with respect to the adversary’s knowledge assumptions has been proposed in [3].
  According to that categorization, when the adversary performs his attack using
  information contained in a single request the attack is said to be single-issuer ;
  otherwise, when the adversary may compare information included in requests
  by multiple users, the attack is said to be multiple-issuers. Moreover, cases in
  which the adversary can acquire information only during a single time granule
  are called static (or snapshot), while contexts in which the adversary may observe
  multiple requests issued by the same users in different time granules are called
  dynamic (or historical ). A possible technique to enforce anonymity in LBS is to
  generalize precise location data in a request to an area including a set (called
  anonymity set [41]) of other potential issuers. An important difference between
  the anonymity set in service requests and the qi-group in databases is that while
  the qi-group includes only identities actually associated to a record in the table,
  the anonymity set includes also users that did not issue any request but that are
  potential issuers with respect to the adversary’s external knowledge.
      With respect to identity anonymization in generic context-aware systems,
  it is evident that many other kinds of context data besides location may be
  considered qi. Hence, a large amount of context data must be generalized in
  order to enforce anonymity. As a consequence, the granularity of generalized


                                         35
Privacy in Georeferenced Context-aware Services: A Survey

  context data released to the service provider could be too coarse to provide the
  service at an acceptable quality level. In order to limit the information loss due
  to the generalization of context data, four different personalized anonymization
  models are proposed in [42]. These models allow a user to constrain the maximum
  level of location and profile generalization still guaranteeing the desired level of
  anonymity. For instance, a user could decide to constrain the maximum level of
  location generalization to an area of 1 km2 , while imposing no constraints on the
  level of generalization of her profile.
       As outlined in the introduction, sensing technologies deployed in pervasive
  environments can be exploited by adversaries to constantly monitor the users’
  behavior, thus exposing the user to novel kinds of privacy attacks, like the one
  presented in [43]. In that work it is shown that even enforcing k-anonymity, in
  particular cases the attacker may recognize the actual issuer of a service request
  by monitoring the behavior of the potential issuers with respect to service re-
  sponses. For example, consider a pervasive system of a gym, suggesting exercises
  on the basis of gender, age, and physiological data retrieved from body-worn
  sensors. Even if users are anonymous in a set of k potential issuers, the attacker
  can easily recognize the issuer of a particular request if she starts to use in a
  reasonable lapse of time a machine the system suggested to her, which was not
  suggested to any other potential issuer. The proposed solution relies on an in-
  termediary entity that filters all the communications between users and service
  providers, calculates the privacy threats corresponding to possible alternatives
  suggested by the service (e.g., the next exercise to perform), and automatically
  filters unsafe alternatives.
       A further issue to be considered is the defense against the well-known problem
  of homogeneity [44] identified in the field of databases. Homogeneity attacks can
  be performed if all the records belonging to a qi-group have the same value of
  sensitive information. In this case it is clear that the adversary may easily violate
  the users’ privacy despite anonymity is formally enforced. The same problem
  may arise as well in context-aware services in the case an adversary recognizes
  that all the users in an anonymity set actually issued a request with the same
  value of private information. To our knowledge, a first effort to defend against
  such attacks in context-aware systems has been presented in [45]. That proposal
  aims at protecting from multiple-issuers historical attacks by applying a bounded
  generalization of both context data and service parameters.


  6    Towards a comprehensive framework for privacy
       protection in context-aware systems

  Based on the weaknesses emerged from the analysis of the proposed techniques,
  in this section we advocate the use of a combined approach to address the com-
  prehensive issue of privacy in context awareness; we present existing proposals,
  and we illustrate the logical design of a framework intended to solve most of the
  identified problems.


                                          36
Privacy in Georeferenced Context-aware Services: A Survey

  On the need for a combined approach The analysis of the state-of-the-
  art reported in the previous sections has shown that each of the proposed ap-
  proaches, even if effective in a particular scenario and under particular assump-
  tions, fails in providing a solution to the general problem. In particular:
   ◦ cryptographic techniques for private information retrieval presented up to
     the time of writing are unfeasible to complex context-aware services, due to
     problems of bandwidth and computational resources consumption;
   ◦ protecting communication privacy between the context source and the con-
     text data consumer (e.g., the service provider) is useless in the case the
     context data consumer is untrusted;
   ◦ access control techniques (possibly coupled with obfuscation) are ineffective
     in the case the access to a service is a sensitive information by itself, since
     they do not protect from the disclosure of the user’s identity. Moreover, they
     do not prevent a malicious subject to adopt reasoning techniques in order to
     derive new sensitive information based on data it is authorized to access;
   ◦ techniques for identity anonymity rely on the exact knowledge about the ex-
     ternal information available to an adversary. However, especially in pervasive
     and mobile computing scenarios, such knowledge is very hard to obtain, and
     adopting worst-case assumptions about the external information leads to a
     significant degradation of the quality of released context data.
  These observations claim for the combination of different approaches in order to
  protect against the different kind of attacks that can be posed to the privacy of
  users taking advantage of context-aware services.

  Proposed techniques Proposals to combine different approaches in a common
  framework have been recently presented.
      In [46], an architecture for privacy-conscious context aggregation and reason-
  ing is illustrated. The proposed solution adopts client-side reasoning modules to
  abstract raw context data into significant descriptions of the user’s situation
  (e.g., current activity and stereotype) that can be useful for adaptation. Release
  of private context information is controlled by context-aware access control poli-
  cies, and the access to context information by service providers is mediated by a
  trusted intermediary infrastructure in charge of enforcing anonymity. Moreover,
  cryptographic techniques are used to protect communications inside the user
  trusted domain.
      Papadopoulou et al. present in [47] a practical solution to enforce anonymity.
  In that work, no assumptions about the external knowledge available to an
  adversary are made; hence, the proposed technique does not formally guarantee
  a given anonymity level. For this reason, the anonymization technique is coupled
  with access control and obfuscation mechanisms in order to protect privacy in
  the case an adversary is able to discover the user’s identity. That technique is
  applied using the virtual identity metaphor. A virtual identity is essentially the
  subset of context data that a user is willing to share with a third party in a
  given situation; in addition, since anonymity is not formally guaranteed, part


                                        37
Privacy in Georeferenced Context-aware Services: A Survey


               SERVICES                  CONTEXT-AWARE SERVICE PROVIDER                                  LOCATION
                                                                                                          SERVER


                                 credentials     context data    service parameters
                                        anonymous / encrypted communication

                                                   CONTEXT-       SECRET AUTHORIZATION
            CONTEXT-AWARE       ANONYMOUS           AWARE
            PRIVACY MODULE    AUTHENTICATION        ACCESS                                             spatio-temporal
                                                   CONTROL              ID ANONYMIZER              information / statistics


                                credentials     policies       context data
                                         encrypted channels

                                                              CONTEXT REASONER / OBFUSCATOR
           CONTEXT PROVIDER
                                   POLICIES AND PROFILE                  SENSOR DATA FUSION


                              encrypted channel                encrypted channel

                                PROFILE
             USER DEVICE                          SENSOR DATA FUSION
                              MANAGEMENT


                                                   encrypted channels               encrypted channels


               SENSORS
                                                   body-worn sensors               environmental sensors


                               Fig. 2. The envisioned framework


  of the shared context data can be obfuscated on the basis of privacy policies in
  order to hide some sensible details. For instance, a person could decide to share
  her preferences regarding shopping items and leisure activities, as well as her
  obfuscated location, when she is on vacation (using a tourist virtual identity),
  while hiding those information when she is traveling for work (using a worker
  virtual identity). With respect to the problem introduced by multiple requests
  issued by the same user, specific techniques are presented to avoid that different
  virtual identities can be linked to the same (anonymous) user by an adversary.
      While the above mentioned works try to protect the privacy of users accessing
  a remote service, the AnonySense system [48] is aimed at supporting privacy in
  opportunistic sensing applications, i.e., applications that leverage opportunistic
  networks formed by mobile devices to acquire aggregated context data in a par-
  ticular region. To reach this goal, the geographic area is logically partitioned into
  tiles large enough to probabilistically gain k-anonymity; i.e., regions visited with
  high probability by more than k persons during a given time granule. Measure-
  ments of context data are reported by mobile nodes specifying the tile they refer
  to and the time interval during which they were acquired. Moreover, in order to
  provide a second layer of privacy protection, obfuscation is applied on the sensed
  data by fusing the values provided by at least l nodes (l ≤ k) before commu-
  nicating the aggregated data to the application. Cryptographic techniques are
  used to enforce anonymous authentication by users of the system.

  Towards a comprehensive framework We now illustrate how existing tech-
  niques can be extended and combined in a logical multilayer framework, which
  is graphically depicted in Figure 2. This framework is partially derived from


                                                       38
Privacy in Georeferenced Context-aware Services: A Survey

  the preliminary architecture described in [46]. However, the model presented
  here is intended to provide a more comprehensive privacy solution, addressing
  problems regarding sensor and profile data aggregation and reasoning (including
  obfuscation), context-aware access control and secret authorization, anonymous
  authentication, identity anonymity, and anonymous/encrypted communication.
  Clearly, the actual techniques to be applied for protecting privacy depend on
  the current context (users’ situation, available services, network and environ-
  mental conditions). However, we believe that this framework is flexible enough
  to provide effective privacy protection in most pervasive and mobile computing
  scenarios. The framework is composed of the following layers:

   ◦ Sensors layer: This layer includes body-worn and environmental sensors
     that communicate context data to the upper layers through encrypted chan-
     nels using energy-efficient cryptographic protocols (e.g., those based on el-
     liptic curves [49] like in Sun SPOT sensors [50]). We assume that this layer
     is within the trusted domain of the user (i.e., sensors do not deliberately
     provide false information).
   ◦ User device layer: This layer is in charge of managing the user’s profile
     information (i.e., context data that are almost static, like personal informa-
     tion, interests and preferences) and privacy policies. Upon update of this
     information by the user, the new information is communicated to the up-
     per layer. Moreover, this layer is in charge of fusing context data provided
     by body-worn sensors and to communicate them in an aggregated form to
     the upper layer on a per-request basis (e.g., when those data are required
     by a service for performing adaptation). This layer is deployed on the user’s
     device, which is assumed to be trusted (traditional security issues are not ad-
     dressed here); communications with the upper layer are performed through
     encrypted channels.
   ◦ Context provider layer: This layer is in charge of fusing sensor data pro-
     vided by the lower layers, including those provided by sensors that are not
     directly under the communication range of the user device. Moreover, ac-
     cording to the user’s policies, it performs context reasoning and obfuscation
     for privacy and adaptation purposes, as described in [46]. It communicates
     user’s credentials, privacy policies, and context data to the upper layer on a
     per-request basis through encrypted channels. This layer belongs to the user’s
     trusted domain; depending on the device capabilities, it can be deployed on
     the user’s device itself, or on another trusted machine.
   ◦ Context-aware privacy module layer: This layer is in charge of anony-
     mously authenticating the user on the upper layer, and to enforce her context-
     aware access control policies, possibly after a phase of secret negotiation with
     the third party. Moreover, depending on the user’s policies, it can possibly
     anonymize the user’s identity on the basis of (either precise or statistical)
     trusted information received from the upper layer (e.g., spatio-temporal in-
     formation about users received from a trusted location server). Protocols for
     anonymous/encrypted communication are adopted to provide credentials,


                                         39
Privacy in Georeferenced Context-aware Services: A Survey

      context data and service parameters to the upper layer. This layer belongs
      to the user’s trusted domain. Depending on device capabilities and on char-
      acteristics of the actual algorithms it adopts (e.g., to enforce anonymity), this
      layer can be implemented on the user’s device, on another trusted machine,
      or on the infrastructure of a trusted entity (e.g., the network operator).
   ◦ Services layer: This layer is composed of context-aware service providers
     and other infrastructural services (e.g., location servers). Typically, this layer
     is assumed not to belong to the user’s trusted domain, even if particular
     services can be trusted by the user (e.g., a network operator location server).


  7    Conclusions
  Through a classification into four main categories of techniques, we have de-
  scribed the state of the art of privacy preservation for georeferenced context-
  aware services. While previous work has also proposed the combination of tech-
  niques from two or more categories, we claim that a deeper integration is needed
  and we propose an architecture for a comprehensive framework towards this goal.
  Clearly, there is still a long way to go in order to refine the architecture, work out
  the details of its components, implement and integrate the actual techniques, and
  test the framework on real applications. Moreover, there are still several other as-
  pects, not considered in our paper, that deserve investigation. For example, since
  there are well-known techniques for context reasoning, they may have to be taken
  into account, since released context data may determine the disclosure of other
  context data, possibly leading to privacy leaks that were previously unidentified.
  Furthermore, computationally expensive techniques (e.g., those making use of
  ontological reasoning or complex cryptographic algorithms) pose serious scala-
  bility issues that may limit their applicability in real-world scenarios. Finally,
  since the access to context data of real users is generally unavailable for privacy
  reasons, sophisticated simulation environments are needed to evaluate the actual
  effectiveness of privacy preservation mechanisms in realistic situations.


  Acknowledgments
  This work was partially supported by National Science Foundation (NSF) under
  grants N. CNS-0716567 and N. IIS-0430402, and by Italian MIUR under grant
  InterLink II04C0EC1D.


  References
   1. Palen, L., Dourish, P.: Unpacking “privacy” for a networked world. In: Proceedings
      of the 2003 Conference on Human Factors in Computing Systems (CHI 2003), ACM
      (2003) 129–136
   2. Lederer, S., Hong, J.I., Dey, A.K., Landay, J.A.: Personal privacy through under-
      standing and action: five pitfalls for designers. Personal and Ubiquitous Computing
      8(6) (2004) 440–454


                                           40
Privacy in Georeferenced Context-aware Services: A Survey

   3. Bettini, C., Mascetti, S., Wang, X.S.: Privacy Protection through Anonymity in
      Location-based Services. Handbook of Database Security: Applications and Trends
      (2008) 509–530
   4. Atallah, M.J., Frikken, K.B.: Privacy-Preserving Location-Dependent Query Pro-
      cessing. In: ICPS ’04: Proceedings of the The IEEE/ACS International Conference
      on Pervasive Services, IEEE Computer Society (2004) 9–17
   5. Ghinita, G., Kalnis, P., Khoshgozaran, A., Shahabi, C., Tan, K.L.: Private queries
      in location based services: anonymizers are not necessary. In: Proceedings of
      the ACM SIGMOD International Conference on Management of Data (SIGMOD
      2008), ACM (2008) 121–132
   6. Ardagna, C.A., Cremonini, M., Damiani, E., De Capitani di Vimercati, S., Sama-
      rati, P.: Location Privacy Protection Through Obfuscation-Based Techniques. In:
      Proceedings of the 21st Annual IFIP WG 11.3 Working Conference on Data and
      Applications Security (DBSec’07). Volume 4602 of Lecture Notes in Computer
      Science., Springer (2007) 47–60
   7. Yiu, M.L., Jensen, C.S., Huang, X., Lu, H.: SpaceTwist: Managing the Trade-
      Offs Among Location Privacy, Query Performance, and Query Accuracy in Mobile
      Services. In: Proceedings of the 24th International Conference on Data Engineering
      (ICDE 2008), IEEE Computer Society (2008) 366–375
   8. Gruteser, M., Grunwald, D.: Anonymous Usage of Location-Based Services
      Through Spatial and Temporal Cloaking. In: Proc. of the 1st International Con-
      ference on Mobile Systems, Applications and Services (MobiSys), USENIX Asso-
      ciation (2003) 31–42
   9. Gedik, B., Liu, L.: Protecting Location Privacy with Personalized k-Anonymity:
      Architecture and Algorithms. IEEE Transactions on Mobile Computing 7(1)
      (2008) 1–18
  10. Aggarwal, C.C.: On k-Anonymity and the Curse of Dimensionality. In: Proceedings
      of the 31st International Conference on Very Large Data Bases (VLDB), ACM
      (2005) 901–909
  11. Neuman, B., Ts’o, T.: Kerberos: an authentication service for computer networks.
      Communications Magazine, IEEE 32(9) (Sep 1994) 33–38
  12. Freedman, M.J., Morris, R.: Tarzan: a peer-to-peer anonymizing network layer.
      In: CCS ’02: Proceedings of the 9th ACM conference on Computer and communi-
      cations security, ACM (2002) 193–206
  13. Reiter, M.K., Rubin, A.D.: Anonymous web transactions with crowds. Commun.
      ACM 42(2) (1999) 32–48
  14. Dingledine, R., Mathewson, N., Syverson, P.: Tor: the second-generation onion
      router. In: SSYM’04: Proceedings of the 13th conference on USENIX Security
      Symposium, USENIX Association (2004) 21–21
  15. Goldschlag, D., Reed, M., Syverson, P.: Onion routing. Commun. ACM 42(2)
      (1999) 39–41
  16. Al-Muhtadi, J., Campbell, R., Kapadia, A., Mickunas, M.D., Yi, S.: Routing
      Through the Mist: Privacy Preserving Communication in Ubiquitous Computing
      Environments. In: Proceedings of the 22 nd International Conference on Dis-
      tributed Computing Systems (ICDCS’02), IEEE Computer Society (2002) 74
  17. Atallah, M.J., Du, W.: Secure multi-party computational geometry. In: WADS ’01:
      Proceedings of the 7th International Workshop on Algorithms and Data Structures,
      Springer-Verlag (2001) 165–179
  18. Samarati, P., De Capitani di Vimercati, S.: Access Control: Policies, Models, and
      Mechanisms. In: Foundations of Security Analysis and Design, Tutorial Lectures.
      Volume 2171 of Lecture Notes in Computer Science., Springer (2001) 137–196


                                          41
Privacy in Georeferenced Context-aware Services: A Survey

  19. Kumar, A., Karnik, N.M., Chafle, G.: Context sensitivity in role-based access
      control. Operating Systems Review 36(3) (2002) 53–66
  20. Covington, M.J., Fogla, P., Zhan, Z., Ahamad, M.: A Context-Aware Security
      Architecture for Emerging Applications. In: Proceedings of the 18th Annual Com-
      puter Security Applications Conference (ACSAC 2002), IEEE Computer Society
      (2002) 249–260
  21. Toninelli, A., Montanari, R., Kagal, L., Lassila, O.: Proteus: A Semantic Context-
      Aware Adaptive Policy Model. In: Proceedings of the 8th IEEE International
      Workshop on Policies for Distributed Systems and Networks(POLICY 2007), IEEE
      Computer Society (2007) 129–140
  22. Sandhu, R., Samarati, P.: Access Control: Principles and Practice. IEEE Commu-
      nications 32(9) (1994) 40–48
  23. Sandhu, R.S., Coyne, E.J., Feinstein, H.L., Youman, C.E.: Role-Based Access
      Control Models. IEEE Computer 29(2) (1996) 38–47
  24. Hengartner, U., Steenkiste, P.: Avoiding Privacy Violations Caused by Context-
      Sensitive Services. Pervasive and Mobile Computing 2(3) (2006) 427–452
  25. Brands, S.A.: Rethinking Public Key Infrastructures and Digital Certificates:
      Building in Privacy. MIT Press (2000)
  26. Wang, C.D., Feng, L.C., Wang, Q.: Zero-Knowledge-Based User Authentication
      Technique in Context-aware System. Multimedia and Ubiquitous Engineering,
      2007. MUE ’07. International Conference on (April 2007) 874–879
  27. Hengartner, U., Steenkiste, P.: Access control to people location information. ACM
      Trans. Inf. Syst. Secur. 8(4) (2005) 424–456
  28. Hull, R., Kumar, B., Lieuwen, D., Patel-Schneider, P., Sahuguet, A., Varadarajan,
      S., Vyas, A.: Enabling Context-Aware and Privacy-Conscious User Data Shar-
      ing. In: Proceedings of the 2004 IEEE International Conference on Mobile Data
      Management (MDM’04), IEEE Computer Society (2004) 187–198
  29. Atluri, V., Shin, H.: Efficient Security Policy Enforcement in a Location Based
      Service Environment. In: Proceedings of Data and Applications Security XXI, 21st
      Annual IFIP WG 11.3 Working Conference on Data and Applications Security.
      Volume 4602 of Lecture Notes in Computer Science., Springer (2007) 61–76
  30. Atluri, V., Chun, S.A.: A geotemporal role-based authorisation system. Interna-
      tional Journal of Information and Computer Security 1(1–2) (2007) 143–168
  31. Corradi, A., Montanari, R., Tibaldi, D.: Context-Based Access Control Manage-
      ment in Ubiquitous Environments. In: Proceedings of the 3rd IEEE International
      Symposium on Network Computing and Applications (NCA 2004), IEEE Com-
      puter Society (2004) 253–260
  32. Sacramento, V., Endler, M., Nascimento, F.N.: A Privacy Service for Context-
      aware Mobile Computing. In: Proceedings of the First International Conference
      on Security and Privacy for Emerging Areas in Communications Networks (SE-
      CURECOMM ’05), IEEE Computer Society (2005) 182–193
  33. Zhang, Q., Qi, Y., Zhao, J., Hou, D., Zhao, T., Liu, L.: A Study on Context-aware
      Privacy Protection for Personal Information. In: Proceedings of the 16th IEEE
      International Conference on Computer Communications and Networks (ICCCN
      2007), IEEE Computer Society (2007) 1351–1358
  34. Bakken, D.E., Parameswaran, R., Blough, D.M., Franz, A.A., Palmer, T.J.: Data
      Obfuscation: Anonymity and Desensitization of Usable Data Sets. IEEE Security
      & Privacy 2(6) (2004) 34–41
  35. Xiao, X., Tao, Y.: Personalized privacy preservation. In: SIGMOD ’06: Proceedings
      of the 2006 ACM SIGMOD international conference on Management of data, ACM
      Press (2006) 229–240


                                          42
Privacy in Georeferenced Context-aware Services: A Survey

  36. Duckham, M., Kulik, L.: A Formal Model of Obfuscation and Negotiation for
      Location Privacy. In: Proceedings of the Third International Conference on Perva-
      sive Computing (PERVASIVE 2005). Volume 3468 of Lecture Notes in Computer
      Science., Springer (2005) 152–170
  37. Gandon, F.L., Sadeh, N.M.: Semantic web technologies to reconcile privacy and
      context awareness. J. Web Sem. 1(3) (2004) 241–260
  38. Wishart, R., Henricksen, K., Indulska, J.: Context Privacy and Obfuscation Sup-
      ported by Dynamic Context Source Discovery and Processing in a Context Man-
      agement System. In: Proceedings of the 4th International Conference on Ubiq-
      uitous Intelligence and Computing (UIC 2007). Volume 4611 of Lecture Notes in
      Computer Science., Springer (2007) 929–940
  39. Sheikh, K., Wegdam, M., van Sinderen, M.: Quality-of-Context and its use for
      Protecting Privacy in Context Aware Systems. Journal of Software 3(3) (2008)
      83–93
  40. Samarati, P.: Protecting Respondents’ Identities in Microdata Release. IEEE
      Trans. on Knowledge and Data Engineering 13(6) (2001) 1010–1027
  41. Pfitzmann, A., Köhntopp, M.: Anonymity, unobservability, and pseudonymity -
      a proposal for terminology. In: Designing Privacy Enhancing Technologies: Inter-
      national Workshop on Design Issues in Anonymity and Unobservability. Volume
      2009 of LNCS., Springer (July 2000) 1–9
  42. Shin, H., Atluri, V., Vaidya, J.: A Profile Anonymization Model for Privacy in a
      Personalized Location Based Service Environment. Proceedings of the 9th Inter-
      national Conference on Mobile Data Management (MDM’08) (2008) 73–80
  43. Riboni, D., Pareschi, L., Bettini, C.: Shadow Attacks to Users’ Anonymity in
      Pervasive Computing Environments. Journal of Pervasive and Mobile Computing
      (To appear) DOI: 10.1016/j.pmcj.2008.04.008.
  44. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity:
      Privacy Beyond k-Anonymity. In: Proceedings of ICDE 2006, IEEE Computer
      Society (2006)
  45. Riboni, D., Pareschi, L., Bettini, C., Jajodia, S.: Preserving Privacy in LBS against
      Attacks based on Concurrent Requests. Technical Report TR 24-07, University of
      Milan (2007)
  46. Pareschi, L., Riboni, D., Agostini, A., Bettini, C.: Composition and Generalization
      of Context Data for Privacy Preservation. In: Sixth Annual IEEE International
      Conference on Pervasive Computing and Communications (PerCom 2008), Pro-
      ceedings of the Workshops, IEEE Computer Society (2008) 429–433
  47. Papadopoulou, E., McBurney, S., Taylor, N., Williams, M.H., Dolinar, K.,
      Neubauer, M.: Using User Preferences to Enhance Privacy in Pervasive Systems.
      In: Proceedings of the Third International Conference on Systems (ICONS 2008),
      IEEE Computer Society (2008) 271–276
  48. Kapadia, A., Triandopoulos, N., Cornelius, C., Peebles, D., Kotz, D.: AnonySense:
      Opportunistic and Privacy-Preserving Context Collection. In: Proceedings of the
      6th International Conference on Pervasive Computing (Pervasive 2008). Volume
      5013 of Lecture Notes in Computer Science., Springer (2008) 280–297
  49. Miller, V.S.: Use of Elliptic Curves in Cryptography. In: Proceedings of Advances
      in Cryptology (CRYPTO ’85). Volume 218 of Lecture Notes in Computer Science.,
      Springer (1986) 417–426
  50. Simon, D., Cifuentes, C., Cleal, D., Daniels, J., White, D.: JavaTM on the bare
      metal of wireless sensor devices: the squawk Java virtual machine. In: Proceedings
      of the 2nd International Conference on Virtual Execution Environments (VEE
      2006), ACM (2006) 78–88


                                            43
    Pattern-Preserving k-Anonymization of
 Sequences and its Application to Mobility Data
                    Mining

    Ruggero G. Pensa1 , Anna Monreale2 , Fabio Pinelli1 , and Dino Pedreschi2
                        1
                          ISTI - CNR, Area della Ricerca di Pisa,
                       Via Giuseppe Moruzzi, 1 - 56124 Pisa, Italy
                      2
                         Computer Science Dep., University of Pisa,
                          Largo Pontecorvo, 3 - 56127 Pisa, Italy
                           {firstname.lastname}@isti.cnr.it


        Abstract. Sequential pattern mining is a major research field in knowl-
        edge discovery and data mining. Thanks to the increasing availability of
        transaction data, it is now possible to provide new and improved services
        based on users’ and customers’ behavior. However, this puts the citizen’s
        privacy at risk. Thus, it is important to develop new privacy-preserving
        data mining techniques that do not alter the analysis results significantly.
        In this paper we propose a new approach for anonymizing sequential
        data by hiding infrequent, and thus potentially sensible, subsequences.
        Our approach guarantees that the disclosed data are k-anonymous and
        preserve the quality of extracted patterns. An application to a real-world
        moving object database is presented, which shows the effectiveness of our
        approach also in complex contexts.


1     Introduction

In the last decade, many KDD techniques have been developed that provide
new means for improving personalized services through the discovery of pat-
terns and models which represent typical or unexpected customer’s and user’s
behavior. The exponential growth of available personal data, as well as the re-
finement of data mining techniques, lead to new and intriguing possibilities. On
the other hand, the collection and the disclosure of personal, often sensible, in-
formation increase the risk of citizen’s privacy violation. For this reason, many
recent research works have focused on privacy-preserving data mining [5, 24, 16,
18], proposing novel techniques that allow to extract knowledge while trying to
protect the privacy of users and customers (or respondents) represented in the
data3 . This may involve techniques that return anonymized data mining results,
or that provide anonymized datasets to the companies/research institution in
charge of their analysis.
3
    In statistics, the problem has been extensively studied in the field of statistical dis-
    closure control.


                                            44
Pattern-Preserving k-Anonymization of Sequences and its Application

      A major and rising field in data mining research concerns the analysis of
  sequence databases. User’s actions as well as customer transactions are often
  stored together with their timestamps, making the temporal sequentiality of
  the events a powerful source of information. For instance, web logs provide the
  full activity of each website visitors during each browser session. Moreover, the
  spreading of mobile devices, such as mobile phone, GPS devices and RFIDs,
  has become a great source of spatio-temporal data. Companies and public in-
  stitutions can now study the sequential behavior of their customers/citizens to
  improve their offers and services. A lot of advanced techniques have been investi-
  gated to extract patterns and models in databases of sequences [4, 27, 23], as well
  as in databases of moving objects (trajectories) [13]. For both legal and ethical
  reasons, the data owners (or custodians) should not compromise the privacy of
  their customers and users, and therefore should reveal as little as possible their
  personal sensible information. Hiding personal identifiers, such as personal IDs
  or quasi-identifiers (i.e., attributes that can be linked to external information to
  re-identify the individual to whom the information refers) may not be sufficient
  in the case of sequential data. If a small sequence of actions is easily referable to
  a few persons, an attacker may access to the whole action sequences involving
  these persons. For instance, if a malicious data user has access to the daylight
  city traffic data, and he knows that John Smith often goes from the commercial
  zone A to the general hospital B, and the sequence A ⇒ B appears few times
  in the dataset, he can easily identify the entire sequence of locations crossed by
  John Smith during the day, and guess his daily behavior. Existing k-Anonymity
  techniques do not take into consideration the intrinsic sensibility of sequential
  data. Some other approaches have been proposed that requires that sensible se-
  quences have to be pre-defined [2, 1]. Other approaches use collaborative data
  mining techniques [17], or propose to mine models instead of the data [15], but
  they do not ensure that sensible sequences can not be extracted.

      In this paper, we propose a new technique that provides an anonymized
  dataset of sequences, while preserving sequential pattern mining results. We use
  a method which combines k-anonymity (the disclosed dataset is such that any
  sequence is undistinguishable with at least k − 1 other sequences) and sequence
  hiding approaches. Our approach consists in a reformulation of the anonymiza-
  tion problem as the problem of hiding k-infrequent sequences, i.e., transforming
  the original sequence database in such way that the sequences with support less
  than k in the original dataset can not be mined any longer. In the hypothesis
  that an attacker knows part of the sequence belonging to a person, and that
  s/he also know that this person is present in the database, s/he has a prob-
  ability of 1/k of reconstructing the entire sequence. Our approach is formally
  defined in the general setting of sequences of items, or events. To illustrate its
  effectiveness and practicality in a realistic and complex domain, we put at work
  our anonymization technique in the scenario of moving object data analysis,
  and applied it to a large-scale, real-life dataset of GPS trajectories of vehicles
  with on-board GPS receivers, tracked in the city of Milan, Italy. The results
  of our experiments, where we compare the set of sequential patterns obtained


                                          45
Pattern-Preserving k-Anonymization of Sequences and its Application

  before and after the application of our anonymization technique, show that we
  can substantially preserve such frequent sequential patterns, while guaranteeing
  that the disclosed data are k-anonymous.
      The rest of the paper is organized as follows. Section 2 briefly discusses the
  relevant related works on privacy-preserving data mining. Section 3 introduces
  and explains our Privacy-Preserving k-Anonymization (P2kA) framework. The
  algorithmic details are given in Section 4, together with explanations on a toy
  example consisting of a small set of sequences. The experimental results of our
  application to a moving object dataset are presented and discussed in Section 5.
  Finally, Section 6 concludes.


  2    Related works

  A lot of recent research works have focused on techniques for privacy-preserving
  data mining [5] and for privacy-preserving data publishing. Important techniques
  include perturbation, condensation, and data hiding with conceptual reconstruc-
  tion. The first step before data publishing is to remove the personally identifying
  information. In [24] (and much earlier in statistics by T. Dalenius [9]), it has
  been shown that removing personally identifying information is not enough to
  protect privacy. In this work, Samarati and Sweeney propose a classification of
  the attributes in quasi-identifiers (i.e., attributes that can be linked to external
  information to re-identify the individual to whom the information refers, a con-
  cept that was already present in [10]), and sensitive attributes. Moreover, they
  propose the k-anonymity to generalize the values of quasi-identifier attributes in
  each record so that it is indistinguishable with at least k − 1 other records with
  respect to the quasi-identifier, Recently, privacy-preserving data mining has been
  studied in conjunction with spatio-temporal data and trajectory mining [12, 8].
  In the work presented in [3], the authors study the problem of anonymity pre-
  serving data publishing in moving objects databases. They propose the notion
  of (k, δ) − anonymity for moving objects databases. In particular, this is a novel
  concept of k-anonymity based on co-localization that exploits the inherent un-
  certainty of the moving objects whereabouts. The k-anonymity notion is also
  used in [22], where authors address privacy issues regarding the identification
  of individuals in static trajectory datasets. They provide privacy protection by:
  (1) first enforcing k-anonymity, meaning every released information refers to at
  least k users/trajectories, (2) then reconstructing randomly a representation of
  the original dataset from the anonymization. Although it has been shown that
  the k-anonymity framework presents some flaws and limitations [20], and that
  finding an optimal k-anonymization is NP-hard [6], the k-anonymity model is
  still practically relevant and in recent years a large research effort has been
  devoted to develop algorithms for k-anonymity [16, 18].
       Existing work about anonymity of spatio-temporal moving points has been
  mainly developed in the context of location based services (LBS) [21, 26, 14, 7].
  Works in [21, 26] use perturbation and obfuscation techniques to de-identify a
  given request or a location. In [14], anonymity is enforced on sensitive locations


                                         46
Pattern-Preserving k-Anonymization of Sequences and its Application

  other than user location points or trajectories. In [7], anonymization process
  enforces points referring to same set of users to be anonymized together. How-
  ever this work considers the anonymization of a request rather than the whole
  trajectory anonymization. In order to preserve the privacy for moving object
  data in [1] the authors propose a hiding technique. In particular, they address
  the problem of hiding sensitive trajectory patterns from a database of moving
  objects. A similar technique is used in [2], where Abul et al. address first the
  problem of hiding patterns that are a simple sequence of symbols and then they
  extend the proposed framework to the case of sequential patterns according to
  the classical definition [5]. A first work attacking the problem of limiting dis-
  closure of sensitive rules by reducing their significance, while leaving unaltered
  or minimally affecting the significance of others, non-sensitive rules is [6]. One
  of the most important contributions of this paper is the proof that finding an
  optimal sanitization of a dataset is NP-hard. A heuristic using greedy search
  is thus proposed. In the work [11] the objective is to hide individual sensitive
  rules instead of all rules produced by some sensitive itemsets. The work in [25]
  proposes two distortion-based heuristic techniques for selectively hiding sensitive
  rules. An interesting work is presented in [15], where Jacquemont et al. propose
  a costless solution to privacy preserving for problems that may be stated as flow
  control problems, that is the case of frequent path discovery in Web sites and
  frequent route discovery in towns. They propose to model this flow of data in the
  form of a weighted automaton, for which they provide a probabilistic solution
  to discover frequent patterns (potentially with gaps) under constraints, without
  any information about the original data.
      Essentially, in our work we present a new anonymization technique for pre-
  serving privacy and at same time, preserving also frequent sequential patterns
  (FSP) obtained by mining the anonymized data. The basic frequent sequential
  pattern problem, originally introduced in [4], is defined over a database of se-
  quences D, where each element of each sequence is a time-stamped set of items
  — i.e., an itemset. Time-stamps determine the order of elements in the sequence.
  Then, the FSP problem consists in finding all the sequences that are frequent in
  the database, i.e., appear as subsequence of a large percentage of sequences of
  the database. Since its first definition, many algorithms for sequential patterns
  have been proposed, from the earliest in [4], to the more recent PrefixSpan [23]
  and SPADE [27].


  3    Problem Definition

  Let L = {l1 , l2 , . . . , ln } denote a set of items (e.g, spatial locations or regions).
  A sequence S = s1 s2 . . . sm (si ∈ L) is an ordered list of items, and an item can
  occur multiple times in a sequence. A sequence T = t1 t2 . . . tw is a subsequence
  of S (T ¹ S) if there exist integers 1 ≤ i1 < . . . < iw ≤ m such that ∀1 ≤ j ≤ w
  tj = sij . A sequence database D is a set of sequences D = {S1 , S2 , . . . , SN }.
  The support of a sequence T in a database D is the number of sequences in the


                                            47
Pattern-Preserving k-Anonymization of Sequences and its Application

  database containing T , i.e.:
                        suppD (T ) = |{S s.t. S ∈ D ∧ T ¹ S}|
  Given a support threshold σ, a sequence T is called a σ-frequent sequential pat-
  tern in a sequence database D if suppD (T ) ≥ σ. The collection of all σ-frequent
  (sequential) patterns in D is denoted by S(D, σ). The set of all subsequences
  supported by D is denoted by S(D).
      Our goal is to provide an anomymized version of D that preserves as much as
  possible the collection of frequent patterns. We use a method which combines k-
  anonymity and sequence hiding approaches. Put in other words, we reformulate
  the anonymization problem — in the case of sequential data — as the problem of
  hiding k-infrequent sequences, i.e., transforming the original sequence database
  in such way that the sequences with support less than k in the original dataset
  can not be mined any longer. The disclosed dataset is such that any sequence
  is undistinguishable with at least k − 1 other sequences. This goal is achieved
  by hiding all the subsequences which are not supported by at least k sequences
  in the database. Let D0 denote the disclosed dataset. Given a positive integer k,
  the disclosed dataset D0 is such that
                         X
                                δ[suppD (T ) < k] · suppD0 (T ) = 0
                      T ∈S(D 0 )

  where δ[condition] is the Dirichlet function (which is equal to 1 if condition is
  true, 0 otherwise). In this paper we consider that any infrequent subsequence of
  items can potentially lead to the identification of the user (respondent). Thus,
  we do not need to specify any sensible subsequence preliminarily, as in [2, 1].
  Moreover, we want to preserve frequent pattern mining results, in order to let
  the analysts investigate over frequent and interesting/unexpected behavior. The
  optimal pattern-preserving k-anonymization problem can be formulated
  as follows:
  Definition 1 (optimal P2kA problem). Given a sequence database D, and
  a positive integer k, find a database D0 such that
   1. D0 is k-anonymous, i.e.:
                        X
                                     δ[suppD (T ) < k] · suppD0 (T ) = 0
                        T ∈S(D 0 )

   2. the collection of all k-frequent pattern in D is preserved, i.e.:
                         S(D0 , k) = S(D, k)
                         ∀T ∈ S(D0 , k) suppD0 (T ) = suppD0 (T )
     In this paper we present an algorithm which assures that (i) D0 is k-anonymous
  and (ii) S(D0 , k) and S(D, k) are ”similar”. In particular the second condition
  of Definition 1 becomes:
                       S(D0 , k) ⊆ S(D, k)
                       ∀T ∈ S(D0 , k) suppD0 (T ) ' suppD0 (T )


                                             48
Pattern-Preserving k-Anonymization of Sequences and its Application

      Algorithm 1: BF-P2kA(D, k)
       Input: A sequence database D, a minimum support threshold k
       Output: A k-anonymous sequence database D0
       PT = P ref ixT reeConstruction(D);
       PT 0 = P T Anonymization(PT , k)
       D0 = SequenceGeneration(PT 0 );
       return D0


      Algorithm 2: PTAnonymization(PT , k)
       Input: A prefix tree PT , a minimum support threshold k
       Output: A k-anonymous prefix tree PT 0
       Lcut = ∅;
       foreach n in Root(PT ).children do
          Lcut = Lcut ∪ T reeP runing(n, PT , k);
       end
       PT 0 = T reeReconstruction(PT , Lcut );
       return PT 0


  In the experimental section (see Section 5) we will express this similarity in
  terms of two measures which quantify how much the pattern support changes,
  and how many frequent pattern we miss. As a preliminary step towards an
  ”optimal” algorithm, we will show that our algorithm provides good results in
  term of pattern similarity (see Section 5), and guarantees that the disclosed
  dataset is k-anonymous.


  4     The BF-P2kA algorithm
  In this section we present our BF-P2kA (Brute Force Pattern-Preserving k-
  Anonymization) algorithm (Algorithm 1), which allows to anonymize a dataset
  of sequences D. Our approach consists of three steps. During the first step, the
  sequences in the input dataset D are used to build a prefix tree PT . The second
  step, given a minimum support threshold k, anonymizes the prefix tree. This
  means that sequences whose support is less than k are pruned from the prefix
  tree. Then part of these infrequent sequences is re-appended in the prefix tree.
  The third and last step post-process the anonymized prefix tree, as obtained in
  the previous step, to generate the anonymized dataset of sequences D0 .

  Step I: Prefix Tree Construction The first step of the BF-P2kA algorithm
  (Algorithm 1) is the construction of a prefix tree PT , given a list of sequences
  D. The created prefix tree is a more compact structure than a list of sequences.
  It is defined as a triplet PT = (N , E, Root(PT )), where N is a finite set of
  labeled nodes, E is a set of edges and Root(PT ) ∈ N is a fictitious node and
  represents the root of the tree. Each node of the tree (except the root) has
  exactly one parent and it can be reached through a path, which is a sequence of


                                         49
Pattern-Preserving k-Anonymization of Sequences and its Application

    Algorithm 3: PrefixTreeConstruction(D)
     Input: A sequence database D
     Output: A prefix tree PT
     foreach T in D do
        LP = LongestP ref ixSearch(Root(PT ), T );
        Append T to LP ;
        foreach v in LP do
            v.support = v.support + suppD (T );
        end
        foreach v in T \ LP do
            v.support = suppD (T );
        end
     end
     return PT


  edges starting with the root node. An example of path for the node d (denoted
  P(d, PT )) is the following:

                    P(d, PT ) = (Root(PT ), a), (a, b), (b, c), (c, d).

     Each node v ∈ N , except Root(PT ), has entries in the form hid, item,
  support, childreni where:

    – id is the identifier of the node v
    – item represents an item of a sequence
    – support is the support of the sequence represented by the path from Root(PT )
      to v
    – children is the list of child nodes of v.

  The PrefixTreeConstruction algorithm (see Algorithm 3) for each sequence of
  items T searches in PT the path which corresponds to the longest prefix of
  the sequence T . Next, it appends, to the last node of the longest prefix found,
  a branch which represents the remaining elements of T , updating the involved
  node attributes accordingly. In particular, it updates the support of each node
  belonging to the common prefix by adding the support of the sequence T in D,
  and sets the support of the remaining nodes to suppD (T ).


  Step II: Prefix Tree Anonymization The main phase of our approach is
  the second one. This phase is described by the Tree Anonymization Algorithm
  (Algorithm 2). Before describing this algorithm we introduce some notions which
  are needed to better explain our method.

  Definition 2 (minimum prefix). Let S = s1 s2 . . . sn and T = t1 t2 . . . tk be
  two sequences such that T is a subsequences of S and sp is the first item of S
  such that T ¹ s1 s2 . . . sp . The sequence S 0 = s1 . . . sp is the minimum prefix of
  S containing the sub-sequence T .


                                           50
Pattern-Preserving k-Anonymization of Sequences and its Application

    Algorithm 4: TreePruning(n, PT , k)
     Input: A node n, a prefix tree PT , a minimum support threshold k
     Output: A list of infrequent sequences Lcut
     Lcut = ∅;
     if n.support < k then
         Lcut = the set of all sequences in P athT ree(PT , n);
         foreach j ∈ P(n, PT ) do
             j.support = j.support − n.support;
         end
         PT = PT \ the subtree induced by n;
     else
         foreach j ∈ n.children do
             Lcut ∪ T reeP runing(j, PT , k);
         end
     end
     return Lcut


  Example 1. Let us consider the sequences

                                 S = ABCDECDF
                                 T = ACD

  The sequence S 0 = ABCD is the minimum prefix of S containing the sub-
  sequence T .

  Definition 3 (path tree). Let PT be a prefix tree, let n be a node in the prefix
  tree PT . The path tree of n in PT (denoted by P athT ree(PT , n)) is the sub-
  tree induced by the set of nodes belonging to P(n, PT ) plus the subtree induced
  by n.

    We recall now the well-known notions of Levenshtein distance [19] and Longest
  Common Subsequence, which are used in our algorithm.
  Definition 4 (Levenshtein distance). Let S and T be two sequences. The
  Levenshtein (edit) distance between S and T is given by the minimum num-
  ber of operations needed to transform a sequences into the other, where an oper-
  ation is an insertion, deletion, or substitution of a single element.

  Definition 5. Let T be a set of sequences. The Longest Common Subse-
  quence (LCS) is the longest subsequence common to all sequences in T .

      The first step of the Algorithm 2 is the pruning of the prefix tree with respect
  to the minimum support threshold given in input. This operation is executed
  thanks to the TreePruning function (see Algorithm 4). Indeed, this function
  modifies the tree by pruning all the infrequent subtrees and updating the sup-
  port of the path to the last frequent node. In particular, it visits the tree and,
  when the support of a given node n is less than the minimum support thresh-
  old k, it computes all the sequences represented by the paths which contain


                                         51
Pattern-Preserving k-Anonymization of Sequences and its Application

    Algorithm 5: TreeReconstruction(PT , Lcut )
     Input: A prefix tree PT , a list of infrequent sequences Lcut
     Output: An anonymized reconstructed prefix tree PT 0
     foreach distinct S ∈ Lcut do
        cand = ClosestLCS(S, PT );
        L = the set of nodes in PT belonging to the first minimum prefix
        containing cand;
        if L is not empty then
            foreach node ∈ L do
                node.support = node.support + suppLcut (S);
            end
        end
     end
     return PT


  the node n and which start from the root and reach the leaves of the sub-tree
  with root n. Note that for construction each node of this sub-tree has support
  less than k. All the computed sequences and their supports are inserted in to
  the list Lcut . Next, the subtree with root n is cut from the tree. Therefore, the
  procedure TreePruning returns a pruned prefix tree and the list Lcut . After the
  pruning step, the algorithm redistributes the infrequent sequences in Lcut into
  the pruned tree, using the TreeReconstruction function (see Algorithm 5). In
  particular, for each infrequent sequence S in Lcut , it computes the LCS between
  S and every sequence represented by the tree. Suppose that T is the sequence
  such that the computed LCS is subsequence of T . Thus, the TreeReconstruction
  function selects the path of the tree that represents the minimum prefix of T
  containing the LCS, and increases the support of the related nodes by adding
  the support of S in Lcut . If there are more LCSs having the same length, the
  function ClosestLCS function returns the LCS and the sequence in PT such
  that the Levenshtein distance between them is minimum. This choice allows to
  increase the support of a limited set of nodes not belonging to the LCS, thus
  reducing the noise.


  Step III: Generation of anonymized sequences PTAnonymization algo-
  rithm returns an anonymized prefix tree, i.e., a prefix tree where only k-frequent
  subsequences are represented. The third step our method allows to generate
  the anonymized dataset D0 . This phase is performed by the SequenceGenera-
  tion procedure, which visits the anonymized prefix tree and generates all the
  represented sequences. Of course, while a sequence is generated the Sequences-
  Generation procedure considers the support of this sequence.
      We show now that (i) our approach guarantees that the disclosed dataset
  D0 is k-anonymous (i.e., patterns whose support is less than k in the original
  dataset D are not represented in D0 ) and (ii) the set of sequential patterns in D0
  is a subset of those in D.


                                         52
Pattern-Preserving k-Anonymization of Sequences and its Application

             s1 A B C D E F                                  s01 A B C D E F
             s2 A B C D E F                                  s02 A B C D E F
             s3 A B C D E F                                  s03 A B C D E F
             s4 A D E F                                      s04 A D E F
             s5 A D E F                                      s05 A D E F
             s6 A D E F                                      s06 A D E F
             s7 B K S                                        s07 A D E F
             s8 B K                                          s08 B K
             s9 B K                                          s09 B K
             s10 D E J F                                     s010 B K
        (a) A dataset of sequences                 (b) Anonymized dataset of sequences

                                     Fig. 1. A toy example


  Theorem 1. Let D be a dataset of sequences. Given a minimum support thresh-
  old k, the dataset D0 returned by Algorithm 1 satisfies the following conditions:

   1. D0 is k-anonymous, i.e.:
                        X
                                     δ[suppD (T ) < k] · suppD0 (T ) = 0
                        T ∈S(D 0 )


   2. S(D0 , k) ⊆ S(D, k)

  where S(D, k) and S(D0 , k) are the collections of k-frequent patterns respectively
  in D and D0

  Proof. (sketch)

   1. By construction, the pruning step in Algorithm 2 prunes all the subtrees
      with support less than k, then the prefix tree PT only contains k-frequent
      sequences. Nevertheless, the reconstruction step (see Algorithm 5) does not
      change the tree structure of PT , it only increases the support of existing
      sequences which are already k-frequent in D. In conclusion, at the end of the
      second step of Algorithm 1, the sequential patterns which are represented in
      PT 0 are at least k-frequent in D.
   2. At the end of the pruning step in Algorithm 2, all infrequent branches in PT
      are cut off. However, this could also imply that some k-frequent sequential
      patterns are pruned out, if they are only supported by multiple infrequent
      paths in the prefix tree PT . Then, the prefix tree PT contains a subset of
      the S(D, k). Moreover, as already stated, during the reconstruction step the
      tree structure of PT is unchanged, i.e., patterns represented in PT 0 were
      still represented in PT after the pruning step. Finally, the set of sequential
      patterns supported by D0 is a subset of those supported by D.

  Even if our approach does not assure that S(D0 , k) = S(D, k), we will show in
  Section 5 that the difference between the two sets can be very small in practice.


                                             53
Pattern-Preserving k-Anonymization of Sequences and its Application

                                                                         hhhh ² RRRRR)
                                                                              Root

                                                                   shhhhh
                                                                 MMM
                                                           h1,A,6i          h10,B,3i   h13,D,1i

                                                       xqqq           &         ²          ²
                               h2,B,3i                               h7,D,3i    h11,K,3i               h14,E,1i

                                       ²                                ²                  ²               ²
                                h3,C,3i                              h8,E,3i     h12,S,1i              h15,J,1i

                                           ²                            ²                                  ²
                               h4,D,3i                               h9,F,3i                           h16,F,1i

                                       ²
                                h5,E,3i

                                   ²
                                h6,F,3i

                                                        (a) Prefix Tree Construction

                                      hhhh ²                                                                     hhhh ²
                                           Root                                                                       Root

                                shhhhh                                                                     shhhhh
                              MMM
                        h1,A,6i          h10,B,2i
                                                                                                         MMM
                                                                                                   h1,A,7i          h10,B,3i

                    xqqq           &         ²                                                 xqqq           &         ²
         h2,B,3i                h7,D,3i                   h11,K,2i             h2,B,3i                     h7,D,4i       h11,K,3i

            ²                                  ²                                  ²                               ²
         h3,C,3i                h8,E,3i                                        h3,C,3i                     h8,E,4i

            ²                                      ²                                  ²                           ²
         h4,D,3i                h9,F,3i                                        h4,D,3i                         h9,F,4i

                ²                                                                 ²
         h5,E,3i                                                               h5,E,3i

            ²                                                                         ²
         h6,F,3i                                                               h6,F,3i

                    (b) Pruned Prefix Tree                                                (c) Anonymized Prefix Tree

                                                       Fig. 2. Prefix tree processing


  4.1   A running example
  We present now an example which shows how our approach works. We consider
  the dataset of sequences in Figure 1(a) and a minimum support threshold equal
  to 2. During the first phase of our method the PrefixTreeConstruction algorithm
  builds the prefix tree depicted in Figure 2(a), which represents the sequences in
  a more compact way.
      During the anonymization step, the prefix tree is modified by the TreePruning
  procedure with respect to the minimum support threshold. In particular, this
  procedure searches the tree for all nodes with support less than 2:
        <12, S, 1>            <13, D, 1>.
      Next, it selects the paths that contain these nodes and which start from the
  root and reach each leaves belonging to the subtrees of these nodes. Then, it
  generates all the sequences represented by these paths and inserts them into the
  list Lcut :
        (B K S, 1)               (D E J F, 1).
     Finally, the TreePruning procedure eliminates from the tree the subtrees
  induced by the infrequent nodes listed above and updates the support of the


                                                                        54
Pattern-Preserving k-Anonymization of Sequences and its Application

  remaining nodes. The prefix tree obtained after the pruning step is shown in the
  Figure 2(b).
     The infrequent sequences within Lcut are then redistributed in this way:

   1. (B K S, 1) increases the support of the following nodes
          <10, B, 2>     <11, K, 2>
     and thus we obtain
          <10, B, 3>     <11, K, 3>
   2. (D E J F, 1) increases the support of the following nodes of the tree
          <1, A, 6>      <7, D, 3>     <8, E, 3>        <9, F, 3>
     therefore we obtain
          <1, A, 7>      <7, D, 4>     <8, E, 4>        <9, F, 4>.

     The prefix tree obtained after the anonymization step is shown in Figure 2(c).
  Finally, the SequencesGeneration procedure provides the anonymized sequence
  dataset shown in Figure 1(b).


  5      Experiments & Results

  In this section, we present an application to a moving objects dataset. Object
  trajectories are first transformed into sequences of crossed locations, and then
  processed with our anonymization approach. In the following, we discuss the
  results over multiple instances of the original data, for different anonymization
  degrees.


  5.1     Data Preparation

  In this section, we explain the procedure used to obtain the input datasets. We
  got a set of GPS trajectories of cars from the european project GeoPKDD 4 that
  cover a week of traffic in Milan. Essentially, each trajectory is a sequence of
  pairs of coordinates x and y with relative timestamp. Obviously, performing our
  algorithm over sequences of points is practically useless because it is impossible
  to find a set of points that exactly matches enough times for being considered
  frequent with respect to any values of k. Thus, to overcome this problem, we use
  the definition of Regions of Interest given in [13], where the authors discretize the
  working space through a regular grid with cells of small size. Then the density
  of each cell is computed by considering each single trajectory and incrementing
  the density of all the cells that contain any of its points. Finally a set of RoI’s
  is extracted by means of a simple heuristics using a density threshold.
      As a result, a set of Roi’s provides a coverage of dense cells through different
  sized, disjoint, rectangular regions with some form of local maximality. In par-
  ticular, for each region they consider the average density of its cells, instead of
  its overall density (which is generally higher), and larger rectangles are preferred
  only if they add dense regions.
   4
       http://www.geopkdd.eu


                                          55
Pattern-Preserving k-Anonymization of Sequences and its Application


                     (a)                                        (b)

                            Fig. 3. Trajectories and regions.


      Once the set of RoI’s has been extracted, we preprocess all the input tra-
  jectories translating each one from a sequence of points to a sequence of RoI’s.
  The order of visit is maintained by means of timestamps. An example of this
  simple procedure of translation is shown in Fig. 3 — on the left we can see all
  the trajectories and a set of RoI’s extracted; on the right we show a trajectory
  and we evidence which RoI’s it crosses. This new dataset represents the input
  dataset for the anonymization algorithm.
      The datasets used in our experiments are built using all the trajectories in
  the dataset described above with different density thresholds. These values have
  been chosen in order to obtain an adequate number of RoI’s, since low density
  values correspond to few big regions, and higher values produce few small regions.
  In that way, we obtain different sets of RoI’s meaning different sets of items in
  the input sequences. Table 1 summarizes the datasets used in our experiments.
  Notice that the number of trajectories is different among the datasets because
  we lose those trajectories that do not cross any region.


  5.2   Results and discussion

  Since our goal is to preserve local patterns (i.e., local subsequences) as much
  as possible, we compare the collections of pattern extracted before and after
  the anonymization process. To measure the similarity between two collection of
  patterns, we define two metrics:


                                          56
Pattern-Preserving k-Anonymization of Sequences and its Application

    Dataset Density threshold N. of Regions N.of Trajectories Avg. Length
       1           0.01            113             82341         8.327
       2          0.035             31             28663         9.152
       3          0.037             21             24995         7.519
       4          0.038             16             23744         6.239
       5          0.039             10             10604         6.687
       6           0.04             8               9213         6.863
                           Table 1. Input parameter


    – SIM1 (Frequent Pattern Support Similarity): defined as
                         1            X          min{freq(s, D0 ), freq(s, D)}
                     |S(D0 , σ)|                 max{freq(s, D0 ), freq(s, D)}
                                   s∈S(D 0 ,σ)

    – SIM2 (Frequent Pattern Collection Size Similarity): defined as
                                   min{|S(D0 , σ)|, |S(D, σ)|}
                                   max{|S(D0 , σ)|, |S(D, σ)|}
  All these measures are defined between 0 and 1. When two collections of subse-
  quences are identical, the two measures are all equal to 1.
      Our experiments were conducted as follows: we first anonymized the six
  datasets using values of k between 10 and 1000. Then, for each value of k we
  compared the collection of frequent patterns extracted from the original dataset
  and the collection extracted from the k-anonymized dataset. In all these exper-
  iments, we used PrefixSpan [23] and the minimum support threshold was set
  to k.
      In Figure 4 we report the results of all the experiments. We were unable to
  compare results for k < 50 and k < 200 for the two first datasets, since the re-
  lated pattern collections are too huge and then untractable. As expected, some
  frequent patterns in D are missing in D0 . This is more evident in the first two
  datasets (see Figure 4(a) and 4(b)), while for higher density thresholds (Fig-
  ure 4(c) to 4(f)) the value of SIM2 is closer to the maximum. This is in part
  due to the fact that when data are sparser, the anonymization algorithm tends
  to prune more sequences. Concerning the effective support similarity (SIM1),
  the results show that the higher the k threshold, the more similar the relative
  frequencies. Moreover, the SIM1 measure is quite high in general (the only ex-
  ception is for the 0.035 dataset).
      It is interesting to notice that, for some datasets, it is possible to identify
  an ”optimum” minimum value of k. For instance, if we look at the similarity
  measures for the last dataset (see Figure 4(f)), k = 300 that preserves the
  number of frequent patterns, as well as their support. For the first dataset (see
  Figure 4(a)), two good choices are k = 100 and k = 500. This may possibly help
  the data publisher in deciding of a suitable value of k. A possible methodology
  would consist in finding the best tradeoff (w.r.t. the application) between the
  anonymization degree and the number of preserved patterns.


                                                 57
Pattern-Preserving k-Anonymization of Sequences and its Application


                                       0.01 dataset                                                                                 0.035 dataset
    0.8
                                                                         SIM1                                                                                                 SIM1
                                                                         SIM2                                                                                                 SIM2
                                                                                         1
    0.7


    0.6
                                                                                        0.8


    0.5

                                                                                        0.6
    0.4


    0.3                                                                                 0.4


    0.2

                                                                                        0.2
    0.1


     0                                                                                   0
          0   100   200    300   400        500        600   700   800     900   1000     200       300         400         500          600          700         800          900     1000
                                         K values                                                                                     K values


                          (a) 0.01 dataset                                                                      (b) 0.035 dataset
                                       0.037 dataset                                                                                0.038 dataset

                                                                         SIM1                                                                                                 SIM1
                                                                         SIM2                                                                                                 SIM2
     1                                                                                   1


    0.8                                                                                 0.8


    0.6                                                                                 0.6


    0.4                                                                                 0.4


    0.2                                                                                 0.2


     0                                                                                   0
          0   100   200    300   400        500        600   700   800     900   1000         0   100     200         300     400        500        600     700         800      900   1000
                                         K values                                                                                     K values


                          (c) 0.037 dataset                                                                     (d) 0.038 dataset
                                       0.039 dataset                                                                                0.04 dataset

                                                                         SIM1                                                                                                 SIM1
                                                                         SIM2                                                                                                 SIM2
     1                                                                                   1


    0.8                                                                                 0.8


    0.6                                                                                 0.6


    0.4                                                                                 0.4


    0.2                                                                                 0.2


     0                                                                                   0
          0   100   200    300   400        500        600   700   800     900   1000         0   100     200         300     400        500        600     700         800      900   1000
                                         K values                                                                                     K values


                          (e) 0.039 dataset                                                                       (f) 0.04 dataset

                    Fig. 4. Values of SIM1 and SIM2 for different location datasets


                                                                                 58
Pattern-Preserving k-Anonymization of Sequences and its Application

  6    Conclusion and future work

  In this paper, we introduced a new approach for anonymizing sequential datasets.
  Our approach provides k-anonymous data generalizing the sequence hiding ap-
  proach. Through an experiment of application to a real-life mobility dataset, we
  showed that the proposed technique preserves sequential pattern mining results
  both in terms of number of extracted patterns and their support.
      Further research will investigate over new approaches to preserve pattern
  mining results also in other hard contexts, such as sparse datasets or long se-
  quences. One possible strategy might require the usage of a different and more
  compact data structure, instead of the prefix tree which is used here. Moreover,
  another investigation possibility could be oriented to a relaxed privacy con-
  straint. Instead of guaranteeing the full satisfaction of k-anonymity, we could en-
  able better pattern mining results despite of a less aggressive (and slightly more
  risky) pruning step. Concerning the application to mobility data, our approach
  does not consider yet the precious information carried by temporal annotations
  as well as the geographical proximity of locations/regions. A deep research ef-
  fort will be undertaken to investigate on the possible extension of our approach
  towards a comprehensive privacy-preserving spatio-temporal framework.
  Acknowledgements The authors wish to thank the anonymous reviewers for
  their precious comments, and Roberto Trasarti for his technical support. This
  work has been carried out within the European project GeoPKDD, which ac-
  knowledges the financial support of the Future and Emerging Technologies (FET)
  programme within the Sixth Framework Programme for Research of the Euro-
  pean Commission, under FET-Open contract number 014915.


  References

   1. Osman Abul, Maurizio Atzori, Francesco Bonchi, and Fosca Giannotti. Hiding
      sensitive trajectory patterns. In ICDM Workshops, pages 693–698. IEEE Computer
      Society, 2007.
   2. Osman Abul, Maurizio Atzori, Francesco Bonchi, and Fosca Giannotti. Hiding
      sequences. In ICDE Workshops, pages 147–156. IEEE Computer Society, 2007.
   3. Osman Abul, Francesco Bonchi, and Mirco Nanni. Never walk alone: Uncertainty
      for anonymity in moving objects databases. In ICDE, pages 376–385. IEEE, 2008.
   4. R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of ICDE,
      1995.
   5. R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of the
      2000 ACM SIGMOD International Conference on Management of Data (SIGMOD
      2000), pages 439–450, 2000.
   6. M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim, and V. S. Verykios. Disclosure
      limitation of sensitive rules. In Proceedings of the 1999 IEEE Knowledge and Data
      Engineering Exchange Workshop (KDEX’99), pages 45–52, 1999.
   7. Claudio Bettini, Xiaoyang Sean Wang, and Sushil Jajodia. Protecting privacy
      against location-based personal identification. In Proceedings VLDB Workshop
      SDM 2005, volume 3674 of LNCS, pages 185–199. Springer, 2005.


                                          59
Pattern-Preserving k-Anonymization of Sequences and its Application

   8. Francesco Bonchi, Yücel Saygin, Vassilios S. Verykios, Maurizio Atzori, Aris
      Gkoulalas-Divanis, Selim. V. Kaya, and Erkay Savas. Privacy in Spatiotemporal
      Data Mining, pages 297–329. Springer Verlang, 2008.
   9. Tore Dalenius. The invasion of privacy problem and statistics production — an
      overview. Statistik Tidskrift, 12:213–225, 1974.
  10. Tore Dalenius. inding a needle in a haystack — or identifying anonymous census
      record. Journal of Official Statistics, 2(3):329–336, 1986.
  11. Elena Dasseni, Vassilios S. Verykios, Ahmed K. Elmagarmid, and Elisa Bertino.
      Hiding association rules by using confidence and support. In Ira S. Moskowitz,
      editor, Information Hiding, volume 2137 of LNCS, pages 369–383. Springer, 2001.
  12. Dino Pedreschi Fosca Giannotti, editor. Mobility Data Mining and Privacy: Geo-
      graphic Knowledge Discovery. Springer Verlang, 2008.
  13. Fosca Giannotti, Mirco Nanni, Fabio Pinelli, and Dino Pedreschi. Trajectory pat-
      tern mining. In Pavel Berkhin, Rich Caruana, and Xindong Wu, editors, KDD,
      pages 330–339. ACM, 2007.
  14. M. Gruteser and X. Liu. Protecting privacy in continuous location-tracking appli-
      cations. IEEE Security & Privacy Magazine, 2(2):28–34, 2004.
  15. Stéphanie Jacquemont, François Jacquenet, and Marc Sebban. Sequence mining
      without sequences: A new way for privacy preserving. In ICTAI, pages 347–354.
      IEEE Computer Society, 2006.
  16. Roberto J. Bayardo Jr. and Rakesh Agrawal. Data privacy through optimal k-
      anonymization. In ICDE, pages 217–228. IEEE Computer Society, 2005.
  17. Seung-Woo Kim, Sanghyun Park, Jung-Im Won, and Sang-Wook Kim. Privacy
      preserving data mining of sequential patterns for network traffic data. Inf. Sci.,
      178(3):694–713, 2008.
  18. Kristen LeFevre, David J. DeWitt, and Raghu Ramakrishnan. Mondrian multi-
      dimensional k-anonymity. In Ling Liu, Andreas Reuter, Kyu-Young Whang, and
      Jianjun Zhang, editors, ICDE, page 25. IEEE Computer Society, 2006.
  19. Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions,
      and reversals. Soviet Physics Doklady, 10(8):707–710, 1966.
  20. Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan
      Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity. TKDD, 1(1),
      2007.
  21. S. Menon, S. Sarkar, and S. Mukherjee. Maximizing accuracy of shared databases
      when concealing sensitive patterns. Information Systems Research, 16(3):256–270,
      2005.
  22. Mehmet Ercan Nergiz, Maurizio Atzori, and Yucel Saygin. Perturbation-driven
      anonymization of trajectories. Technical Report 2007-TR-017, ISTI-CNR, Pisa,
      Italy, 2007. 10 pages.
  23. J. Pei et al. Prefixspan: Mining sequential patterns by prefix-projected growth. In
      ICDE, pages 215–225, 2001.
  24. P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-
      anonymity and its enforcement through generalization and suppression. Technical
      report, SRI International, March 1998.
  25. Yücel Saygin, Vassilios S. Verykios, and Chris Clifton. Using unknowns to prevent
      discovery of association rules. SIGMOD Record, 30(4):45–54, 2001.
  26. V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, and Y. Theodor-
      idis. State-of-the-art in privacy preserving data mining. ACM SIGMOD Record,
      33(1):50–57, 2004.
  27. M. J. Zaki. Spade: An efficient algorithm for mining frequent sequences. Machine
      Learning, 42(1/2):31–60, 2001.


                                           60
On the Impact of User Movement Simulations in
   the Evaluation of LBS Privacy-Preserving
                  Techniques

                Sergio Mascetti1 , Dario Freni1 , Claudio Bettini1 ,
                      X. Sean Wang2 , and Sushil Jajodia3
                             1
                              DICo, Università di Milano
                       2
                           Dept. of CS, University of Vermont
                           3
                             CSIS, George Mason University


      Abstract. The evaluation of privacy-preserving techniques for LBS is
      often based on simulations of mostly random user movements that only
      partially capture real deployment scenarios. We claim that benchmarks
      tailored to specific scenarios are needed, and we report preliminary re-
      sults on how they may be generated through an agent-based context-
      aware simulator. We consider privacy preserving algorithms based on
      spatial cloaking and compare the experimental results obtained on two
      benchmarks: the first based on mostly random movements, and the sec-
      ond obtained from the context-aware simulator. The specific deployment
      scenario is the provisioning of a friend-finder-like service on weekend
      nights in a big city. Our results show that, compared to the context-
      aware simulator, the random user movement simulator leads to signifi-
      cantly different results for a spatial-cloaking algorithm, under-protecting
      in some cases, and over-protecting in others.


1   Introduction

Location-based services (LBS) are often cited as killer applications for the latest
GPS-equipped 3G phones. These phones are slated to be massively distributed
in 70 countries. While car navigation and identification of nearest points of
interest are already widely used services, more interest are generating the so-
called friend-finder services as a class of LBS that will change once more our way
to interact. A friend-finder service reveals to a participating user the presence of
other close-by participants belonging to a particular group (friends is only one
example), possibly showing their position on a map. From a technical point of
view, in contrast to services that find nearest points of interests, this service is
characterized by a sequence of LBS requests instead of single ones, since a user
may want to periodically check, while moving or even while staying in the same
place, for close-by participants.
    Sociological studies have shown that a large number of users perceive the
release of their precise location, as part of a LBS request, as a possible privacy
threat [1]. Considering friend-finder services it is easy to identify two types of


                                         61
On the Impact of User Movement Simulations

  privacy threats: a) the association of the identity of the user with the specific
  group of persons he is interested in may reveal her religious, sexual, or political
  orientation, and b) the association of the identity of the user with her precise
  location may reveal what kind of places she has been to, or that she has not
  been where she was supposed to be at that time.
      As formally shown in [2] the likelihood of a privacy violation, and conse-
  quently the defense techniques to be applied, strongly depend on the knowledge
  that an adversary may have. In the friend-finder service the service provider (SP)
  may not be trusted or the communication channels may be insecure; then, the
  adversary’s knowledge may include the precise identity and location information
  submitted with each request, and both the above privacy threats would become
  real privacy breaches. The substitution of identities with pseudonyms does not
  entirely solve the problem, if, for example, the adversary happens to know who
  is at the location at the time reported in the request (e.g., in the case the issuer
  of the request uses a fidelity card at a store). In some cases, the adversary may
  also be able to recognize sequences of requests as issued by the same anonymous
  user (e.g., by observing the same pseudonym or by spatio-temporal tracking)
  and use this information to re-identify the issuer.
      Several defense techniques against both threats under different adversary
  models have been proposed, and may be applied to the friend-finder service;
  however, current proposals very rarely have formal assessments of the provided
  privacy preservation, and are generally supported by experimental results based
  either on real datasets of questionable significance for real LBS services (i.e.,
  trucks or school bus traces) or on data simulations based on mostly random user
  movements that hardly match the specific deployment scenario of a LBS service.
      In order to understand if the use of simulations based on mostly random
  user movements may be a real problem, or if it is actually useful and safe to use
  these simulations, we considered a typical deployment scenario for a friend-finder
  service: a large number of young people using the service on a weekend night in
  a large city like Milan, Italy. We performed a deep study, using different sources,
  including on-line surveys, of the parameters characterizing this scenario. We then
  used the Brinkhoff simulator [3], widely used in testing LBS privacy preservation,
  to generate, based on the parameters, a first dataset of user movements. A second
  dataset was created with a personalized version of the Siafu agent-based context-
  aware simulator [4] which is able to capture much more details of our scenario.
  Then, based on a common metric for privacy and quality of service evaluation, we
  run a large number of tests on both datasets, considering different abilities of re-
  identification by the adversary, as well as different privacy preserving techniques.
      Our results consistently show that (i) in some cases the evaluation on random
  movement simulations leads to the definition of overprotective techniques and
  (ii) in other cases, the techniques that are shown to meet privacy requirements
  based on those simulations do not meet them when tested with more realistic
  context-aware simulations.
      We focus our technical treatment on protecting the association of the user
  with the request he has issued (e.g., with the group of people he is interested in,


                                         62
On the Impact of User Movement Simulations

  as in threat (a) described above), even if we believe that our arguments can be
  easily extended to techniques only aimed to protect the location.


  Related work

  We are not aware of related work in this area considering specifically the rele-
  vance of realistic simulations in LBS. There are however several studies on user
  movements with impact on many different application areas including epidemi-
  ology, transportation, computer networks, marketing, as well as LBS. A very in-
  teresting study supporting an argument against random movement simulations
  recently appeared [5]. In the following we briefly report the main techniques
  currently proposed for LBS privacy preservation, identifying the ones similar to
  those tested in our experiments, and the ones using simulations to generate the
  datasets for experiments.
      Privacy preserving solutions based on cryptographic techniques that totally
  hide the location information in requests, even to the SP, have been recently
  proposed [6] for LBS based on 1-NN queries, and may be probably adapted for
  the service we consider. If proven to be correct, no simulation would be needed for
  these techniques since no information would leak from any request and the above
  privacy threats do not apply. However, this adaptation is still to be investigated,
  and there are some general concerns with these approaches regarding efficiency
  and flexibility.
      A popular alternative technique is spatial cloaking, consisting in the gener-
  alization of the spatial information transmitted to the SP as part of a service
  request. By receiving generalized locations, the SP can only return approximate
  results on the presence of close-by group members and their positions; while it
  may be possible to have a trusted entity in the middle filtering the communi-
  cation and improving the precision, the related overhead costs should be taken
  into account in evaluating the trade-off between generalization and quality of
  service. While in this paper we consider techniques based on spatial cloaking
  as in [7–9, 2, 10], other proposals have considered different techniques, including
  the generation of dummy requests, the use of incremental requests, or the sub-
  stitution in the request of the position of the issuer with a region that does not
  include her (see among others [11–13]).
      Most of the proposals for LBS privacy have only considered requests in iso-
  lation while a few have also addressed the cases in which sequences of requests
  can be exploited by the adversary ([14, 15] among others), as in the friend-finder
  service. A related problem is privacy-aware publication of trajectories [16, 17];
  even if this has some aspects more similar to database publication than to ser-
  vice request privacy preservation, we believe that our results may be important
  for these studies as well.
      Synthetic, mostly random, user movements obtained by the Brinkhoff sim-
  ulator or other simulators have been used in most of the above cited papers as
  well as in our own previous work.


                                         63
On the Impact of User Movement Simulations

  Organization
  The rest of the paper is organized as follows. In Section 2 we formally define how
  we evaluate the privacy of LBS requests, or equivalently, how we measure the
  risk of a privacy violation upon issuing a request. In Section 3 we explain how
  the two datasets were obtained from the generators based on the parameters
  characterizing the deployment scenario. In Section 4 we briefly explain the pri-
  vacy preservation algorithms being used and we report our experimental results.
  Section 5 concludes the paper.


  2     Privacy metric of generalized requests
  As mentioned in the introduction, we are concerned with privacy protection via
  location generalization (also called spatial cloaking). In this section, we formalize
  the adversary model we consider in this paper, and give a metric to measure the
  privacy provided by a set of generalized requests against the adversary.

  2.1    Requests, original requests, and generalized requests
  We first formally define requests and generalized request for LBS. A request
  issued by a user without alteration is called an original request, and a generalized
  request is one that is sent to the service provider and has been altered from the
  original one for the purpose of privacy protection. Both kinds of requests are
  called requests and denoted r. A convention in this paper is to use r0 to denote
  generalized requests to emphasize its generalized nature, while use r to denote
  original requests, if not specified otherwise.
       Either the client software or a trusted medium transforms (or generalizes) an
  original request to a generalized one. In this paper, we are not concerned about
  how the generalization has happened, but rather on the resulting generalized
  requests and their privacy properties. In the experimental section, we evaluate
  the performance of generalization algorithms based on the generalized requests
  they generate.
       Each LBS request r, either original or generalized, is logically divided into
  three parts: IDdata, ST data, and SSdata, containing user identification data,
  location and time of the request, and other service parameters, respectively. In
  the sequel, the spatial and temporal components in ST data are denoted with
  Sdata and T data, respectively. In this paper, for the sake of simplicity, we con-
  sider space and time as discrete domains. However, our results can be easily
  extended to the case in which these two domains are continuous.
       Each generalized request r0 must correspond to an original request r such that
  the difference between r and r0 is only in SData and furthermore, r.Sdata ⊆
  r0 .Sdata, i.e., the spatial region of the generalized request must contain (or be
  equal to) the spatial region of the original request4 . We use issuer(r) to denote
  the actual issuer of the (original or generalized) request r.
  4
      Here “region” can be a point.


                                          64
On the Impact of User Movement Simulations

  2.2   Adversary model

  The objective here is to provide an adversary model that captures a general class
  of adversary models. In a sense, our adversary model is an adversary “meta-
  model”. This adversary meta-model concerns two aspects of knowledge that an
  adversary might have: (1) knowledge of users’ whereabouts (i.e., their locations),
  and (2) correlation of (generalized) requests. These two aspects cover the (ex-
  plicit or implicit) assumptions appeared in the relevant literature.
      For users’ locations, we assume that the adversary has the knowledge ex-
  pressed as the following Ident function:

                        Identt : the Areas −→ the U ser sets,

  that is, given an area A, Identt (A) is the set of users whom, through certain
  means, the adversary has identified to be located in area A at time t. In the
  following, when no confusion arises, we omit the time instant t. We further
  assume that this knowledge is correct in the sense that these identified users in
  reality are indeed in area A at the time.
       For a given user i, if there exists an area A such that i ∈ Ident(A), then we
  say i is identified by the adversary. Furthermore, we say that i is identified in
  A. Note that there may be users who are also in A but the adversary does not
  identify them. This may happen either because the adversary is not aware of
  the presence of users in A, or because the adversary cannot identify these users
  even if he is aware of their presence. We do not distinguish these two cases in
  our adversary model as we shall see later that the distinction of the two cases
  does not make any perceptible difference in the ability of the adversary when
  the total population is large.
       Clearly, in reality, there are lots of different sources of external information
  that can lead the adversary to estimate the location of users. Some may lead the
  adversary to know that a user is in a certain area, but not the exact location.
  For example, an adversary may know that Bob is in a pub (due to his use of a
  fidelity card at the pub), but may not know which room he is in. Some statistical
  analysis may be done to derive the probability that Bob is in a particular room,
  but this is beyond the scope of this paper.
       The most conservative assumption regarding this capability of the adversary
  is that Ident(A) will give exactly all the users for each area A. It can be seen that
  if the privacy of the user is guaranteed in this most conservative assumption, then
  privacy is also guaranteed against any less precise Ident function. However, this
  conservative assumption is unlikely true in reality, while some observed that this
  assumption degenerates the quality of service unnecessarily. It will be interesting
  to see how much privacy and quality of service change with more realistic Ident
  functions. This is partly the goal of our paper.
       As part of this adversary model regarding the location and users, we also
  assume another function:

                            N umt : the Areas −→ [0, ∞),


                                          65
On the Impact of User Movement Simulations

  that is, given an area A, N umt (A) gives an estimate of the number of users
  in the area at time t. This function can be derived from statistical information
  publicly available or through some kind of counting mechanism such as tickets to
  a theater. Again, when no confusion arises, we do not indicate the time instant
  t.
      The second part of the adversary model is his ability to correlate requests.
  We formalize this with the following function L:

                       L : the Requests −→ the Request sets,

  that is, given a (generalized) request r0 , L(r0 ) gives a set of requests such that
  the adversary has concluded, through certain means, are issued by the same user
  who issued the request r0 . In other words, all the requests in L(r0 ) are linked to
  r0 , although the adversary may still not know who the user is.
       Note that L(r) may only give an (often small) subset of all the requests
  issued by the issuer of r. On the other hand, we assume that the function L is
  correct in the sense that each request in L(r) is indeed issued by the same user
  in reality. A set of requests is called a trace, denoted τ , if from the link function
  L we understand that all requests are issued by the same user. The requests in
  τ are implicitly ordered along the time dimension.
       As in the case for Ident function, the most conservative assumption on cor-
  relation is that L(r) gives exactly all the (generalized) requests that are issued
  by the issuer of r. This is a very strong assumption that may lead to severely
  decrease quality of service when accompanied with the most conservative as-
  sumption about the Ident function. Again, a partial goal of this paper is to
  study the impact of a less conservative but more realistic assumption on L.
       In [2], we proposed a formal framework to model LBS privacy attacks and
  defenses for the static case. The main idea is that the safety of a defense technique
  can be formally evaluated only if the context, i.e., the assumptions about the
  adversary’s external knowledge, is explicitly stated. Following this methodology,
  in this paper, a context CH is given by three functions Ident, N um, and L, that
  is
                                 CH = (Ident, N um, L).
  In the next section, we formalize the attack on the generalized requests that an
  adversary can perform in a context CH .
      A consequence of restricting to context CH is that, analogously to the re-
  lated work in this area, we focus our attention on using only ST data as a quasi-
  identifier. Intuitively, a quasi-identifier in a request is a combination of values
  that can be used to provide more information on who the actual issuer of a
  request may be than without these values. For example, if the Ident function is
  given, the ST Data in the request is a quasi-identifier as it may provide infor-
  mation on the actual issuer, as shown in the next subsection. In principle, any
  information contained in a request should be carefully analyzed to see if it may
  serve as a quasi-identifier. For example, the IDdata part is an obvious target,
  and some service specific parameters may be used to link the request to users.
  However, these aspects are outside the scope of this paper.


                                          66
On the Impact of User Movement Simulations

  2.3   Privacy Evaluation
  The general question for this subsection is, given a set of generalized requests
  and a context CH , how much privacy the users who issued these requests have.
     We want to find the following function:
                     Att : the Request set × the U sers −→ [0, 1],
  Intuitively, given a (generalized) request r0 and a user i, Att(r0 , i) gives the
  probability that the adversary can derive from CH that i is the issuer of r0
  among all the users.
      In the following of this section we show how to specify the attack function
  for context CH . Once the attack function is specified, we can use the following
  formula to evaluate the privacy value of a request:
                       P rivacy(r0 ) = 1 − AttCH (r0 , issuer(r0 ))                 (1)
  Intuitively, this value is the probability that the attacker will not associate the
  issuer of request r0 to r0 .
      In order to specify the Att function, we introduce the function Inside(i, r0 )
  that indicates the probability of user i to be located in r0 .Sdata at the time
  of the request. Intuitively, Inside(i, r0 ) = 1 if user i is identified by the ad-
  versary as one of the users that are located in r0 .Sdata at time r0 .T data, i.e.,
  i ∈ Identt (r0 .Sdata) when t = r0 .T data. On the contrary, Inside(i, r0 ) = 0 if
  i is recognized by the adversary as one of the users located outside r0 .Sdata
  at time r0 .T data, i.e., there exists an area A with A ∩ r0 .Sdata = ∅ such that
  i ∈ Ident(A). Finally, if neither of the above cases hold, then the adversary does
  not know where i is. There is still a probability that i is in r0 .Sdata. Theoreti-
  cally, this probability is the number of users in r0 .Sdata that are not recognized
  by the adversary (i.e., N um(r0 .Sdata) − |Ident(r0 .Sdata)|) divided by all the
  users who are not recognized by the adversary anywhere (i.e., |I| − |Ident(Ω)|,
  where I is the set of all users, and Ω is the entire area for the application).
  Formally,
                                                  0
                             1 if i ∈ Ident(r .Sdata)
                                                 0
           Inside(i, r0 ) = 0 if ∃A      : A ∩ r .Sdata = ∅ and i ∈ Ident(A)      (2)
                             N um(r0 .Sdata)−|Ident(r0 .Sdata)|
                                        |I|−|Ident(Ω)|           otherwise

      We can now define the Att function in context CH . For the sake of presenta-
  tion, let us first consider the attack in the snapshot context
                            Csnap = (Ident, N um, Lsnap ),
  where for each generalized request r0 , Lsnap (r0 ) = {r0 }. In this special case, the
  probability of a user i of being the issuer of r0 is given by the probability of i
  being in r0 .Sdata at the time of the request, normalized among all the users in
  I. Formally, the attack can be defined as:
                                                   Inside(i, r0 )
                         AttCsnap (r0 , i) = P                  0 0
                                                                                    (3)
                                                 i0 ∈I Inside(i , r )


                                          67
On the Impact of User Movement Simulations

     When the total population of users is large (relative to the number of users
  whose locations are known to the adversary), then the “otherwise” case in For-
  mula 2 is very small, albeit nonzero. Intuitively, if a user i falls into this case,
  then the adversary cannot really distinguish this particular user from all other
  users who also fall into this case. For such a user i, we can simply give a value
  1/|I| to AttCsnap (r0 , i). We could give 1/(|I| − N um(Ω)), but this does not make
  much impact in practice. Now it’s easy to see that
                                  
                                   1/N um(r0 .Sdata) if Inside(i, r0 ) = 1
                            0
              AttCsnap (r , i) ≈ 0                      if Inside(i, r0 ) = 0      (4)
                                  
                                    1/|I|               otherwise
  The above formula makes intuitively sense. Indeed, if i is recognized as in-
  side r0 .Sdata, without any other information, the adversary cannot distinguish
  him/her from any of the N um(r0 .Sdata) people in the area who might be the
  issuer. If i is recognized outside, then clearly i cannot be the issuer due to our
  definition of (generalized) requests. If i is not recognized anywhere (meaning
  he/she can be anywhere), then the attacker cannot distinguish him/her from
  any of the other people who are not recognized. Since we assume the total popu-
  lation is much greater than N um(Ω), the probability that i is the issuer is close
  to 1/|I|.
  Example 1. Consider the situation shown in Figure 1(a) in which there is the
  request r0 such that, at time r0 .T data, there are three users in r0 .Sdata: one of
  them is identified as i1 , the other two are not identified. The adversary can also
  identify users i2 and i3 outside r0 .Sdata at time r0 .T data. Assume that the set
  I contains 100 users.


          (a) First request, r0 .                           (b) Second request, r00 .

                                    Fig. 1. Example of attack


     Clearly, i2 and i3 have zero probability of being the issuers, since they are
  identified outside r0 .Sdata and due to the assumption that the spatial region


                                              68
On the Impact of User Movement Simulations

  of any generalized request must contain the spatial region of the original re-
  quest. On the contrary, the adversary is sure about the fact that i1 is lo-
  cated in Pr0 .Sdata. By Equation 3, the attack associates i1 to r0 with likeli-
  hood 1/( i0 ∈I Inside(i0 , r0 )). By Formula
                                          P           2, for each user i in I \ {i1 , i2 , i3 },
  Inside(i, r0 ) = 2/100. Therefore,                         0 0
                                              i0 ∈I Inside(i , r ) = 97 ∗ 2/100 + 1 ≈ 3.
  Consequently, the probability of i1 to be the issuer of r0 is approximately 1/3.
  Moreover, each user i ∈ I \ {i1 , i2 , i3 } has a probability to be the issuer of about
  (2/100)/3 = 2/300.
     In the general case L(r0 ) ⊇ {r0 }, we can evaluate, analogously to the snapshot
  case, the probability that a user is located in the generalized region of all the
  requests in the trace τ = L(r0 ). So, we can extend the Inside function to traces
  where, given a trace τ and a user i, Inside(i, τ ) is the probability that user i is
  located, for each request r0 in τ , in r0 .ST data. Then, the attack is
                                                 Inside(i, L(r0 ))
                          AttCH (r0 , i) = P                 0     0
                                                                                            (5)
                                               i0 ∈I Inside(i , L(r ))

         We now turn to consider how to compute Inside(i, τ ).
         First consider some easy cases. If i ∈ Ident(r0 ) for all requests r0 ∈ τ , then
  Inside(i, τ ) = 1. If i ∈ Ident(A) and A ∩ r0 .Sdata = ∅ for an area A and at least
  one requests r0 ∈ τ , then Inside(i, τ ) = 0.
         The rest of cases are difficult ones. To calculate Inside(i, τ ), we need to
  consider the likelihood of someone moving from one location to/from another
  in the specific times. In this paper, we advocate the following as a reasonable
  approach. We assume for each pair of locations A and B and two times t1 and
  t2 , we know the probability of a user i being in B at time t2 conditioned on the
  fact that the user is in A at time t1 . In formalism, consider two random variables
  X: “i is inside A at time t1 ” and Y : “i is inside B at time t2 ”, where A and B
  are two areas and t1 and t2 are two different times. We assume the adversary
  knows the value P (Y |X).
         We note that P (Y |X) in general is not the same as P (Y ). Indeed, how likely
  user i is in B can depend on how likely the same user is in A. Take two extreme
  examples: if A and B are very far away and t1 and t2 are close to each other,
  then i cannot be in B at t2 if i is in A at t1 , i.e., P (Y |X) ≈ 0. On the other
  hand, if A and B are just two locations along a one-way road and the difference
  between times t1 and t2 matches the time needed to move from A to B with a
  normal moving speed, then P (Y |X) ≈ 1. In practice, this value can be derived
  from historical observations and experiences.
         Now, assume τ consists of the requests r10 , . . . , rk0 . We form a Bayesian net-
  work for each user i with X1 , . . ., Xk as the nodes, where each Xj corresponds
  to the random variable: “user i is in rj .Sdata at time rj .T data”. In this network,
  for each node Xh that satisfies the condition (denoted c) i ∈ Identt (rh0 .Sdata)
  with t = rh0 .T data, we draw an arc towards each other node Xh0 which does
  not satisfy condition C. In addition, for each pair of nodes rh0 and rh00 such that
  neither satisfy condition c, we draw an arc from Xh0 to Xh00 if the rh0 0 .T data <
  rh0 00 .T data. (The resulting network is acyclic.) As we have assumed, we know the


                                              69
On the Impact of User Movement Simulations

  value P (Xh0 |Xh ) for each arc Xh to Xh0 . Denote by E the conjunctive fact that
  P (Xh ) = 1 for each rh ∈ τ that satisfies condition c. What we want to find is
  P (X1 , . . . , Xk |E) = Inside(i, τ ). This is a well-studied belief revision problem,
  and many computation and approximation methods exist. (Note that if we apply
  this method to the easier cases mentioned earlier, we would arrive at the correct
  values.)

  Example 2. Continue from Example 1 and assume a second request r00 (see
  Figure 1(b)) is issued after r0 and that r00 is linked with r0 , so τ consists of
  these two requests. At time r00 .T data, there are 4 users inside r00 .Sdata, two of
  which are identified as i1 and i2 . No user is identified outside r00 .Sdata. From
  the above discussion, it follows that Inside(i2 , τ ) = Inside(i3 , τ ) = 0 since
  i2 and i3 are identified outside the first generalized request r0 . All the other
  users have a non-zero probability of being inside the generalized region of each
  request in the trace. In particular, Inside(i1 , τ ) = 1 since i1 is recognized in
  both requests. Consider a user i ∈ I \ {i1 , i2 , i3 }, and denote X and Y being
  the assertion that “i is in r0 .Sdata at time r0 .T data” and “i is in r00 .Sdata at
  time r00 .T data”. In this case, the Bayesian network for i has two nodes Xr0 and
  Xr00 , and there is an arc from Xr0 to Xr00 since r00 is issued after r0 is. Now
  let us assume P (Xr00 |Xr0 ) = 0.75, i.e., there is a 75% likelihood that some-
  one in r0 .Sdata will move to r00 .Sdata at the specified times. Now compute
  Inside(i, τ ) = P (Xr0 , Xr00 ) = P (Xr0 )P (Xr00 |Xr0 ) = 2/97 ∗ 0.75. Now the sum of
  all the Inside(j, τ ) value is 1 + 0 + 0 + 97 ∗ 2/97 ∗ 0.75 = 2.5. The attack value
  under these assumptions then is as follows: For AttCH (r00 , i1 ) = 1/2.5 = 40%,
  AttCH (r00 , i2 ) = AttCH (r00 , i3 ) = 0, and AttCH (r00 , i) = (2/97) ∗ .75/2.5 ≈ 0.6%
  for all other 97 users i.
       To make the situation more interesting, let us remove the fact that i2 was rec-
  ognized outside at time r0 .T data, and we want to figure out the value Inside(i2 , τ ).
  In this case, let us assume P (Xr0 |Xr00 ) = 0.75, namely people who are in
  r00 .Sdata have a 75% likelihood to be from r0 .Sdata. Under the fact E that i2 is in
  r00 .Sdata, then we know Inside(i2 , τ ) = P (Xr0 , Xr00 |E) = 0.75. Then the sum of
  Inside values is 1+0.75+0+97∗2/97∗0.75 = 3.25. Hence, AttCH (r00 , i1 ) ≈ 31%,
  AttCH (r00 , i2 ) ≈ 23%, AttCH (r00 , i3 ) = 0, and AttCH (r00 , i) = (2/97) ∗ 0.75/3.25 ≈
  0.47% for each other 97 users i. This is an interesting exercise as it reveals that if
  we add i2 to be possibly in r0 .Sdata (with 75% probability), then the likelihood
  that i1 is the issuer decreases, which is intuitively correct.

      It is worth noting that the definition of attack in context CH is a proper
  extension of the attack that can be defined in the conservative context in which
  the adversary knows the location and the identity of each user in each time
  instant. The historical attack in this context was first proposed in [14]. The idea
  is that the only users that have non-zero probability of being the issuer of a
  trace of requests are those whose spatio-temporal location is contained in the
  generalized region of every request in the trace. It can be easily seen that, if each
  user can be identified at each time instant, then the Inside() function returns
  either 0 or 1 and hence the attack we specified for context CH assigns a zero


                                            70
On the Impact of User Movement Simulations

  probability to each user that is located outside the generalized region of any
  request in the trace.

  3     The MilanoByNight simulation
  In order to evaluate privacy-preserving techniques applied to LBS, a dataset of
  users’ movements is needed. In our experiments, we want to focus on privacy
  threats that arise when using a friend finder service, as described in Section 1. We
  suppose that this kind of service is primarily used by people during entertainment
  hours, especially at night. Therefore, the ideal dataset for our experiments should
  represent movements of people on a typical Friday or Saturday night in a big city,
  when users tend to move to entertainment places. To our knowledge, currently
  there are no datasets like this publicly available, specially considering that we
  want to have large scale, individual, and precise location data (i.e., with the
  same approximation of current consumer GPS technology). In this section we
  describe how we generated this user movement dataset.

  3.1    Relevant Parameters
  For our experiments we want to artificially generate movements for 100, 000 users
  on the road network of Milan5 . The total area of the map is 324 km2 , and the
  resulting average density is 308 users/km2 . Very detailed digital vector maps
  of the city have been generously provided by the municipality of Milan. The
  simulation includes a total of 30, 000 home buildings and 1, 000 entertainment
  places; the first value is strictly related to the considered number of users, while
  the second is based on real data from public sources which also provide the
  geographical distribution of the places. Our simulation starts at 7 pm and ends at
  1 am. During these hours, each user moves from house to an entertainment place,
  spends some time in that place, and possibly moves to another entertainment
  place or go back home.
      All probabilities related to users’ choices are modeled with a probability
  distributions. For this specific data generation, some of the important parameters
  of the simulation are:
   – Source and destination. These are the locations essential to define move-
     ments. They may be homes or entertainment places. Some places in some
     districts are more popular than others.
   – StartingTime. The time at which a user leaves her home to go to the first
     entertainment place.
   – Permanence. How long will a user stay at one entertainment place?
   – NumPlaces. How many entertainment places will a user visit on one night?
      In order to have a realistic model of these distributions, we prepared a survey
  to collect real users data. We are still collecting data, but the current parameters
  are based on interviews of more than 300 people in our target category.
  5
      100, 000 is an estimation of the number of people participating in the service we
      consider.


                                           71
On the Impact of User Movement Simulations

  3.2   Weaknesses of mostly random movement simulations

  Many papers in the field of privacy preservation in LBS use artificial data gen-
  erated by moving object simulators to evaluate their techniques. However, most
  of the simulators are usually not able to reproduce a realistic behavior of users.
  For example, objects generated by the Brinkhoff generator [3] cannot be ag-
  gregated in certain places (e.g., entertainment places). Indeed, once an object
  is instantiated, the generator chooses a random destination point on the map;
  after reaching the destination, the object disappears from the dataset. For the
  same reason, it is not possible to reproduce simple movement patterns (e.g.: a
  user going out from her home to another place and then coming back home),
  nor to simulate that a user remains for a certain time in a place.
      Despite these strong limitations, we made our best effort to use the Brinkhoff
  simulator to generate a set of user movements with characteristics as close as
  possible to those explained in Section 3.1. For example, in order to simulate
  entertainment places, some random points on the map, among those points on
  the trajectories of users, were picked. The simulation has the main purpose of
  understanding if testing privacy preservation over random movement simulations
  gives significantly different results with respect to more realistic simulations.


  3.3   Generation of user movements with a context simulator

  In order to obtain a dataset consistent with the parameters specified in Sec-
  tion 3.1, we need a more sophisticated simulator. For our experiments, we have
  chosen to customize the Siafu context simulator [4]. With a context simulator
  it is possible to design models for agents, places and context. Therefore, it is
  possible to define particular places of aggregation and make users dynamically
  choose which place to reach and how long to stay in that place. In our simulation
  homes are distributed almost uniformly on the map, with a minor concentration
  on the central zones of the city. Entertainment places are mostly concentrated
  in 5 zones of the city.
      The distributions for StartingTime, Permanence and NumPlaces parameters
  introduced in Section 3.1 were modeled with the results of the survey. For exam-
  ple, the time of permanence in an entertainment place was modeled according
  to the following percentages derived from the survey: 9.17% of the users stays
  less than 1 hour, 34.20% stays between 1 and 2 hours, 32.92% stays between 2
  and 3 hours, 16.04% stays between 3 and 4 hours, and 7.68% stays more than 4
  hours.
      Following these parameters, in our dataset users spend 50.87% of the time
  at home, 7.28% of the time moving from one place to another and 41.85% of the
  time in entertainment places. When a user moves from one place to another, she
  decides whether to go on foot or by car. In general, if an entertainment place
  is farther than 500 meters, people tend to move by car, and this is reflected in
  the simulation. The average speed of movements by car is 20 km/h, while the
  average speed on foot is 3.6 km/h. With our parameters 10.64% of movements
  are done on foot, while all the others are done by car.


                                        72
On the Impact of User Movement Simulations

  4     Experimental results
  In this section we show the results of our experimental evaluation. We first
  define how we evaluate the quality of service in Section 4.1, then we describe
  the experimental setting in Section 4.2 and the generalization algorithms we
  used in Section 4.3. Finally, in Sections 4.4 and 4.5 we show the impact of the
  simulation parameters and of the user movements, respectively, in the evaluation
  of the generalization algorithms.

  4.1   Evaluation of the Quality of Service
  Different metrics can be defined to measure QoS for different kind of services. For
  instance, for the friend-finder service we are considering, it would be possible to
  measure how many times the generalization leads the SP to return an incorrect
  result i.e., the issuer is not notified of a close-by friend or, vice versa, the issuer
  is notified for a friend that is not close-by. While this metric is useful for this
  specific application, we want to measure the QoS independently from the specific
  kind of service. For this reason, in this paper we evaluate how QoS degrades in
  terms of the perimeter of the generalized region. If the generalized region is
  too large, the service becomes useless. For this purpose, we introduce a new
  parameter, called maxP , that indicates this threshold in terms of the maximum
  perimeter. We assume that no request is sent to the SP with a perimeter larger
  than maxP .

  4.2   Experimental settings
  In our experiments we used two datasets of users movements. The dataset AB
  (Agent-Based) was generated with the customized Siafu simulator as described
  in Section 3.3, while the dataset M RM (Mostly Random Movement) was created
  with the Brinkhoff simulator as described in Section 3.2. In both cases, we sim-
  ulate LBS requests for the friend-finder service by choosing random users in the
  simulation, we compute for each request the generalization according to a given
  algorithm, we evaluate QoS as explained in Section 4.1 and privacy according to
  formula (1) presented in Section 2.
      The most important parameters that characterize the simulations are re-
  ported in Table 1, with the values in bold denoting default values. The number
  of users indicates how many users are in the simulation, and the simulations
  are designed so that this number remains almost constant at each time instant.
  In every two minutes, each user has a probability Preq of issuing a request. For
  technical reasons, the reported tests are based on a time frame of three hours
  over the total six hours of the MilanoByNight scenario. This implies that in the
  default case we consider a total of 45 requests (one every four minutes of the
  considered time frame). The parameter Pid−in indicates the probability that a
  user is identified when she is located in a entertainment place while Pid−out is
  the probability that a user is identified in any other location (e.g., while moving
  from home to a entertainment place). While we also perform experiments where


                                           73
On the Impact of User Movement Simulations

  the two probabilities are the same, our scenario suggests as much more realistic
  a higher value for Pid−in (it is considered ten times higher than Pid−out ). This
  is due to the fact that restaurants, pubs, movie theaters, and similar places are
  likely to have different ways to identify people (fidelity or membership cards, wifi
  hotspots, cameras, credit card payments, etc.) and in several cases more than
  one place is owned by the same company that may have an interest in collecting
  data about its customers.
      Finally, Plink indicates the probability that two consecutive requests can be
  identified as issued by the same user.6 While we perform our tests considering
  a full range of values, the specific default value reported in the table is due
  to a recent study on the ability of linking positions based on spatio-temporal
  correlation [18].


                                 Table 1. Parameter values


             Parameter                              Values
               dataset                            AB, MRM
           number of users 10k, 20k, 30k, 40k, 50k, 60k, 70k, 80k, 90k, 100k
                Preq            0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0
               Pid−in           0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0
               Pid−out    0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1
                Plink        0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.87, 0.9, 1.0


     The experimental results we show in this section are obtained by running the
  simulation for 100 issuers and then computing the average values.


  4.3     The Generalization Algorithms Used in the Experiments

  In our experiments we evaluate the privacy and the QoS of requests generalized
  by using two algorithms previously proposed in the literature. The first one,
  called Grid, was presented in [19], and it is used as a representative of several
  algorithms aimed at guaranteeing k-anonymity in the snapshot case, i.e., these
  algorithms do not take into account link ability of the adversary. Intuitively,
  this particular algorithm partitions all users into blocks, each one having at
  least cardinality k. Then, it computes the generalized region as the minimum
  bounding rectangle (MBR) that covers the location of the users in the same
  block as the issuer.
      The second algorithm, Greedy, was first proposed in [14] and a similar idea
  was also described in [15]. The use of Greedy is intended to represent algorithms
  aimed at preserving privacy in the historical case, i.e., the general CH context,
  6
      The limitation to consecutive requests is because in our specific scenario we assume
      linking is performed mainly through spatio-temporal correlation.


                                            74
On the Impact of User Movement Simulations

  assuming that the attacker may actually obtain and recognize traces of requests
  from the same issuer. This algorithm computes the generalization of the first
  request r in a trace using an algorithm for the snapshot case. While doing this,
  the set A of users located in the generalized region is stored. The generalized
  regions of the successive request r0 linked with r is then computed as the MBR
  of the location of the users in A at the time of r0 . In our implementation we
  use Grid as the snapshot algorithm to compute the generalization of the first
  request.
      For the purpose of our tests, we modified the two algorithms above so that
  each generalized region has a perimeter always smaller than maxP . To achieve
  this, if the perimeter of the generalized region is larger than maxP , then the
  region is iteratively shrunk, until its perimeter is below maxP , by excluding
  from the MBR the user that is farther from the issuer. In the Greedy algorithm,
  when a user is excluded from the generalized region, then it is also excluded from
  the set A of users, and hence he is not used in the generalization of the succes-
  sive requests. Eventually, when the set A contains the issuer only, a snapshot
  generalization is executed again and A is reinitialized.
      In addition to the input request r, and the location of all the users in the
  system, the considered algorithms require two additional parameters: the value
  k, and the threshold maxP . In our tests, we used values for k between 10 and
  60 (default is 10) and values for maxP between 1000 to 4000 meters (default is
  1000 meters).
      In our experimental results we also evaluated the privacy threat when no
  privacy preserving algorithm is applied. The label NoAlg is used in the figures
  to identify results in this particular case.

  4.4   Impact of Simulation Parameters in the Evaluation of the
        Generalization Algorithms
  The objective of the first set of experimental results we present is to show which
  parameters of the simulation affect most the evaluation of the generalization
  algorithms. In these tests we used the AB dataset only.
      Figure 2(a) shows that the average privacy obtained with Greedy and Grid
  is not significantly affected by the size of the total population. Indeed, both
  algorithms, independently from the total number of users, try to have generalized
  regions that cover the location of k users, so the privacy of the requests is not
  affected. However, when the density is high, the two algorithms can generalize
  to a small area, while when the density is low, a larger area is necessary to
  cover the location of k users (see Figure 2(b)). On the contrary, the privacy
  obtained when no generalization is performed is significantly affected by the
  total population. Indeed, a higher density increases the probability of different
  users to be in the same location and hence it increases privacy also if the requests
  are not generalized.
      A parameter that significantly affects the average privacy is the probability
  of identification of a user in a certain place. In Figure 3 we show the experi-
  mental results for different values of Pid−in when, in each test, Pid−out is set to


                                         75
On the Impact of User Movement Simulations

                         1                                                                                      700
                                                                                                                                       Greedy
                                                                                                                                          Grid


                                                                                        Average perimeter (m)
                       0.9
                                                                                                                600
    Average privacy


                       0.8
                                                                                                                500
                       0.7
                                                                                                                400
                       0.6                                 Greedy
                                                              Grid
                                                            NoAlg                                               300
                       0.5
                         10000    40000       70000                         100000                               10000      40000       70000    100000
                                   Number of users                                                                           Number of users
                             (a) Average privacy.                                                                     (b) Average perimeter.

                      Fig. 2. Performance evaluation for different values of the total population.


  Pid−in /10. As expected, considering a trace of requests, the higher is the proba-
  bility of identifying users in one or more of the regions from which the requests
  in the trace were performed, the smaller is the level of privacy.


                                                              1
                                                            0.9
                                                            0.8
                                         Average privacy


                                                            0.7
                                                            0.6
                                                            0.5
                                                            0.4      Greedy
                                                            0.3         Grid
                                                                      NoAlg
                                                            0.2
                                                                  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9                        1
                                                                       Probability of identification (inside)


                      Fig. 3. Average privacy for different values of Pid−in (Pid−out = Pid−in /10).


      Figure 4(a) shows the impact of Plink on the average privacy. As expected,
  high values of Plink lead to small values of privacy. Our results show that the
  relation between the Plink and privacy is not linear. Indeed, privacy depends
  almost linearly on the average length of the traces identified by the adversary
  (Figure 4(b)). However, the average length of the traces grows almost exponen-
  tially with the value of Plink (Figure 5).
      To summarize the first set of experiments, our findings show that many pa-
  rameters of the simulation significantly affect the evaluation of the generalization
  algorithms. This implies that when a generalization algorithm is evaluated it is
  necessary to carefully estimate realistic values for the parameters of the simula-
  tion. Indeed, an error in the estimation may lead to misleading results.


                                                                                  76
On the Impact of User Movement Simulations

                       1                                                                                            1

                      0.8                                                                                          0.8
    Average privacy


                                                                                                 Average privacy
                      0.6                                                                                          0.6

                      0.4                                                                                          0.4

                      0.2   Greedy                                                                                 0.2   Greedy
                               Grid                                                                                         Grid
                             NoAlg                                                                                        NoAlg
                       0                                                                                            0
                        0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1                                                            5   10 15 20 25 30 35 40           45
                        Probability of linking two consecutive requests                                                      Average length of the traces
   (a) Average privacy as a function of Plink . (b) Average privacy as a function of the
                                                average trace length.

                               Fig. 4. Performance evaluation for different values of Plink .

                                                                     45
                                                                     40
                                              Average trace length


                                                                     35
                                                                     30
                                                                     25
                                                                     20
                                                                     15
                                                                     10
                                                                     5
                                                                     0
                                                                          0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9                      1
                                                                          Probability of linking two consecutive requests


                                      Fig. 5. Average trace length as a function of Plink .


  4.5                 Impact of the User Movements on the Evaluation of the
                      Generalization Algorithms

  The objective of the second set of experiments is to answer an important ques-
  tion posed in this paper: what is the impact of the different simulated user
  movements on the evaluation of the Generalization Algorithms? We answer to
  this question with a set of tests performed on the two different datasets we
  obtained as described above.
      The first set of tests, reported in in Figure 6, compares the privacy achieved
  by the Greedy algorithm on the two datasets for different values of k and for
  different values of QoS. The experiments on M RM were repeated trying also
  larger values for the QoS threshold (maxP = 2000 and maxP = 4000), so three
  different versions of M RM appear in the figures. In order to focus on these
  parameters only, in these tests the probability of identification was set to the
  same value for any place (Pid−in = Pid−out = 0.1), and for the M RM dataset
  the issuer of the requests was randomly chosen only among those that stay
  in the simulation for 3 hours, ignoring the ones staying for much shorter time
  that inevitably are part of this dataset. This setting allowed us to compare the


                                                                                          77
On the Impact of User Movement Simulations

  results on the two datasets using the same average length of traces identified by
  the adversary.
      Figure 6(a) shows that the average privacy of the algorithm evaluated on
  the AB dataset is much higher than on the M RM dataset. This is mainly
  motivated by the fact that in AB users tend to concentrate in a few locations
  (the entertainment places) and this enhances privacy. This is also confirmed by a
  similar test performed without using any generalization of locations; we obtained
  values constantly higher for the AB dataset (the average privacy is 0.67 in AB
  and 0.55 in M RM ).
      In Figure 6(b) we show the QoS achieved by the algorithm in the two datasets
  with respect to the average privacy achieved. This result confirms that the level
  of privacy evaluated on the AB dataset using small values of k and maxP for
  the algorithm cannot be observed on the M RM dataset even with much higher
  values for these parameters.
      From the experiments shown in Figure 6 we can conclude that if the M RM
  dataset is used as a benchmark to estimate the values of k and maxP that
  are necessary to provide a desired average level of privacy, then the results will
  suggest the use of values that are over-protective. As a consequence, it is possible
  that the service will exhibit a much lower QoS than the one that could be
  achieved with the same algorithm.


                       1                                                                   1

                                                                                         0.95
    Average privacy


                                                                       Average privacy


                      0.9                                                                 0.9

                                                                                         0.85

                      0.8                                                                 0.8
                                       AB. maxP=1000                                                         AB. maxP=1000
                                      MRM. maxP=1000                                                        MRM. maxP=1000
                                      MRM. maxP=2000                                     0.75               MRM. maxP=2000
                                      MRM. maxP=4000                                                        MRM. maxP=4000
                      0.7                                                                 0.7
                            10   20     30       40    50   60                                  0   400     800   1200 1600 2000 2400
                                             k                                                            Average perimeter (m)
    (a) Average privacy as a function of k.                           (b) Average privacy as a function of the
                                                                      average perimeter.

  Fig. 6. Evaluation of the Greedy algorithm using AB and M RM data sets. Pid−in =
  Pid−out = 0.1


      The above results may still support the safety of using M RM , since according
  to what we have seen above a technique achieving a certain level of privacy may
  only do better in a real scenario. However, our second set of experiments shows
  that this is not the case.
      In Figure 7 we show the results we obtained by varying the probability of
  identification. For this test, we considered two sets of issuers in the M RM data
  set. One set is composed by users that stay in the simulation for 3 hours, (M RM
  long traces, in Figure 7), while the other contains issuers randomly chosen in the


                                                                 78
On the Impact of User Movement Simulations

  entire set of users (M RM all traces, in Figure 7), hence including users staying
  in the simulation for a much shorter time.
      In Figure 7(a) and 7(b) we can observe that the execution on the M RM
  dataset leads to evaluate a privacy level that is higher than the one obtained
  on the AB dataset. In particular, the evaluation of the Grid algorithm using
  the M RM dataset (Figure 7(b)), would suggest that the algorithm is able to
  provide a high privacy protection. However, when evaluating the same algorithm
  using the more realistic dataset AB, this conclusion seems to be incorrect. In
  this case, the evaluation on the M RM dataset may lead to underestimate the
  privacy risk, and hence to deploy services based on generalization algorithms
  that may not provide the minimum required level of privacy.


                            1                                                                            1
                          0.98
                          0.96                                                                          0.9
      Average privacy


                                                                                      Average privacy
                          0.94
                                                                                                        0.8
                          0.92
                           0.9
                                                                                                        0.7
                          0.88
                          0.86                     AB                                                   0.6                     AB
                          0.84         MRM long traces                                                              MRM long traces
                                        MRM all traces                                                               MRM all traces
                          0.82                                                                          0.5
                                 0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8                                0.1    0.2   0.3   0.4   0.5   0.6   0.7   0.8
                                                   P_id_in                                                                       P_id_in
                                  (a) Greedy algorithm.                                                             (b) Grid algorithm.

                        Fig. 7. Average privacy using AB and M RM data sets. Pid−out = Pid−in /10.


  5                     Conclusions and open issues
  In this paper we claim that the experimental evaluation of LBS privacy preserv-
  ing techniques should be based on user movement datasets obtained through
  simulations tailored to the specific deployment scenario of the target services.
  Our results support our thesis for the class of LBS known as friend-finder ser-
  vices, for techniques based on spatial cloaking, and for adversary models that
  include the possibility for the adversary to occasionally recognize people in cer-
  tain locations. We believe that these results can be generalized to other LBS,
  techniques and adversary models. For example, as a future work, it would be
  interesting to also evaluate some defense techniques that generalize the issuer’s
  location to an area that does not necessarily contain the issuer’s location. More-
  over, in our experiments we only considered the first of the two privacy threats
  presented in the introduction. We do have some ideas on how to extend them
  to consider the second, location privacy, as well. Finally, we believe a signifi-
  cant effort should be devoted to the development of new flexible and efficient
  context-aware user movement simulators, as well as to the collection of real


                                                                                 79
On the Impact of User Movement Simulations

  data, possibly even in an aggregated form, to properly tune the simulations. In
  our opinion this is a necessary step to have significant common benchmarks to
  evaluate LBS privacy preserving techniques.

  Acknowledgments
  The authors would like to thank Stefano Varesi for his contribution in writing
  the code that was used for our simulations. This work was partially supported
  by National Science Foundation (NSF) under grant N. CNS-0716567, and by
  Italian MIUR under grant InterLink II04C0EC1D.

  References
   1. Barkhuus, L., Dey, A.: Location-based services for mobile telephony: a study of
      users privacy concerns. In: Proc. of the 9th International Conference on Human-
      Computer Interaction, IOS Press (2003) 709–712
   2. Bettini, C., Mascetti, S., Wang, X.S., Jajodia, S.: Anonymity in location-based
      services: towards a general framework. In: Proc. of the 8th International Conference
      on Mobile Data Management, IEEE Computer Society (2007)
   3. Brinkhoff, T.: A framework for generating network-based moving objects. GeoIn-
      formatica 6(2) (2002) 153–180
   4. Martin, M., Nurmi, P.: A generic large scale simulator for ubiquitous comput-
      ing. In: 3rd Annual International Conference on Mobile and Ubiquitous Systems:
      Networking & Services, IEEE Computer Society (July 2006)
   5. Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.L.: Understanding individual human
      mobility patterns. Nature 453 (June 2008) 779–782
   6. Ghinita, G., Kalnis, P., Khoshgozaran, A., Shahabi, C., Tan, K.L.: Private queries
      in location based services: Anonymizers are not necessary. In: Proc. of SIGMOD,
      ACM Press (2008)
   7. Gruteser, M., Grunwald, D.: Anonymous usage of location-based services through
      spatial and temporal cloaking. In: Proc. of the 1st International Conference on
      Mobile Systems, Applications and Services (MobiSys), The USENIX Association
      (2003)
   8. Mokbel, M.F., Chow, C.Y., Aref, W.G.: The new casper: query processing for
      location services without compromising privacy. In: Proc. of the 32nd International
      Conference on Very Large Data Bases, VLDB Endowment (2006) 763–774
   9. Gedik, B., Liu, L.: Protecting location privacy with personalized k-anonymity:
      Architecture and algorithms. IEEE Transactions on Mobile Computing 7(1) (2008)
      1–18
  10. Kalnis, P., Ghinita, G., Mouratidis, K., Papadias, D.: Preventing location-based
      identity inference in anonymous spatial queries. IEEE Transactions on Knowledge
      and Data Engineering 19(12) (2007) 1719–1733
  11. Kido, H., Yanagisawa, Y., Satoh, T.: Protection of location privacy using dummies
      for location-based services. In: Proc. of the 21st International Conference on Data
      Engineering Workshops, IEEE Computer Society (2005)
  12. Yiu, M.L., Jensen, C.S., Huang, X., Lu, H.: Spacetwist: Managing the trade-
      offs among location privacy, query performance, and query accuracy in mobile
      services. In: Proc. of the 24th International Conference on Data Engineering,
      IEEE Computer Society (2008)


                                           80
On the Impact of User Movement Simulations

  13. Ardagna, C.A., Cremonini, M., Damiani, E., di Vimercati, S.D.C., Samarati, P.:
      Location privacy protection through obfuscation-based techniques. In: Proc. of the
      21st Conference on Data and Applications Security. Volume 4602 of Lecture Notes
      in Computer Science., Springer (2007) 47–60
  14. Bettini, C., Wang, X.S., Jajodia, S.: Protecting privacy against location-based
      personal identification. In: Proc. of the 2nd workshop on Secure Data Management.
      Volume 3674 of LNCS., Springer (2005) 185–199
  15. Xu, T., Cai, Y.: Location anonymity in continuous location-based services. In:
      Proc. of ACM International Symposium on Advances in Geographic Information
      Systems, ACM Press (2007)
  16. Abul, O., Bonchi, F., Nanni, M.: Never walk alone: Uncertainty for anonymity in
      moving objects databases. In: Proc. of the 24th International Conference on Data
      Engineering, IEEE Computer Society (2008)
  17. Terrovitis, M., Mamoulis, N.: Privacy preservation in the publication of trajecto-
      ries. In: Proc. of the 9th International Conference on Mobile Data Management,
      IEEE Computer Society (2008)
  18. Vyahhi, N., Bakiras, S., Kalnis, P., Ghinita, G.: Tracking moving objects in
      anonymized trajectories. In: Proc. of 19th International Conference on Database
      and Expert Systems Applications, Springer (2008, to Appear)
  19. Mascetti, S., Bettini, C., Freni, D., Wang, X.S.: Spatial generalization algorithms
      for lbs privacy preservation. Journal of Location Based Services 2(1) (2008)


                                           81
    A Multi-Path Approach for k-Anonymity in
             Mobile Hybrid Networks

        C.A. Ardagna1 A. Stavrou2 S. Jajodia2 P. Samarati1 R. Martin2
                             1
                               University of Milan, Italy
                         2
                             George Mason University, USA


     Abstract. The ubiquitous proliferation of mobile devices has given rise
     to novel user-centric applications and services. In current mobile sys-
     tems, users gain access to remote service providers over mobile network
     operators which are assumed to be trusted and not improperly use or
     disclose users’ information. In this paper, we remove this assumption,
     offering privacy protection of users’ requests again the prying eyes of
     the network operators, which we consider to be honest but curious. Fur-
     thermore, to prevent abuse of the communication privacy we provide,
     we elevate traffic accountability as a primary design requirement. We
     build on prior work on network k-anonymity and multi-path communi-
     cations to provide communications’ anonymity in a mobile environment.
     The resulting system protects users’ privacy while maintaining data in-
     tegrity and accountability. To verify the effectiveness of our approach and
     measure its overhead, we implemented a prototype of our system using
     WiFi-enabled devices. Our preliminary results indicate that the overall
     impact on the end-to-end latency is negligible, thus ensuring applicabil-
     ity of our solution to protect the privacy of real-time services including
     video streaming and voice activated services.


1   Introduction
Recent technology advancements in mobile and wireless devices have fos-
tered the development of a new wave of on-line and mobile services. Due
to their pervasive nature, these services are becoming increasingly pop-
ular and wide-spread. On the other hand, the accuracy, reliability and
performance of location sensing technologies, have raised concerns about
the protection of users’ privacy. Today, there are no mechanisms to pre-
vent wireless communications from being broadcasted to the neighboring
devices thus disclosing private information about the location of users.
The worst case scenario that analysts have foreseen as a consequence of
an unrestricted and unregulated availability of mobile technologies re-
calls the “Big Brother” stereotype: a society where the secondary effect
of mobile technologies – whose primary effect is to enable the develop-
ment of innovative and valuable services – becomes a form of implicit
total surveillance of individuals.


                                        82
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


                     (a)                                 (b)

  Fig. 1. Current privacy mechanisms (a) and our new vision of privacy
  (b)


      Some recent examples can provide an idea of the extend of the prob-
  lem. In September 2007, Capla Kesting Fine Art announced the plan of
  building a cell tower, near Brooklyn NY, able to capture, monitor and
  rebroadcast wireless signals, or in other terms eavesdrop WiFi commu-
  nications to ensure public safety [28]. Moreover, the US Congress ap-
  proved changes to the 1978 Foreign Intelligence Surveillance Act giving
  NSA authorization to monitor domestic phone conversations and e-mails
  including those stemming from the cellular network and Internet. This
  legislation provides the legal grounds for the cell tower’s construction,
  and for the monitoring of users communications in the cellular network.
      Current privacy protection systems are focused on preserving users
  from untrusted service providers. However, at the same time and assume
  mobile network operators to be trusted. In this paper, and to the best
  of our knowledge we are the first to do so, we assume mobile network
  operators to be honest but curious. Our approach builds on the con-
  cept of k-anonymity in the context of network communication but, unlike
  other approaches, aims at providing such anonymity against the mobile
  network operator, instead of against the service provider. Figure 1 illus-
  trate the difference between our approach and current solutions. Current
  solutions (see Figure 1(a)) use k-anonymity to protect the users during
  the communications with the service provider and consider the mobile
  network operator as a fully trusted party. However, the mobile network
  operator has access to precise location and traffic information for each
  user. In our approach (see Figure 1(b)), the mobile network operator is
  considered honest but curious and a k-anonymity mechanism is used to
  protect users’ privacy. The user can then decide if the service provider is
  assumed trusted. In the figure either 1-anonymity is preserved, if the ser-


                                       83
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


  vice provider is assumed trusted, or k-anonymity, if the service provider
  is assumed untrusted. Also, our work is different from traditional research
  in anonymous communications [6–8, 19], because it can be applied in a
  mobile infrastructure and is geared towards k-anonymity, not complete
  sender anonymity. In addition, we treat user and traffic accountability as
  a fundamental requirement of our approach making sure that each user is
  accountable for the services requested. Having a system that can enforce
  data accountability prevents unwanted traffic and provides economic in-
  centives for the deployment of privacy-preserving services.
      To achieve the aforementioned goals, we extend the concept of net-
  work k-anonymity to hybrid mobile networks. In such networks, users can
  simultaneously create WiFi point-to-point connections, join the cellular
  network, and access the Internet through their mobile phones. Using a
  multi-path communication paradigm [23], a mobile user can achieve net-
  work k-anonymity by distributing, using WiFi network, different packets
  of the same message to k neighboring mobile peers, which then forward
  the received packet through the cellular network. This scheme achieves
  k-anonymity because the mobile network operator is not able to asso-
  ciate the users’ data flow with fewer than k peers.A separate accounting
  mechanism can verify that the packets are legitimate. For instance, one
  approach is to have the data flow encrypted with a symmetric key shared
  between the requester and the service provider. This would assure ac-
  countability, data integrity, and confidentiality. In addition, it will prevent
  the abuse of anonymity [4] while providing the economic incentives to de-
  ploy anonymizing schemes. Of course, there is a clear trade-off between
  anonymity and latency overhead: the further we forward the packets, the
  better the anonymity is but the more is the latency overhead. To quantify
  that trade-off in practice, we have built a prototype of our system using
  WiFi-enabled cellphones.
      The remainder of this paper is organized as follows. Section 2 illus-
  trates the overall architecture. Section 3 discusses privacy requirements
  and challenges in the considered scenario and illustrates our solution.
  Section 4 discusses experimental results illustrating the impact of our so-
  lution on end-to-end communication. Section 5 discusses related work.
  Finally, Section 6 presents our conclusions.


  2   Overall Architecture

  Our reference model is a distributed and mobile infrastructure which
  forms a hybrid network [8, 9, 22], integrating both wireless, cellular and


                                       84
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


                     Fig. 2. Mobile Network Architecture


  wired technologies. Our scenario is based on mobile parties communi-
  cating through wireless and cellular protocols to access services, either
  co-located in the cellular network or in the Internet. Figure 2 illustrates
  the overall architecture and the participating entities, which are as fol-
  lows.


   – Mobile Users. They are human users that carry mobile devices sup-
     porting both GSM/3G and WiFi protocols for communication. They
     request services to providers available over the network.
   – Cellular Network (and corresponding Mobile Network Operators). It
     is composed of multiple radio cells (also known as cell-phone towers),
     which provide network access and services to mobile users. The cellular
     network acts as a gateway between mobile users and service providers.
   – Service Provider. It is the entity that provides on-line services to the
     mobile users and collects their personal information before granting
     an access to its services.


      Mobile users establish ad-hoc (WiFi) point-to-point connections with
  other mobile peers in the network, resulting in several Mobile Ad-Hoc
  Networks (MANETs), represented by the dashed rectangles in Figure 2.
  Also, mobile users receive signals from the radio cells and can connect
  to the cellular networks, through which they access the service. Here, we
  assume also mobile peers, like the provider, to be honest but curious.
  This means that they can try to eavesdrop a communication but do not
  attempt to either drop or maliciously modify it. Figure 3 illustrates the
  communications between the different parties in the hybrid network.


                                       85
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


                   Fig. 3. Hybrid Network Communications


  3     A Multi-Path Communication for Network
        k-Anonymity

  We describe of our solution based on network k-anonymity by showing:
  i) how a k-anonymous request is generated and transmitted by a mobile
  user to the service provider through the cellular network and ii) how the
  service provider crafts a reply that can be received and decoded only by
  the requester concealed from the other k-1 users. Before going into details
  of the solution, we discuss our privacy goals and challenges.


  3.1    Privacy Requirements and Challenges

  In hybrid mobile networks, users privacy is at risk and is affected by
  several threats. In the last few years, the definition of privacy solutions
  was geared towards the privacy of the users, sacrificing the need for ac-
  countability. Thus, an important requirement, often neglected by mobile
  privacy solutions, is the necessity for mechanisms to make the users ac-
  countable for their operations. Many anonymization techniques in fact can
  be abused or lack economic incentives due to the lack of user accountabil-
  ity [4]. Service providers are often reluctant to adopt privacy solutions
  that completely hide the users and do not enable any form of account-
  ability. Another challenge driving our work is the current implicit trust
  on mobile network operators. We believe that mobile network operators
  should be treated as untrusted parties with respect to confidentiality.


                                       86
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


      These challenges result in the definition of two-level privacy require-
  ments. Two-level means that different kinds of privacy protection have to
  be guaranteed at: 1) the mobile network level (anonymous communica-
  tion) and 2) the service level (location hiding).

   – Anonymous communication. Each mobile user should communicate
     anonymously with the mobile network operator, possibly by masking
     its identity with the identities of other users joining the cellular net-
     work. At the same time, to preserve accountability, the requester’s
     identity should be known to the service provider.
   – Location hiding. Each mobile user interacting with a service provider
     should be able to hide its current location, if not otherwise required
     by the service provider for the service release.3 This follows the prin-
     ciple of minimum disclosure, which states that service providers must
     require the least set of information needed for service provision. Con-
     versely, location of the users must be known to the mobile network
     operator to provide connection to the network.

      To conclude, an important requirement that any privacy solution
  should implement, is to provide a mechanism for expressing users’ privacy
  preferences that strikes a balance between usability and expressiveness.
  In our work, the users can still express their privacy preference in terms of
  the number k of users that should join the anonymity set. This is the only
  effort required to the users to protect their privacy, while the application
  of the privacy solution is completely transparent to them.


  3.2      Overview of the Approach

  The concept of k-anonymity has been originally defined in the context
  of databases [21]. Here, we introduce a solution based on the concept of
  network k-anonymity, first introduced in [24], which can be defined as
  follow.

  Definition 1 (Network k-anonymity). Let U be a set of users and M
  be a message originated by a mobile user u ∈ U . User u is said to have
  network k-anonymity, where k is the privacy preference of the user u, if
  the probability of associating u as the message originator is less than or
  equal to k1 .
   3
       Note that, our solution is however compatible with all previous works in the context
       of location privacy and anonymity.


                                             87
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


       We now describe the forward and reverse anonymous communications
  that compose our solution. The complete protocol is shown in Figure 4.
  Let us define u as the mobile user that submits the request and SP the
  service provider. SP and the cellular network are in business relationship
  and u is subscribed to the cellular network. Also, SP and u are assumed
  to be in a producer-consumer relationship and to share a common secret
  key s that is generated through a Diffie-Hellman key exchange protocol.
  Each message M between a user and a service provider is encrypted thus
  protecting confidentiality and integrity of the message through symmetric
  encryption (e.g., 3DES, AES). Es (M ) denotes a message M encrypted
  with symmetric key s. Also, a cryptographic message authentication code
  (i.e., MACs (M )) is calculated on the message M using s. SP is finally
  responsible for filtering of the requests.

  Anonymous Request. The anonymous request process is initiated by
  a mobile user u, which wishes to access a service provided by service
  provider SP. No overhead is given to u in the management of the mobile
  and anonymous process; u needs only to specify her privacy preference
  k. First, MACs (M ) is calculated; then M is split in k data flows produc-
  ing the set DS ={m1 , m2 , . . . , mk }.4 The resulting packets are distributed
  among the neighbor mobile peers (peers for short) in the mobile ad-hoc
  network. Different algorithms, ranging from the ones based on network
  state to the ones based on peer reputation, can be implemented for dis-
  tributing packets among peers. Here, we use a simple approach which
  consists in randomly forwarding the packets to the peers in u’s commu-
  nication range.
      The distribution algorithm is illustrated in Figure 5(a) and
  works as follows. The requester u encrypts each packet in DS using
  the symmetric key s shared between u and SP, and then appends
  MACs (M ) in plaintext to each encrypted packet, that is, Es (DS ) =
  {[Es (m1 )kMACs (M )], [Es (m2 )kMACs (M )], . . . , [Es (mk )kMACs (M )]}.
  The presence of the MAC information in every packet allows mobile
  peers to distinguish between packets belonging to the same message
  M . Requester u then randomly picks up one of the encrypted packets
  [Es (mj )kMACs (M )] ∈ Es (DS ) for sending it to the SP, and randomly
  selects k − 1 peers in the communication range. Each selected peer
  receives a packet [Es (mi )kMACs (M )] ∈ Es (DS ) and uses a decision
  forwarding function (dff ) to manage it. Function dff is defined as follow.
   4
       For the sake of clarity, in the following, we use the term “packet” to identify a data
       flow of any dimension.


                                              88
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


  Protocol 1 Anonymous communication protocol

  Initiator: Requester u
  Involved Parties: Mobile peers PEERS, Mobile network operator MNO, Service provider SP
  Variables: Original message M , Response message Mr , Secret key s shared between u and SP

  INITIATOR (u) u.1 Define message M and privacy preference k.
                u.2 Generate MACs (M ) and DS = {m1 , m2 , . . . , mk }.
                u.3 Encrypt packets in DS and append MACs (M ) to them,
                    Es (DS) ={[Es (m1 )kMACs (M )], . . . , [Es (mk )kMACs (M )]}.
                u.4 Select a random packet [Es (mj )kMACs (M )] ∈ Es (DS).
                u.5 Select a set of k-1 peers {p1 , . . . , pk−1 } ∈PEERS.
                u.6 Send to each pi ∈ {p1 , . . . , pk−1 } a packet
                    [Es (mi )kMACs (M )] ∈ Es (DS).
                u.7 Send [Es (mj )kMACs (M )] to the MNO.
                u.8 Receive Es (Mr ) from the MNO (Step M.3) and decrypt it.

  PEERS             P.1 Receive a packet [Es (mi )kMACs (M )] ∈ Es (DS) (Step u.6).
                    P.2 Apply decision forwarding function (dff ).
                    P.3 Send [Es (mi )kMACs (M )] ∈ Es (DS) to the MNO or forward it to
                        another peer.
                    P.4 Receive Es (Mr ) from the MNO (Step M.3) and delete it.

  MNO               M.1 Receive packets (Steps u.7 and P.3).
                    M.2 Forward packets to the SP.
                    M.3 Receive Es (Mr ) from the SP (Step S.4) and forward it to u and
                        PEERS.

  SP                S.1 Receive packets from the MNO (Step M.2).
                    S.2 Decrypt the packets and assemble M .
                    S.3 Generate and encrypt the response message Es (Mr ).
                    S.4 Send Es (Mr ) to u and PEERS through the MNO.


                   Fig. 4. Anonymous communication protocol


                                        (
                                         1         if count(MACs (M )) = 1
          df f ([Es (mi )kMACs (M )]) =
                                         0         otherwise.

      where dff =1 means that the peer under examination has already
  agreed to send a packet belonging to message M (i.e., with MACs (M )).
  If dff =1 the peer forwards the received packet mi to some other peers.
  Otherwise, if dff =0 the peer randomly selects with probability pf = 21
  either to send the packet to the SP (white circles in Figure 5(a)) or to
  forward it to a peer in the communication range (black circles in Figure
  5(a)).


                                            89
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


  Example 1. Figure 5(a) shows an example of the distribution algorithm.
  The requester u defines k = 5 and splits the message M in five parts
  {m1 , . . . , m5 }. Packets are then encrypted with the symmetric key s
  shared between u and SP, and MACs (M ) is attached to each of them.5
  The requester u selects packet m3 to be sent directly to the SP and for-
  wards the other k-1 packets to peers in the communication range. Specif-
  ically, packets m2 and m5 are forwarded to peers p1 and p3 which send
  them to the SP. Packet m1 instead takes a forwarded path p4 → p7 ,
  assuming p4 does not accept to send m1 . Finally, packet m4 takes a for-
  warded path p6 → p7 → p9 because when the packet is received by p7 , p7
  notices that she has already accepted a packet with the same MACs (M )
  (i.e., m1 ) and then automatically forwards m4 to p9 .

     After packets distribution, each selected peer independently sends the
  packet to the SP, through the mobile network operator. The mobile net-
  work operator then sees packets that comes from k different users. This
  scenario results in the following proposition.

  Proposition 1. A user is k-anonymous to the mobile network operator
  if and only if at least k packets of the same message are sent to the mobile
  network operator by k different peers (including the requester).

     The mobile network operator forwards the k received packets to the
  SP hiding by default location information. Now, the SP can decrypt each
  packet, reconstruct the original message, and satisfy the user request. A
  summary of the overall anonymous request process is provided in Figure
  5(a).

  Anonymous Response. After the conclusion of the anonymous request
  process, the SP retrieves the original message M and starts the service
  provisioning, which results in the release of an anonymous response to the
  requester u. The communication involves the mobile network operator to
  manage peers mobility and route the response to the user u, and must
  preserve the preference k of the requester.
     The anonymous response process works as follow. First of all, as
  showed in Figure 5(b), the service provider encrypts the response mes-
  sage Mr with the secret key s shared with u. Then the SP transmits the
  encrypted message Es (Mr ) to the k peers involved in the anonymization
  process. SP relies on the cellular network to manage the message delivery
  and the mobility of the peers. Although all peers receive the message, the
   5
       For the sake of clarity, we omit MACs (M ) in the figure.


                                             90
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


                        (a)                                               (b)

  Fig. 5. Example of anonymous request (a) and anonymous response (b)


  requester u is the only peer with the secret key s, and thus, she is the
  only one able to decrypt the message and benefit of the service.6

  Example 2. Figure 5(b) shows an example of anonymous response. En-
  crypted message Es (Mr ) is transmitted to all peers used in Example 1,
  that is, {u, p1 , p3 , p7 , p9 }. As soon as the message is received by u, it is
  decrypted. The other peers delete message Es (Mr ), since they are not
  able to open it.

      Recalling the requirements and challenges in Section 3.1, our solution
  provides both anonymous communication and location hiding. In terms
  of anonymous communication, we employ a message splitting and multi-
  path solution that provides k-anonymity against mobile network opera-
  tors. Considering location hiding, the location information of the users
  is hidden by the cellular network to the service providers. Finally, our
  solution provides requester accountability, since the requester’s identity
  is released to the service provider.
      It is important to note that our solution does not require changes to
  existing network protocols. All the packets in fact are routed regularly
  through the hybrid network using TCP and reconstructed at the destina-
  tion service provider. Only some small changes are requested for specific
   6
       To further strengthen our protocol, the service provider could potentially generate
       k − 1 decoy messages, other than Mr . This can be performed by adding a nonce
       to the original message Mr before encrypting it with the secret key s. The cellular
       network sees k different response messages and it is not able to associate the response
       to the request.


                                              91
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


  applications on the top of existing layers, as for instance, the message
  splitting done by the requester u and the packet checks on the mobile
  ad-hoc network done by the peers.

  4     Some Notes on Performance
  As a first step, we were interested in quantifying the impact of our ap-
  proach on the end-to-end communications. Although, this aspect is less
  significant for database and informational services, it is highly critical for
  real-time streaming services including video and live operators. Hence,
  we implemented a prototype of our approach using WiFi-enabled devices
  and measured the latency overhead when we forward packets to one-hop
  and two-hop neighbors using WiFi. We describe the testing scenario in
  Section 4.1 and discuss the performance analysis in Section 4.2.

  4.1    Testing Scenario Implementation
  We deployed a small-scale testbed using standard IEEE 802.11 commu-
  nications. We generated two scenarios depicted in Figure 6.
      The first scenario (Figure 6(a)) considers baseline measurements in
  latency of one hop between a wireless client and the target system. Here,
  a device is associated directly with a Wireless Access Point (WAP); we
  varied the distance from the client to the WAP. For all practical purposes,
  the WAP was acting as the one-hop neighbor that forwards the packets
  to the cellular network.
      The second scenario (Figure 6(b)) considers measurements of latency
  in a two-hop scenario. Here, a device is configured as an ad-hoc server
  on Wireless Adapter #1 (WA1), and with Windows’ Internet Connection
  Sharing (ICS) enabled on Wireless Adapter #2 (WA2), for WA1’s traffic.
  The wireless client is then connected through the ad-hoc server and the
  WAP to the target system. As in the one-hop scenario, no modification is
  needed at the WAP. To better simulate a real world scenario, the ad-hoc
  server has been placed in various locations and distances from the WAP.
  However, as expected and confirmed by our result, the closer the two
  systems are to each other, the less latency is observed. Additionally, any
  implementation in which we have more than one ad-hoc networks should
  utilize orthogonal channels while broadcasting in the same spectrum, to
  minimize the interference.
      The measurements for both the infrastructure and ad-hoc connections
  have been taken at approximately the same points. This mitigates vari-
  ables that might affect WiFi connectivity, such as amount of interference


                                       92
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


                   (a)                                     (b)

  Fig. 6. Network Architecture for One-Hop (a) and Two-Hop (b) Sce-
  narios


  from other access points, construction of the building, and obstructions,
  and seeks to only vary distance/Signal-to-Noise Ratio (SNR).


  4.2   Performance Analysis and Discussion

  Initially, we measured the Round Trip Time (RTT) of each packet. In
  addition, we employed NetStumbler [17] to measure the Signal-to-Noise
  Ratio (SNR), and Wireshark [25] to verify: i) packets sent to and received
  by each device (i.e., the wireless client, ad-hoc server, WAP, and target
  systems), and ii) that the client communication remained anonymous as
  all packets seemed to originate from the last hop (in this case the WAP).
      Table 1 shows our preliminary results. In particular, due to the wire-
  less transmission, we observe a wide variation in latency, mainly due to
  interference and physical obstacles. Table 1 gives also the average RTT
  values from which all graphs are evaluated (each value is calculated over
  more than 25 measurements collected).
      The results in Table 1 indicate that there is no significant latency
  overhead when the SNR is within acceptable bounds. However, the ad-
  hoc connection becomes much less reliable in weak areas. Figure 7 depicts


                                       93
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks

  Table 1. Maximum, Minimum and Average RTT Values, and Packet
  Loss Percentages: (a) one-hop WiFi connection, and (b) two-hop WiFi
  connection
                      (a)                               (b)
          SNR    RTT (ms)    Packet         SNR    RTT (ms)    Packet
                Min Max Avg Loss (%)              Min Max Avg Loss (%)
           14    -    -   -   100            14    -    -  -    100
           19    3   52   7     0            23    -    -  -    100
           25    1   28   4     0            31    1    9  3      4
           28    1  188 63      0            33    1  245 63      0
           33    1   47   3     0            35    1   13  1      0
           48    1   33   4     0            47    1   44  5      0
           55    1   97  23     0            51    1   55  6      3
           64    1    8   3     0            66    3  104 19      0


  scattergraphs of the data sets, with the anomalous peaks representing
  inconsistencies due to physical obstacles. Figure 7 confirms that peaks
  in latency due to physical obstacles occur at the same location for all
  WiFi connections. This is not a measurement inconsistency but rather a
  verification of the jittery nature of the wireless communications in which
  physical obstacles affect the transmission even when the distance or the
  SNR reported by the device remains constant. Moreover, we believe that
  the RTT measurements are more immediate than the SNR reported by
  the device which is measured over a period of time. That is why we see
  this discrepancy of having a good SNR but degraded RTT measurements.
      In conclusion, ad-hoc WiFi connections do not seem to suffer much
  of a performance hit in adding an intermediary node since almost all of
  our measurements stayed below 5ms of round trip time (or 2.5ms single
  trip). This allows to safely claim that our system is both deployable and
  practical even for latency-sensitive applications such as video or voice
  streaming. However, we must acknowledge that the signal seems unreli-
  able in degraded SNR, so that we might consider using another node with
  better connectivity. Nodes acting as ad-hoc servers with lower power and
  bandwidth, such as cellphones instead of laptops, would incur in a per-
  formance loss, which may present itself in the form of packet loss and
  intermittent connectivity, such as was observed in the ad-hoc connection
  as SNR worsened. While waiting for a ping response, the client node was
  seen to hang for long periods before announcing an error. This is an is-
  sue of QoS because, for example, a page that would attempt to load for
  some time before displaying an error, or a call that would suspend for
  some time before finally disconnecting. In using a MultiNet-like technol-


                                       94
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


                               70
                                                                                              One Hop
                               65                                                             Two Hop

                               60
                               55
        Round Trip Time (ms)


                               50
                               45
                               40
                               35
                               30
                               25
                               20
                               15
                               10
                                5
                                0
                                    0   5   10   15   20   25   30   35   40   45   50   55     60      65   70
                                                      Signal-to-Noise Ratio (SRN)


  Fig. 7. Comparison Between Graph of SNR vs. RTT (ms) in One-Hop Scenario (in
  red) and in Two-Hop Scenario (in green). Notice that there is a shift in the graph to
  the right for two-hops indicating an increase in SNR from the extra hop. Also, there
  are two symmetric peaks indicating a loss of packets due to physical obstacles.


  ogy [23], we could switch between connections, further anonymizing the
  packet stream. In general, there are a lot of open questions regarding the
  performance of the entire system, especially under an adversarial model
  where some of the peers are uncooperative or even malicious.

  5   Related Work
  While mobile networks and their management have been considered in
  several works in the area of mobile applications, approaches aimed at
  protecting the privacy of users have gained great relevance only in the last
  few years. Recent research in the context of mobile networks approached
  the privacy problem from different perspectives and have been inspired
  by works on fully anonymous communications [6, 7, 19].

  Anonymous Communications. Chaum introduces the concept of
  “Mix” to provide source anonymity [6]. A mix collects a number of mes-


                                                                95
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


  sages from different sources, shuffles them, and forwards them to the
  next destination in a random order. This solution makes the tracking of
  a message difficult for the attackers. In mix-based solution, the path is
  statically determined by the message sender. Onion routing is a solution
  that built on the notion of mix network [8]. In onion routing, the con-
  nection initiator creates an onion and the path of the connection through
  the network. Each router (named onion router) in the path knows its
  successor and can remove a layer of encryption to the onion with its pri-
  vate key. At the end, the data reach the final destination in plaintext.
  For instance, TOR [8] is an onion routing-based solution that provides
  route anonymity, by preventing adversaries from following packets from
  the source to the destination and vice versa. Crowds [19] is an anonymiz-
  ing solution designed for Web-communications where the routing path
  and length is dynamically generated. The paths is determined randomly
  by the machines used in the communications.
      An important characteristic shared by the above solutions that makes
  them not applicable in a mobile scenario like ours is that they use the
  path generated by the sender for both the request and the response.
  This assumption cannot be applied in mobile networks where users are
  moving fast over time and then the path used for the request is likely
  to be not available both for the response. Also, onion routing solutions
  are different from our approach because, each onion proxy is required to
  know the network topology and public certificates of routing nodes to cre-
  ate meaningful routes. Finally, Crowds focuses on protecting the sender’s
  anonymity against the service providers and cannot protect anonymity
  against a global eavesdropper. Our approach, instead, exploits the hy-
  brid nature of the devices to create a local network which is impervious
  against global eavesdroppers that operate in the cellular network (e.g.,
  mobile network operators). Since the WiFi network is ad-hoc and of lim-
  ited range, it is very difficult to have a global eavesdropper that would
  cover both the WiFi and cellular communications.


  Anonymous Mobile Ad-Hoc Routing Protocols. Another line of re-
  search has focused on preserving the privacy of wireless traffic by studying
  and providing privacy-enhanced and anonymous routing protocols. Orig-
  inally, the proposed mobile ad-hoc routing protocols, such as AODV [18]
  and DSR [14], were not designed to provide or guarantee privacy and
  route anonymity but rather they were aimed at increasing network per-
  formance, efficiency, security and reliability. As a consequence, in such
  protocols, there are many ways to compromise privacy; for instance, by


                                       96
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


  abusing the protocol state since both source and destination together with
  hop-count are stored on each node. Subsequent work focused on routing
  protocols for mobile ad-hoc networks and attempted to protect anonymity
  and privacy. They did so by keeping secret the identities of the senders and
  recipients of messages from intermediate nodes. A number of anonymous
  routing protocols have been proposed [5, 15, 26, 27, 29, 30]. Among them,
  MASK [30] proposes an anonymous on demand routing protocol, which
  provides both MAC-layer and network-layer communications without the
  need of releasing real identities of the participating nodes. ANODR [15]
  provides route anonymity, by preventing adversaries from following pack-
  ets to its source or destination, and location privacy, by preventing the
  adversary to discover the real identities of local transmitters. Discount-
  ANODR [27] limits the overhead introduced by ANODR in providing
  source anonymity and routing privacy. It provides a lightweight protocol
  based on symmetric key encryption and onion routing. Capkun et al. [20]
  provide a scheme for secure and privacy-preserving communication in hy-
  brid ad-hoc networks. Their scheme provides the users with a means to
  communicate in a secure environment and preserve their anonymity and
  location privacy. Although our solution has similar goals and considers
  privacy issues in hybrid mobile networks, it is not aimed at providing a
  new routing protocol. Our k-anonymity solution using a multi-path com-
  munication paradigm provides privacy of the requester from the neigh-
  bors sharing the media, the mobile network operators, and the service
  providers. Also our solution does not heavily rely on key encryption, dy-
  namic keys or pseudonyms; rather, it exploits the possibility of breaking
  a single data stream in several different packets, and of using neighbor
  mobile peers, which act on behalf of the request originator, to distribute
  these packets.


  Location k-Anonymity. More recently, another line of research has
  focused on protecting the location privacy and anonymity of users that
  interact with Location-Based Services (LBSs) [1, 2]. The main goal of
  most of the current solutions [16] is to protect users’ identities associated
  with or inferred from location information. In this case, the best possible
  location measurement can be provided to other entities but users identity
  must be kept hidden. In particular, these solutions are based on the no-
  tion of k-anonymity in data [21], which is aimed at making an individual
  (i.e., her identity or personal information) not identifiable by releasing a
  geographical area containing at least k-1 users other than the requester.
  In this way, the request cannot be associated to fewer than k respon-


                                       97
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


  dents and the identity of the users is not released to the LBSs. Bettini et
  al. [3] propose a framework for evaluating the risk of disseminating sensi-
  tive location-based information, and introduce a technique aimed at sup-
  porting k-anonymity. Gruteser and Grunwald [12] propose a middleware
  architecture and an adaptive algorithm to adjust location information
  resolution, in spatial or temporal dimensions, to comply with a specific
  k-anonymity requirement. Gedik and Liu [10] describe a k-anonymity
  model and define a message perturbation engine responsible for provid-
  ing location anonymization of user’s requests through identity removal
  and spatio-temporal obfuscation of location information. Ghinita et al.
  propose PRIVÈ [11], a decentralized architecture for preserving query
  anonymization, which is based on the definition of k -anonymous areas
  obtained exploiting the Hilbert space-filling curve. Hashem and Kulik [13]
  provide a decentralized approach to location privacy in a wireless ad-hoc
  network, where each peer is responsible for generating its cloaked area
  by communicating with others peers, thus providing anonymity. Existing
  works on location k-anonymity have the following main disadvantages:
  i) they rely on either a centralized middleware for providing anonymity
  functionalities (centralized approach) or let the burden of the complex-
  ity in calculating the k-anonymous area to the users (decentralized ap-
  proach); ii) they assume trusted mobile network operators; iii) they do
  not support accountability. In our approach, we protect the privacy of
  the users acting in a hybrid network including cellular networks. Here, we
  assume untrusted mobile network operators, which could track users ac-
  tivities [28], and we provide location k-anonymity at network level rather
  than at application level.


  6   Conclusions and Future Work

  We presented a novel privacy-preserving scheme based on network k-
  anonymity and multi-path that aims at balancing privacy and account-
  ability without assuming any trusted entity between the user and the
  service provider. Furthermore, we put forward the idea that a reliable
  privacy solution should protect users against threats stemming from hon-
  est but curious mobile network operators. Our vision is then to re-cast
  privacy for hybrid networks and provide a privacy-assurance mechanism
  based on network k-anonymity that: i) protects users’ privacy against
  honest but curious mobile network operators; ii) conceal or obfuscate
  the users location to service providers, iii) enforces user and service ac-
  countability. Note that, our solution can be integrated with obfuscation


                                       98
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


  techniques, as the one in [2], to protect the location privacy of the users
  interacting with LBSs.
      Many interesting research directions that warrant further investiga-
  tion, among which: the enhancement of the decision forwarding algo-
  rithms for guaranteeing reliability and efficiency; the consideration of a
  comprehensive threat model including malicious and uncooperative peers;
  the complete implementation and extensive testing of our prototype; the
  consideration of economic incentives for the neighbor peers to participate
  in our anonymizing network.

  Acknowledgments

  This work was supported in part by the EU, within the 7th Frame-
  work Programme (FP7/2007-2013) under grant agreement no. 216483
  “PrimeLife” and by the Italian MIUR, within PRIN 2006, under project
  no. 2006099978 “Basi di dati crittografate”.

  References
   1. C.A. Ardagna, M. Cremonini, E. Damiani, S. De Capitani di Vimercati, and
      P. Samarati. Supporting location-based conditions in access control policies. In
      Proc. of the ACM Symposium on Information, Computer and Communications
      Security (ASIACCS’06), Taipei, Taiwan, March 2006.
   2. C.A. Ardagna, M. Cremonini, E. Damiani, S. De Capitani di Vimercati, and
      S. Samarati. Location privacy protection through obfuscation-based techniques.
      In Proc. of the 21st Annual IFIP WG 11.3 Working Conference on Data and
      Applications Security, Redondo Beach, CA, USA, July 2007.
   3. C. Bettini, X.S. Wang, and S. Jajodia. Protecting privacy against location-based
      personal identification. In Proc. of the 2nd VLDB Workshop on Secure Data
      Management, Trondheim, Norway, 2005.
   4. Nikita Borisov, George Danezis, Prateek Mittal, and Parisa Tabriz. Denial of
      service or denial of security? In CCS ’07: Proceedings of the 14th ACM conference
      on Computer and communications security, pages 92–102, New York, NY, USA,
      2007. ACM.
   5. A. Boukerche, K. El-Khatib, L. Xu, and L. Korba. Sdar: A secure distributed
      anonymous routing protocol for wireless andmobile ad hoc networks. In Proc.
      of the 29th Annual IEEE International Conference on Local Computer Networks
      (LCN 2004), pages 618–624, Tampa, FL, USA, October 2004.
   6. D. Chaum. Untraceable electronic mail, return addresses, and digital pseudonyms.
      Communications of the ACM, 24(2):84–88, 1981.
   7. D. Chaum. The dining cryptographers problem: Unconditional sender and recipient
      untraceability. Journal of Cryptology, 1(1):65–75, 1988.
   8. R. Dingledine, N. Mathewson, and P. Syverson. Tor: The Second-Generation Onion
      Router. In Proceedings of the 13th USENIX Security Symposium, pages 303–319,
      August 2004.


                                          99
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


   9. T. Fujiwara and T. Watanabe. An ad hoc networking scheme in hybrid networks
      for emergency communications. Ad Hoc Networks, 3(5):607–620, 2005.
  10. B. Gedik and L. Liu. Protecting location privacy with personalized k-anonymity:
      Architecture and algorithms. IEEE Transactions on Mobile Computing, 7(1):1–18,
      January 2008.
  11. G. Ghinita, P. Kalnis, and S. Skiadopoulos. Privè: Anonymous location-based
      queries in distributed mobile systems. In Proc. of the International World Wide
      Web Conference (WWW 2007), Banff, Canada, May 2007.
  12. M. Gruteser and D. Grunwald. Anonymous usage of location-based services
      through spatial and temporal cloaking. In Proc. of the 1st International Conference
      on Mobile Systems, Applications, and Services (MobiSys 2003), San Francisco, CA,
      USA, May 2003.
  13. T. Hashem and L. Kulik. Safeguarding location privacy in wireless ad-hoc net-
      works. In Proc. of the 9th International Conference on Ubiquitous Computing
      (UbiComp 2007), Innsbruck, Austria, September 2007.
  14. D. B. Johnson and D. A. Maltz. Dynamic Source Routing in Ad Hoc Wireless
      Networks, volume 353. Kluwer Academic Publishers, 1996.
  15. J. Kong and X. Hong. ANODR: Anonymous on demand routing with untraceable
      routes for mobile ad-hoc networks. In Proc. of the 4th ACM International Sym-
      posium on Mobile Ad Hoc Networking and Computing (MOBIHOC 2003), pages
      291–302, Annapolis, MD, USA, June 2003.
  16. S. Mascetti and C. Bettini. A comparison of spatial generalization algorithms for
      lbs privacy preservation. In Proc. of the 1st International Workshop on Privacy-
      Aware Location-based Mobile Services (PALMS 2007). IEEE Computer Society,
      2007.
  17. NetStumbler.com. http://www.netstumbler.com/.
  18. C.E. Perkins and E.M. Royer. Ad-hoc on demand distance vector routing. In
      Proc. of the 2nd IEEE Workshop on Mobile Computing Systems and Applications
      (WMCSA99), New Orleans, LA, USA, February 1999.
  19. M.K. Reiter and A.D. Rubin. Crowds: Anonymity for web transactions. ACM
      Transactions on Information and System Security, 1(1):66–92, 1998.
  20. J.-P. Hubaux S. Capkun and M. Jakobsson. Secure and Privacy-Preserving Com-
      munication in Hybrid Ad Hoc Networks, January 2004.
  21. P. Samarati. Protecting respondents’ identities in microdata release. IEEE Trans-
      actions on Knowledge and Data Engineering, 13(6):1010–1027, 2001.
  22. Sphinx - A Hybrid Network Model for Next Generation Wireless Systems. http:
      //www.ece.gatech.edu/research/GNAN/work/sphinx/sphinx.html.
  23. Angelos Stavrou and Angelos D. Keromytis. Countering dos attacks with stateless
      multipath overlays. In CCS ’05: Proceedings of the 12th ACM conference on Com-
      puter and communications security, pages 249–259, New York, NY, USA, 2005.
      ACM.
  24. L. von Ahn, A. Bortz, and N.J. Hopper. k-anonymous message transmission. In
      Proc. of the 10th ACM Conference on Computer and Communication Security
      (CCS 2003), pages 122–130, Washingtion, DC, USA, October 2003.
  25. Wireshark. http://www.wireshark.org/.
  26. X. Wu and B. Bhargava. Ao2p: Ad hoc on-demand position-based private routing
      protocol. IEEE Transaction on Mobile Computing, 4(4), July/August 2005.
  27. L. Yang, M. Jakobsson, and S. Wetzel. Discount anonymous on demand routing for
      mobile ad hoc networks. In Proc. of the Second International Conference on Secu-
      rity and Privacy in Communication Networks (SECURECOMM 2006), Baltimore,
      MD, USA, August-September 2006.


                                          100
A Multi-Path Approach for k-Anonymity in Mobile Hybrid Networks


  28. C. Zander.     ’CIA CELL TOWER’ Monitors Local Internet Users’ Wire-
      less Transmissions, September 2007. http://www.send2press.com/newswire/
      2007-09-0911-003.shtml.
  29. Y. Zhang, W. Liu, and W. Lou. Anonymous communication in mobile ad hoc net-
      works. In Proc. of the 24th Annual Joint Conference of the IEEE Communication
      Society (INFOCOM 2005), Miami, FL, USA, March 2005.
  30. Y. Zhang, W. Liu, W. Lou, and Y. Fang. Mask: Anonymous on-demand routing
      in mobile ad hoc networks. IEEE Transaction on Wireless Communications, 5(9),
      September 2006.


                                       101
    User Privacy in Transport Systems Based on
                  RFID E-Tickets

      Ahmad-Reza Sadeghi1 , Ivan Visconti2 , and Christian Wachsmann1
                               1
                              Ruhr-University Bochum
               Horst-Görtz Institute for IT-Security (HGI), Germany
              {ahmad.sadeghi,christian.wachsmann}@trust.rub.de
                    2
                        Dipartimento di Informatica ed Applicazioni
                               University of Salerno, Italy
                                visconti@dia.unisa.it


      Abstract. Recently, operators of public transportation in many coun-
      tries started to roll out electronic tickets (e-tickets). E-tickets offer several
      advantages to transit enterprises as well as to their customers, e.g., they
      aggravate forgeries by cryptographic means whereas customers benefit
      from fast and convenient verification of tickets or replacement of lost
      ones.
      Existing (proprietary) e-ticket systems deployed in practice are mainly
      based on RFID technologies where RFID tags prove authorization by re-
      leasing spatio-temporal data that discloses customer-related data, in par-
      ticular their location. Moreover, available literature on privacy-preserving
      RFID-based protocols lack practicability for real world scenarios.
      In this paper, we discuss appropriate security and privacy requirements
      for e-tickets and point out the shortcomings of existing proposals. We
      then propose solutions for practical privacy-preserving e-tickets based
      on known cryptographic techniques and RFID technology.

      Key words: Location Privacy, E-Tickets, RFID


1   Introduction
Electronic tickets (e-tickets) gain increasing popularity among operators of pub-
lic transit networks. However, besides offering many advantages, e-tickets also
introduce several risks, in particular concerning privacy of their users.

Benefits of e-tickets. Transit enterprises benefit from e-tickets in various ways:
First, e-tickets help to decrease maintenance costs. Second, the number of fare
dodgers is expected to decrease if tickets can be verified efficiently. Moreover,
cryptographic means help to aggravate the problem of ticket forgery.
    From the user perspective, e-tickets allow for faster and more convenient
verification. Moreover, an e-ticket system can automatically select the lowest
fare, which saves the customer’s time and money. Finally, revocation of e-tickets
enables transit enterprises to replace lost tickets, which is not possible for con-
ventional paper-based ticket systems.


                                           102
User Privacy in Transport Systems Based on RFID E-Tickets

  Threats. Besides their advantages, e-tickets also introduce several risks, in par-
  ticular regarding the privacy of users. Since authentication of transit tickets
  typically involves spatio-temporal data, users are at risk to loose their privacy
  if this information is leaked to unauthorized parties. This means that e-tickets
  should ensure that no information on users (confidentiality) or their movements
  (location privacy) should be revealed to entities that are not trusted by the
  users. There are existing implementations of e-tickets that allow the creation
  of movement profiles and, in some cases, even disclose personal information of
  users (cf. Section 3). Moreover, since e-tickets contain digital data, they may
  be easily copied (cloning). Additionally, the corresponding protocols to issue or
  verify e-tickets may be subject to different attacks (e.g., man-in-the middle or
  replay).

  Current situation. Currently, there is a vast amount of existing proprietary
  solutions for e-tickets. Since the corresponding specifications are usually not
  publicly accessible, there is no publicly known solution in practice that explicitly
  considers the privacy of users. We stress that user privacy preservation has not
  been claimed among the features of such systems.
      The preferred technology to implement electronic transit tickets is Radio Fre-
  quency IDentification (RFID), which enables fully automated wireless identifica-
  tion of objects. A typical RFID system consists of transponders and transceivers.
  The main component of a RFID system is the transponder, which consists of an
  integrated circuit that is connected to an antenna. Typically, transponders are
  integrated into plastic cards or stickers that can be attached to the object to be
  identified and thus are often called tags. Since transceivers are mainly used to
  read data from tags, they are called readers. RFID tags can be used to realize
  e-tickets that are issued and verified by readers. Thus, in the rest of this paper
  “e-ticket” refers to tickets based on RFID.

  Related work. There is a large body of literature on different approaches to real-
  ize privacy-preserving mechanisms for RFID (e.g., [17,16,2,31,14,20,22,9,18,25]).
  However, as pointed out in Section 3, most of these solutions are not applica-
  ble to e-tickets since each of them lacks some important security and functional
  requirements, as usability, security and privacy.
      In [15], the authors motivate research for privacy in the context of e-tickets
  and provide a rough description of how anonymous credential [6] and e-cash [5]
  systems may be used to implement an anonymous payment system for public
  transit. However, they assume that devices realizing tickets can perform compu-
  tationally demanding protocols (i.e., use public-key cryptography and intensive
  interaction), which is not a reasonable assumption for currently available cheap
  RF tokens. Since RFID tags are devices with very limited capabilities, one has
  to provide an acceptable level of privacy still preserving usability.
      Summing up, an e-ticket system is an authentication scheme that involves
  spatio-temporal information and the design and secure implementation of a
  privacy-preserving and usable system based on RFID, is currently an interesting
  open problem.


                                         103
User Privacy in Transport Systems Based on RFID E-Tickets

  Our contribution. In this paper we study the levels of privacy that could be
  achieved in an e-ticket powered system. We point out the weaknesses of known
  solutions and explore how known cryptographic tools can be applied to realize
  anonymization of e-tickets with currently available RFID technology, while hav-
  ing the goal to obtain a usable system that ensures no information disclosure on
  the user or his location to entities that are not trusted by the user.

  Structure of the paper. In Section 2, we demonstrate the problems related to
  e-tickets by introducing the setting of electronic transit tickets, and define ap-
  propriate security requirements. In Section 3, we analyze several proposals from
  literature on how to realize anonymity for RF-tokens with limited capabilities
  and discuss their applicability to e-tickets. Section 4 describes how recent crypto-
  graphic tools can be applied in order to achieve the desired requirements. Finally,
  we conclude with Section 5 by describing some open problems and motivating
  further research.


  2        Scenario of Electronic Transit Tickets
  To introduce the problems related to e-tickets for public transportation, we first
  give a short overview of the general application scenario and point out potential
  weaknesses.

  2.1        General Application Scenario


           issuer I                     station X       transit network        station Y
                                       veriﬁer Vin             veriﬁer V      veriﬁer Vout
                                         entrance              inspectors        exit of
                                        to transit                               transit
                                         network                                network


      1.   Request 2. Issue
                                3.   Prove             5.   Prove
                                                                                    7.   Prove

        token T                        token T                                token T                     token T
                                                                6.   Travel
            user U                       user U                                 user U                      user U

                         4.   Enter                                                          8.   Leave


                                             Fig. 1. General scenario for e-tickets.


     An e-ticket system as shown in Fig. 1 is a token-based authentication scheme
  whereas tickets are represented as tokens (e.g., RFID tags). It consists of at least


                                                                104
User Privacy in Transport Systems Based on RFID E-Tickets

  one token issuing entity (issuer ), a set of users, tokens, and verifiers who verify
  whether tokens are valid.
      Typically, a user U must buy a token from token issuer I. Therefore, U selects
  his desired ticket and pays it. Issuer I then checks whether U is eligible to obtain
  a token (e.g., whether U paid for the ticket), and, if applicable, issues a token
  T and passes it to U . From now on, U is able to use token T to prove that he
  is authorized to use the transit network. This means that every user who is in
  possession of a token that has been issued by a genuine issuer is considered to
  be an authorized user.
      Now assume that, as shown in Fig. 1, user U wants to travel from a place
  X to some location Y . Before U is allowed to enter the transit system at X, he
  must first prove to a verifier Vin at the entrance of the transit network that he is
  authorized to access it. If Vin can successfully verify the user’s token, U is allowed
  to enter. Otherwise access will be denied. During his trip, U may encounter
  arbitrary inspections where he must prove that he is authorized to use the transit
  network. Thus, a verifier V may check the user’s token T . If verification of T is
  successful, U is allowed to continue his trip. Otherwise, U must leave the transit
  network and may be punished for using it without authorization. After arriving
  at Y , the user’s token T can be checked for a last time. Again, if T cannot be
  verified successfully, U may be punished.
      Note that authentication is typically bound to some limitations. For instance,
  this may be some geographical or timely usage restrictions. Additionally, a token
  may be bound to the identity of its owner (i.e., the entity that bought the ticket).

  2.2   Potential Attacks
  Obviously, the main goal of a ticket system is to prevent ineligible users from
  using the transit system. Thus, the most prominent attack is to violate this goal.
  However, there are some other, subtle attacks which we are going to consider in
  the following.

  Impersonation. The most obvious attack against e-ticket systems is motivated
  by unauthorized entities. The adversary must obtain or simulate a token that
  is accepted by an honest verifier. To achieve this, the adversary may perform
  various attacks including man-in-the-middle or replay attacks against the under-
  lying authentication protocols, or he may attempt to create forged tokens, or to
  copy tokens of honest users.

  Tracing. A more subtle attack aims at obtaining information on users and their
  movements within the transit network. For instance, the transit enterprise may
  be interested in information on the behavior of its customers. When using con-
  ventional authentication protocols, a token can be easily identified during veri-
  fication. This enables verifiers to trace tokens within the transit network. More-
  over, if a user uses an identifying payment method (e.g., a credit card) to buy
  a token, the issuer can link the token to the identity of its owner. Since the
  issuer and the verifiers are typically under the control of the same entity (e.g.,


                                          105
User Privacy in Transport Systems Based on RFID E-Tickets

  the transit enterprise), this results in a complete loss of the user’s privacy. How-
  ever, in this case user information is managed by the transit enterprise that is
  a known entity. Thus it can be subject by law to commit on the honest use of
  the collected user data and can be monitored by means of inspections (similar
  observations hold for credit card companies).
      The concrete threat instead, comes from unknown adversaries. Tokens typi-
  cally are wireless devices and thus all their communication can be eavesdropped
  or manipulated by an adversary. Moreover the adversary may unnoticeably in-
  teract with tokens. As a consequence, the user’s token may also be traced by
  entities different from the verifiers or the token issuer.
      In summary, a primary goal is that the e-ticket system prevents disclosure of
  information on users or their movements to entities not trusted by the users.

  Denial-of-service attacks. Another type of adversary may want to harm (e.g.,
  to blackmail) the transit enterprise by preventing honest users from accessing
  the transit network. As already mentioned, tokens are wireless devices that can
  be attacked unnoticeably. This means that an adversary may try to exploit
  deficiencies of the protocols such that a ticket is no longer accepted by an honest
  verifier.
  Depending on the underlying business model, protocols for e-tickets must be
  carefully crafted to prevent some or all of these attacks. In Section 2.3, we
  introduce different reasonable trust and adversary models and set up a complete
  list of requirements for e-ticket systems in Section 2.4.

  2.3   Trust and Adversary Model
  In an ideal setting, no entity must be trusted. However, in practice, the transit
  enterprise must at least trust issuer I to only create tokens for eligible users.
  Moreover, the transit enterprise must trust each verifier V to only accept to-
  kens that have been issued by issuer I. These are reasonable assumptions since
  in practice, the token issuing entity and the verifiers are typically physically
  controlled by the transit enterprise.
      Ideally, users should be anonymous to every entity, including issuer I and
  all verifiers V . However, due to technical restrains this is not always feasible in
  practice. Thus, a reasonable trust model for a practical solution is that users must
  at least trust issuer I and, dependent on the implementation, also all verifiers V .
  However, a trust model which only requires issuer I to be trusted is preferable.
      To summarize, issuer I must trust all verifiers V . Moreover, all verifiers V
  must trust token issuer I. For users, there are three possible trust models:

  TM 1: User U must trust token issuer I and all verifiers V .
  TM 2: User U must only trust token issuer I.
  TM 3: User U needs not to trust anyone.

      TM 1 means that the e-ticket system must preserve privacy to all entities out-
  side the system. This is the trust model primarily used for the solution presented


                                         106
User Privacy in Transport Systems Based on RFID E-Tickets

  in Section 4. Considering TM 2, the e-ticket system must additionally protect the
  user’s privacy to the verifiers. The solution presented in Section 4 can achieve
  this by assuming each verifier V to be connected to a remote server or to be
  equipped with a security module that is controlled by issuer I. However, these
  hardware assumptions may be difficult to achieve in practice. To realize TM 3,
  the e-ticket scheme must provide full anonymity. As discussed in Section 1, this
  seems to be possible only with high computational and communication resources,
  which is inappropriate for low-cost RFID devices.
      It is also assumed that all communication that takes place during the pro-
  cess of creating a ticket cannot be eavesdropped or manipulated by an adversary.
  This is reasonable in practice since a user U may either use out-of-band com-
  munication or a secure channel to communicate to issuer I. However, following
  the traditional adversarial models, an adversary can eavesdrop all communica-
  tion of a token T . Moreover, an adversary may perform active attacks on the
  corresponding protocols, which means that he can interact with all parties on
  the protocol level. Additionally, an adversary can corrupt tokens and verifiers
  (though this can only happen for a limited number of tokens and verifiers). The
  adversary is not allowed to corrupt the token issuer.


  2.4   Requirement Analysis

  Authentication. As mentioned in Section 2.1, the most important security
  goal for transit enterprises is authentication. Thus no unauthorized user (i.e.,
  who is not in possession of a valid token) should be able to convince an honest
  verifier that he is authorized to access the transit system.
      Another major requirement for any token-based authentication scheme is the
  resilience to remote tampering with tokens, which would allow denial-of-service
  attacks.
      We summarize the security goals concerning authentication as follows:

  Authentication: Only valid tokens are accepted by honest verifiers.
  Unforgeability: Emulation and copying of valid tokens should be infeasible.
  Availability: Unauthorized altering of token data must be infeasible.


  Privacy. Since e-tickets enable efficient detection and identification of a huge
  number of tickets, a detailed dossier about user profiles (e.g., personal data or
  movements) can be created. The problem aggravates if tickets can be associated
  with the identity of their corresponding users since this results in a complete
  loss of user privacy.
      Thus, the security objectives concerning privacy are:

  Confidentiality: Unauthorized access to user-related data should be infeasible.
  Anonymity: Unauthorized identification of tokens should be infeasible.
  Location Privacy: Unauthorized tracing of tokens should be infeasible.


                                        107
User Privacy in Transport Systems Based on RFID E-Tickets

     A stronger notion of location privacy considers traceability of tokens in case
  the internal state (i.e., the secrets) of a token has been disclosed. To distinguish
  traceability in past or future protocol runs, [18] consider the notion of forward
  and backward traceability.

  Backward traceability: Accessing the current state of a token should not al-
     low to trace the token in previous protocol runs.
  Forward traceability: Accessing the state of a token should not allow to trace
     the token in future protocol runs.

  In addition to these security and privacy requirements it is important to consider
  functional requirements for a practical solution.

  Functional requirements. The costs per e-ticket should be minimal. There-
  fore, in case each ticket is implemented as a physical token (e.g., as RFID tag),
  the computational and storage requirements to the token should be as low as
  possible.
      Additionally, verification of tickets must be fast. For instance, it should be
  possible to verify an e-ticket while a user is walking by, or shortly holding his
  ticket near a verifying device (e.g., while entering a bus). Therefore, protocols
  for e-tickets must be designed carefully to minimize the amount of computation
  and communication that must be performed. Moreover, an e-ticket system must
  be able to handle a huge amount of tokens.
      Therefore, the functional requirements to e-tickets are:

  Efficiency: Verification of tokens must be fast.
  Scalability: The system should be able to handle a large amount of tokens.

  Depending on the underlying business case and the technological restraints a
  practical realization may not fulfill all of these requirements.


  3    Analysis of Existing Solutions
  Most e-ticket systems are proprietary solutions whose specifications are not pub-
  licly available. This section exemplary shows the most common approach of im-
  plementing authentication of e-tickets in practice by the Calypso e-ticket system
  [1,26], of which at least some information is public. Moreover, to the best of our
  knowledge, there is no solution for e-tickets in practice that explicitly considers
  privacy of users.

  Calypso e-ticket standard. Calypso is an e-ticket standard based on RFID
  tokens that is widely used in Europe and North and South America [1]. The roles
  in the Calypso system correspond to the model presented in Fig. 2. However,
  Calypso does not consider privacy of users and thus does not fulfill any of the
  privacy requirements of Section 2.4 w.r.t. any of the trust models presented


                                         108
User Privacy in Transport Systems Based on RFID E-Tickets

  in Section 2.3. In fact, all transactions involving a Calypso e-ticket provide no
  confidentiality at all [26]. Moreover, Calypso tokens store personal data of their
  owner (“holder information”) that can be queried by every verifier. Thus the
  Calypso e-ticket system leaks user-related information and allows the creation of
  movement profiles by everyone who is in possession of a standard RFID reader.
  However, all messages of a Calypso token are authenticated by a symmetric-
  key-based authentication mechanism. Thus, Calypso seems to fulfill all of the
  authentication requirements of Section 2.4.
      Calypso implements a common approach to authenticate a low-cost RFID
  token based on a simple challenge-response protocol. Each token has a symmetric
  authentication key KT that can be computed as a function of the serial number
  ST of the token and a global master secret. All verifiers are equipped with a
  tamper-resistant security module (secure application module, SAM) that knows
  and protects this master secret and can be used as a black-box to compute KT
  from ST . To authenticate a token, a verifier sends a random challenge NV to
  the token,which then computes HT ← f (KT , NV ) where f is some one-way
  function. Finally, the token returns (ST , HT ) to the verifier who uses its SAM
  to drive KT and then verifies HT . If verification is successful, the token has
  been authenticated. Obviously, this approach cannot provide privacy since all
  transactions of a token can be linked by its serial number ST that is transmitted
  in clear in every protocol run. All subsequent transactions to update or to read
  data from a Calypso token are authenticated this way but are not encrypted.


  Other e-ticket systems. There are many other proprietary solutions for e-
  tickets in practice. Most of them are based on widely used RFID transponders.
  Prominent examples are FeliCa [11] and MiFare [24].
      FeliCa [11] is provided by Sony and is a contactless smartcard that is used
  mainly in the Asia-Pacific area for different purposes including e-tickets for public
  transportation.
      MiFare is a family of contactless smartcards produced by Philips/NXP Semi-
  conductors. These transponders are widely used for different purposes including
  e-tickets for public transportation. There were several publications on attacks
  against MiFare Classic transponders [21,23], that use a proprietary encryption
  algorithm that has been completely broken [7]. However, other MiFare products
  are claimed not to be affected.
      The attacks on MiFare Classic transponders demonstrate a major problem
  of proprietary security solutions: Manufacturers of low-cost hardware try to find
  a compromise between speed and security of their products. Thus, they often
  implement proprietary lightweight crypto algorithms whose specifications are not
  public, and thus are typically not sufficiently evaluated. As the attack against
  MiFare Classic shows, these algorithms can often be reverse-engineered, which
  allows cryptanalysis or efficient key search by running the algorithms on more
  powerful hardware. In case of MiFare Classic, both ways enabled to break the
  security goals of these tags at a point in time where they were already widely
  used in practice.


                                         109
User Privacy in Transport Systems Based on RFID E-Tickets

  3.1    Protocols for Anonymous Authentication

  In an ideal e-ticket system, verifiers should learn nothing from the verification
  except that a token is genuine and valid. It is possible to realize this by us-
  ing privacy-preserving techniques like anonymous credential systems [15]. An
  anonymous credential system is a cryptographic tool that enables zero-knowledge
  proofs of knowledge of certified data [19]. However, using anonymous creden-
  tials implies high computational (public-key cryptography) and typically also
  high communication (many rounds of interaction) requirements to all devices
  involved. Apparently, this is a contradiction to the functional requirements de-
  scribed in Section 2.4. Thus, these techniques are not applicable unless the e-
  ticket system can fall back upon appropriate mobile computing devices that are
  already possessed by the users. However, using mobile computing devices, like
  mobile phones, has several disadvantages. For instance, in case a user’s phone
  runs out of power (which probably happens very often) he will no longer be
  able to prove authorization. Moreover, these devices can also be compromised
  by Trojans, which brings up new challenges. Furthermore, many users do not
  yet own a NFC3 compatible mobile phone that has sufficient computing power
  to run computationally demanding protocols like anonymous credential systems
  or e-cash as proposed in [15].


  3.2    Privacy-Preserving Protocols for RFID

  There is a large body of literature on different approaches to implement privacy-
  preserving mechanisms for low-cost RFID transponders. For instance, [16] gives
  a comprehensive overview of different approaches. The author classifies RFID
  transponders as basic tags and symmetric-key tags. Basic tags refers to tokens
  that have no computational and no cryptographic capabilities. Symmetric-key
  tags means tags that are capable of performing at least some symmetric cryp-
  tographic functions (e.g., random number generation, hashing, or encryption).
  Using the classification of [16], we discuss the applicability of different proposed
  solutions to e-tickets.


  Basic tags. As basic tags cannot perform any cryptographic operations they
  disqualify for authentication purposes. Tags that only provide wireless readable
  memory can only forward the data stored in their memory and thus are subject
  to replay and cloning attacks. This means that all data stored on such a tag can
  be read and be used to create identical copies or to simulate the original tag to
  an honest reader. Another problem related to cloning is swapping. This means
  that an adversary can copy the data stored on tag A to another tag B and vice
  versa and thus change the identities of these tags. Therefore, basic tags cannot
  fulfill the requirement of unforgeability.
  3
      Near Field Communication (NFC) [10] is a RFID standard for contactless smartcards
      that is also supported by some currently available mobile phones.


                                          110
User Privacy in Transport Systems Based on RFID E-Tickets

       Moreover, many solutions to enhance privacy of basic tags require tags to
  provide many-writable memory (e.g., [17,13,2]). The basic idea of these schemes
  is to frequently update the information stored on the tags such that an adversary
  cannot link them. However, due to the lack of secure access control mechanisms
  it is impossible to prevent unauthorized writes to such tags. A simple denial-of-
  service attack is to write some garbage data to a tag. Thus, an honest verifier will
  no longer accept the tag until it is reinitialized with correct data. This violates
  the availability requirement.
       Therefore, tags that provide no cryptographic functionality cannot be used
  in applications that require reliable authentication. Thus, it is inevitable to use
  tags that are capable of performing at least some cryptographic functions if
  authentication is of concern.


  Symmetric-key tags. A general problem of implementing privacy-preserving
  authentication based on symmetric keys is how to inform the other party which
  key must be used. Apparently, a tag cannot disclose its identity before the reader
  has been authenticated since this would violate its location privacy. Therefore,
  the reader does not know which authentication key it should use, and thus
  cannot authenticate to the tag. The basic idea to circumvent this problem has
  been introduced by [31] as Randomized Access Control :
      Let fK (m) be a keyed one-way function on message m using key K. To
  authenticate to a reader, a tag first computes hT ← fKT (R) where KT is a tag-
  specific key and R is a random value chosen by the tag. On receipt of (hT , R),
  the reader forwards this tuple to a trusted server that computes hi ← fKi (R)
  for all keys Ki ∈ K where K denotes the set of the keys of all authorized tags.
  The server accepts if it finds a Ki ∈ K such that hi = hT . Finally, the server
  sends its decision whether to accept or reject the tag to the reader. Since R is
  randomly chosen each time the tag is queried, it always emits a different tuple
  (hT , R) which cannot be linked to the tuples sent in previous protocol runs.
  Moreover, the reader does not learn the identity (i.e., the key KT ) of the tag
  since it only receives the response from the server. An obvious drawback of this
  solution is that the computational costs for the server to verify a tag are linear
  in the number of authorized tags. Therefore, this basic approach does not fulfill
  the efficiency and scalability requirement. Another disadvantage of this solution
  is that readers must have an online connection to the server, which, depending
  on the use case, may not be practical. Moreover, the tag must trust the server
  to respect its privacy since the server can identify the tag when it found the
  right key. Furthermore, this solution provides no security against replay-attacks
  and thus violates the unforgeability requirement. There is many subsequent work
  (including [20,9,18,25]) that follows and optimizes this approach by introducing
  new setup assumptions or by lowering the security or privacy requirements.
      Other approaches rely on updating the identity of a tag each time it has
  been authenticated [14,27]. These approaches allow authentication of a tag in
  constant time. However, they require the verifiers to have permanent access to a


                                         111
User Privacy in Transport Systems Based on RFID E-Tickets

  trusted database that verifies tags for them and manages all updates of the tag
  identities. As discussed above, this may be inappropriate for e-ticket systems.
      Section 4 provides a simple solution that allows anonymous authentication
  of tags with constant computational costs for the readers without the need for
  a permanent online connection.


  4     Solution for Practical Privacy-Preserving E-Tickets
  RFID tags that are capable of performing public-key operations disqualify for
  practicable implementations of e-tickets because of their relatively high price and
  low performance. Thus, RFID tokens that are limited to symmetric-key cryp-
  tography (i.e., random number generation and hashing) are the most practical
  choice for e-tickets. However, as discussed in Section 3.2 the use of symmetric-
  key cryptography seems to have the drawback that at least the token issuer must
  be trusted not to disclose personal information or movement profiles of users.
  Thus, our solution is based on the trust and adversary model for e-tickets that
  we discuss in the following.

  Trust and Adversary Model. Following Section 2.3, for e-tickets based on RFID
  tokens that are limited to symmetric cryptography, either trust model TM 1
  or trust model TM 2 must be chosen. This means that a user U must at least
  trust token issuer I. Whether user U must additionally trust all verifiers V
  depends on the corresponding setup assumptions. This means that, if verifiers are
  considered to be untrusted, all operations that disclose user-related information
  must be dropped from the verifiers. For instance, these computations may be
  carried out on a local tamper-resistant4 security module as it is done by many
  implementations in practice (cf. Section 3). Another simple approach used by
  various anonymous symmetric-key-based authentication protocols, is to employ
  a remote trusted server (cf. Section 3.2).

  Model for Anonymous E-Ticket Systems. As discussed in Section 2.2, to provide
  privacy of users it is necessary to prevent tracing of tokens. This means that all
  entities that are not trusted by the user of a token should not be able to decide
  whether the user’s token has been used in a protocol run (unlinkability).
      Therefore, it is necessary to employ some mechanism that hides the identity
  of a token each time it is queried. This can either be some special hardware
  (e.g., as proposed by [2]) or a cryptographic primitive that inherently provides
  anonymity of users (e.g., anonymous credentials as proposed in [15]). In the
  following, we refer to this mechanism as anonymizer.
      Analogous to Section 2.1, an anonymous e-ticket system consists of at least
  one token issuer, a set of users, tokens, verifiers, and anonymizers. The token
  issuer creates tokens for users. These tokens can be used by users to prove
  to verifiers that they are authorized to use the transit system. Additionally,
  4
      Tamper-resistance means that the device will delete all its secrets when it detects
      any kind of physical tampering.


                                           112
User Privacy in Transport Systems Based on RFID E-Tickets

  anonymizers ensure anonymity of tokens. We say that tokens are anonymized.
  Fig. 2 illustrates the model for anonymous e-ticket systems.


            issuer I       1.   Request        user U                          veriﬁer V

                                2.   Issue                        4.   Prove
                                               token T

                                                      3.   Anonymize

                                             anonymizer A


                       Fig. 2. Model for anonymous e-ticket systems.


  Description of the Solution. In the following, we focus on solutions for privacy-
  preserving authentication based on RFID tokens that are at most capable of
  performing symmetric cryptography.
      The players are as shown in Fig. 2. The anonymizer is either a dedicated
  hardware device or a software running on a mobile computing device (e.g., the
  mobile phone) of the user. Note that, a separate anonymizer device may suffer
  from the same problems as discussed in Section 3.1. However, in case a user’s
  anonymizer runs out of power, the user will indeed loose some privacy until
  his anonymizer is operable again but he can still prove authorization using his
  RFID token. Moreover, since anonymizers can also be available in public places
  and their capabilities can be embedded in the verifiers (when this does not
  significantly affect the performance of the system), the user’s privacy is not
  completely lost.
      Since our solution relies on symmetric-key-based authentication, the token
  must store an authentication secret. To achieve the security requirement of un-
  forgeability, it should be impossible to determine this secret by attacking the
  protocols involving the token, as well as by physically attacking the token. One
  solution to counterfeit physical attacks is to employ physical protection mecha-
  nisms that aggravate reading out the memory of the tag. However, this would
  increase the price of tag such that it would be improvident to use them. Another
  solution to prevent cloning can be implemented by means of a recent physical
  cryptographic primitive: Physically Unclonable Functions (PUFs) [28,29].

  4.1   Building Blocks
  Secure key storage with PUFs. A Physically Unclonable Function (PUF)
  is an inherently unclonable function embedded into a physical object [29]. The
  unclonability of the PUF comes from random and uncontrollable manufacturing
  processes during creation of the corresponding object. A PUF maps challenges
  to responses. A challenge is a stimulus that, when applied to the PUF makes it
  to return a response that is specific for the PUF w.r.t. to the stimulus. Since the


                                                113
User Privacy in Transport Systems Based on RFID E-Tickets

  response of a PUF relies on physical properties of the corresponding physical
  object, which is subject to noise (e.g., temperature, pressure, etc.), the PUF will
  always return slightly different responses to the same stimulus.
      A PUF can be embedded into a microchip, e.g., by exploiting statistical
  variations of delays of gates and wires within the chip. These deviations are
  unique for every sample from a set of chips that implement the same circuit.
  Therefore, in [28], the authors propose to use a PUF as secure key storage.
      The adversary model for PUFs is that an attacker is assumed to know how
  the PUF is challenged and how responses are measured. Moreover, the attacker
  is allowed to know the exact challenges for deriving the secret stored in the PUF.
  The requirements to the chip that incorporates a PUF to securely store a secret
  are as follows [29]:
   1. The PUF must be inseparably bound to the chip such that any attempt to
      separate them results in significant damage to the PUF and the chip.
   2. Any measurements to the chip must not reveal detailed information on the
      structure of its PUF.
   3. The PUF, the sensors for measuring responses, the processing unit, and the
      volatile memory of the chip must be opaque.
   4. Even if details on the structure of the PUF are known, it must be infeasible
      to create a physical copy or to set up a mathematical model of the PUF
      that allows to predict challenge-response pairs with non-negligible probabil-
      ity (unclonability).
   5. Tampering with the chip or the PUF must significantly change the challenge-
      response behavior of the PUF (tamper-evidence).
   6. The chip must contain tamper-proof read-only memory that stores public
      data (e.g., algorithms) whose integrity is important.
      The first requirement prevents an adversary from accessing the output of
  (and thus, the secret stored in) the PUF. The second prevents an adversary
  from collecting data that may help to create a clone or to set up a mathematical
  model of the PUF that can be used to obtain the secret. The third is to prevent
  any kind of attacks (e.g., side-channel attacks) that try to disclose the internal
  state of the PUF, the processing unit, or the volatile memory of the chip that
  may temporally contain parts of the secret. The fourth is to prevent cloning of
  the PUF. The fifth prevents invasive inspections, which means that any attempt
  to physically access or manipulate the chip or the PUF must destroy both of
  them. The last requirement prevents an attacker from injecting malicious code
  that may force the chip to disclose its secret.

  Storing a secret in a PUF. To use a PUF as a secure key storage, a key K is
  generated and stored as follows: A trusted party (e.g., issuer I) first generates
  key K ∈R {0, 1}l using an appropriate security parameter l. Then, it chooses a
  random challenge z to challenge the PUF. On receipt of response r, the trusted
  party computes some helper data w such that key K can later be recovered
  by evaluating PUF(z, w) and stores (z, w, K) in a database. The tuple (z, w) is
  stored in the (unprotected) memory of the chip.


                                        114
User Privacy in Transport Systems Based on RFID E-Tickets

      Helper data w has two different purposes [29]: First it should help to remove
  the effects of noise on measurements of the responses of the PUF, and second,
  since the responses of a PUF are typically not uniformly distributed, w should
  guarantee that secret K is uniform.

  Reconstructing a secret from a PUF. To reconstruct secret K, the chip reads
  (z, w) from its memory, challenges its PUF with (z, w) and obtains K.

  Efficient implementation. According to [28], a PUF can be integrated into a chip
  with less than 1000 extra gates. Moreover, [8] presents an implementation of a
  PUF for RFIDs.

  Symmetric-key-based authentication. In order to authenticate tokens, stan-
  dard authentication mechanisms based on symmetric-key cryptography that are
  secure against impersonation under passive (imp-pa) and active (imp-aa) attacks
  [4, p. 10] can be used. Since low-cost RFID tags are not capable of running mul-
  tiple sessions, concurrent attacks (imp-ca) must not be considered. To provide
  confidentiality and location privacy, the authentication scheme must not disclose
  user-related information (e.g., user data or movement profiles).
      As described in Section 3.2, the major problem of realizing anonymous au-
  thentication based on shared secrets is how to inform the other party about
  which secret should be used without revealing the own identity. This problem
  can be solved by employing rerandomizable public-key encryption [13,2].

  Rerandomizable encryption. A rerandomizable encryption scheme means an
  encryption scheme for which there is a probabilistic function Rand(·) that maps
  ciphertexts c to ciphertexts c0 6= c such that the corresponding plaintext stays
  the same. The rerandomizable encryption scheme must be semantically secure
  [12] and should provide key privacy [3]. Semantic security means that, given
  two different chosen plaintexts m0 6= m1 , and a ciphertext cb = Encpk (mb ) for
  some fixed public-key pk and b ∈R {0, 1}, it should be hard to decide whether
  cb encrypts m0 or m1 . Key privacy means that, given two different public-keys
  pk 0 6= pk 1 , and a ciphertext cb = Encpk b (m) for some fixed message m and
  b ∈R {0, 1}, it should be hard to decide whether cb has been created by using
  pk 0 or pk 1 .

  Use of rerandomizable encryption. Rerandomizable public-key encryption can be
  used to provide a symmetric authentication key to authorized communication
  partners (e.g., trusted verifiers) without disclosing the identity of the token to
  unauthorized entities. Moreover the computations performed by the token are
  still contained in the more efficient symmetric-key setting.
       During creation of a token T , the token issuer encrypts the token authentica-
  tion key KT of token T with a public encryption key pk V whose corresponding
  secret decryption key is known to all verifiers (or their security modules or the
  trusted server). The resulting ciphertext cT = Encpk V (KT ) is then stored in the


                                        115
User Privacy in Transport Systems Based on RFID E-Tickets

  memory of the token. Whenever token T engages a protocol run with a verifier,
  it first sends its ciphertext cT . In case the recipient knows the correct decryption
  key, it can decrypt KT and use it in a subsequent authentication protocol.
       Since an honest verifier must verify that a token has been created by a gen-
  uine issuer, a digital signature scheme is used to certify the token authentication
  key. However, this signature is static data and thus cannot be transmitted to the
  verifier as plaintext since this would enable tracing of the token and thus violate
  location privacy. Therefore, the signature must be included into the rerandom-
  izable ciphertext.
       However, cT is a static ciphertext and must be frequently rerandomized in
  order to provide location privacy. Therefore, anonymizers must read cT , reran-
  domize it to c0T ← Rand(cT ), and replace cT with c0T [2]. Since all known reran-
  domizable encryption schemes require public-key operations (which in turn im-
  plies modular exponentiations) to rerandomize a ciphertext, a symmetric-key
  token cannot rerandomize its ciphertext on its own. Thus, unlinkability relies on
  the availability of anonymizers that are not controlled by the token. Basically,
  there are four possibilities to realize anonymizers:

   1. Integrated anonymizers: The anonymizer may be integrated into the token.
      This would enable gapless location privacy while improving practicability.
      However, all known rerandomizable public-key encryption schemes require
      to compute public-key operations (e.g., exponentiations) in order to reran-
      domize a ciphertext. Thus, this approach is not applicable to symmetric-key
      tags.
   2. Public anonymizers: Anonymizers may be public, which means that they
      can be constructed and run by everyone. However, public anonymizers as
      proposed by [2] enable adversaries to put up malicious anonymizers that can
      perform denial-of-service attacks. Therefore, to fulfill security requirement
      availability, it is necessary that anonymizers are trusted by the users. In
      return, a trusted anonymizer must authenticate to a token before it is allowed
      to anonymize it. In practice there may be a variety of public anonymizing
      service providers the user may choose from the one he trusts.
      Authentication of anonymizers can be realized in the same way as described
      above for verifiers. Each token may be initialized with an additional reran-
      domizable ciphertext cA that encrypts a token-specific symmetric anonym-
      izer authentication key KA under a public-key pk A whose secret key sk A is
      known to all anonymizers trusted to anonymize the specific token. Thus, only
      trusted anonymizers can decrypt cA to obtain KA and use it to authenticate
      to the token to be anonymized.
   3. Anonymizers controlled by transit enterprise: Anonymizers may be con-
      trolled by the (trusted) transit enterprise. For instance, anonymizers may
      be included into verifiers or mounted at the stations or in the vehicles of the
      transit enterprise.
   4. User-controlled anonymizers: Each user may own an anonymizer that can
      only be used to rerandomize his own tags. Therefore the user must provide


                                         116
User Privacy in Transport Systems Based on RFID E-Tickets

        the public key pk A of his anonymizer A to the token issuer during the process
        of issuing an e-ticket.

  To summarize, the user must trust the anonymizer to respect his privacy. How-
  ever, this is a reasonable assumption since the anonymizer is either under his
  control or managed by a trusted entity (e.g., the transit enterprise).

  4.2     Protocol Descriptions
  The issue protocol. A user U requests token issuer I to create a token with his
  desired usage conditions ρT (e.g., ticket type, expiration date, geographical us-
  age restrictions, etc.) and therefore provides public-key pk A of his anonymizer A.
  Issuer I then creates the token authentication key KT and anonymizer authenti-
  cation key KA for token T . After that, issuer I derives the corresponding helper
  data (zT , wT ) and (zA , wA ) for the PUF of token T as described in Section 4.1.
  Then, issuer I creates a certificate σT = Signsk I (KT , ρT ) and two rerandomiz-
  able ciphertexts cT = Encpk V (KT , ρT , σT ) and cA = Encpk A (KA ). Finally, issuer
  I writes the tuple (wT , wA , cT , cA ) to the (unprotected) memory of token T and
  physically passes token T to user U .

  The anonymize protocol. In order to anonymize a token T , anonymizer A
  broadcasts an anonymization request. On receipt of this request, token T uses its
  random number generator to create a random challenge NT , reads both cipher-
  texts cT and cA from its memory, and sends the tuple (NT , cT , cA ) to anonymizer
  A that then uses the rerandomization function of the rerandomizable encryp-
  tion scheme to rerandomize both ciphertexts (cT , cA ) to (c0T , c0A ). After that,
  anonymizer A uses its secret decryption key sk A to decrypt the anonymizer
  authentication key KA from ciphertext cA , uses KA to authenticate message
  (KA , c0T , c0A , NT ), which A then sends to token T . On receipt of this message,
  token T recovers its anonymizer authentication key KA by reading helper data
  wA from its memory and challenging its PUF as described in Section 4.1. If
  token T can successfully verify the authenticity of tuple (KA , c0T , c0A , NT ) w.r.t.
  to key KA , token T updates both ciphertexts (cT , cA ) stored in its memory to
  the ciphertexts (c0T , c0A ) received from anonymizer A. If the authenticity of the
  response of anonymizer A cannot be verified, token T aborts.

  The prove protocol. To verify the authenticity of token T , a verifier V first
  broadcasts a verification request. On receipt of this request, token T reads ci-
  phertext cT from its memory and sends it to the verifier V , who then uses its
  secret decryption key sk V to decrypt (KT , ρT , σT ) from cT . Then, verifier V uses
  public verification key pk I of token issuer I to verify σT . If verification of σT
  fails or the usage conditions ρT associated with token T are violated, verifier V
  rejects. Otherwise, it continues by using KT to engage an symmetric-key based
  authentication protocol with token T . Token T can recover its token authenti-
  cation key KT by reading helper data wT from its memory and challenging its


                                          117
User Privacy in Transport Systems Based on RFID E-Tickets

  PUF as described in Section 4.1. Verifier V accepts token T as authentic token
  if token T successfully completes the authentication protocol w.r.t key KT . If
  authentication fails, verifier V rejects token T .


  4.3   Analysis of the Framework

  This section informally analysis which of the requirements of Section 2.4 are
  fulfilled by the solution presented in Section 4. We will provide formal proofs
  in an extended version of the security and privacy model of [30] in a follow-up
  paper.


  Authentication. The solution presented in Section 4 fulfills all of the authen-
  tication requirements of Section 2.4:

  Authentication: Honest verifiers will only accept tokens whose token authen-
     tication key has been certified by a genuine token issuer.
  Unforgeability: The properties of the PUF, the underlying authentication pro-
     tocol, and the semantic security of the rerandomizable encryption scheme
     ensure that a valid token cannot be cloned or simulated by an adversary
     since its secrets cannot be extracted. Moreover, the security of the digital
     signature scheme guarantees that an adversary cannot create valid tokens
     on his own since he cannot forge signatures.
  Availability: The token only updates its internal memory with data that has
     been authenticated by an authorized (i.e., trusted) anonymizer. Thus, an
     adversary cannot tamper with the data stored on the token.


  Privacy. The solution presented in Section 4 fulfills the following privacy re-
  quirements of Section 2.4 w.r.t. to trust model TM 2 (or trust model TM 3 if
  verifiers are equipped with security modules or are connected to a remote trusted
  server) as described in Section 4:

  Confidentiality: Since the rerandomizable encryption scheme is required to be
     semantically secure, no information on the secrets of the token is revealed
     by the corresponding ciphertexts. Moreover, the underlying authentication
     scheme is required not to disclose any user-related information. Thus, an
     adversary cannot obtain any information on the token or the user.
  Location Privacy: Tokens can be traced between two randomizations. How-
     ever, if an adversary misses only one rerandomization, he cannot trace a
     token any more because of the semantic security of the rerandomizable en-
     cryption scheme and the properties of the authentication scheme.

  Our framework currently does not provide backwards and forward traceability,
  and we leave this as an interesting open problem.


                                        118
User Privacy in Transport Systems Based on RFID E-Tickets

  Functional requirements. The solution presented in Section 4 fulfills all of
  the functional of Section 2.4:

  Efficiency: Verification of a token requires to run a symmetric-key-based au-
      thentication protocol between the token and the verifier. Moreover, a single
      public-key decryption and a single signature verification must be performed
      by the verifier. This computational effort is comparable to existing schemes
      currently used in practice (e.g., [1,26]) since the additional operations that
      must be performed by the verifier can be neglected due to the computing
      power of currently available RFID readers.
  Scalability: The solution does not depend on the number of tokens.


  5    Conclusion, Open Problems, and Future Work
  Summary of contribution. We analyzed the viability of current proposals for
  privacy-preserving e-tickets and examined the applicability of privacy-enhancing
  RFID-based protocols. We showed that existing approaches are not suited for
  the application scenario of e-tickets and presented a solution based on existing
  cryptographic tools and current RFID technology.

  Open research problems. As discussed in Section 3.2, all currently known privacy-
  preserving authentication schemes for tokens that are limited to symmetric cryp-
  tography seem to require the token issuer to be trusted. Therefore, it would be
  interesting to find a scheme based on symmetric cryptography but similar to the
  one that provides similar properties as anonymous credential systems.
      Currently, our approach does not provide forward and backward security.
  Forward and backward-secure anonymous symmetric-key based authentication
  schemes require frequent update of the secrets of the tokens [18]. However, since
  secrets are protected by PUFs it is not trivial to update them for both, the token
  and the verifier, in a way that ensures forward and backwards traceability.

  Acknowledgments. This work has been supported in party by the European
  Commission through the Network of Excellence ECRYPT II.
     The work of the third authors has been also supported in part by the Eu-
  ropean Commission through the FP7 Information Communication Technologies
  programme, under Contract FET-215270 FRONTS (Foundations of Adaptive
  Networked Societies of Tiny Artefacts).


  References
   1. Calypso Networks Association. Web site of Calypso Networks Association. http:
      //www.calypsonet-asso.org/, May 2007.
   2. Giuseppe Ateniese, Jan Camenisch, and Breno de Medeiros. Untraceable RFID
      tags via insubvertible encryption. In Proceedings of the 12th ACM Conference on
      Computer and Communications Security, November 7–11, 2005, Alexandria, VA,
      USA, pages 92–101. ACM Press, 2005.


                                        119
User Privacy in Transport Systems Based on RFID E-Tickets

   3. Mihir Bellare, Alexandra Boldyreva, Anand Desai, and David Pointcheval. Key-
      privacy in public-key encryption. In 7th International Conference on the Theory
      and Application of Cryptology and Information Security, Gold Coast, Gold Coast,
      Australia, December 9–13, 2001, Proceedings, volume 2248 of LNCS, pages 566–
      582. Springer Verlag, 2001.
   4. Mihir Bellare, Chanathip Namprempre, and Gregory Neven. Security proofs for
      identity-based identification and signature schemes. Cryptology ePrint Archive:
      Report 2004/252, September 2004. Available at http://eprint.iacr.org/2004/
      252.
   5. Jan Camenisch, Susan Hohenberger, and Anna Lysyanskaya. Compact e-cash. In
      24th Annual International Conference on the Theory and Applications of Crypto-
      graphic Techniques, Aarhus, Denmark, May 22–26, 2005, Proceedings, volume 3494
      of Lecture Notes on Computer Science (LNCS), pages 302–321. Springer Verlag,
      2005.
   6. Jan Camenisch and Anna Lysyanskaya. Signature schemes and anonymous cre-
      dentials from bilinear maps. In 24th Annual International Cryptology Conference,
      Santa Barbara, California, USA, August 15–19, 2004, Proceedings, volume 3152 of
      LNCS, pages 56–72. Springer Verlag, 2004.
   7. Nicolas T. Courtois, Karsten Nohl, and Sean O’Neil. Algebraic attacks on the
      Crypto-1 stream cipher in MiFare classic and oyster cards. Cryptology ePrint
      Archive, Report 2008/166, 2008. Available at http://eprint.iacr.org/2008/
      166/.
   8. Srinivas Devadas, Edward Suh, Sid Paral, Richard Sowell, Tom Ziola, and Vivek
      Khandelwal. Design and implementation of PUF-based unclonable RFID ICs for
      anti-counterfeiting and security applications. In IEEE International Conference on
      RFID 2008, Las Vegas, NV, USA, 16–17 April, 2008, pages 58–64. IEEE Computer
      Society, 2008.
   9. Tassos Dimitriou. A lightweight RFID protocol to protect against traceability
      and cloning attacks. In Proceedings of the First International Conference on Secu-
      rity and Privacy for Emerging Areas in Communications Networks (SecureComm),
      September, 05–09, 2005, pages 59–66. IEEE Computer Society, 2005.
  10. Near Field Communication Forum. Web site of Near Field Communication (NFC)
      Forum. http://www.nfc-forum.org/, April 2008.
  11. Sony Global. Web site of Sony FeliCa. http://www.sony.net/Products/felica/,
      June 2008.
  12. Shafi Goldwasser and Silvio Micali. Probabilistic encryption. Journal of Computer
      and System Sciences, 28:270–299, 1984.
  13. Philippe Golle, Markus Jakobsson, Ari Juels, and Paul Syverson. Universal re-
      encryption for mixnets. In The Cryptographers’ Track at the RSA Conference
      2004, San Francisco, CA, USA, February 23–27, 2004, Proceedings, volume 2964
      of LNCS, pages 163–178. Springer Verlag, 2004.
  14. Dirk Henrici and Paul Müuller. Hash-based enhancement of location privacy for
      radio-frequency identification devices using varying identifiers. In Proceedings of the
      Second IEEE Annual Conference on Pervasive Computing and Communications
      Workshops, March, 14–17, 2004, pages 149–153. IEEE Computer Society, 2004.
  15. Thomas S. Heydt-Benjamin, Hee-Jin Chae, Benessa Defend, and Kevin Fu. Privacy
      for public transportation. In 6th International Workshop, PET 2006, Cambridge,
      UK, June 28–30, 2006, Revised Selected Papers, volume 4258 of Lecture Notes on
      Computer Science (LNCS), pages 1–19. Springer Verlag, 2006.
  16. Ari Juels. RFID security and privacy: A research survey. Journal of Selected Areas
      in Communication (J-SAC), 24(2):381–395, February 2006.


                                            120
User Privacy in Transport Systems Based on RFID E-Tickets

  17. Ari Juels and Ravikanth Pappu. Squealing euros: Privacy protection in RFID-
      enabled banknotes. In 7th International Conference, FC 2003, Guadeloupe, French
      West Indies, January 2003, Revised Papers, volume 2742 of LNCS, pages 103–121.
      Springer Verlag, 2003.
  18. Chae Hoon Lim and Taekyoung Kwon. Strong and robust RFID authentication
      enabling perfect ownership transfer. In 8th International Conference, ICICS 2006,
      Raleigh, NC, USA, December 4–7, 2006, Proceedings, volume 4307 of LNCS, pages
      1–20. Springer Verlag, 2006.
  19. Anna Lysyanskaya. Signature Schemes and Applications to Cryptographic Pro-
      tocol Design. PhD thesis, Massachusetts Institute of Technology, Department of
      Electrical Engineering and Computer Science, Cambridge, Massachusetts, USA,
      September 2002.
  20. David Molnar and David Wagner. Privacy and security in library RFID: Issues,
      practices, and architectures. In Proceedings of the 11th ACM Conference on Com-
      puter and Communications Security, Washington, DC, USA, October 25–29, 2004,
      pages 210–219. ACM Press, 2004.
  21. Karsten Nohl and Henryk Plötz. MiFare — little security despite obscurity. http:
      //events.ccc.de/congress/2007/Fahrplan/events/2378.en.html, 2007.
  22. Miyako Ohkubo, Koutarou Suzuki, and Shingo Kinoshita. Efficient hash-chain
      based RFID privacy protection scheme. International Conference on Ubiquitous
      Computing (UbiComp), Workshop Privacy: Current Status and Future Directions,
      Nottingham, UK, September, 2004, September 2004.
  23. Ronny Wichers Schreur, Peter van Rossum, Flavio Garcia, Wouter Teepe, Jaap-
      Henk Hoepman, Bart Jacobs, Gerhard de Koning Gans, Roel Verdult, Ruben Mui-
      jrers, and Ravindra Kali andVinesh Kali. Security flaw in MiFare Classic. http:
      //www.sos.cs.ru.nl/applications/rfid/pressrelease.en.html, March 2008.
  24. NXP Semiconductors. Web site of MIFARE. http://mifare.net/, May 2007.
  25. Boyeon Song and Chris J. Mitchell. RFID authentication protocol for low-cost
      tags. In Proceedings of the First ACM Conference on Wireless Network Security,
      Alexandria, Virginia, USA, March 31 – April 2, 2008., pages 140–147. ACM Press,
      2008.
  26. Spirtech. CALYPSO functional specification: Card application, version 1.3. http:
      //calypso.spirtech.net/, October 2005.
  27. Gene Tsudik. YA-TRAP: Yet Another Trivial RFID Authentication Protocol.
      In Proceedings of the 4th Annual IEEE International Conference on Pervasive
      Computing and Communications Workshops, March 13–17, 2006, volume 2802 of
      LNCS, pages 640–643. IEEE Computer Society, 2006.
  28. Pim Tuyls and Lejla Batina. RFID-tags for anti-counterfeiting. In The Cryptogra-
      phers’ Track at the RSA Conference 2006, San Jose, CA, USA, February 13–17,
      2005, Proceedings, volume 3860 of Lecture Notes on Computer Science (LNCS),
      pages 115–131. Springer Verlag, 2006.
  29. Pim Tuyls, Boris Škoriç, and Tom Kevenaar, editors. Security with Noisy Data —
      On Private Biometrics, Secure Key Storage, and Anti-Counterfeiting. Springer-
      Verlag, 2007.
  30. Serge Vaudenay. On privacy models for RFID. In 13th International Conference
      on the Theory and Application of Cryptology and Information Security, Kuching,
      Malaysia, December 2–6, 2007, Proceedings, volume 4833 of LNCS, pages 68–87.
      Springer Verlag, 2007.
  31. Stephen A. Weis, Sanjay E. Sarma, Ronald L. Rivest, and Daniel W. Engels.
      Security and privacy aspects of low-cost radio frequency identification systems.


                                         121
User Privacy in Transport Systems Based on RFID E-Tickets

      In First International Conference on Security in Pervasive Computing, Boppard,
      Germany, March 12–14, 2003, Revised Papers, volume 2802 of LNCS, pages 50–59.
      Springer Verlag, 2003.


                                        122
        The CURUPIRA-2 Block Cipher for Constrained
        Platforms: Speci cation and Benchmarking
 Marcos Simplcio Jr1 , Paulo S. L. M. Barreto1 ? , Tereza C. M. B. Carvalho1 ,
                     Cintia B Margi1 , and Mats Naslund2
    1
        Laboratory of Computer Architecture and Networks, PCS-EP, University of S~ao
                                      Paulo, Brazil
                   {mjunior,pbarreto,carvalho,cbmargi}@larc.usp.br
                           2
                               Ericsson Research - Stockholm, Sweden
                                   mats.naslund@ericsson.com


           Abstract.    Privacy is a key concern in Location Based Applications
           (LBAs), especially due to their intensive use resource constrained devices
           in which general purpose ciphers are dicult to deploy. In this paper, we
           address this issue by specifying a new, faster key-schedule algorithm for
           the Curupira block cipher. This special-purpose cipher follows the Wide
           Trail Strategy (such as AES) and is tailored for resource-constrained plat-
           forms, such as sensors and mobile devices. Furthermore, we present our
           benchmark results for both the Curupira-1 (which adopts the original
           key-schedule speci cation) and the Curupira-2 (which adopts the new
           one) in appropriate testbeds.
           Keywords: location based services, symmetric cryptography, involutional
           block ciphers, sensor networks, constrained platforms, ciphers bench-
           mark.


1        Introduction

The continuous and widespread development of context-aware applications based
on Wireless Sensor Networks (WSNs) shows their potential to allow a high
level of integration between computers and the physical environment. Location-
Based Applications (LBAs) play an important role in this scenario, automati-
cally adapting their behaviors to the available geo-spatial location information
and the nearby sensors and devices. This way, these systems are able to provide
both novel and more e ective services. However, due to the sensitive nature of
the data involved (both location information and other data collected by the
network nodes) the security of the communication in these applications is an
important concern.
    Battery-powered sensor nodes and other limited platforms normally em-
ployed in LBAs impose several constraints over the cryptographic algorithms
? Supported by the Brazilian National Council for Scienti c and Technological De-
    velopment (CNPq) under grant 312005/2006-7. yThis work was supported by the
    Research and Development Centre, Ericsson. Telecomunicaco~es S.A., Brazil


                                              123
The CURUPIRA-2 Block Cipher for Constrained Platforms

  that can be e ectively deployed. For example, commercial motes usually have a
  memory size of 8-12 KB for code and 512-4096 bytes of RAM, as well as 4-16 MHz
  processors [21, 17]. Moreover, messages exchanged in these applications are fre-
  quently small, a typical packet being 24 bytes in length [22, 30]. In this context,
  complex all-purpose algorithms not only take longer to run but also consume
  more energy, which motivates the research for more ecient alternatives.
      To date, many architectures have been proposed to provide security in WSNs.
  One of the most popular is TinySec [19], which o ers link layer security to
  TinyOS [18], the de facto standard operating system for sensor networks. As
  default block cipher, TinySec has chosen Skipjack [33] due to its superior per-
  formance. Meanwhile, RC5 [34] was not considered as an alternative since it is
  considered to be encumbered with patents and, even if it can run faster than
  Skipjack when the round keys are pre-computed, this incurs extra RAM require-
  ments [15]. However, as Skipjack uses relatively small (80-bit) keys and 31 out of
  its 32 rounds can be successfully cryptanalyzed [9], it presents a very low margin
  of security. These observations lead to security concerns regarding TinySec, as
  well as other architectures based on the same cipher, such as TinyKeyMan [24],
  MiniSec [25] and Sensec [23].
      The literature includes numerous analyses of modern cryptographic algo-
  rithms, aiming to identify those well-suited to WSNs both in terms of security
  and performance. One of the most extensive is presented by [21], which con-
  cludes that Skipjack is the most energy-ecient of all surveyed ciphers, while
  the Advanced Encryption Standard (AES) [32] and MISTY1 [27] are considered
  reasonable alternatives in scenarios with higher security requirements. However,
  MISTY1 is considered to be encumbered with patents, while AES has larger
  code, memory, and energy requirements, as well as a block size which is too
  big for sensor networks. It could also be possible to rely on cryptographic hard-
  ware [16], but this is not a solution available in many modern devices.
      An alternative to address these issues is the adoption of Curupira [6], a
  special-purpose block cipher specially developed with constrained platforms in
  mind. The cipher follows the Wide Trail strategy [11], such as the AES itself,
  which assures a good security against cryptanalysis. Also, it presents an invo-
  lutional structure (meaning that the encryption and decryption processes are
  identical except by the key-schedule) and is very exible in terms of implemen-
  tation. This way, the cipher is well adapted to resource constrained scenarios
  such as those faced by LBAs.
      In this paper, we propose a new, faster key-schedule algorithm for the Cu-
  rupira algorithm and analyze its security. We also present a benchmark com-
  paring Skipjack, AES and Curupira in relevant platforms. In order to discern
  between the two versions of the Curupira cipher, we write `Curupira-1" for
  the one adopting the original key-schedule and \Curupira-2" for the new spec-
  i cation. We write simply \Curupira" when the discussion applies to both.
      This document is organized as follows. We introduce basic mathematical tools
  and notation in section 2. An overview of the Curupira-1, including its key-
  schedule algorithm, is given in section 3. The second version of the key-schedule


                                        124
The CURUPIRA-2 Block Cipher for Constrained Platforms

  is presented in section 4, which also discuss security, performance and implemen-
  tation issues for the resultant Curupira-2 block cipher. Our benchmark results
  are presented in section 5. We conclude in section 6.


  2    Mathematical preliminaries and notation


  The nite eld GF(2n ) will be represented as GF(2)[x]=pn (x), where pn (x) is a
  primitive pentanomial of degree n over GF(2), i.e. deg(pn (x))  n. This way, all
  multiplications over GF(28 ) are made modulo p8 (x) = x8 + x6 + x3 + x2 + 1.
  This choice of p8 (x) incurs in a simple form for the primitive cube root of unity,
  c(x) = x85 mod p8 (x) = x4 + x3 + x2 .
      An element u = u7 x7 +: : :+ui xi +: : :+u0 of GF(28 ) where ui 2 GF(2) for all
  i = 0; : : : ; 7, will be denoted by the numerical value u7  27 +: : :+ui  2i +: : :+u0 ,
  written in hexadecimal notation. Thus, the polynomial c(x) is written 1C, while
  p8 (x) is written 14D. The multiplication by the polynomials x and c(x) are
  denoted xtimes and ctimes, respectively.
                                                                        M
      The set of all 3  n matrices over GF(2m ) is denoted by n . Let D and E
  denote the MDS matrices (see [26] for a de nition) as follows:
                       2       3       2                  3
                         322             1+c(x) c(x) c(x)
                   D = 4 4 5 4 5 ; E = 4 c(x) 1+c(x) c(x) 5 :
                           667                c(x)     c(x) 1+c(x)


  3    Overview of the CURUPIRA-1 structure


  This section gives a brief description of the Curupira-1 original speci cation.
  For further details, we refer to [6].
      The Curupira is a block cipher specially tailored for constrained platforms.
                                                       M
  It operates on 96-bit data blocks (organized as 4 matrices, mapped by columns
  instead of by rows) and accepts 96-, 144- or 192-bit keys, with a variable number
  of rounds. The cipher round structure is similar to the one in BKSQ [12], with
  the advantage of being involutional, resulting in a more compact cipher. Its
  round function structure is used for both Curupira-1 and Curupira-2 and is
  composed by the following self-inverse transforms (see Figure 1:
   { Nonlinear Layer ( ): all bytes in the block pass through a highly nonlinear
      S-Box, identical to that used in Anubis [7] and Khazad [8] block ciphers;
   { Linear Di usion Layer (): the block is left-multiplied by the MDS matrix
     D (the one de ned in section 2), which results in intra-columnar di usion;
   { Permutation Layer (): all the bytes in the second and third rows of the
     block are permuted according to the rule (a) = b , bi;j = ai;ij ; 0 6 i <
     3; 0 6 j < n;
   { Key addition Layer (): the round key is XORed with the block.


                                            125
The CURUPIRA-2 Block Cipher for Constrained Platforms


                         Fig. 1.   Curupira Round Structure

      These transforms only involve basic operations such as table lookups, XORs
  and byte shifts. Thus, they can be implemented in most platforms in a very e-
  cient way. Nonetheless, when space is available, they can be further accelerated
  using pre-computed tables, operating over entire columns of the block instead of
  byte-to-byte.

  3.1   The CURUPIRA-1 key-schedule

  The Curupira-1 key-schedule algorithm is easily invertible and follows a struc-
  ture closely related to the one dictated by the Wide Trail Strategy, which assures
  a high di usion speed. Also, it has the advantage of being cyclical, which means
  that the original key is recovered after a certain number of rounds, avoiding the
  need of storing any intermediary sub-key during both encryption and decryption.
      In this rst construction, a 48t-bit user key K, 2 6 t 6 4 is internally repre-
                           M
  sented as a matrix K 2 2t . To generate a sub-key K n+1 from its predecessor
  K n , the sub-keys pass through three di erent transformations (illustrated in
  Figure 2):
   { Constant Addition ((q)): a set of nonlinear constants, incrementally taken
     from the S-Box, are XORed with the bytes in the rst row of the round key;
     this way, for a 48t-bit key, the rth round and the column j, the qj(r) constant
     is given by: qj(0) = 0 and qj(r) = S[2t(r 1) + j], 0 6 j 6 2t;
   { Cyclic Shift (): rotates the second row one position to the left, and rotates
     the third row one position to the right, keeping the rst row unchanged;
   { Linear Di usion (): the round-key is left-multiplied by the matrix E (de-
       ned in section 2).


                                         126
The CURUPIRA-2 Block Cipher for Constrained Platforms


        Fig. 2.   The original key-schedule speci cation (adopted by Curupira-1)


     Furthermore, the round keys (r) e ectively combined with the data blocks
  are chosen by the Key Selection algorithm r , which applies the S-Box to the
   rst row of the sub-key K r and truncates it to 96 bits (i.e., the size of the
  block). Thus, even if the Key Selection is not part of the key evolution, it adds
  nonlinearity to the key-schedule.

  4     The CURUPIRA-2 key-schedule

  The Curupira-1 key-schedule speci cation is quite conservative, aiming for a
  high security level against related-key attacks. Thus, it is reasonable to adopt a
  simpler yet secure key-schedule algorithm in order to improve the overall cipher
  performance. That is the approach of the Curupira-2 key-schedule, which will
  be described in the following sections.

  4.1   Key representation

  A 48t-bits long user key K (2 6 t 6 4) is internally represented as an element
  K belonging to the nite eld GF(248t ) = GF(2)=p48t (x), where p48t (x) is a
  pentanomial in GF(2) chosen in such a way that x8 is a primitive root of p(x).
  An element u(x) in this eld can be seen as a byte vector, i.e. u = (U6t 1 ; : : : ; U0 ),
  where U0 indicates the lesser signi cant byte. This way, the cryptographic key
  K is directly mapped to K, starting at its most signi cant byte, K6t 1 .
      A pentanomial representation was chosen because primitive pentanomials are
  available for all values of t used by the Curupira. In fact, the use of trinomials
  would result in a better performance, but unfortunately they do not exist for
   elds of type GF(2n ) when n is multiple of 8 [28] and, thus, they cannot be used
  for any of the cipher key sizes.
      For reasons that will become clearer in section 4.3, the pentanomials chosen
  for Curupira-2 are:


                                           127
The CURUPIRA-2 Block Cipher for Constrained Platforms

   { p96 (x) = x96 + x16 + x13 + x11 + 1;
   { p144 (x) = x144 + x56 + x53 + x51 + 1;
   { p192 (x) = x192 + x43 + x41 + x40 + 1:


  4.2   Schedule constants

  The schedule constants are denoted q (s) , where the index (s) indicates the round
  in which they are applied. As in the Curupira-1, the constants are directly
  taken from the S-box and, thus, no extra storage is needed. This time, however,
  they are interpreted as elements of GF(248t ) in such a way that q (0) = 0 and, for
  s > 0, q (s) = (S[s 1]; 0 : : : ; 0) i.e. a single S-box output is mapped to the most
  signi cant byte of q (s) . As shown in the next section, this is a strategic position
  that makes each constant a ect exactly 3 bytes of the round key right after its
  addition.

  4.3   The key evolution s

  The sub-keys are updated during the cipher operation by means of two oper-
  ations: a reversible transform, @ : GF(248t ) ! GF(248t ); and an auto-inverse
  transform  : GF(248t ) ! GF(248t ). They are de ned in such a way that @(u) 
  u  x8 and, for the polynomials u = (U6t 1 ; : : : ; U0 ) and v = (V6t 1 ; : : : ; V0 ) in
  GF(248t ) :
                               
                                   Vi = U11 i  U12+i if 0 6 i < 6t     12;
                 v = (u) ,
                                   Vi = Ui            otherwise:
      Together with the schedule constants, these transforms compose the key evo-
  lution function  : GF(248 ) ! GF(248 ), de ned as  (u)    @(u  q( ) ), in
                     r
                                t               t
                                                               r
                                                                                     r


  such a way that K (0)  K and K ( +1) =  (K ( ) ).
                                          r
                                                    r
                                                        r


      The  transform is used to ensure a greater di usion to the schedule when
  keys greater than 96 bits are adopted, since it combines some of the least sig-
  ni cant bytes of the key with the most signi cant ones. Without this operation,
  two 144-bit keys di ering only at the byte K11 would generate sub-keys whose
  12 least signi cant bytes would be identical for the rst 6 rounds; also, for 192-
  bit keys, this result would hold true for the rst 12 sub-keys. As these least
  signi cant bytes are the ones actually selected in each round, the e ect of the
   transform is essential to assure a higher di usion speed for 144- and 192-bit
  keys.
      Also, the @ transform is particularly interesting due to its performance, spe-
  cially on resource-constrained platforms, as stated in the following theorem:
  Theorem 1. Let p(x) = xn + xk3 + xk2 + xk1 + 1 be a primitive pentanomial
  of degree n = bw over GF(2) such that k3 > k2 > k1 , k3 k1 6 w, and
  either k3 mod w = 0 or k1 mod w = 0. Then multiplication by x in GF(2 ) =
                                                                          w              n


  GF(2)[x]=p(x) can be implemented with no more than 5 XORs and 4 shifts on
  w-bit words. Moreover, if 2  2 bytes of storage are available, the cost drops to
                                      w


  no more than 2 XORs on w-bit words and 2 table lookups.


                                              128
The CURUPIRA-2 Block Cipher for Constrained Platforms
                          L
  Proof. For u = =01 (u x ) 2 GF(2 ), let U = u + 1 x 1 +: : :+u where
                              n
                              d           d
                                              d                         n
                                                                                        i           wi w
                                                                                                                    w
                                                                                                                                        wi

  i = 0; : : : ; b 1, so that u = U 1 x ( 1) + U 2 x ( 2) + : : : + U0 , which for
                                                              b
                                                                      w b
                                                                                            b
                                                                                                     w b


  brevity we write u = (U 1 ; : : : ; U0 ). Then one can compute u  x as:
                                          b
                                                                                                                            w


                  (U 1 x ( 1) + U 2 x ( 2) + : : : + U0 )  x =
                      b
                              w b
                                                      b
                                                                  w b                                       w


                  U 1 x + U 2 x ( 1) + : : : + U0 x =
                  b
                              n
                                      b
                                                  w b                                       w


                  U 2 x ( 1) + : : : + U0 x + U 1 (x 3 + x 2 + x 1 + 1) =
                  b
                              w b                                     w
                                                                                b
                                                                                            k           k           k


                  (U 2 ; : : : ; U0 ; U 1 )  U 1 (x 3 + x 2 + x 1 ):
                      b                           b                     b
                                                                                 k              k           k


      Assume that k3 = w(k + 1) for some k; the case k1 mod w = 0 is handled
  analogously. Thus:
       u  x = (U 2 ; : : : ; U0 ; U 1 )  U 1 (x + x 3 + 2 + x 3 + 1 )x
             w
                          b                               b                 b
                                                                                        w           w   k       k       w   k       k    wk


      Since deg(U 1 ) 6 w 1 and deg(x + x 3 + 2 + x 3 + 1 ) = w, their
                          b
                                                                                w           w       k   k           w   k   k


  product is a polynomial of degree not exceeding 2w 1, and hence it ts two
  w-bit words for any value of U 1 . Besides, multiplication of this value by x
                                                              b
                                                                                                                                              wk


  corresponds to simply displacing it k words to the left. We can de ne:
               T1 [U]  U  (U  (w k3 + k2 )))  (U  (w k3 + k1 )));
               T0 [U]  (U  (k3 k2 ))  (U  (k3 k1 ));
      Thus, we can write ux = (U 2 ; : : : ; U T1 [U 1 ]; U 1 T0 [U 1 ]; : : : ; U0 ; U
                                              w
                                                                  b                 k               b           k               b                  b   1   ).
  The values T1 and T0 can be either computed on demand or else pre-computed
  and stored in two 2 -entry tables. One easily sees by direct inspection that the
                                  w


  computational cost is that stated by the theorem.
      Applying this theorem for the polynomials adopted by the Curupira-2, we
  can evaluate the cost of the transforms @ and its inverse:
  p96 (x) = x96 + x16 + x13 + x11 + 1 :
    @ : (U11 ; : : : ; U0 )  x8 = (U10 ; : : : ; U1  T1 [U11 ]; U0  T0 [U11 ]; U11 );
    @ 1 : (U11 ; : : : ; U0 )  x 8 = (U0 ; U11 ; : : : ; U2  T1 [U0 ]; U1  T0 [U0 ]);
  p144 (x) = x144 + x56 + x53 + x51 + 1 :
    @ : (U17 ; : : : ; U0 )  x8 = (U16 ; : : : ; U6  T1 [U17 ]; U5  T0 [U17 ]; : : : ; U0 ; U17 );
    @ 1 : (U17 ; : : : ; U0 )  x 8 = (U0 ; U17 ; : : : ; U7  T1 [U0 ]; U6  T0 [U0 ]; : : : ; U1 );
   p192 (x) = x192 + x48 + x45 + x43 + 1 :
     @ : (U23 ; : : : ; U0 )  x8 = (U22 ; : : : ; U5  T1 [U23 ]; U4  T0 [U23 ]; : : : ; U0 ; U23 ):
     @ 1 : (U23 ; : : : ; U0 )  x 8 = (U0 ; U23 ; : : : ; U6  T1 [U0 ]; U5  T0 [U0 ]; : : : ; U1 ):
      For all key sizes, we have T0 = U  (U  5)  (U  3) and T1 = (U 
  3)  (U  5).
      These equations show that both @ and @ 1 transforms have the same cost
  and, thus, it is also valid for r and its inverse, r 1 (u)  (@ 1  (u))  q (r) .
  In contrast, when compared to the key-schedule of the Curupira-1, this second
  schedule algorithm has one disadvantage: there is no simple way to reinitialize the
  key after a reduced number of rounds. However, in many applications, its higher
  speed both during encryption and decryption can be a much more interesting
  feature, compensating its lack of cyclicity.


                                                                          129
The CURUPIRA-2 Block Cipher for Constrained Platforms

  4.4   The key selection 
                           r

                           M
  The round keys (r) 2 n e ectively used in each round are calculated by means
                                                      M
  of the key selection function r : GF(248t ) ! n , de ned in such a way that:
                                          ( ()
                                           i;jr = S[Ki(+3r)
                                                               ] if i = 0;
                       (r) = r (K) ,     (r )     (r )
                                                             j

                                           i;j = Ki+3j otherwise:
      This way, only the 12 least signi cant bytes are taken by r . Also, the S-box is
  applied to the bytes that will be combined with the rst row of the block, adding
  nonlinearity to the key-schedule, while the bytes for the other rows are taken
  directly. The whole process involved in this second version of the key-schedule
  algorithm is depicted in Figure 3.


          Fig. 3.   The new key-schedule speci cation (adopted by Curupira-2)


  4.5   Security analysis revisited

  Since, for both versions of our cipher, the round structure remains the same, most
  of the Curupira-1 security analysis [6, section 4] also applies to the Curupira-
  2: the adoption of the Wide Trail Strategy in combination with a highly non-
  linear S-Box thwarts the most well known modalities of attacks, such as linear,
  di erential and integral cryptanalysis. As a consequence, no attack faster than
  exhaustive search was found for more than 7 rounds of the cipher. These results
  were also con rmed by third party analysis [31]. The main di erence between the
  two analysis concerns the existence of weak keys and the viability of related-key
  attacks.
      Weak keys are keys that result in a block cipher mapping with detectable
  weaknesses, which normally occurs when the nonlinear operations depend on the
  actual key value. This is not the case for the Curupira, where keys are applied
  using XOR and all nonlinearity is xed in the S-box. Also, the nonlinear round


                                           130
The CURUPIRA-2 Block Cipher for Constrained Platforms

  constants considerably reduce the probability of xed points in the key-schedule
  process, making the existence of weak keys very unlikely.
      Related-key attacks exploit a known relationship between di erent unknown
  keys, leading to a predictable behavior for the sub-keys generated by the key
  schedule. Some of the most widespread techniques involve key di erentials and
  key rotations (cf. [10]) in order. Due to its slower di usion, it is clear that it
  is easier to nd relationships between subsequent sub-keys in Curupira-2 than
  in Curupira-1, making the former less resistant to related-key attacks. In fact,
  between any two rounds, the di erence in a single internal byte (i.e. in a position
  that will only be shifted as a result of the multiplication by x8 ) results in a dif-
  ference on a single byte of the next sub-key, which could be somehow exploited.
  In spite of this, some fundamental elements are introduced to the key-schedule
  proposed in this paper in order to prevent attacks. First, the nonlinearity in-
  troduced by the key selection thwarts related-key attacks involving di erentials.
  Second, the generation of sub-keys does not involve simple rotations, but rather
  a multiplication over GF(248t ) after the addition of nonlinear constants. Third,
  the truncation of the sub-keys make some advanced related-key attack variants
  such as that described in [13, section 4] improbable. Finally, the slow di usion
  in the key schedule is counterbalanced by the round function fast di usion, as-
  suring that each byte of the key a ects many block bytes after a few rounds.
  Together, these features make this kind of attacks unlikely to work against the
  full cipher.
      Furthermore, for key lengths that are larger than the length of one round
  key, the existence of sets of keys that produce identical values for at least one
  round key is inevitable. Thus, even if the  transform adds di usion power to the
  key-schedule and prevents the existence of trivial sets with this property, they
  should be more easy to nd than in the Curupira-1. Even so, it remains unclear
  how such keys could possibly be successfully used in a related-key attack.
      As a last remark, the Curupira structure involves only simple operations
  such as XORs, shifts and table lookups. As long as the running times for these
  transforms are not data-dependent on the target platform, the cipher implemen-
  tation can avoid many side-channel attacks (such as timing-attacks [20]) in a
  straightforward way.

  4.6   Implementation and Performance Issues

  The Curupira-2 algorithm is very exible in terms of implementation, o ering
  many memory/performance trade-o s. It cannot only use the same techniques
  developed for the Curupira-1 round functions [6, section 5]. These include the
  usage of a few tables with pre-computed results, but its key-schedule also allow
  some useful optimizations depending on resources available.
     The round keys can be either computed on-demand or fully pre-calculated
  and stored in a table for ready access. In the rst case, the cipher requires a
  reduced amount of RAM memory, since only one sub-key is stored at any given
  time. However, as the key-schedule of Curupira-2 is not cyclic, there is no easy
  way to compute the rst round key from the last one. An easy way to overcome


                                         131
The CURUPIRA-2 Block Cipher for Constrained Platforms

  this problem is to use two arrays ka and kb to store the rst and the last sub-
  keys, respectively: when one wants to encrypt, it suces to copy ka into kb and
  reuses ka memory space to create the encryption sub-keys; in the end, ka will
  have the last key while kb will store the rst one, assuring that both sub-keys
  are always available. The decryption is handled analogously. In this case, the
  last sub-key could be computed during the cipher initialization. We note that
  this strategy is only possible because the key schedule is easily invertible.
      For 6t-byte keys (2 6 t 6 4), the round sub-keys can be calculated in any
  direction at the cost of one circular permutation, 2+6(t-2) XORs and one com-
  putation of T0 and T1 . Also, T0 and T1 can be either implemented using two
  256-bit tables or calculated on-the- y, taking 1 XOR + 2 shifts and 2 XORs +
  2 shifts, respectively. In fact, the circular permutation does not need to be e ec-
  tively implemented: the same e ect can be achieved if the index corresponding
  to the most signi cant byte of the key is stored and used as the rst byte of
  the key for every calculation; this way, it suces to update this index after each
  invocation of the @ and @ 1 operations.
      Reviewing the Curupira preliminary calculations [6, section 5.1], the cost
  of its round function is 3R 1 XORs, 2(R 1)=3 xtimes operations and R
  S-box lookups per byte. When the key-schedule and key selection are taken
  into account, we add at most 1=3 S-Box lookups, 1=3 ctimes operations and 2
  XORs per key byte and per round in the Curupira-1, while this cost drops to
  at most 5=12 S-Box lookups, 5=8 XORs and 1=12 computations of T0 and T1
  per key byte and per round in the Curupira-2. In comparison, Skipjack takes
  basically 48 XORs and 16 F-table lookups per encrypted byte. Thus, supposing
  that the cost of any of these basic operations are approximately the same and
  not counting auxiliary operations not directly related to the ciphers structures
  (such as counter increments and key index updates), Curupira-1 with 96-bit
  keys and 10 rounds is about (45 + 27)=64  112:5% as Skipjack when the round
  keys computed on demand. On the other hand, Curupira-2 with the same key-
  size corresponds to (45 + 7:5)=64  82% of Skipjack computation in the same
  conditions. This result should be expected for similar implementations of both
  ciphers on byte-oriented platforms.
      Furthermore, more powerful processors (32-bit servers, for example) could
  implement the s transform in a more ecient way, operating over columns
  instead of bytes. Also, the multiplication by x8 can be easily implemented using
  a single table that calculates T0 and T1 at the same time, an approach similar
  to those adopted in some very optimized versions of AES [14].

  5    Benchmark

  In this section, we present the results of our comparison between Curupira,
  Skipjack and AES in terms of processing time and memory usage. As discussed
  in section 1, the motivation behind the choice of Skipjack resides in the results
  presented in [35] and in [21] which shows that, in spite of its low security level, the
  cipher is a very interesting choice to achieve a high performance in constrained


                                          132
The CURUPIRA-2 Block Cipher for Constrained Platforms

  platforms, surpassing many other hardware-oriented ciphers like MISTY1 and
  Kasumi [1]. AES, on the other hand, provides higher security but is recom-
  mended for less constrained platforms, since it is a less memory-ecient cipher.
  Considering these remarks, we decided to develop a deep analysis on the com-
  parison of Skipjack and Curupira in both constrained and powerful platforms,
  while AES is taken into account only on powerful ones. Three di erent platforms
  were chosen as testbeds:
   { Microcontroller (8 bits): a RISC microcontroller PIC18F8490 [29] equipped
      with a 8MHz processor, 768 bytes of RAM and a memory size of 16KB for
      code. The reason behind this choice resides in its capacity, slightly superior
      to the one presented by the ATmega8535 [2]. This last device, with a 4 MHz
      processor, 8 KB of ash memory and 512 bytes of RAM, is the one used in
      the Smart Dust Project [17] for sensor networks.
   { Microcontroller Simulator (8 bits): Avrora version 1.6.0 - Beta [36], simulat-
      ing a microcontroller from the ATmega128 [3] series. The goal of using this
      simulator is mainly to validate the results obtained with the PIC processor
      in a more powerful, yet tiny platform.
   { Pentium 4 (32 bits): a notebook equipped with Pentium 4 (3.2GHz) and 1GB
      of RAM. This platform was chosen to evaluate the proposed optimizations
      of the cipher when the resources in the target platform are abundant.

  5.1   Implementation Characteristics
  For the 8-bit versions of Curupira and Skipjack, the same C-written imple-
  mentations were analyzed in both the PIC18F84908 and the Avrora simulator.
  Furthermore, they adopt similar interfaces in each of these platforms, in order
  to assure a fair comparison. They also are more speed-oriented than memory-
  oriented, since the consumption of energy with processing is proportional to the
  number of operations performed by the algorithm, and this is normally consid-
  ered the most critical resource in constrained platforms, particularly in WSNs.
      For the implementation running on Avrora, as recommended by its documen-
  tation, we adopted avr-objdump and avr-gcc (both GNU utilities) as compilation
  tools, while MPLAB IDE v7.60 and MPLAB C18 compiler are used together
  with the PIC microcontroller. The speed-optimized versions of each cipher, re-
  sulting from the available compiler optimizations, are the ones considered in this
  document. It is important to notice that, even if both platforms include indirect
  addressing in their instruction sets, our tests showed that the compilers were
  not able to fully take advantage of these instructions, resulting in less than opti-
  mal machine codes when pointers and/or matrices were used. That is the reason
  why we decided to evaluate two di erent programming techniques: one that uses
  pointers and matrices and another that uses basic-type variables more inten-
  sively, avoiding indirect addressing. While the rst approach normally results
  in more exible code (where the size of the keys can be more easily changed,
  for example), the second allows more optimized implementations with xed-size
  keys (enabling loop unrolling with little loss of compactness)
      The implementations running on the 8-bit platforms are detailed below:


                                         133
The CURUPIRA-2 Block Cipher for Constrained Platforms

  CURUPIRA (8 bits) using the proposed optimizations for constrained plat-
     forms, we elected two versions of the Curupira for each scheduling algo-
     rithm:
      1. Curupirac -1: complete version (meaning that it accepts all key sizes) of
         the Curupira-1. It requires two 256-byte tables, one for the S-Box and
         another for the ctimes operation and uses many pointers and matrices
      2. Curupirac -2: complete version of the Curupira-2, using two 256-byte
         tables for the S-Box and xtimes operations. Such as Curupirac -1, it is
         also based on indirect addressing instructions.
      3. Curupirak96 -1: Curupira-1 restricted to 96-bit keys. It uses the same
         tables as the Curupirac -1, but relies on basic types instead of indirect
         addressing instructions.
      4. Curupirak96 -2: Curupira-2 restricted to 96-bit keys, using the same
         tables as the Curupirac -2 but relying on basic-type variables.
  Skipjack (8 bits) two versions were developed according to the speci cation:
      1. Skipjackc : relies on indirect instructions just like Curupirac -1 and Cu-
         rupirac -2, providing a useful source of comparison with these ciphers. It
         uses a single 256-byte F-table and calculates the round keys on demand.
      2. Skipjackk : adopts programming strategies similar to those present in the
         Curupirak96 , strongly relying on operations over basic-type variables
         instead of matrices and pointers. Such as the Skipjackc , it also uses a
         256-byte F-table and calculates the round keys on-the- y.
      For the 32-bits platform, we decided to take advantage of some highly opti-
  mized cipher implementations publicly available. The chosen algorithms, written
  in Java, were compiled and run on Netbeans IDE 5.5, using the JDK 1.6. The
  details of the implementations are given below:
  AES (32 bits) we adopted the implementation of Barreto [5], which pre-computes
     the round keys and employs ten 256-word tables to greatly accelerate the
     cipher operation.
  CURUPIRA (32 bits) the optimizations in the algorithm are similar to those
     present in AES, specially regarding the pre-computation of the round keys
     and the intensive use of tables, in the same number as AES.
  Skipjack (32 bits) the cipher tested is a Java adaptation of Barreto's algo-
     rithm [4], originally developed in C language. It operates over 16-bit words
     and stores some important key-dependent pre-computed values in a 10x256-
     word matrix; this last operation can be seen as a kind of \key-schedule",
     since it must be performed each time the cipher key is changed.

  5.2   Results: 8-bits platforms

  The ciphers memory usage, for both 8-bit platforms, is presented in Table 1. This
  table shows that all tested versions of the Curupira take more space in memory
  than Skipjack, an expected result considering the higher complexity of its round
  function and key-schedule algorithms. Despite this di erence, the tested ciphers


                                       134
The CURUPIRA-2 Block Cipher for Constrained Platforms

  Table 1.    Memory Occupation (in bytes) of the tested ciphers on the 8 bits platforms
  Algorithm                             ROM Code-PIC18F8490 Code-Avrora Simulator
  Curupirac -1                           822      1444              1648
  Curupirac -2                           512      1238              1718
  Curupirak96 -1                         768      1372              1936
  Curupirak96 -2                         512      1532              1846
  Skipjackc                              256      1012               940
  Skipjackk                              256       972              1352


  are both compact enough to be easily deployed in most constrained platforms,
  taking less than 3KB as a whole.
      Both Curupira and Skipjack do not need to pre-compute the round keys
  and, thus, they require a reduced amount of RAM. We were not able, however, to
  directly measure the RAM usage with the tools available for the tested platforms.
  Nevertheless, due to its greater block and key size, we speculate that Curupira
  takes a higher amount of RAM than Skipjack. For example, when using two
  arrays to store the rst and the last keys, Curupira-2 would take about twice
  ((96+2  96)=(64+80)) the amount of RAM needed by Skipjack. These numbers
  remain very tiny when compared to pre-computed keys storage, though.
      For the PIC microcontroller, Figure 4 shows the number of CPU cycles, per
  byte encrypted, of both tested ciphers. The time measured corresponds to a
  single encryption of random blocks using di erent keys. Even if the Curupirac
  implementations allow three di erent key sizes, only the 96-bit keys are depicted
  in this graph. Also, we explicitly distinguish the processing time required for the
  key scheduling and the encryption itself (as Skipjack reuses the original key
  cyclically, we considered it part of the encryption instead of a \key-schedule").


     Fig. 4.   Comparison between the cipher encryption speeds on the PIC18F8490


                                           135
The CURUPIRA-2 Block Cipher for Constrained Platforms

       The gure shows that Curupirac -1 and Curupirac -2 are respectively 
  20% and  45% faster than Skipjackc , with the round keys computed on de-
  mand. Despite this very positive result, it should be carefully considered since
  the measured number of cycles includes not only the operations directly involved
  in the encryption process but also a non-negligible number of auxiliary ones. The
  in uence of these secondary operations is less expressive on both Skipjackk and
  Curupirak96 and, as depicted in the right side of Figure 4, Curupirak96 -2 is
  still  18% faster than Skipjack, while Curupirak96 -1 is  12% slower. One can
  see that the cost of both Curupirak96 versions in this gure are approximately
  the ones theoretically calculated in Section 4.6.


   Fig. 5.   Comparison between the cipher encryption speeds on the Avrora Simulator


      The results on the Avrora Simulator were slightly di erent from those in the
  PIC18F8490, as depicted in Figure 5. Skipjackk speed was considerably improved
  by this platform change, running about 30% and 4% faster than Curupirak96 -1
  and Curupirak96 -2, respectively. A further analysis of the assembly codes show
  that these results were caused by the in uence of the compiler, which were able
  to apply di erent optimizations to each algorithm. In contrast, this unexpected
  behavior was not observed with Curupirac and Skipjackc implementations,
  which sustained the relative performances presented on the PIC18F8490.

  5.3   Results: 32-bits platforms

  The encryption speed of each cipher on the 32-bits platform, with di erent key
  sizes (and, thus, number of rounds), is depicted in Figure 6. It is important to
  point out that, as all round keys are computed at cipher initialization, there is
  no di erence between the encryption speeds of Curupira-1 and Curupira-2.
      We obtained similar results for AES and Curupira with the same number
  of rounds (note the additional graph entry where, for the sake of comparison,
  both ciphers were tested with the same number of rounds for each key size). This
  is an expected result since both ciphers adopt well-known optimizations for the
  Wide Trail Strategy family. On the other hand, the modest result of Skipjack


                                          136
The CURUPIRA-2 Block Cipher for Constrained Platforms


             Fig. 6.   Encryption Performance x Number of Rounds - 32 bits


  (about 3 times slower than the other ciphers) may seem surprising at rst sight,
  since its performance usually is the main factor for its adoption on constrained
  platforms. However, this can be explained by its 16-bit oriented operations, very
  attractive on constrained processors but less adapted to fully take advantage
  of the higher number of bits available on powerful platforms. Both AES and
  Curupira, though, can easily be implemented to operate over 32- and 24-bit
  words (columns), respectively.
      The processing time involved on the ciphers key-schedules was also measured.
  As depicted in Figure 7, while Curupira and AES presented similar speeds,
  Skipjack was about 10 times slower. As this operation has to be performed
  a single time (at initialization), the impact on the cipher overall operation is
  reduced on scenarios where the keys are not frequently changed.

  6   Conclusions

  We have described a new and faster key-schedule proposal for the Curupira
  block cipher. As a drawback, when compared to the original speci cation, it has
  a lower level of security against related-key attacks. However, according to our
  security analysis, both versions of the full cipher are secure against cryptanalysis.
      We also presented a benchmark comparing Curupira, Skipjack and AES in
  terms of performance and memory occupation, both on constrained and pow-
  erful platforms. According to the results obtained, the proposed block cipher is
  fast and compact, especially when using the new key-schedule presented in this
  paper. While Skipjack is considered a good candidate for constrained scenarios,
  such as LBAs dependent on sensors and low-power mobile devices, Curupira is


                                          137
The CURUPIRA-2 Block Cipher for Constrained Platforms


           Fig. 7.   Key Schedule Performance x Number of Rounds - 32 bits

  a suitable alternative to increase the security level and potentially improve per-
  formance, introducing a reduced impact in terms of memory usage. Also, when
  more powerful platforms are also available, the several optimizations allowed by
  the Curupira can be deployed to obtain an even higher performance in the
  whole network.
      All together, these results show that the Curupira block cipher is a use-
  ful solution for providing data encryption at low cost, being recommended for
  constrained-resource devices and for applications based on such platforms, such
  as WSNs and LBAs.

  6.1   Future and Ongoing work
  We are currently working on the deployment of Curupira on a real WSN in
  order to evaluate its impact (especially in terms of energy consumption) in some
  signi cant scenarios. Also, we are developing a new MAC algorithm named Mar-
  vin, designed to provide a low-cost authenticated-encryption scheme on WSNs,
  particularly when used in conjunction with Curupira.

  6.2   Acknowledgments
  We would like to thank Richard Gold for his useful comments and the review of
  this paper.

  References

   1. 3GPP. Speci cation of the 3gpp con dentiality and integrity algorithms document
      2: Kasumi speci cation. Technical report, 3GPP, 1999.


                                         138
The CURUPIRA-2 Block Cipher for Constrained Platforms

   2. Atmel. AVR 8-Bit RISC processor - ATmega8535 (90LS8535), 2006.
   3. Atmel. AVR 8-Bit RISC processor - ATmega128 e ATmega128L, 2007.
   4. P. Barreto.        The Skipjack block cipher { 32 bit implementation.
      http://planeta.terra.com.br/informatica/paulobarreto/skipjack32.zip, 1998.
   5. P. Barreto. The AES block cipher (rijndael) { 32 bit implementation.
      http://planeta.terra.com.br/informatica/paulobarreto/JEAX.zip, 2003.
   6. P. Barreto and M. Simplicio. Curupira, a block cipher for constrained platforms.
      In Anais do 25o Simpsio Brasileiro de Redes de Computadores e Sistemas Distribu-
      dos - SBRC 2007, volume 1, pages 61{74. SBC, 2007.
   7. P. S. L. M. Barreto and V. Rijmen. The Anubis block cipher. In First open
      NESSIE Workshop, Leuven, Belgium, November 2000. NESSIE Consortium.
   8. P. S. L. M. Barreto and V. Rijmen. The Khazad legacy-level block cipher. In First
      open NESSIE Workshop, Leuven, Belgium, November 2000. NESSIE Consortium.
   9. E. Biham, A. Biryukov, and A. Shamir. Cryptanalysis of skipjack reduced to 31
      rounds using impossible di erentials. In Advances in Cryptology { Eurocrypt'99,
      volume 1592 of Lecture Notes in Computer Science, pages 55{64. Springer, 1999.
  10. M. Ciet, G. Piret, and J. Quisquater. Related-key and slide attacks: Analysis,
      connections, and improvements - extended abstract. In 23rd Symposium on Infor-
      mation Theory in the Benelux, Louvain-la-Neuve, Belgium, pages 315{325, 2002.
  11. J. Daemen. Cipher and hash function design strategies based on linear and di er-
      ential cryptanalysis. Doctoral dissertation, Katholiek Universiteit Leuven, March
      1995.
  12. J. Daemen and V. Rijmen. The block cipher BKSQ. In Smart Card Research and
      Applications { CARDIS'98, volume 1820 of Lecture Notes in Computer Science,
      pages 236{245. Springer, 1998.
  13. N. Ferguson, J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner, and D. Whiting.
      Improved cryptanalysis of Rijndael. In Fast Software Encryption { FSE'2000,
      volume 1978 of Lecture Notes in Computer Science, pages 213{230. Springer, 2000.
  14. B. R. Gladman.              AES second round implementation experience.
      http://fp.gladman.plus.com/cryptography technology/aesr2/index.htm, 2000.
  15. G. Guimaraes, E. Souto, D. Sadok, and J. Kelner. Evaluation of security mecha-
      nisms in wireless sensor networks. In ICW '05: Proceedings of the 2005 Systems
      Communications, pages 428{433. IEEE Computer Society, 2005.
  16. M. Healy, T. Newe, and E. Lewis. Eciently securing data on a wireless sensor
      network. Journal of Physics, 76, 2007.
  17. J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K. Pister. System architec-
      ture directions for networked sensors. In Architectural Support for Programming
      Languages and Operating Systems, pages 93{104, 2000.
  18. J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K. Pister. System architec-
      ture directions for networked sensors. In Architectural Support for Programming
      Languages and Operating Systems, pages 93{104, 2000.
  19. C. Karlof, N. Sastry, and D. Wagner. Tinysec: a link layer security architecture for
      wireless sensor networks. In 2nd International Conference on Embedded Networked
      Sensor Systems { SenSys'2004, pages 162{175, Baltimore, USA, 2004. ACM.
  20. P. Kocher, J. Ja e, and B. Jun. Introduction to di erential power analysis and
      related attacks. Technical report, Cryptography Research Inc., 1998.
  21. Y. W. Law, J. Doumen, and P. Hartel. Survey and benchmark of block ciphers
      for wireless sensor networks. ACM Transactions on Sensor Networks (TOSN),
      2(1):65{93, 2006.


                                           139
The CURUPIRA-2 Block Cipher for Constrained Platforms

  22. P. Levis and D. Culler. Mate: a tiny virtual machine for sensor networks. In
      ASPLOS-X: Proceedings of the 10th international conference on Architectural sup-
      port for programming languages and operating systems, pages 85{95. ACM, 2002.
  23. T. Li, H. Wu, X. Wang, and F. Bao. SenSec design. Technical report, InfoComm
      Security Department, February 2005.
  24. D. Liu, P. Ning, and R. Li. Establishing pairwise keys in distributed sensor net-
      works. In CCS'03: Proceedings of the 10th ACM conference on Computer and
      communications security, pages 52{61. ACM, 2003.
  25. M. Luk, G. Mezzour, A. Perrig, and V. Gligor. Minisec: A secure sensor network
      communication architecture. In IPSN'07: Proceedings of the 6th international con-
      ference on Information processing in sensor networks, pages 479{488, New York,
      NY, USA, 2007. ACM.
  26. F. J. MacWilliams and N. J. A. Sloane. The theory of error-correcting codes,
      volume 16. North-Holland Mathematical Library, 1977.
  27. M. Matsui. New block encryption algorithm MISTY. In Fast Software Encryption {
      FSE'97, volume 1267 of Lecture Notes in Computer Science, pages 54{68. Springer,
      1997.
  28. A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone. Handbook of Applied
      Cryptography. CRC Press, Boca Raton, USA, 1999.
  29. Microchip. PIC18F8490 Datasheet, 2006.
  30. R. Muller, G. Alonso, and D. Kossmann. SwissQM: Next generation data process-
      ing in sensor networks. In CIDR, pages 1{9, 2007.
  31. J. Nakahara. Analysis of Curupira block cipher. In Anais do 8o Simpsio Brasileiro
      em Segurana da Informao e Sistemas Computacionais, 2008.
  32. NIST. Federal Information Processing Standard (FIPS 197) { Advanced Encryption
      Standard (AES). National Institute of Standards and Technology, November 2001.
      http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf.
  33. NSA. Skipjack and KEA Algorithm Speci cations, version 2.0. National Security
      Agency, May 1998.
  34. R. L. Rivest. The RC5 encryption algorithm. In Fast Software Encryption {
      FSE'94, volume 1008 of Lecture Notes in Computer Science, pages 86{96. Springer,
      1995.
  35. R. Roman, C. Alcaraz, and J. Lopez. A survey of cryptographic primitives and
      implementations for hardware-constrained sensor network nodes. Mobile Networks
      and Applications, 12(4):231{244, 2007.
  36. B. Titzer, D. Lee, and J. Palsberg. Avrora scalable simulation of sensor networks
      with precise timing. Center for Embedded Network Sensing Posters - Paper 93,
      2004.

  The name

  According to a Brazilian legend, the Curupira is a spirit of nature and protector
  of the forests. It assumes the form of a boy with red hair, whose feet are turned
  backwards. This way, when hunters think they are on the right trail to get it,
  they in fact are going to the wrong direction, getting confused and lost. This
  should also work against cryptanalysts :-).


                                         140
Author Index

Ardagna, Claudio Agostino . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Barreto, Paulo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Bettini, Claudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24, 61

Carvalho, Tereza . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Domingo-Ferrer, Josep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6, 12

Freni, Dario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Jajodia, Sushil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61, 82

Margi, Cintia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Martı́nez-Ballesté, Antoni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Martin, Rhandi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Mascetti, Sergio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Monreale, Anna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Näslund, Mats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Pareschi, Linda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Pedreschi, Dino . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Pensa, Ruggero G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Pinelli, Fabio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Riboni, Daniele . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Sadeghi, Ahmad-Reza . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Samarati, Pierangela . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Simplı́cio, Marcos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Solanas, Agusti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Stavrou, Angelos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Visconti, Ivan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Wachsmann, Christian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Wang, Sean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Wu, Qianhong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6