=Paper= {{Paper |id=Vol-2311/paper_15 |storemode=property |title=How to Balance Privacy and Money through Pricing Mechanism in Personal Data Market |pdfUrl=https://ceur-ws.org/Vol-2311/paper_15.pdf |volume=Vol-2311 |authors=Rachana Nget,Yang Cao,Masatoshi Yoshikawa |dblpUrl=https://dblp.org/rec/conf/sigir/NgetCY17 }} ==How to Balance Privacy and Money through Pricing Mechanism in Personal Data Market== https://ceur-ws.org/Vol-2311/paper_15.pdf
How to Balance Privacy and Money through Pricing Mechanism
                   in Personal Data Market
                   Rachana Nget                                                 Yang Cao                           Masatoshi Yoshikawa
              Kyoto University                                            Emory University                             Kyoto University
                Kyoto, Japan                                            Atlanta, Georgia, USA                            Kyoto, Japan
     rachana.nget@db.soc.i.kyoto-u.ac.jp                                 ycao31@emory.edu                         yoshikawa@i.kyoto-u.ac.jp

ABSTRACT                                                                              are extraordinarily valuable for the public and private sector to
In the big data era, personal data is, recently, perceived as a new                   improve their products or services. However, personal data reflect
oil or currency in the digital world. Both public and private sectors                 the unique value and identity of each individual; therefore, the
wish to use such data for studies and businesses. However, access to                  access to personal data is highly restricted. For this reason, some
such data is restricted due to privacy issues. Seeing the commercial                  large Internet companies and social network services provide free
opportunities in gaps between demand and supply, the notion of                        services in exchange for their users’ personal data. Demand for per-
personal data market is introduced. While there are several chal-                     sonal data for research and business purposes excessively increases
lenges associated with rendering such a market operational, we                        while there is practically no safe and efficient supply of personal
focus on two main technical challenges: (1) How should personal                       data. Seeing the commercial opportunities rooted in gaps between
data be fairly traded under a similar e-commerce platform? (2) How                    demand and supply, the notion of personal data market is intro-
much should personal data be worth in trade?                                          duced. This notion has transformed perceptions of personal data
   In this paper, we propose a practical personal data trading frame-                 as an undisclosed type to a commodity, as noted in [4] and [11]. To
work that strikes a balance between money and privacy. To acquire                     perceive personal data as a commodity, many scholars, such as [6],
insight on user preferences, we first conduct an online survey on                     [12], [13], and [14], have asserted that a monetary compensation
human attitude toward privacy and interest in personal data trad-                     should be given to real data producers/owners for their privacy
ing. Second, we identify five key principles of the personal data                     loss whenever their data are accessed. Thus, personal data could be
trading central to designing a reasonable trading framework and                       traded under the form of e-commerce where buying, selling, and
pricing mechanism. Third, we propose a reasonable trading frame-                      financial transaction are done online. However, this type of com-
work for personal data, which provides an overview of how data                        modity might be associated with private attributes, so it should not
are traded. Fourth, we propose a balanced pricing mechanism that                      be classified as one of the three conventional types of e-commerce
computes the query price and perturbed results for data buyers and                    goods (i.e., physical goods, digital goods, and services, as noted
compensation for data owners (whose data are used) as a function                      in [9]). This privacy attribute introduces a number of challenges
of their privacy loss. Finally, we conduct an experiment on our bal-                  and requires different trading approach for this commodity called
anced pricing mechanism, and the result shows that our balanced                       personal data. How much money should data buyers pay, and how
pricing mechanism performs significantly better than the baseline                     much money should data owners require for their privacy loss from
mechanism.                                                                            information derived from their personal data? One possible way
                                                                                      is to assign the price in corresponding to the amount of privacy
CCS CONCEPTS                                                                          loss, but how to quantify privacy loss and how much money to be
                                                                                      compensated for a metric of privacy loss are the radical challenges
• Security and privacy → Economics of security and privacy;
                                                                                      in this market.
Usability in security and privacy;

KEYWORDS                                                                              1.1    Personal Data Market
Query pricing; Personalized Differential Privacy; Personal data mar-                  The personal data market is a sound platform for securing the
ket                                                                                   personal data trading. What is traded as defined in [12] is a noisy
                                                                                      version of statistical data. It is an aggregated query answer, derived
1 INTRODUCTION                                                                        from users’ personal data, with some random noise included to
Personal data is, recently, perceived as a new oil or currency in                     guarantee the privacy of data owners. The injection of random noise
the digital world. A massive volume of personal data is constantly                    is referred to as perturbation. The magnitude of perturbation directly
produced and collected every second (i.e., via smart devices, search                  impacts the query price and amount of data owners’ privacy loss. A
engines, sensors, social network services, etc.). These personal data                 higher query price typically yields a lower degree of perturbation
                                                                                      (less noise injection).
Copyright © 2017 by the paper’s authors. Copying permitted for private and academic       In observing the published results of true statistical data, an
purposes.                                                                             adversary with some background knowledge (i.e., sex, birth date,
In: J. Degenhardt, S. Kallumadi, M. de Rijke, L. Si, A. Trotman, Y. Xu (eds.):
Proceedings of the SIGIR 2017 eCom workshop, August 2017, Tokyo, Japan, published     zip code, etc.) on an individual in the dataset can perform linkage
at http://ceur-ws.org                                                                 attacks to identify whether that person is included in the results.
                                                                                      For instance, published anonymized medical encounter data were
                                                                                      once matched with voter registration records (i.e., birth date, sex,
SIGIR 2017 eCom, August 2017, Tokyo, JAPAN                                                    Rachana Nget, Yang Cao, Masatoshi Yoshikawa


zip code, etc.) to identify the medical records of the governor of Mas-   and buyers must negotiate the prices on their own, which may not
sachussetts, as explained in [3]. Therefore, statistical results should   be efficient because not all data owners know or truthfully report
be subjected to perturbation prior to publication to guarantee an         the price of their data. This can result in an obstruction of trading
absence of data linkages.                                                 operations. Based on lessons learned from such start-ups, we can
   As is shown in Figure 1, three main participants are involved:         conclude what they are missing is a well-designed trading frame-
data owners, data seekers/buyers, and market maker. Data owners           work, that explains the principles of trading, and pricing mechanism,
contribute their personal data and receive appropriate monetary           that balances the money and privacy traded in the market.
compensation. Data buyers pay a certain amount of money to obtain            To make this market operational, there are many challenges
their desirable noisy statistical data. Market maker is a trusted         from all disciplines, but we narrow down fundamental technical
mediator between the two key players, as no direct trading occurs         challenges to two factors:
between two parties. A market maker is entrusted to compute a               • Trading framework for personal data: How should per-
query answer, calculate query price for buyers and compensation               sonal data be fairly traded? In other words, how should a rea-
for owners, and most importantly design a variety of payment                  sonable trading framework be designed to respectively prevent
schemes for owners to choose from.                                            circumvention from buyers on arbitrage pricing and from data
                                                                              owners on untruthful privacy valuation?
                                                                              • Balanced pricing mechanism: How much should personal
                                                                                data be worth? How should a price that balances data owners’
                                                                                privacy loss and buyers’ payment be computed? This balance is
                                                                                crucial in convincing data owners and data buyers to participate
                                                                                in the personal data market.

         Figure 1: How much is personal data worth?                       1.2      Contribution
   The personal data market could be considered as the integration        To address the above challenges more precisely, we first conducted
of Consumer-to-Business (C2B) and Business-to-Consumer (B2C)              a survey on human attitudes toward privacy and interest in per-
or Business-to-Business (B2B) e-commerce. On one side of the              sonal data trading (Section 2). Second, from our survey analysis and
trading, the data owners as individuals provide their personal data       from previous studies, we identify five key principles of personal
to the market as is done in (C2B) e-commerce, though, at this point,      data trading (Section 3.1). Third, we propose a reasonable trading
no trading is done. On another end of the framework, the market           framework (Section 3.2) that provides an overview of how data
maker sells statistical information to data buyers as an individual       are traded and of transactions made before, during, and after trade
or company which is similar to (B2C) and (B2B) trading. This is           occurs. Fourth, we propose a balanced pricing mechanism (Section
when the trading transactions are completed in this framework. The        4) that computes the price of a noisy aggregated query answer
study of such a market framework could initiate a new perception          and that calculates the amount of compensation given to each data
on the new forms of e-commerce.                                           owner (whose data are used) based on his or her actual privacy loss.
   The existence of personal data market will make abundance of           The main goal is to balance the benefits and expenses of both data
personal data including sensitive but useful data safely available        owners and buyers. This issue has not been addressed in previous
for various uses, giving rise to many sophisticated developments          researches. For instance, a theoretical pricing mechanism [12] has
and innovations. For this reason, several start-up companies have         been designed in favor of data buyers only. Their mechanism em-
developed online personal data trading sites and mobile applica-          powers buyer to determine the privacy loss of data owners while
tions following this market orientation. These sites are Personal 1 ,     assuming that data owners can accept an infinite privacy loss. In-
and Datacoup2 , which aim at creating personal data vaults. They          stead, our mechanism will empower both data owners and buyers
buy the raw personal data from each data owner and compensate             to fully control their own benefits and expenses. Finally, we conduct
them accordingly. However, some data owners are not convinced             an experiment on a survey dataset to simulate the results of our
to sell their raw data (without perturbation). For Datacoup, pay-         mechanism and prove the efficiency of our mechanism relative to a
ment is fixed at approximately $8 for SNS and financial data (i.e.,       baseline pricing mechanism (Section 5).
credit/debit card transactions). It is questionable whether $8 is rea-
sonable compensation, and how this price was decided. Another             2      SURVEY RESULT
source of inefficiency is related to the absence of data buyers. This     To develop deeper insight into personal data trading and to collect
can create problems if buyers are not interested in such types of         data for our experiment, we conducted an online survey delivered
collected data. In addition, CitizenMe and digi.me recently launched      through a crowdsourcing platform. In total, 486 respondents from
personal data collection mobile applications that help data owners        46 different states throughout the USA took part in the survey.
collect and store all of their personal data in their devices. Although   The respondents were aged 14 to older than 54 and had varying
the framework connects buyers to data owners, it might be ineffi-         education backgrounds, occupations, and incomes. For our survey,
cient and impractical for buyers to buy individual raw data one at        respondents were required to answer 11 questions. Due to space
a time. Moreover, as no pricing mechanism is offered, data owners         limitations, We only discuss the more significant questions posed.
1 www.personal.com                                                           Analysis 1: For four types of personal data: Type 1 (commute
2 www.datacoup.com                                                        type to school/work), Type 2 (yearly income), Type 3 (yearly expense
How to Balance Privacy and Money through Pricing Mechanism                                                    SIGIR 2017 eCom, August 2017, Tokyo, JAPAN


on medical care), Type 4 (bank service you’re using), the following
results were obtained.




                                                                                           (a) Alteration levels on data.            (b) Payment schemes.
                                                                                                Figure 4: Preferences in privacy and money.
      (a) Can sell Vs. Cannot sell.                  (b) How much to sell.              Privacy protection levels and desired payment schemes varied in
             Figure 2: Types of data to sell/not to sell.                            between the data considered and among the respondents. In prac-
                                                                                     tice, people harbor different attitudes toward privacy and money.
    More than 50% of the respondents said they cannot sell the data                  Thus, it is crucial to allow a personalized privacy level and payment
(see Figure 2a), and more than 50% of those who can sell said that                   scheme for each individual.
they do not know how much to sell (see Figure 2b).                                      Analysis 4: Among the four given criteria to decide when selling
    Most of the participants stated that they do not know how much                   personal data: usage (who and how buyers will use your data),
their data are worth, highlighting one of the above mentioned chal-                  sensitivity (sensitivity of data, i.e., salary, disease, etc.), risks (future
lenges related to the personal data market. Similarly, [1] noted that                risks/impacts), and money (to obtain as much money as possible),
it is very difficult for data owners to articulate the exact valuation                  In descending order, the participants valued the following: who
of their data.                                                                       and how the data will be used, sensitivity, future risks/impacts, and
    Analysis 2: When asked to sell their anonymized personal data,                   money (see Figure 5).
49% of respondents said It depends on type of personal data and
amount of money, 35% were Not interested, and 16% were Interested
(see Figure 3a). However, if providing more privacy protection by
both anonymizing and altering (perturbing) real data, more than 50%
of the respondents became interested in selling, meaning that more
people are now convinced to sell their data under such conditions.
(see Figure 3b).

                                                                                     Figure 5: Importance of criteria when selling personal data.
                                                                                        Money is considered the least important criterion, while who and
                                                                                     how data will be used is considered the most important one when
                                                                                     deciding to sell personal data. This implies that money cannot buy
                                                                                     everything when the seller does not want to sell.

                                                                                     3     TRADING FRAMEWORK
(a) Interest in selling anonymized data.   (b) Interest in selling both anonymized
                                           and altered data.                         All notations used in this study are summarized in Table 1.
            Figure 3: Interest in selling personal data.
                                                                                     3.1     Key Principles of the Trading Framework
   Anonymization does not convince people to sell their personal                     To design a reasonable trading framework and a balanced pricing
data. Providing extra privacy protection via data alteration or per-                 mechanism, it is important to determine the chief principles of the
turbation on the anonymized data might make them feel more                           personal data trading framework. These key principles are derived
convinced and safer to sell their data.                                              from previous studies and from the four key analyses of our survey.
   Analysis 3: With regard to alteration/perturbation, the respon-                   The principles are categorized into five different groups: personal-
dents were asked to select their preferred privacy level: {very low,                 ized differential privacy as a privacy protection, applicable query
low, high, very high}, in other words, how much they want to al-                     type, arbitrage-free pricing model, truthful privacy valuation, and
ter/perturb their real data. A very low level of alteration (low noise               unbiased result. To guarantee the data owner’s privacy, personal-
injection) denotes a low privacy protection, but more monetary                       ized differential privacy injects some randomness into the result
compensation. As a result (see Figure 4a), alteration levels were                    based on the preferred privacy level. It is also used as a metric to
found to vary across the four types of data. Similarly, the preferred                quantify the privacy loss of each data owner. With this personalized
payment schemes (see Figure 4b) varied throughout all the data                       differential privacy guarantee, only some certain linear aggregated
types. A human-centric study [18] also showed that people value                      query types are applicable in this trading framework. Regarding
different categories of data differently according to their behaviors                pricing, a pricing model should be arbitrage-free and must not al-
and intentional levels of self-disclosure; as a result, location data                low any circumventions on the query price from any savvy buyers.
are valued more highly than communication, app, and media data.                      Similarly, such a framework should be designed to encourage data
SIGIR 2017 eCom, August 2017, Tokyo, JAPAN                                                   Rachana Nget, Yang Cao, Masatoshi Yoshikawa
                Table 1: Summary of notations.
   Notation     Description                                                [8], which is derived from the above differential privacy. Each user
    ui , b j    Data owner i, Data buyer j                                 can personalize his or her maximum tolerable privacy level/loss
      xi        Data element of ui                                         εˆi , so any private mechanisms that satisfy εˆi -differential privacy
       εˆi      Maximum tolerable privacy loss of ui                       must guarantee each user’s privacy up to their εˆi . Users may set
      wi        Payment scheme of ui                                       εˆi according to their privacy attitude with the assumption that εˆi
       εi       Actual privacy loss of ui in query computation             is public and is not correlated with the sensitivity of data. This
    w i (εi )   Compensation to ui for losing εi                           theory thus allows users’ privacy personalization while offering
       x        Dataset consisting of a data element of all ui             more utility to data buyers.
       Q        Linear aggregated query requested by the buyer
    Wmax        Maximum budget of the buyer                                    Definition 3.2 (Personalized Differential Privacy [8]). Regarding
   Wp ,Wr       Query price, Remaining budget of the buyer                 the maximum tolerable privacy loss εˆ of each user and a universe
    Q(x)        True query answer                                          of users U , a randomized mechanism M : D → R satisfies ε-           ˆ
   P(Q(x))      Perturbed query answer (with noise)                        Personalized Differential Privacy (or ε-PDP),
                                                                                                                    ˆ        if for every pair of
    RMSE        Root mean squared error                                    neighboring datasets x, y ∈ D where x and y differs in data for user
        χ       Market maker’s profit                                      i, and for any set of S ⊆ Ranдe(M),
     Wab        Available budget for query computation
                                                                                          Pr(M(x) ∈ S) ≤ exp(ε)
                                                                                                             ˆ ∗ Pr(M(y) ∈ S)                 (2)
      RS        A representative sample of dataset x
       h        Number of representative samples RS                           Both DP and PDP are theories, so a private mechanism is em-
       Φ        Number of perturbation run times                           ployed to realize these theories. [8] introduced two PDP private
                                                                           mechanisms: sampling and exponential-like mechanisms. Given a
owners’ truthful privacy valuation by providing them the right             privacy threshold, the sampling mechanism samples a subset drawn
pricing scheme so that they will not benefit from any untruthful           from the dataset and then runs one of the private mechanisms (i.e.,
valuation. Finally, it is important to ensure the generation of unbi-      Laplace mechanism, etc.). The exponential-like mechanism, given
ased/less biased query result without increasing query price, so a         a set of ε,
                                                                                    ˆ computes a score (probability) for each potential element
careful sample selection method is crucial.                                in the output domain. This score is inversely related to the number
                                                                           of changes made in a dataset x required for a potential value to
A. Personalized Differential Privacy as a Privacy                          become the true answer.
Protection                                                                    Definition 3.3 (Score Function [8]). Given a function f : D → R
The pricing mechanism should be capable of preserving data owner’s         and outputs r ∈ Ranдe(f ) with a probability proportional to that
privacy from any undesirable privacy leakages. To ensure privacy,          of the exponential mechanism differential privacy [3], s(D, r ) is
differential privacy [3] plays an essential role in guaranteeing that      a real-valued score function. The higher the score, the better r is
the adversary could learn nothing about an individual while learn-         relative to f (D). Assuming that D and D ′ differ only in the value
ing useful information about the whole population/dataset from             of a tuple, denoted as D ⊕ D ′ ,
observing the query result (despite some background knowledge
                                                                                               s(D, r ) = max − |D ⊕ D ′ |                    (3)
about that individual). Given a privacy parameter ε, any private                                             f (D ′ )=r
mechanisms (i.e., Laplace mechanism, Exponential mechanism, etc.)
satisfy the ε-differential privacy level if the same result is likely to         In PDP, each record or data owner has their own privacy setting
occur regardless of the presence or absence of any individual in           εˆi , so it is important to distinguish between different D ′ that make
the dataset as a result of random noise addition. A smaller ε offers       a specific value to become the output. To formalize this mechanism,
better privacy protection but is less accurate, resulting in a tradeoff    [8] defined it as follows.
between privacy and result accuracy. In our framework, we define              Definition 3.4 (P E Mechanism [8]). Given a function f : D → R,
ε as the quantification of privacy loss of data owner as ε and money       an arbitrary input dataset D ⊂ D , and a privacy specification ϕ, the
are correlated.                                                                             f
                                                                           mechanism P E ϕ (D) outputs r ∈ R with probability
   Definition 3.1 (ε-Differential Privacy [3]). A random algorithm
M : D → R satisfies ε-Differential Privacy (ε-DP) if the neighboring                                                        1
dataset x, y ∈ D where D is a whole dataset and dataset x and y                                               exp( d f (D, r, ϕ))
                                                                                              f
                                                                                      Pr [P E ϕ (D) = r ] =        2                          (4)
differs by only one record, and any set of S ⊆ Ranдe(M),                                                              1
                                                                                                             q ∈R exp( d f (D, r , ϕ))
                                                                                                            Í
                Pr(M(x) ∈ S) ≤ exp(ε) ∗ Pr(M(y) ∈ S)                (1)                                                            2
                                                                           where d f (D, r, ϕ) = max
                                                                                                             Í                i
                                                                                                                          ′ −ϕ u
   In regard to differential privacy (DP), privacy protection is for                                          i ∈D ⊕D
                                                                                                f (D ′ )=r
the tuple level, which means that all users included in the dataset
have the same privacy protection/loss ε value (one for all). How-             In our framework, ϕ refers to a set of maximum tolerable privacy
ever, in practice, individuals may have different privacy attitude,        loss εˆi of all data owners in the dataset x. We apply this P E mech-
as illustrated in our survey result, so allowing privacy personal-         anism to guarantee that each data owner’s privacy is protected
ization is considered critical, especially in the trading setting. We      despite data owners having different privacy requirements. The
thus adopt the personalized differential privacy (PDP) theory by           proof of this mechanism can be found in [8].
How to Balance Privacy and Money through Pricing Mechanism                                      SIGIR 2017 eCom, August 2017, Tokyo, JAPAN


B. Applicable Query Type                                                    query buyers can request, so our framework allows buyers to ask
With background knowledge, the adversary may engage in linkage              any linear aggregated queries but only once per query.
attacks on the published query answer and may eventually identify
an individual from this answer. Therefore, any queries answered in          D. Truthful Privacy Valuation
this trading framework should guarantee that results do not reveal          Untruthful privacy valuation is an undesirable property leading to
whether or not an individual is answering the query. DP or PDP              the generation of unacceptably high query prices. Without carefully
can prevent the data linkage attacks on the published results of            designed payment schemes, some savvy data owners will always
statistical/linear aggregated queries by introducing randomness.            attempt to select any schemes that provide them more benefits,
For these reasons, only statistical/linear aggregated queries should        so they may intentionally report an unreasonably high privacy
be allowed in the trading framework when the privacy is guaranteed          valuation. For instance, [12] applied a linear payment scheme (w i =
by DP or PDP. [12] also adopted this query type in their proposed           c i ∗ ε) and allowed each data owner to define the c i . With the same
theoretical framework.                                                      ε, most data owners will always set very high c i values to maximize
     Definition 3.5 (Linear Query [12]). Linear Query is a vector with      benefits.
real value q = (q 1 , q 2 , ..., qn ). The computation of this query q on       To encourage truthful privacy valuation, all data owners shall be
a fixed-size data vector x is the result of a vector product q.x =          provided with the suitable payment scheme corresponding to their
q 1 .x 1 + ... + qn .x n .                                                  privacy/risk attitudes so that untruthful valuations do not increase
                                                                            their benefits, as illustrated [2].
C. Arbitrage-free Pricing Model
Arbitrage-free is a requisite property used to combat the circum-
vention of a savvy data buyer on the query price. For instance, a
perturbed query answer with a larger ε 1 = 1 costs $10 and that with
a smaller ε 2 = 0.1 costs $0.1. If a savvy buyer seeks a perturbed
query answer with ε = 1, he or she will buy the query answer with
ε 2 = 0.1 10 times to compute the average of them for the same result
as ε 1 = 1 because ε increases as the number of computation times
n increases ε = (n ∗ ε 2 ). This case is explained based on composition
theorems in [3]. Therefore, the buyer will never have to pay $10
for the same result as the average of several cheap queries costing
him/her only $1. In [12], the arbitrage-free property is defined as                           Figure 6: Payment Schemes.
follows:
                                                                               Proposition 3.7 (Payment Scheme). A payment scheme is a
   Definition 3.6 (Arbitrage-free [12]). A pricing function π (Q) is        non-decreasing function w : ε → R + representing a promise between
arbitrage-free if for every multiset S = Q 1 , ..., Qm and Q can be         a market maker and a data owner on how much data owner should
determined from S, denoted as S → Q, then:                                  be compensated for their actual privacy loss εi . Any non-decreasing
                                    m
                                                                            functions can be denoted as payment schemes. For instance,
                                                                                 • Type A: This Logarithm function is designed to favor conser-
                                    Õ
                          π (Q) ≤         π (Q i )                    (5)
                                    i=1
                                                                                   vative (low-risk, low-return) data owners whose εˆ is small.
   An explanation and discussion of query determinacy (S → Q)                                        loд(30) ∗ ln(9000ε + 1)
                                                                                                w=                                            (6)
can be found in [12].                                                                                          130
   Arbitrage-free pricing function: [12] proved that a pricing function
                                                                                • Type B: This Sublinear function is designed to favor liberal
π (Q) can be made equal to the sum of all payments made to data
                                                                                  (high-risk, high-return) data owners whose εˆ is large.
owners if the framework is balanced. A framework is balanced if:
(1) the pricing function π and payment function to data owners                                     w= √
                                                                                                               8ε
                                                                                                                                           (7)
are arbitrage-free, and (2) the query price is cost-recovering, which                                      1100 + 500ε 2
means that the query price should not be less than that needed to
compensate all data owners. In our framework, we simply adopt                  For our framework, we designed two different types of payment
their arbitrage-free property by ensuring that the query price Wq           schemes, as illustrated in Figure 6. The data owner shall select a
is always greater than the compensation given to all data owners            payment scheme based on his or her privacy εˆ or risk orientation.
(whose data are accessed) for their actual privacy loss εi .                Therefore, there is no reason for data owners to untruthfully report
   For simplicity, a buyer shall not be able to request the same            their privacy valuation εˆ because doing so would not provide them
query more than once because each data owner has his or her                 with any benefits. The market maker designs a pricing scheme,
own εˆi , so we must guarantee that their privacy loss is no greater        and the guidelines of a design should mainly depend on equilib-
than their specified εˆi . Alternatively, market maker can predefine        rium theory of the supply and demand. In the present study, we
the sets of queries that buyer can ask for so that they can study           only consider two types of functions to provide different options
relationships between all queries in advance to prevent arbitrage           for conservative and liberal data owners. We will develop a more
problems from emerging. However, this also limits the choice of             sophisticated scheme in our future work.
SIGIR 2017 eCom, August 2017, Tokyo, JAPAN                                                     Rachana Nget, Yang Cao, Masatoshi Yoshikawa


E. Unbiased Result                                                          is a real non-negative value that is difficult to determine to obtain
Besides ensuring privacy protection and price optimization, unbi-           an exact level of utility. However, [7] conducted a study on an eco-
ased result has been a crucial factor in trading. Buyers do not want        nomic method of setting ε. Thus, a good user interface is assumed
to obtain a result that is biased or that is significantly different from   to help data owners understand and determine their εˆi .
the true result, so it is important to ensure the generation of an              Data buyer purchases an aggregated query answer from the
unbiased result.                                                            market maker by specifying a query Q and a maximum budget
   In our setting, we guarantee the generation of an unbiased/less          Wmax . Rather than asking the buyer to specify the variance in the
biased result by randomly selecting data owners, among which                query answer, as in [12], we design our mechanism to be able to
both liberal and conservative data owners are equally likely to be          obtain the most optimal result with the least noise/errors within
selected. Employing the PDP assumption, data owner’s εˆi value is           the given budget Wmax , since data buyers are highly unlikely to
not correlated with the sensitivity of data, so random selection best       know which value of variance to specify to obtain their desired
guarantees a less biased result.                                            utility within a limited budget. Thus, designing a mechanism to
   Moreover, to optimize the query price, it is necessary to select         tackle this issue helps buyers and market maker.
a representative sample from a dataset because paying each indi-                Our framework works as follows. Data owner ui (x i , εˆi , w i ), i ∈
vidual data owner in the dataset (as in [12]) leads to the generation       [1, n] sells his/her data element x i by demanding that the actual
of very high query prices for the same level of data utility. Thus,         privacy loss εi must not be greater than their specified εˆi while
sampling a good representative subset is very useful. We apply              payment should correspond to their selected payment scheme w i .
statistical sampling method to compute the number of data owners            These data elements are stored by a trusted market maker. In the pre-
required for each representative sample given a dataset. A similar          trading stage, the data buyer issues a purchase request by specifying
concept is employed in [2].                                                 his Q and Wmax . With the request, the market maker will run a
   A personal data trading framework should adopt these five key            simulation and generate a price menu (see Table 2) with an average
principles to avoid certain issues and to obtain more optimal results.      privacy loss ε and a sample size corresponding to prices for the
However, a similar study by [12] did not consider all of these key          buyer. This price menu provides an overview of the approximate
principles. First, data owners cannot personalize their privacy levels      level of utility the buyer may receive for each price.
as they are assumed to accept infinite losses when more money is                            Table 2: Example of a price menu.
paid. Moreover, their mechanism cannot efficiently reduce query                       Price ($) Average privacy loss ε Sample size
prices because a query is computed on the entire dataset, and data                        5              0.039             384
owners can easily untruthfully report their privacy valuation to                         50              0.545             384
maximize the amount of payment given a linear payment scheme.                           100              0.619             698

3.2    Personal Data Trading Framework
                                                                               The buyer reviews the ε and determines the amount of money he
To balance data owners’ privacy loss and data buyer’s payment to            is willing to pay. Once the market maker is notified of the purchase
guarantee a fair trade, we propose a personal data trading frame-           decision, he will run the pricing mechanism (described in Section 4)
work (see Figure 7) that involves three main participants: market           to select a number of representative samples RS from the dataset x
maker, data owner, and data buyer.                                          and then conduct a query computation by perturbing the answer to
                                                                            ensure the privacy guarantee for all data owners whose data were
                                                                            accessed. Next, the market maker distributes the payment to the
                                                                            data owners in the selected sample RS and returns the perturbed
                                                                            query answer P(Q(x)), the remaining budget Wr , the size of RS, and
                                                                            the root mean squared error RMSE in the query answer. Note that
                                                                            the transaction aborts when the market maker cannot meet their
                                                                            requirements simultaneously.

                                                                            4    PRICING MECHANISM
       Figure 7: Trading framework for personal data.                       The pricing mechanism directs price and query computations for
   Market maker is a mediator between the data buyer and data               data buyers and compensation computation for data owners whose
owner. Market maker has some coordinating roles. First, market              data have been accessed. A specially designed pricing mechanism is
maker serves as a trusted server that answers data buyer’s query            required in this personal data market because information derived
by accessing the data elements of data owners. Second, a market             from personal data, unlike other types of physical goods, does not
maker computes and distributes payment to data owners whose                 have any tangible properties. Thus, it is difficult to set a price or
data have been accessed while keeping a small cut of the price as a         calculate the traded value as asserted in [16]. Similarly, [1] and
profit χ . Third, a market maker devises some payment schemes for           [15] discussed why some conventional pricing models (i.e., the
data owners to choose from. Our pricing mechanism is designed to            cost-based pricing and competition-based pricing models) are not
assist the market maker with his or her tasks.                              able to price digitalized goods such as data and information. As
   Data owner sells his/her data element x i by selecting the max-          noted in [17], the only feasible pricing model is the value-based
imum tolerable privacy loss εˆi and payment scheme w i . In DP, ε           pricing model, through which the price is set based on the value that
How to Balance Privacy and Money through Pricing Mechanism                                    SIGIR 2017 eCom, August 2017, Tokyo, JAPAN


the buyer perceives. In our framework, the utility of query results           Algorithm 2: Compute query price and compensation for all
determines the price, and this utility is significantly associated with       h subsets
each data owner’s level of privacy loss.                                       Input: x, (RS 1 , RS 2 , ..., RSh ), Wmax , χ , h, and Φ
                                                                               Output: Wp , Wr , and (w 1 , w 2 , ..., w h )
4.1    Baseline Pricing Mechanism                                            1 Wab ← Wmax − χ ;
To simply compute the query price, compensation, and perturbed               2 while While h>0 do
query result, the baseline pricing mechanism does not involve a              3     j ← |RSh |;
sampling procedure. It basically utilizes the entire dataset x in com-       4
                                                                                             Íj
                                                                                   Wp ← { i=0 w ui ∈RSh (εˆui ∈RSh )|i ∈ [0, j − 1]};
putations to ensure the generation of an unbiased result. In addition,
                                                                             5     if Wp ≤ Wab then
the baseline pricing mechanism implements a simple personalized
                                                                             6         while While j< |x |&& Wp < Wab do
differentially private mechanism known as the Minimum mech-
                                                                             7             Wr ← Wab − Wp ;
anism [8], which satisfies εˆi -PDP by injecting the random noise
                                                                             8              RSh ← {uk |U ndupRandomize(1, |x |)};
X drawn from a Laplace distribution with a scale b, denoted as
(X ∼ Lap(b)), where b = 1/Min(εˆ1 , εˆ2 , ..., εˆn ). The computational      9              j ← j + 1;
run-time of this mechanism is much shorter than that of the sophis-         10              if W r > w uk ∈x (εˆuk ∈x ) then
ticated balanced pricing mechanism, yet it generates a higher query         11                   Wp ← Wp + w uk ∈x (εˆuk ∈x );
price for a result with more noise. This mechanism does not con-            12                   ε uk ∈x ← εˆuk ∈x ;
sider a sophisticated allocation of compensation and perturbation,          13              else
so it just compensates all data owners ui ∈ x for the same privacy          14                   Wp ← Wp + Wr ;
loss εˆmin and satisfies all ui ∈ x with the minimum privacy loss           15                   w uk ∈x ← Wr ;
εˆmin resulting in a very low utility (with more noise). For a better       16                   ε uk ∈x ← (w uk ∈x )−1 ;
result, we propose a balanced pricing mechanism that takes into             17              end
account the weak performance of the baseline pricing mechanism.             18         end
                                                                            19         Wr ← Wab − Wp ;
4.2    Balanced Pricing Mechanism
                                                                            20     else
In the balanced pricing mechanism, computations are conducted               21         lsT emp ← RSh ;
through the use of three main algorithms: (1) sample h subsets of           22         payment ← 0;
data owners, (2) compute the query price and compensation for all
                                                                            23         Wr ← 0;
h subsets, and (3) perturb the query answer for all h subsets and                                     Wab
then select an optimal subset.                                              24         Weq ←                   ;
                                                                                                   |lsT emp|
   Algorithm 1 samples h subsets of data owners. It computes the
                                                                            25         do
size of an RS representative sample of a dataset x using the sta-
                                                                            26              lsU nderPaid ← 0;
tistical method given a dataset x, a confidence level score CLS, a
                                                                            27              foreach ui ∈ lsT emp do
distribution of the selection DT , and a margin of error MER. Then,
                                                                            28                   if w ui (εˆui ) ≤ Weq then
the mechanism randomly selects different/not-duplicated data own-
                                                                            29                         ε ui ← εˆui ;
ers for all the h different subsets. Due to the randomization of data
owner selection, the mechanism guarantees an optimal sampling               30                        payment ← payment + w ui (εˆui );
result by increasing the h because an optimal subset RS is selected         31                   else
from all the h different subsets. The output of this algorithm is a         32                        w ui ← Weq ;
set of samples (RS 1 , RS 2 , ..., RSh ) used as an input in Algorithm 2.   33                         εi ← (w ui )−1 ;
                                                                            34                        lsU nderPaid ← lsU nderPaid + ui ;
 Algorithm 1: Sample h subsets of data owners                               35                        payment ← payment + Weq ;
  Input: x, DT , CLS, MER, and h                                            36                   end
  Output: (RS 1 , RS 2 , ..., RSh )                                         37              end
         DT ∗ CLS 2                                                                        Wr ← Wab − payment;
1 SS ←                 ;                                                    38
            MER 2                                                                                                      Wr
             SS ∗ |x |                                                      39             Weq ← Weq +                            ;
2 |RS | ←                 ;                                                                                      |lsU nderPaid |
           SS + |x | − 1
                                                                            40              lsT emp ← lsU nderPaid;
3 while While h>0 do
                                                                            41         while Wr > 0;
4      RSh ← {ui |U ndupRandomize(1, |x |)|i ∈ [1, h]};
                                                                            42         Wp ← Wab ;
5     h ← h − 1;
                                                                            43     end
6 end
                                                                                           Wp
                                                                            44     wh ←         ;
                                                                                             j
   Algorithm 2 computes the query price and compensation for                45     h = h − 1;
all the h subsets. Given a data buyer’s maximum budget Wmax ,               46 end
query Q, dataset x, number of samples h, number of perturbations
SIGIR 2017 eCom, August 2017, Tokyo, JAPAN                                                         Rachana Nget, Yang Cao, Masatoshi Yoshikawa


  Algorithm 3: Perturb the query answer for all h subsets and                   5    EXPERIMENT
  then select an optimal subset                                                 Experimental setup: We divide the experiment into two compo-
   Input: x, h, Φ, (RS 1 , RS 2 , ..., RSh ), and (w 1 , w 2 , ..., w h )       nents: (1) the simulation of our balanced pricing mechanism and (2)
   Output: ε max , P(Q(RS)opt ), w opt , and RMSEopt                            the comparison of our mechanism with the baseline pricing mech-
 1 m ← h;                                                                       anism. We examine the query price Wp , root mean squared error
 2 while While m>0 do                                                           RMSE, average privacy loss ε and average compensation w that
                  1 Í |RSm | ui                                                 each ui obtained from both mechanisms and then conclude that
 3     εm ← {                       ε |i ∈ [0, |RSm | − 1]};                    for the same Wp , which mechanism generates the smallest RMSE
                |RSm | i=0
                                                   |x |
       P(Q(RSm ))k ← { P E Q (x, RSm ) ∗
                                                                                value. Due to space constraints, we only show the experimental
 4                                                       |k ∈ [0, Φ − 1]};
                                                 |RSm |                         result of the following count query Q: "How many people commute
                      sÍ
                            Φ (P(Q(x) − P(Q(RS )) ))2                           by personal car in the USA?"
                                                        m k
                    
                            k =0
 5     RMSEm ←                                                       |k ∈          Data preparation: From our survey, we obtained 486 records
                                             Φ                                 of personal data from 486 data owners. To generate more accurate
        [1, Φ − 1] ;                                                            experimental results, a larger dataset is preferable, so we dupli-
                                                                                cated our survey dataset 500 times to obtain a larger dataset x of
 6     m = m − 1;                                                               243,000 records. To conduct such an experiment, each data record
 7 end                                                                          must have two important variables: the maximum tolerable pri-
 8 ε max ← Max(ε 1 , ε 2 , ..., ε h );                                          vacy loss εˆ and a payment scheme w. For the sake of simplicity,
 9 optIndex ← {index |ε index = ε max };                                        we assume εˆ ∈ [0, 1] and two types of payment schemes (as de-
10 RMSEopt ← RMSEopt I ndex ;                                                   scribed in Section 3.1). In preparing our data, we generate these
11 P(Q(RS))opt ← P(Q(RS))opt I ndex ;
                                                                                two variables for each record/data owner according to the survey
12 w opt ← w opt I ndex ;
                                                                                answers. When a data owner has chosen to have very high and high
                                                                                alterations/perturbations, they are classified under the conservative
                                                                                group, so his or her εˆi values are set to 0.1 and 0.3, respectively. For
Φ, market maker’s benefit χ , and h subsets (RS 1 , RS 2 , ..., RSh ) from      low and very low perturbations, the εˆi values are set to 0.7 and 0.9
Algorithm 1, the algorithm returns the query price Wp , remaining               respectively, and such data owners are categorized under the liberal
budget Wr (if applicable), compensation w i (εi ) for each ui , and             group. To optimize their benefits, we set the most optimal payment
average compensation w i for each subset because Algorithm 3                    scheme for them based on their εˆi values. For the conservative
uses this result to select an optimal subset from all h subsets. The            group with εˆi values of 0.1 and 0.3, we set a payment scheme type
algorithm first computes the available budget w ab by subtracting               A, while type B is set for liberal group with εˆi values of 0.7 and 0.9.
χ from the given Wmax . Next, the algorithm computes the total                  In turn, we obtain a usable and large dataset for our experiment.
payment Wp required when paying for the maximum privacy loss εˆi                   Experiment results: We first conduct a simulation of our mech-
of ui ∈ RSh . w ui ∈RSh (εˆui ∈RSh ) denotes a payment for data owner           anism (Figure 8) to explain the correlation between the query price
ui in RSh for εˆi . When Wp is smaller than Wab , the algorithm pays            and RMSE , between the query price and average privacy loss ε,
each ui for εˆi while using Wr to include more data owners into                 and between the query price and average compensation value. Fig-
RSh by paying for εˆi or εi < εˆi based on Wr . This process repeats            ure 8a shows that the RMSE value decreases as the query price
until all Wr = 0 or |RSh | = |x |, as the utility is influenced by both         increases. This pattern is reasonable in practice because the higher
the size of RS and by the privacy loss εi of all ui . Otherwise, when           the query price is, the lower the RMSE should be. Remarkably, the
Wp > Wab , the algorithm determines the equal payment Weq for                   RMSE value declines dramatically with query price from $5 to $50
each ui ∈ RSh and then verifies if each ui should be paid exactly               but then gradually and slightly decreases for $50 to $1000. We can
Weq or less when w ui (εˆui ) < Weq . The updated (RS 1 , RS 2 , ..., RSh )     attribute this phenomenon to the impact of privacy parameter εi
as an output is used in Algorithm 3.                                            of each data owner ui and to the number |RS | of data owners re-
    With the output of Algorithm 2, Algorithm 3 perturbs the query              sponding to the query. When the query price is approximately $50
answer and selects an optimal subset from all h subsets. It computes            or less, it can only cover the compensation of RS, so with the same
the average privacy loss ε and perturbed query result P(Q(RS))                  size |RS |, an increase in the query price (i.e., $5 to $50) can also
based on the proportional difference between x and RSh by multiply-             increase the ε value in RS. However, when the query price exceeds
ing result of P E by |x |/|RSh |, and RMSE in each (RS 1 , RS 2 , ..., RSh ).   what is needed to pay for εˆi for all ui in RS, the remaining budget
It then selects an optimal RS with a maximum average privacy loss               is used to include more data owners in RS, which can significantly
of ε max denoting a high probability that less random noise is in-              or marginally decrease the overall RMSE while increasing the ε
cluded in the result. Finally, the algorithm finds the corresponding            value depending on the distribution of data. When more conser-
RMSEopt , P(Q(RS))opt and w opt of the optimal RS selected.                     vative data owners are included in RS, this can affect the ε value
    At the end, data buyers receive the perturbed query answer                  resulting in just a minor decrease in RMSE despite more money
P(Q(RS)) along with the remaining budget Wr (when applicable),                  being spent. For this reason, the price menu plays a crucial role in
the number of data owners in RS, and the mean squared error RMSE                providing an overview on approximate degree of change in RMSE
in the query answer. Data owners are then compensated according                 values corresponding to query prices. In turn, data buyers can de-
to their actual privacy losses εi .                                             cide whether it is worth spending more money for a minor decrease
How to Balance Privacy and Money through Pricing Mechanism                                            SIGIR 2017 eCom, August 2017, Tokyo, JAPAN




             (a) Query price and RMSE.                             (b) Query price and ε .                      (c) Query price and average compensation.

                                              Figure 8: Simulation on balanced pricing mechanism.




    (a) Query price and RMSE of both mechanisms.          (b) Query price and ε of both mechanisms.      (c) Query price and average compensation of both mecha-
                                                                                                         nisms.
                 Figure 9: Comparison between the balanced pricing mechanism and baseline pricing mechanism.

of RMSE within their available budgets. Figure 8b and Figure 8c                 exponential-like P E mechanism (see Definition 3.4) to achieve Per-
show a similar correlation pattern between the query price and ε                sonalized Differential Privacy (PDP) to take advantage of the indi-
and between the query price and w. They show that the higher                    vidual privacy parameter εˆ of data owners, especially of the liberal
the query price is, the higher ε and w values become. A marginal                group. In contrast, the baseline mechanism can only apply a mini-
decrease in ε with a significant rise in query price ($100 to $1000)            mum mechanism to achieve PDP by adding a large amount of ran-
shown in Figure 8b can be attributed to the phenomenon illustrated              dom noise drawn from Laplace distributions utilizing the smallest
in Figure 8a whereby the RMSE value only slightly decreases within              εˆ of the entire dataset. Second, our mechanism produces a con-
a significant price increase.                                                   siderably smaller RMSE for the same query price. In other words,
   We next compare the results of our balanced pricing mecha-                   for the same level of utility, we can indeed reduce the query price,
nism with those of the baseline pricing mechanism (Figure 9). The               as our mechanism only queries a small subset of a dataset while
experimental results show that our balanced pricing mechanism                   generating unbiased results from a random sampling and selection
considerably outperforms the baseline mechanism under almost all                procedure. we thus exclusively compensate the data owners of the
conditions. Figure 9a shows that our balanced mechanism produced                queried subset, while the baseline mechanism must compensate all
a noticeably smaller RMSE value for the same query price relative               data owners of a dataset to run a query on the dataset to obtain
to the baseline mechanism. In particular, our balanced mechanism                unbiased results. Therefore, our balanced pricing mechanism is
produced a significantly smaller RMSE value even when the query                 more efficient than the baseline mechanism.
price was set to be relatively low (i.e., $5) because instead of query-             In the price menu, it is important to illustrate trends of higher
ing from the entire dataset, our balanced pricing mechanism only                prices and higher levels of approximate utility (denoted as ε). How-
queries from a representative sample RS. This reduces the query                 ever, Figure 8b shows a slight decrease in ε from $100 to $1000.
price while still generating a smaller RMSE. Due to random noise                This phenomenon could be attributed to the number of samplings
drawn from the Laplace distribution, we can see that the RMSE                   h applied in the mechanism. Despite showing a budget increase,
of the baseline mechanism, rather than declining, rises for query               it cannot fully guarantee that ε will increase due to the random
prices $50 to $100. Figure 9b and Figure 9c show a similar pattern in           selection of data owners with various εˆ values. Thus, our naive
that the ε and w of our balanced pricing mechanism are significantly            solution is to increase the price gap in the price menu to guarantee
higher than those of the baseline mechanism.                                    a distinguished increase in ε for an increasing query price. More
                                                                                discussion on this point will be included in our next work.
                                                                                    It is also crucial to ensure that data owners can technically choose
6   DISCUSSION                                                                  an appropriate maximum tolerable privacy loss εˆi that reflects
The above listed experiment results show that our balanced pricing              their privacy attitude and risk orientation. This problem indeed
mechanism considerably outperforms the baseline pricing mech-                   remains an open question in the differential privacy community
anism. This is attributed to two main factors. First, we apply an               regarding how to set the value of ε or εˆ in our setting. Although
SIGIR 2017 eCom, August 2017, Tokyo, JAPAN                                                               Rachana Nget, Yang Cao, Masatoshi Yoshikawa


[7] proposed an economic method for choosing ε, this problem has                  scheme using game theory. In the present study, we only designed
not been widely discussed. A part of solution, we provide some                    two types of payment schemes for liberal and conservative data
options of εˆ = {0.1, 0.3, 0.7, 0.9} corresponding with {very high,               owners. We will develop a more sophisticated design in our future
high, low, very low} data perturbation level. Very high perturbation              work. Moreover, in our study, a market maker is assumed to be
(i.e., εˆ = 0.1) means that more random noise is added to the result,             a trusted server storing and accessing data owners’ data on their
so the data owners have a very high privacy guarantee. However,                   behalf, yet to some extent, trust has become a difficult question to
some data owners might not understand how the perturbation                        address from both technical and social standpoints. Thus, for future
works, so we can provide an interactive interface allowing them                   work, we can consider a trading framework and pricing mechanisms
to see the approximate change on their actual data for a different                in which market makers are assumed to be untrustworthy..
value of ε. ˆ A similar concept of the interactive interface3 is used
to explain a perturbation via Laplace mechanism. Thus, we can                     ACKNOWLEDGMENTS
create a similar interface for exponential-like data perturbation                 This work was supported through the A. Advanced Research Net-
mechanism to assist data owners and buyers to understand the                      works JSPS Core-to-Core Program. The work was also supported
meaning of ε.   ˆ                                                                 through JSPS KAKENHI Grant Numbers 16K12437 and 17H06099.

7    RELATED WORK                                                                 REFERENCES
In the field of pricing mechanism design, there are two crucial fo-                [1] Alessandro Acquisti, Leslie K John, and George Loewenstein. 2013. What is
                                                                                       privacy worth? The Journal of Legal Studies 42, 2 (2013), 249–274.
cuses of research: auction-based pricing and query-based pricing.                  [2] Christina Aperjis and Bernardo a. Huberman. 2012. A market for unbiased private
Auction-based pricing has attracted the attention of [5], [6], [13],                   data: Paying individuals according to their privacy attitudes. First Monday 17, 5
and [14]. Auction-based pricing allows data owners to report their                     (2012), 1–17. https://doi.org/10.5210/fm.v17i5.4013
                                                                                   [3] Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differ-
data valuations and data buyers to place a bid. From a practical                       ential Privacy. Foundations and Trends in Theoretical Computer Science 9, 2013
point of view, it is very difficult for individuals to articulate their                (2014), 211–407. https://doi.org/10.1561/0400000042
data valuations as reported in [1]. Moreover, the price described in               [4] Federico Ferretti. 2014. EU competition law, the consumer interest and data protec-
                                                                                       tion: The exchange of consumer information in the retail financial sector. Springer.
[13] is eventually determined by the data buyer without consider-                      116 pages.
ing data owners’ privacy valuations or actual privacy losses. On the               [5] Lisa K Fleischer and Yu-Han Lyu. 2012. Approximately optimal auctions for
                                                                                       selling privacy when costs are correlated with data. In Proceedings of the 13th
other hand, query-based pricing, as defined in [10], involves the ca-                  ACM Conference on Electronic Commerce. ACM, 568–585.
pacity to automatically derive the prices of queries from given data               [6] Arpita Ghosh and Aaron Roth. 2015. Selling privacy at auction. Games and
valuations. The author in [10] also proposes a flexible arbitrage-free                 Economic Behavior 91 (2015), 334–346. https://doi.org/10.1016/j.geb.2013.06.013
                                                                                   [7] Justin Hsu, Marco Gaboardi, Andreas Haeberlen, Sanjeev Khanna, Arjun Narayan,
query-based pricing model that assigns prices to arbitrary queries                     and Benjamin C Pierce. 2014. Differential Privacy: An Economic Method for
based on the pre-defined prices of view. Despite this flexibility, the                 Choosing Epsilon. IEEE Computer Security Foundations Symposium (CSF) (2014),
price is non-negotiable. The buyer can obtain a query answer only                      1–29. https://doi.org/10.1109/CSF.2014.35
                                                                                   [8] Zach Jorgensen, Ting Yu, and Graham Cormode. 2015. Conservative or liberal?
when he or she is willing to pay full price. Unfortunately, this model                 Personalized differential privacy. Proceedings - International Conference on Data
is not applicable to personal data trading, as it takes no account                     Engineering 2015-May (2015), 1023–1034. https://doi.org/10.1109/ICDE.2015.
                                                                                       7113353
of issues of privacy preservation. [12] extended and adapted the                   [9] Jyoti Mandar Joshi and G M Dumbre. 2017. Basic Concept of E-Commerce.
model by applying differential privacy for privacy preservation                        International Research Journal of Multidisciplinary Studies 3, 3 (2017).
and for the quantification of data owners’ privacy losses, yet this               [10] Paraschos Koutris, Prasang Upadhyaya, Magdalena Balazinska, Bill Howe, and
                                                                                       Dan Suciu. 2012. Query-based data pricing. Proceedings of the 31st symposium
method still presents a number of problems, as explained in Section                    on Principles of Database Systems - PODS ’12 62, 5 (2012), 167. https://doi.org/10.
3.1.                                                                                   1145/2213556.2213582
                                                                                  [11] Kenneth C Laudon. 1996. Markets and Privacy. Commun. ACM 39, 9 (1996), 92.
                                                                                  [12] Chao Li, Daniel Yang Li, Gerome Miklau, and Dan Suciu. 2014. A Theory of
8    CONCLUSION AND FUTURE WORK                                                        Pricing Private Data. ACM Transactions on Database Systems 39, 4 (2014), 1–28.
                                                                                       https://doi.org/10.1145/2448496.2448502
We analyzed people’s privacy attitude and levels of interest in data              [13] Christopher Riederer, Vijay Erramilli, Augustin Chaintreau, Balachander Krish-
trading, then identified five key principles for designing a reason-                   namurthy, and Pablo Rodriguez. 2011. For sale: Your data, By: You. Proceedings
able personal data trading framework. For an operational market,                       of the 10th ACM Workshop on Hot Topics in Networks - HotNets ’11 (2011), 1–6.
                                                                                       https://doi.org/10.1145/2070562.2070575
we proposed a reasonable personal data trading framework based                    [14] Aaron Roth. 2012. Buying Private Data at Auction : The Sensitive Surveyor âĂŹ
on our key principles. In addition, we proposed a balanced pricing                     s Problem. 11, 1 (2012), 3–8. https://doi.org/10.1145/2325713.2325714
mechanism that balances money with privacy to offer more utility                  [15] Tang Ruiming. 2014. on the Quality and Price of Data. Ph.D. Dissertation. National
                                                                                       University OF Singapore. http://www.scholarbank.nus.edu.sg/bitstream/handle/
to both data owners and data buyers without circumvention. Finally,                    10635/107391/TANG-Ruiming-thesis.pdf?sequence=1
we conducted various experiments to simulate our mechanism and                    [16] Mario Sajko, Kornelije Rabuzin, and Miroslav Bača. 2006. How to calculate
to prove its considerably higher degree of efficiency in compari-                      information value for effective security risk assessment. Journal of Information
                                                                                       and Organizational Sciences 30, 2 (2006), 263–278.
son to a baseline pricing mechanism. The results show that our                    [17] Carl Shapiro and Hal R Varian. 1998. Versioning: the smart way to. Harvard
study has identified and tackled some radical challenges facing the                    Business Review 107, 6 (1998), 107.
                                                                                  [18] Jacopo Staiano, Nuria Oliver, Bruno Lepri, Rodrigo de Oliveira, Michele Caraviello,
market, thus facilitating the existence of the personal data market.                   and Nicu Sebe. 2014. Money walks: a human-centric study on the economics of
   Having investigated the challenges of this market, we identify a                    personal mobile data. In Proceedings of the 2014 ACM International Joint Conference
number of interesting avenues for future work. To obtain an optimal                    on Pervasive and Ubiquitous Computing. ACM, 583–594.
query answer and price, it is crucial to carefully design a payment
3 http://content.research.neustar.biz/blog/differential-privacy/WhiteQuery.html