<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>How to Balance Privacy and Money through Pricing Mechanism in Personal Data Market</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rachana Nget</string-name>
          <email>rachana.nget@db.soc.i.kyoto-u.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yang Cao</string-name>
          <email>ycao31@emory.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Masatoshi Yoshikawa</string-name>
          <email>yoshikawa@i.kyoto-u.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Emory University</institution>
          ,
          <addr-line>Atlanta, Georgia</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kyoto University</institution>
          ,
          <addr-line>Kyoto</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>In the big data era, personal data is, recently, perceived as a new oil or currency in the digital world. Both public and private sectors wish to use such data for studies and businesses. However, access to such data is restricted due to privacy issues. Seeing the commercial opportunities in gaps between demand and supply, the notion of personal data market is introduced. While there are several challenges associated with rendering such a market operational, we focus on two main technical challenges: (1) How should personal data be fairly traded under a similar e-commerce platform? (2) How much should personal data be worth in trade? In this paper, we propose a practical personal data trading framework that strikes a balance between money and privacy. To acquire insight on user preferences, we first conduct an online survey on human attitude toward privacy and interest in personal data trading. Second, we identify five key principles of the personal data trading central to designing a reasonable trading framework and pricing mechanism. Third, we propose a reasonable trading framework for personal data, which provides an overview of how data are traded. Fourth, we propose a balanced pricing mechanism that computes the query price and perturbed results for data buyers and compensation for data owners (whose data are used) as a function of their privacy loss. Finally, we conduct an experiment on our balanced pricing mechanism, and the result shows that our balanced pricing mechanism performs significantly better than the baseline mechanism.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Security and privacy → Economics of security and privacy;
Usability in security and privacy;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        Personal data is, recently, perceived as a new oil or currency in
the digital world. A massive volume of personal data is constantly
produced and collected every second (i.e., via smart devices, search
engines, sensors, social network services, etc.). These personal data
are extraordinarily valuable for the public and private sector to
improve their products or services. However, personal data reflect
the unique value and identity of each individual; therefore, the
access to personal data is highly restricted. For this reason, some
large Internet companies and social network services provide free
services in exchange for their users’ personal data. Demand for
personal data for research and business purposes excessively increases
while there is practically no safe and eficient supply of personal
data. Seeing the commercial opportunities rooted in gaps between
demand and supply, the notion of personal data market is
introduced. This notion has transformed perceptions of personal data
as an undisclosed type to a commodity, as noted in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. To
perceive personal data as a commodity, many scholars, such as [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], and [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], have asserted that a monetary compensation
should be given to real data producers/owners for their privacy
loss whenever their data are accessed. Thus, personal data could be
traded under the form of e-commerce where buying, selling, and
ifnancial transaction are done online. However, this type of
commodity might be associated with private attributes, so it should not
be classified as one of the three conventional types of e-commerce
goods (i.e., physical goods, digital goods, and services, as noted
in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). This privacy attribute introduces a number of challenges
and requires diferent trading approach for this commodity called
personal data. How much money should data buyers pay, and how
much money should data owners require for their privacy loss from
information derived from their personal data? One possible way
is to assign the price in corresponding to the amount of privacy
loss, but how to quantify privacy loss and how much money to be
compensated for a metric of privacy loss are the radical challenges
in this market.
1.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Personal Data Market</title>
      <p>
        The personal data market is a sound platform for securing the
personal data trading. What is traded as defined in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is a noisy
version of statistical data. It is an aggregated query answer, derived
from users’ personal data, with some random noise included to
guarantee the privacy of data owners. The injection of random noise
is referred to as perturbation. The magnitude of perturbation directly
impacts the query price and amount of data owners’ privacy loss. A
higher query price typically yields a lower degree of perturbation
(less noise injection).
      </p>
      <p>
        In observing the published results of true statistical data, an
adversary with some background knowledge (i.e., sex, birth date,
zip code, etc.) on an individual in the dataset can perform linkage
attacks to identify whether that person is included in the results.
For instance, published anonymized medical encounter data were
once matched with voter registration records (i.e., birth date, sex,
zip code, etc.) to identify the medical records of the governor of
Massachussetts, as explained in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Therefore, statistical results should
be subjected to perturbation prior to publication to guarantee an
absence of data linkages.
      </p>
      <p>As is shown in Figure 1, three main participants are involved:
data owners, data seekers/buyers, and market maker. Data owners
contribute their personal data and receive appropriate monetary
compensation. Data buyers pay a certain amount of money to obtain
their desirable noisy statistical data. Market maker is a trusted
mediator between the two key players, as no direct trading occurs
between two parties. A market maker is entrusted to compute a
query answer, calculate query price for buyers and compensation
for owners, and most importantly design a variety of payment
schemes for owners to choose from.</p>
      <p>The personal data market could be considered as the integration
of Consumer-to-Business (C2B) and Business-to-Consumer (B2C)
or Business-to-Business (B2B) e-commerce. On one side of the
trading, the data owners as individuals provide their personal data
to the market as is done in (C2B) e-commerce, though, at this point,
no trading is done. On another end of the framework, the market
maker sells statistical information to data buyers as an individual
or company which is similar to (B2C) and (B2B) trading. This is
when the trading transactions are completed in this framework. The
study of such a market framework could initiate a new perception
on the new forms of e-commerce.</p>
      <p>The existence of personal data market will make abundance of
personal data including sensitive but useful data safely available
for various uses, giving rise to many sophisticated developments
and innovations. For this reason, several start-up companies have
developed online personal data trading sites and mobile
applications following this market orientation. These sites are Personal1,
and Datacoup2, which aim at creating personal data vaults. They
buy the raw personal data from each data owner and compensate
them accordingly. However, some data owners are not convinced
to sell their raw data (without perturbation). For Datacoup,
payment is fixed at approximately $8 for SNS and financial data (i.e.,
credit/debit card transactions). It is questionable whether $8 is
reasonable compensation, and how this price was decided. Another
source of ineficiency is related to the absence of data buyers. This
can create problems if buyers are not interested in such types of
collected data. In addition, CitizenMe and digi.me recently launched
personal data collection mobile applications that help data owners
collect and store all of their personal data in their devices. Although
the framework connects buyers to data owners, it might be
ineficient and impractical for buyers to buy individual raw data one at
a time. Moreover, as no pricing mechanism is ofered, data owners
1www.personal.com
2www.datacoup.com
and buyers must negotiate the prices on their own, which may not
be eficient because not all data owners know or truthfully report
the price of their data. This can result in an obstruction of trading
operations. Based on lessons learned from such start-ups, we can
conclude what they are missing is a well-designed trading
framework, that explains the principles of trading, and pricing mechanism,
that balances the money and privacy traded in the market.</p>
      <p>To make this market operational, there are many challenges
from all disciplines, but we narrow down fundamental technical
challenges to two factors:
• Trading framework for personal data: How should
personal data be fairly traded? In other words, how should a
reasonable trading framework be designed to respectively prevent
circumvention from buyers on arbitrage pricing and from data
owners on untruthful privacy valuation?
• Balanced pricing mechanism: How much should personal
data be worth? How should a price that balances data owners’
privacy loss and buyers’ payment be computed? This balance is
crucial in convincing data owners and data buyers to participate
in the personal data market.
1.2</p>
    </sec>
    <sec id="sec-4">
      <title>Contribution</title>
      <p>
        To address the above challenges more precisely, we first conducted
a survey on human attitudes toward privacy and interest in
personal data trading (Section 2). Second, from our survey analysis and
from previous studies, we identify five key principles of personal
data trading (Section 3.1). Third, we propose a reasonable trading
framework (Section 3.2) that provides an overview of how data
are traded and of transactions made before, during, and after trade
occurs. Fourth, we propose a balanced pricing mechanism (Section
4) that computes the price of a noisy aggregated query answer
and that calculates the amount of compensation given to each data
owner (whose data are used) based on his or her actual privacy loss.
The main goal is to balance the benefits and expenses of both data
owners and buyers. This issue has not been addressed in previous
researches. For instance, a theoretical pricing mechanism [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] has
been designed in favor of data buyers only. Their mechanism
empowers buyer to determine the privacy loss of data owners while
assuming that data owners can accept an infinite privacy loss.
Instead, our mechanism will empower both data owners and buyers
to fully control their own benefits and expenses. Finally, we conduct
an experiment on a survey dataset to simulate the results of our
mechanism and prove the eficiency of our mechanism relative to a
baseline pricing mechanism (Section 5).
2
      </p>
    </sec>
    <sec id="sec-5">
      <title>SURVEY RESULT</title>
      <p>To develop deeper insight into personal data trading and to collect
data for our experiment, we conducted an online survey delivered
through a crowdsourcing platform. In total, 486 respondents from
46 diferent states throughout the USA took part in the survey.
The respondents were aged 14 to older than 54 and had varying
education backgrounds, occupations, and incomes. For our survey,
respondents were required to answer 11 questions. Due to space
limitations, We only discuss the more significant questions posed.</p>
      <p>Analysis 1: For four types of personal data: Type 1 (commute
type to school/work), Type 2 (yearly income), Type 3 (yearly expense
on medical care), Type 4 (bank service you’re using), the following
results were obtained.</p>
      <p>(a) Can sell Vs. Cannot sell.</p>
      <p>(b) How much to sell.</p>
      <p>More than 50% of the respondents said they cannot sell the data
(see Figure 2a), and more than 50% of those who can sell said that
they do not know how much to sell (see Figure 2b).</p>
      <p>
        Most of the participants stated that they do not know how much
their data are worth, highlighting one of the above mentioned
challenges related to the personal data market. Similarly, [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] noted that
it is very dificult for data owners to articulate the exact valuation
of their data.
      </p>
      <p>Analysis 2: When asked to sell their anonymized personal data,
49% of respondents said It depends on type of personal data and
amount of money, 35% were Not interested, and 16% were Interested
(see Figure 3a). However, if providing more privacy protection by
both anonymizing and altering (perturbing) real data, more than 50%
of the respondents became interested in selling, meaning that more
people are now convinced to sell their data under such conditions.
(see Figure 3b).
(a) Interest in selling anonymized data. (b) Interest in selling both anonymized
and altered data.</p>
      <p>Anonymization does not convince people to sell their personal
data. Providing extra privacy protection via data alteration or
perturbation on the anonymized data might make them feel more
convinced and safer to sell their data.</p>
      <p>
        Analysis 3: With regard to alteration/perturbation, the
respondents were asked to select their preferred privacy level: {very low,
low, high, very high}, in other words, how much they want to
alter/perturb their real data. A very low level of alteration (low noise
injection) denotes a low privacy protection, but more monetary
compensation. As a result (see Figure 4a), alteration levels were
found to vary across the four types of data. Similarly, the preferred
payment schemes (see Figure 4b) varied throughout all the data
types. A human-centric study [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] also showed that people value
diferent categories of data diferently according to their behaviors
and intentional levels of self-disclosure; as a result, location data
are valued more highly than communication, app, and media data.
      </p>
      <p>Privacy protection levels and desired payment schemes varied in
between the data considered and among the respondents. In
practice, people harbor diferent attitudes toward privacy and money.
Thus, it is crucial to allow a personalized privacy level and payment
scheme for each individual.</p>
      <p>Analysis 4: Among the four given criteria to decide when selling
personal data: usage (who and how buyers will use your data),
sensitivity (sensitivity of data, i.e., salary, disease, etc.), risks (future
risks/impacts), and money (to obtain as much money as possible),</p>
      <p>In descending order, the participants valued the following: who
and how the data will be used, sensitivity, future risks/impacts, and
money (see Figure 5).</p>
      <p>Money is considered the least important criterion, while who and
how data will be used is considered the most important one when
deciding to sell personal data. This implies that money cannot buy
everything when the seller does not want to sell.
3</p>
    </sec>
    <sec id="sec-6">
      <title>TRADING FRAMEWORK</title>
      <p>All notations used in this study are summarized in Table 1.
3.1</p>
    </sec>
    <sec id="sec-7">
      <title>Key Principles of the Trading Framework</title>
      <p>To design a reasonable trading framework and a balanced pricing
mechanism, it is important to determine the chief principles of the
personal data trading framework. These key principles are derived
from previous studies and from the four key analyses of our survey.
The principles are categorized into five diferent groups:
personalized diferential privacy as a privacy protection, applicable query
type, arbitrage-free pricing model, truthful privacy valuation, and
unbiased result. To guarantee the data owner’s privacy,
personalized diferential privacy injects some randomness into the result
based on the preferred privacy level. It is also used as a metric to
quantify the privacy loss of each data owner. With this personalized
diferential privacy guarantee, only some certain linear aggregated
query types are applicable in this trading framework. Regarding
pricing, a pricing model should be arbitrage-free and must not
allow any circumventions on the query price from any savvy buyers.
Similarly, such a framework should be designed to encourage data
Notation
ui , bj
xi
εˆi
wi
εi
wi (εi )
x</p>
      <p>Q
owners’ truthful privacy valuation by providing them the right
pricing scheme so that they will not benefit from any untruthful
valuation. Finally, it is important to ensure the generation of
unbiased/less biased query result without increasing query price, so a
careful sample selection method is crucial.</p>
    </sec>
    <sec id="sec-8">
      <title>A. Personalized Diferential Privacy as a Privacy</title>
    </sec>
    <sec id="sec-9">
      <title>Protection</title>
      <p>
        The pricing mechanism should be capable of preserving data owner’s
privacy from any undesirable privacy leakages. To ensure privacy,
diferential privacy [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] plays an essential role in guaranteeing that
the adversary could learn nothing about an individual while
learning useful information about the whole population/dataset from
observing the query result (despite some background knowledge
about that individual). Given a privacy parameter ε, any private
mechanisms (i.e., Laplace mechanism, Exponential mechanism, etc.)
satisfy the ε-diferential privacy level if the same result is likely to
occur regardless of the presence or absence of any individual in
the dataset as a result of random noise addition. A smaller ε ofers
better privacy protection but is less accurate, resulting in a tradeof
between privacy and result accuracy. In our framework, we define
ε as the quantification of privacy loss of data owner as ε and money
are correlated.
      </p>
      <p>
        Definition 3.1 ( ε-Diferential Privacy [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). A random algorithm
M : D → R satisfies ε-Diferential Privacy ( ε-DP) if the neighboring
dataset x , y ∈ D where D is a whole dataset and dataset x and y
difers by only one record, and any set of S ⊆ Ranдe(M),
Pr(M(x ) ∈ S) ≤ exp(ε) ∗ Pr(M(y) ∈ S)
(1)
      </p>
      <p>
        In regard to diferential privacy (DP), privacy protection is for
the tuple level, which means that all users included in the dataset
have the same privacy protection/loss ε value (one for all).
However, in practice, individuals may have diferent privacy attitude,
as illustrated in our survey result, so allowing privacy
personalization is considered critical, especially in the trading setting. We
thus adopt the personalized diferential privacy (PDP) theory by
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which is derived from the above diferential privacy. Each user
can personalize his or her maximum tolerable privacy level/loss
εˆi , so any private mechanisms that satisfy εˆi -diferential privacy
must guarantee each user’s privacy up to their εˆi . Users may set
εˆi according to their privacy attitude with the assumption that εˆi
is public and is not correlated with the sensitivity of data. This
theory thus allows users’ privacy personalization while ofering
more utility to data buyers.
      </p>
      <p>
        Definition 3.2 (Personalized Diferential Privacy [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). Regarding
the maximum tolerable privacy loss εˆ of each user and a universe
of users U , a randomized mechanism M : D → R satisfies
εˆPersonalized Diferential Privacy (or εˆ-PDP), if for every pair of
neighboring datasets x, y ∈ D where x and y difers in data for user
i, and for any set of S ⊆ Ranдe(M),
      </p>
      <p>Pr(M(x ) ∈ S) ≤ exp(εˆ) ∗ Pr(M(y) ∈ S)
(2)</p>
      <p>
        Both DP and PDP are theories, so a private mechanism is
employed to realize these theories. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] introduced two PDP private
mechanisms: sampling and exponential-like mechanisms. Given a
privacy threshold, the sampling mechanism samples a subset drawn
from the dataset and then runs one of the private mechanisms (i.e.,
Laplace mechanism, etc.). The exponential-like mechanism, given
a set of εˆ, computes a score (probability) for each potential element
in the output domain. This score is inversely related to the number
of changes made in a dataset x required for a potential value to
become the true answer.
      </p>
      <p>
        Definition 3.3 (Score Function [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). Given a function f : D → R
and outputs r ∈ Ranдe(f ) with a probability proportional to that
of the exponential mechanism diferential privacy [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], s(D, r ) is
a real-valued score function. The higher the score, the better r is
relative to f (D). Assuming that D and D ′ difer only in the value
of a tuple, denoted as D ⊕ D ′,
s(D, r ) =
max
f (D′)=r
− |D ⊕ D ′|
      </p>
      <p>
        In PDP, each record or data owner has their own privacy setting
εˆi , so it is important to distinguish between diferent D ′ that make
a specific value to become the output. To formalize this mechanism,
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] defined it as follows.
mechanism P E ϕf (D) outputs r ∈ R with probability
      </p>
      <p>
        Definition 3.4 ( P E Mechanism [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). Given a function f : D → R,
an arbitrary input dataset D ⊂ D, and a privacy specification ϕ, the
Pr [P E ϕf (D) = r ] =
      </p>
      <p>1
exp( 2df (D, r , ϕ))
Íq ∈R exp( 21df (D, r , ϕ))
where df (D, r , ϕ) = max Íi ∈D ⊕D′ −ϕiu</p>
      <p>f (D′)=r</p>
      <p>
        In our framework, ϕ refers to a set of maximum tolerable privacy
loss εˆi of all data owners in the dataset x . We apply this P E
mechanism to guarantee that each data owner’s privacy is protected
despite data owners having diferent privacy requirements. The
proof of this mechanism can be found in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
(3)
(4)
      </p>
    </sec>
    <sec id="sec-10">
      <title>B. Applicable Query Type</title>
      <p>
        With background knowledge, the adversary may engage in linkage
attacks on the published query answer and may eventually identify
an individual from this answer. Therefore, any queries answered in
this trading framework should guarantee that results do not reveal
whether or not an individual is answering the query. DP or PDP
can prevent the data linkage attacks on the published results of
statistical/linear aggregated queries by introducing randomness.
For these reasons, only statistical/linear aggregated queries should
be allowed in the trading framework when the privacy is guaranteed
by DP or PDP. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] also adopted this query type in their proposed
theoretical framework.
      </p>
      <p>
        Definition 3.5 (Linear Query [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]). Linear Query is a vector with
real value q = (q1, q2, ..., qn ). The computation of this query q on
a fixed-size data vector x is the result of a vector product q.x =
q1.x1 + ... + qn .xn .
      </p>
    </sec>
    <sec id="sec-11">
      <title>C. Arbitrage-free Pricing Model</title>
      <p>
        Arbitrage-free is a requisite property used to combat the
circumvention of a savvy data buyer on the query price. For instance, a
perturbed query answer with a larger ε1 = 1 costs $10 and that with
a smaller ε2 = 0.1 costs $0.1. If a savvy buyer seeks a perturbed
query answer with ε = 1, he or she will buy the query answer with
ε2 = 0.1 10 times to compute the average of them for the same result
as ε1 = 1 because ε increases as the number of computation times
n increases ε = (n ∗ ε2). This case is explained based on composition
theorems in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Therefore, the buyer will never have to pay $10
for the same result as the average of several cheap queries costing
him/her only $1. In [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], the arbitrage-free property is defined as
follows:
      </p>
      <p>
        Definition 3.6 (Arbitrage-free [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]). A pricing function π (Q) is
arbitrage-free if for every multiset S = Q1, ..., Qm and Q can be
determined from S, denoted as S → Q, then:
π (Q) ≤
      </p>
      <p>π (Qi )
m
Õ
i=1
(5)</p>
      <p>
        An explanation and discussion of query determinacy (S → Q)
can be found in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Arbitrage-free pricing function: [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] proved that a pricing function
π (Q) can be made equal to the sum of all payments made to data
owners if the framework is balanced. A framework is balanced if:
(1) the pricing function π and payment function to data owners
are arbitrage-free, and (2) the query price is cost-recovering, which
means that the query price should not be less than that needed to
compensate all data owners. In our framework, we simply adopt
their arbitrage-free property by ensuring that the query price Wq
is always greater than the compensation given to all data owners
(whose data are accessed) for their actual privacy loss εi .
      </p>
      <p>For simplicity, a buyer shall not be able to request the same
query more than once because each data owner has his or her
own εˆi , so we must guarantee that their privacy loss is no greater
than their specified εˆi . Alternatively, market maker can predefine
the sets of queries that buyer can ask for so that they can study
relationships between all queries in advance to prevent arbitrage
problems from emerging. However, this also limits the choice of
query buyers can request, so our framework allows buyers to ask
any linear aggregated queries but only once per query.</p>
    </sec>
    <sec id="sec-12">
      <title>D. Truthful Privacy Valuation</title>
      <p>
        Untruthful privacy valuation is an undesirable property leading to
the generation of unacceptably high query prices. Without carefully
designed payment schemes, some savvy data owners will always
attempt to select any schemes that provide them more benefits,
so they may intentionally report an unreasonably high privacy
valuation. For instance, [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] applied a linear payment scheme (wi =
ci ∗ ε) and allowed each data owner to define the ci . With the same
ε, most data owners will always set very high ci values to maximize
benefits.
      </p>
      <p>
        To encourage truthful privacy valuation, all data owners shall be
provided with the suitable payment scheme corresponding to their
privacy/risk attitudes so that untruthful valuations do not increase
their benefits, as illustrated [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Proposition 3.7 (Payment Scheme). A payment scheme is a
non-decreasing function w : ε → R+ representing a promise between
a market maker and a data owner on how much data owner should
be compensated for their actual privacy loss εi . Any non-decreasing
functions can be denoted as payment schemes. For instance,
• Type A: This Logarithm function is designed to favor
conservative (low-risk, low-return) data owners whose εˆ is small.
w =
loд(30) ∗ ln(9000ε + 1)
130
(6)
• Type B: This Sublinear function is designed to favor liberal
(high-risk, high-return) data owners whose εˆ is large.</p>
      <p>8ε
(7)
w = √1100 + 500ε2</p>
      <p>For our framework, we designed two diferent types of payment
schemes, as illustrated in Figure 6. The data owner shall select a
payment scheme based on his or her privacy εˆ or risk orientation.
Therefore, there is no reason for data owners to untruthfully report
their privacy valuation εˆ because doing so would not provide them
with any benefits. The market maker designs a pricing scheme,
and the guidelines of a design should mainly depend on
equilibrium theory of the supply and demand. In the present study, we
only consider two types of functions to provide diferent options
for conservative and liberal data owners. We will develop a more
sophisticated scheme in our future work.</p>
    </sec>
    <sec id="sec-13">
      <title>E. Unbiased Result</title>
      <p>Besides ensuring privacy protection and price optimization,
unbiased result has been a crucial factor in trading. Buyers do not want
to obtain a result that is biased or that is significantly diferent from
the true result, so it is important to ensure the generation of an
unbiased result.</p>
      <p>In our setting, we guarantee the generation of an unbiased/less
biased result by randomly selecting data owners, among which
both liberal and conservative data owners are equally likely to be
selected. Employing the PDP assumption, data owner’s εˆi value is
not correlated with the sensitivity of data, so random selection best
guarantees a less biased result.</p>
      <p>
        Moreover, to optimize the query price, it is necessary to select
a representative sample from a dataset because paying each
individual data owner in the dataset (as in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]) leads to the generation
of very high query prices for the same level of data utility. Thus,
sampling a good representative subset is very useful. We apply
statistical sampling method to compute the number of data owners
required for each representative sample given a dataset. A similar
concept is employed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        A personal data trading framework should adopt these five key
principles to avoid certain issues and to obtain more optimal results.
However, a similar study by [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] did not consider all of these key
principles. First, data owners cannot personalize their privacy levels
as they are assumed to accept infinite losses when more money is
paid. Moreover, their mechanism cannot eficiently reduce query
prices because a query is computed on the entire dataset, and data
owners can easily untruthfully report their privacy valuation to
maximize the amount of payment given a linear payment scheme.
3.2
      </p>
    </sec>
    <sec id="sec-14">
      <title>Personal Data Trading Framework</title>
      <p>To balance data owners’ privacy loss and data buyer’s payment to
guarantee a fair trade, we propose a personal data trading
framework (see Figure 7) that involves three main participants: market
maker, data owner, and data buyer.</p>
      <p>Market maker is a mediator between the data buyer and data
owner. Market maker has some coordinating roles. First, market
maker serves as a trusted server that answers data buyer’s query
by accessing the data elements of data owners. Second, a market
maker computes and distributes payment to data owners whose
data have been accessed while keeping a small cut of the price as a
profit χ . Third, a market maker devises some payment schemes for
data owners to choose from. Our pricing mechanism is designed to
assist the market maker with his or her tasks.</p>
      <p>
        Data owner sells his/her data element xi by selecting the
maximum tolerable privacy loss εˆi and payment scheme wi . In DP, ε
is a real non-negative value that is dificult to determine to obtain
an exact level of utility. However, [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] conducted a study on an
economic method of setting ε. Thus, a good user interface is assumed
to help data owners understand and determine their εˆi .
      </p>
      <p>
        Data buyer purchases an aggregated query answer from the
market maker by specifying a query Q and a maximum budget
Wmax . Rather than asking the buyer to specify the variance in the
query answer, as in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], we design our mechanism to be able to
obtain the most optimal result with the least noise/errors within
the given budget Wmax , since data buyers are highly unlikely to
know which value of variance to specify to obtain their desired
utility within a limited budget. Thus, designing a mechanism to
tackle this issue helps buyers and market maker.
      </p>
      <p>Our framework works as follows. Data owner ui (xi , εˆi , wi ), i ∈
[1, n] sells his/her data element xi by demanding that the actual
privacy loss εi must not be greater than their specified εˆi while
payment should correspond to their selected payment scheme wi .
These data elements are stored by a trusted market maker. In the
pretrading stage, the data buyer issues a purchase request by specifying
his Q and Wmax . With the request, the market maker will run a
simulation and generate a price menu (see Table 2) with an average
privacy loss ε and a sample size corresponding to prices for the
buyer. This price menu provides an overview of the approximate
level of utility the buyer may receive for each price.</p>
      <p>The buyer reviews the ε and determines the amount of money he
is willing to pay. Once the market maker is notified of the purchase
decision, he will run the pricing mechanism (described in Section 4)
to select a number of representative samples RS from the dataset x
and then conduct a query computation by perturbing the answer to
ensure the privacy guarantee for all data owners whose data were
accessed. Next, the market maker distributes the payment to the
data owners in the selected sample RS and returns the perturbed
query answer P (Q(x )), the remaining budget Wr , the size of RS, and
the root mean squared error RMSE in the query answer. Note that
the transaction aborts when the market maker cannot meet their
requirements simultaneously.
4</p>
    </sec>
    <sec id="sec-15">
      <title>PRICING MECHANISM</title>
      <p>
        The pricing mechanism directs price and query computations for
data buyers and compensation computation for data owners whose
data have been accessed. A specially designed pricing mechanism is
required in this personal data market because information derived
from personal data, unlike other types of physical goods, does not
have any tangible properties. Thus, it is dificult to set a price or
calculate the traded value as asserted in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Similarly, [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] discussed why some conventional pricing models (i.e., the
cost-based pricing and competition-based pricing models) are not
able to price digitalized goods such as data and information. As
noted in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], the only feasible pricing model is the value-based
pricing model, through which the price is set based on the value that
the buyer perceives. In our framework, the utility of query results
determines the price, and this utility is significantly associated with
each data owner’s level of privacy loss.
4.1
      </p>
    </sec>
    <sec id="sec-16">
      <title>Baseline Pricing Mechanism</title>
      <p>
        To simply compute the query price, compensation, and perturbed
query result, the baseline pricing mechanism does not involve a
sampling procedure. It basically utilizes the entire dataset x in
computations to ensure the generation of an unbiased result. In addition,
the baseline pricing mechanism implements a simple personalized
diferentially private mechanism known as the Minimum
mechanism [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which satisfies εˆi -PDP by injecting the random noise
X drawn from a Laplace distribution with a scale b, denoted as
(X ∼ Lap(b)), where b = 1/Min(εˆ1, εˆ2, ..., εˆn ). The computational
run-time of this mechanism is much shorter than that of the
sophisticated balanced pricing mechanism, yet it generates a higher query
price for a result with more noise. This mechanism does not
consider a sophisticated allocation of compensation and perturbation,
so it just compensates all data owners ui ∈ x for the same privacy
loss εˆmin and satisfies all ui ∈ x with the minimum privacy loss
εˆmin resulting in a very low utility (with more noise). For a better
result, we propose a balanced pricing mechanism that takes into
account the weak performance of the baseline pricing mechanism.
4.2
      </p>
    </sec>
    <sec id="sec-17">
      <title>Balanced Pricing Mechanism</title>
      <p>In the balanced pricing mechanism, computations are conducted
through the use of three main algorithms: (1) sample h subsets of
data owners, (2) compute the query price and compensation for all
h subsets, and (3) perturb the query answer for all h subsets and
then select an optimal subset.</p>
      <p>Algorithm 1 samples h subsets of data owners. It computes the
size of an RS representative sample of a dataset x using the
statistical method given a dataset x , a confidence level score CLS, a
distribution of the selection DT , and a margin of error MER. Then,
the mechanism randomly selects diferent/not-duplicated data
owners for all the h diferent subsets. Due to the randomization of data
owner selection, the mechanism guarantees an optimal sampling
result by increasing the h because an optimal subset RS is selected
from all the h diferent subsets. The output of this algorithm is a
set of samples (RS1, RS2, ..., RSh ) used as an input in Algorithm 2.
Algorithm 1: Sample h subsets of data owners</p>
      <p>Input: x, DT , CLS, MER, and h</p>
      <p>Output: (RS1, RS2, ..., RSh )
1 SS ← DTM∗ECRL2S2 ;</p>
      <p>SS ∗ |x | ;
2 |RS | ← SS + |x | − 1</p>
      <sec id="sec-17-1">
        <title>3 while While h&gt;0 do</title>
        <p>4 RSh ← {ui |U ndupRandomize(1, |x |)|i ∈ [1, h]};
5 h ← h − 1;
6 end</p>
        <p>Algorithm 2 computes the query price and compensation for
all the h subsets. Given a data buyer’s maximum budget Wmax ,
query Q, dataset x , number of samples h, number of perturbations
Algorithm 2: Compute query price and compensation for all
h subsets</p>
        <p>Input: x , (RS1, RS2, ..., RSh ), Wmax , χ , h, and Φ</p>
        <p>Output: Wp , Wr , and (w1, w2, ..., wh )
1 Wab ← Wmax − χ ;</p>
      </sec>
      <sec id="sec-17-2">
        <title>2 while While h&gt;0 do</title>
        <p>3 j ← |RSh |;</p>
        <p>Íj
Wp ← { i=0 wui ∈RSh (εˆui ∈RSh )|i ∈ [0, j − 1]};
if Wp ≤ Wab then
while While j&lt; |x |&amp;&amp; Wp &lt; Wab do</p>
        <p>Wr ← Wab − Wp ;
RSh ← {uk |U ndupRandomize(1, |x |)};
j ← j + 1;
if W r &gt; wuk ∈x (εˆuk ∈x ) then</p>
        <p>Wp ← Wp + wuk ∈x (εˆuk ∈x );
εuk ∈x ← εˆuk ∈x ;
else
end</p>
        <p>Wp ← Wp + Wr ;
wuk ∈x ← Wr ;
εuk ∈x ← (wuk ∈x )−1;
end</p>
        <p>Wr ← Wab − Wp ;
else
lsT emp ← RSh ;
payment ← 0;
Wr ← 0;
Weq ← |lsWTeambp | ;
do
lsU nder Paid ← 0;
foreach ui ∈ lsT emp do
if wui (εˆui ) ≤ Weq then</p>
        <p>εui ← εˆui ;
else
end
payment ← payment + wui (εˆui );
wui ← Weq ;
εi ← (wui )−1;
lsU nder Paid ← lsU nder Paid + ui ;
payment ← payment + Weq ;
end
Wr ← Wab − payment ;</p>
        <p>Wr
Weq ← Weq + |lsU nder Paid |
lsT emp ← lsU nder Paid;
while Wr &gt; 0;
;</p>
        <p>Wp ← Wab ;
4
5</p>
        <p>RMSEm ←
6
7 end</p>
        <p>P (Q(RSm ))k ← {P E Q (x, RSm ) ∗ |R|Sx m| | |k ∈ [0, Φ − 1]};
s ÍΦ
k=0(P (Q(x ) − P (Q(RSm ))k ))
Φ
2
|k ∈
8 εmax ← Max (ε1, ε2, ..., εh );
9 optIndex ← {index |εindex = εmax };
10 RMSEopt ← RMSEopt I ndex ;
11 P (Q(RS))opt ← P (Q(RS))opt I ndex ;
12 wopt ← wopt I ndex ;
Φ, market maker’s benefit χ , and h subsets (RS1, RS2, ..., RSh ) from
Algorithm 1, the algorithm returns the query price Wp , remaining
budget Wr (if applicable), compensation wi (εi ) for each ui , and
average compensation wi for each subset because Algorithm 3
uses this result to select an optimal subset from all h subsets. The
algorithm first computes the available budget wab by subtracting
χ from the given Wmax . Next, the algorithm computes the total
payment Wp required when paying for the maximum privacy loss εˆi
of ui ∈ RSh . wui ∈RSh (εˆui ∈RSh ) denotes a payment for data owner
ui in RSh for εˆi . When Wp is smaller than Wab , the algorithm pays
each ui for εˆi while using Wr to include more data owners into
RSh by paying for εˆi or εi &lt; εˆi based on Wr . This process repeats
until all Wr = 0 or |RSh | = |x |, as the utility is influenced by both
the size of RS and by the privacy loss εi of all ui . Otherwise, when
Wp &gt; Wab , the algorithm determines the equal payment Weq for
each ui ∈ RSh and then verifies if each ui should be paid exactly
Weq or less when wui (εˆui ) &lt; Weq . The updated (RS1, RS2, ..., RSh )
as an output is used in Algorithm 3.</p>
        <p>With the output of Algorithm 2, Algorithm 3 perturbs the query
answer and selects an optimal subset from all h subsets. It computes
the average privacy loss ε and perturbed query result P (Q(RS))
based on the proportional diference between x and RSh by
multiplying result of P E by |x |/|RSh |, and RMSE in each (RS1, RS2, ..., RSh ).
It then selects an optimal RS with a maximum average privacy loss
of εmax denoting a high probability that less random noise is
included in the result. Finally, the algorithm finds the corresponding
RMSEopt , P (Q(RS))opt and wopt of the optimal RS selected.</p>
        <p>At the end, data buyers receive the perturbed query answer
P (Q(RS)) along with the remaining budget Wr (when applicable),
the number of data owners in RS, and the mean squared error RMSE
in the query answer. Data owners are then compensated according
to their actual privacy losses εi .
5</p>
      </sec>
    </sec>
    <sec id="sec-18">
      <title>EXPERIMENT</title>
      <p>Experimental setup: We divide the experiment into two
components: (1) the simulation of our balanced pricing mechanism and (2)
the comparison of our mechanism with the baseline pricing
mechanism. We examine the query price Wp , root mean squared error
RMSE, average privacy loss ε and average compensation w that
each ui obtained from both mechanisms and then conclude that
for the same Wp , which mechanism generates the smallest RMSE
value. Due to space constraints, we only show the experimental
result of the following count query Q: "How many people commute
by personal car in the USA?"</p>
      <p>
        Data preparation: From our survey, we obtained 486 records
of personal data from 486 data owners. To generate more accurate
experimental results, a larger dataset is preferable, so we
duplicated our survey dataset 500 times to obtain a larger dataset x of
243,000 records. To conduct such an experiment, each data record
must have two important variables: the maximum tolerable
privacy loss εˆ and a payment scheme w. For the sake of simplicity,
we assume εˆ ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] and two types of payment schemes (as
described in Section 3.1). In preparing our data, we generate these
two variables for each record/data owner according to the survey
answers. When a data owner has chosen to have very high and high
alterations/perturbations, they are classified under the conservative
group, so his or her εˆi values are set to 0.1 and 0.3, respectively. For
low and very low perturbations, the εˆi values are set to 0.7 and 0.9
respectively, and such data owners are categorized under the liberal
group. To optimize their benefits, we set the most optimal payment
scheme for them based on their εˆi values. For the conservative
group with εˆi values of 0.1 and 0.3, we set a payment scheme type
A, while type B is set for liberal group with εˆi values of 0.7 and 0.9.
In turn, we obtain a usable and large dataset for our experiment.
      </p>
      <p>Experiment results: We first conduct a simulation of our
mechanism (Figure 8) to explain the correlation between the query price
and RMSE , between the query price and average privacy loss ε,
and between the query price and average compensation value.
Figure 8a shows that the RMSE value decreases as the query price
increases. This pattern is reasonable in practice because the higher
the query price is, the lower the RMSE should be. Remarkably, the
RMSE value declines dramatically with query price from $5 to $50
but then gradually and slightly decreases for $50 to $1000. We can
attribute this phenomenon to the impact of privacy parameter εi
of each data owner ui and to the number |RS | of data owners
responding to the query. When the query price is approximately $50
or less, it can only cover the compensation of RS, so with the same
size |RS |, an increase in the query price (i.e., $5 to $50) can also
increase the ε value in RS. However, when the query price exceeds
what is needed to pay for εˆi for all ui in RS, the remaining budget
is used to include more data owners in RS, which can significantly
or marginally decrease the overall RMSE while increasing the ε
value depending on the distribution of data. When more
conservative data owners are included in RS, this can afect the ε value
resulting in just a minor decrease in RMSE despite more money
being spent. For this reason, the price menu plays a crucial role in
providing an overview on approximate degree of change in RMSE
values corresponding to query prices. In turn, data buyers can
decide whether it is worth spending more money for a minor decrease
(a) Query price and RMSE.
(b) Query price and ε.
(c) Query price and average compensation.
of RMSE within their available budgets. Figure 8b and Figure 8c
show a similar correlation pattern between the query price and ε
and between the query price and w. They show that the higher
the query price is, the higher ε and w values become. A marginal
decrease in ε with a significant rise in query price ($100 to $1000)
shown in Figure 8b can be attributed to the phenomenon illustrated
in Figure 8a whereby the RMSE value only slightly decreases within
a significant price increase.</p>
      <p>We next compare the results of our balanced pricing
mechanism with those of the baseline pricing mechanism (Figure 9). The
experimental results show that our balanced pricing mechanism
considerably outperforms the baseline mechanism under almost all
conditions. Figure 9a shows that our balanced mechanism produced
a noticeably smaller RMSE value for the same query price relative
to the baseline mechanism. In particular, our balanced mechanism
produced a significantly smaller RMSE value even when the query
price was set to be relatively low (i.e., $5) because instead of
querying from the entire dataset, our balanced pricing mechanism only
queries from a representative sample RS. This reduces the query
price while still generating a smaller RMSE. Due to random noise
drawn from the Laplace distribution, we can see that the RMSE
of the baseline mechanism, rather than declining, rises for query
prices $50 to $100. Figure 9b and Figure 9c show a similar pattern in
that the ε and w of our balanced pricing mechanism are significantly
higher than those of the baseline mechanism.
6</p>
    </sec>
    <sec id="sec-19">
      <title>DISCUSSION</title>
      <p>The above listed experiment results show that our balanced pricing
mechanism considerably outperforms the baseline pricing
mechanism. This is attributed to two main factors. First, we apply an
exponential-like P E mechanism (see Definition 3.4) to achieve
Personalized Diferential Privacy (PDP) to take advantage of the
individual privacy parameter εˆ of data owners, especially of the liberal
group. In contrast, the baseline mechanism can only apply a
minimum mechanism to achieve PDP by adding a large amount of
random noise drawn from Laplace distributions utilizing the smallest
εˆ of the entire dataset. Second, our mechanism produces a
considerably smaller RMSE for the same query price. In other words,
for the same level of utility, we can indeed reduce the query price,
as our mechanism only queries a small subset of a dataset while
generating unbiased results from a random sampling and selection
procedure. we thus exclusively compensate the data owners of the
queried subset, while the baseline mechanism must compensate all
data owners of a dataset to run a query on the dataset to obtain
unbiased results. Therefore, our balanced pricing mechanism is
more eficient than the baseline mechanism.</p>
      <p>In the price menu, it is important to illustrate trends of higher
prices and higher levels of approximate utility (denoted as ε).
However, Figure 8b shows a slight decrease in ε from $100 to $1000.
This phenomenon could be attributed to the number of samplings
h applied in the mechanism. Despite showing a budget increase,
it cannot fully guarantee that ε will increase due to the random
selection of data owners with various εˆ values. Thus, our naive
solution is to increase the price gap in the price menu to guarantee
a distinguished increase in ε for an increasing query price. More
discussion on this point will be included in our next work.</p>
      <p>
        It is also crucial to ensure that data owners can technically choose
an appropriate maximum tolerable privacy loss εˆi that reflects
their privacy attitude and risk orientation. This problem indeed
remains an open question in the diferential privacy community
regarding how to set the value of ε or εˆ in our setting. Although
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposed an economic method for choosing ε , this problem has
not been widely discussed. A part of solution, we provide some
options of εˆ = {0.1, 0.3, 0.7, 0.9} corresponding with {very high,
high, low, very low} data perturbation level. Very high perturbation
(i.e., εˆ = 0.1) means that more random noise is added to the result,
so the data owners have a very high privacy guarantee. However,
some data owners might not understand how the perturbation
works, so we can provide an interactive interface allowing them
to see the approximate change on their actual data for a diferent
value of εˆ. A similar concept of the interactive interface3 is used
to explain a perturbation via Laplace mechanism. Thus, we can
create a similar interface for exponential-like data perturbation
mechanism to assist data owners and buyers to understand the
meaning of εˆ.
      </p>
    </sec>
    <sec id="sec-20">
      <title>7 RELATED WORK</title>
      <p>
        In the field of pricing mechanism design, there are two crucial
focuses of research: auction-based pricing and query-based pricing.
Auction-based pricing has attracted the attention of [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
and [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Auction-based pricing allows data owners to report their
data valuations and data buyers to place a bid. From a practical
point of view, it is very dificult for individuals to articulate their
data valuations as reported in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Moreover, the price described in
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] is eventually determined by the data buyer without
considering data owners’ privacy valuations or actual privacy losses. On the
other hand, query-based pricing, as defined in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], involves the
capacity to automatically derive the prices of queries from given data
valuations. The author in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] also proposes a flexible arbitrage-free
query-based pricing model that assigns prices to arbitrary queries
based on the pre-defined prices of view. Despite this flexibility, the
price is non-negotiable. The buyer can obtain a query answer only
when he or she is willing to pay full price. Unfortunately, this model
is not applicable to personal data trading, as it takes no account
of issues of privacy preservation. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] extended and adapted the
model by applying diferential privacy for privacy preservation
and for the quantification of data owners’ privacy losses, yet this
method still presents a number of problems, as explained in Section
3.1.
      </p>
    </sec>
    <sec id="sec-21">
      <title>8 CONCLUSION AND FUTURE WORK</title>
      <p>We analyzed people’s privacy attitude and levels of interest in data
trading, then identified five key principles for designing a
reasonable personal data trading framework. For an operational market,
we proposed a reasonable personal data trading framework based
on our key principles. In addition, we proposed a balanced pricing
mechanism that balances money with privacy to ofer more utility
to both data owners and data buyers without circumvention. Finally,
we conducted various experiments to simulate our mechanism and
to prove its considerably higher degree of eficiency in
comparison to a baseline pricing mechanism. The results show that our
study has identified and tackled some radical challenges facing the
market, thus facilitating the existence of the personal data market.</p>
      <p>Having investigated the challenges of this market, we identify a
number of interesting avenues for future work. To obtain an optimal
query answer and price, it is crucial to carefully design a payment
scheme using game theory. In the present study, we only designed
two types of payment schemes for liberal and conservative data
owners. We will develop a more sophisticated design in our future
work. Moreover, in our study, a market maker is assumed to be
a trusted server storing and accessing data owners’ data on their
behalf, yet to some extent, trust has become a dificult question to
address from both technical and social standpoints. Thus, for future
work, we can consider a trading framework and pricing mechanisms
in which market makers are assumed to be untrustworthy..</p>
    </sec>
    <sec id="sec-22">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was supported through the A. Advanced Research
Networks JSPS Core-to-Core Program. The work was also supported
through JSPS KAKENHI Grant Numbers 16K12437 and 17H06099.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Acquisti</surname>
          </string-name>
          ,
          <article-title>Leslie K John,</article-title>
          and George Loewenstein.
          <year>2013</year>
          .
          <article-title>What is privacy worth?</article-title>
          <source>The Journal of Legal Studies</source>
          <volume>42</volume>
          ,
          <issue>2</issue>
          (
          <year>2013</year>
          ),
          <fpage>249</fpage>
          -
          <lpage>274</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Christina</given-names>
            <surname>Aperjis</surname>
          </string-name>
          and
          <article-title>Bernardo a</article-title>
          .
          <source>Huberman</source>
          .
          <year>2012</year>
          .
          <article-title>A market for unbiased private data: Paying individuals according to their privacy attitudes</article-title>
          .
          <source>First Monday</source>
          <volume>17</volume>
          ,
          <issue>5</issue>
          (
          <year>2012</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          . https://doi.org/10.5210/fm.v17i5.
          <fpage>4013</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Cynthia</given-names>
            <surname>Dwork</surname>
          </string-name>
          and
          <string-name>
            <given-names>Aaron</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The Algorithmic Foundations of Diferential Privacy</article-title>
          .
          <source>Foundations and Trends in Theoretical Computer Science 9</source>
          ,
          <year>2013</year>
          (
          <year>2014</year>
          ),
          <fpage>211</fpage>
          -
          <lpage>407</lpage>
          . https://doi.org/10.1561/0400000042
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Federico</given-names>
            <surname>Ferretti</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>EU competition law, the consumer interest and data protection: The exchange of consumer information in the retail financial sector</article-title>
          . Springer. 116 pages.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Lisa</surname>
            <given-names>K Fleischer</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yu-Han Lyu</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Approximately optimal auctions for selling privacy when costs are correlated with data</article-title>
          .
          <source>In Proceedings of the 13th ACM Conference on Electronic Commerce. ACM</source>
          ,
          <volume>568</volume>
          -
          <fpage>585</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Arpita</given-names>
            <surname>Ghosh</surname>
          </string-name>
          and
          <string-name>
            <given-names>Aaron</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Selling privacy at auction</article-title>
          .
          <source>Games and Economic Behavior</source>
          <volume>91</volume>
          (
          <year>2015</year>
          ),
          <fpage>334</fpage>
          -
          <lpage>346</lpage>
          . https://doi.org/10.1016/j.geb.
          <year>2013</year>
          .
          <volume>06</volume>
          .013
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Justin</given-names>
            <surname>Hsu</surname>
          </string-name>
          , Marco Gaboardi, Andreas Haeberlen, Sanjeev Khanna, Arjun Narayan, and Benjamin C Pierce.
          <year>2014</year>
          .
          <article-title>Diferential Privacy: An Economic Method for Choosing Epsilon</article-title>
          .
          <source>IEEE Computer Security Foundations Symposium (CSF)</source>
          (
          <year>2014</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          . https://doi.org/10.1109/CSF.
          <year>2014</year>
          .35
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Zach</given-names>
            <surname>Jorgensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ting</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Graham</given-names>
            <surname>Cormode</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Conservative or liberal? Personalized diferential privacy</article-title>
          .
          <source>Proceedings - International Conference on Data Engineering</source>
          <year>2015</year>
          -May (
          <year>2015</year>
          ),
          <fpage>1023</fpage>
          -
          <lpage>1034</lpage>
          . https://doi.org/10.1109/ICDE.
          <year>2015</year>
          . 7113353
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jyoti</given-names>
            <surname>Mandar Joshi</surname>
          </string-name>
          and
          <string-name>
            <given-names>G M</given-names>
            <surname>Dumbre</surname>
          </string-name>
          .
          <year>2017</year>
          . Basic Concept of E-Commerce.
          <source>International Research Journal of Multidisciplinary Studies</source>
          <volume>3</volume>
          ,
          <issue>3</issue>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Paraschos</surname>
            <given-names>Koutris</given-names>
          </string-name>
          , Prasang Upadhyaya, Magdalena Balazinska, Bill Howe, and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Suciu</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Query-based data pricing</article-title>
          .
          <source>Proceedings of the 31st symposium on Principles of Database Systems - PODS '12 62</source>
          ,
          <issue>5</issue>
          (
          <year>2012</year>
          ),
          <volume>167</volume>
          . https://doi.org/10. 1145/2213556.2213582
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Kenneth</surname>
            <given-names>C</given-names>
          </string-name>
          <string-name>
            <surname>Laudon</surname>
          </string-name>
          .
          <year>1996</year>
          .
          <article-title>Markets and Privacy</article-title>
          .
          <source>Commun. ACM 39</source>
          ,
          <issue>9</issue>
          (
          <year>1996</year>
          ),
          <fpage>92</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Chao</surname>
            <given-names>Li</given-names>
          </string-name>
          , Daniel Yang Li,
          <string-name>
            <given-names>Gerome</given-names>
            <surname>Miklau</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Suciu</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>A Theory of Pricing Private Data</article-title>
          .
          <source>ACM Transactions on Database Systems</source>
          <volume>39</volume>
          ,
          <issue>4</issue>
          (
          <year>2014</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          . https://doi.org/10.1145/2448496.2448502
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Christopher</surname>
            <given-names>Riederer</given-names>
          </string-name>
          , Vijay Erramilli, Augustin Chaintreau, Balachander Krishnamurthy, and
          <string-name>
            <given-names>Pablo</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>For sale: Your data</article-title>
          ,
          <source>By: You. Proceedings of the 10th ACM Workshop on Hot Topics in Networks - HotNets '11</source>
          (
          <year>2011</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . https://doi.org/10.1145/2070562.2070575
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Aaron</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Buying Private Data at Auction : The Sensitive Surveyor âĂŹ s Problem</article-title>
          .
          <volume>11</volume>
          ,
          <issue>1</issue>
          (
          <year>2012</year>
          ),
          <fpage>3</fpage>
          -
          <lpage>8</lpage>
          . https://doi.org/10.1145/2325713.2325714
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Tang</given-names>
            <surname>Ruiming</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>on the Quality and Price of Data</article-title>
          .
          <source>Ph.D. Dissertation</source>
          . National University OF Singapore. http://www.scholarbank.nus.edu.sg/bitstream/handle/ 10635/107391/TANG-Ruiming-thesis.
          <source>pdf?sequence=1</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Mario</surname>
            <given-names>Sajko</given-names>
          </string-name>
          , Kornelije Rabuzin, and
          <string-name>
            <given-names>Miroslav</given-names>
            <surname>Bača</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>How to calculate information value for efective security risk assessment</article-title>
          .
          <source>Journal of Information and Organizational Sciences</source>
          <volume>30</volume>
          ,
          <issue>2</issue>
          (
          <year>2006</year>
          ),
          <fpage>263</fpage>
          -
          <lpage>278</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Carl</given-names>
            <surname>Shapiro</surname>
          </string-name>
          and
          <string-name>
            <surname>Hal R Varian</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Versioning: the smart way to</article-title>
          .
          <source>Harvard Business Review</source>
          <volume>107</volume>
          ,
          <issue>6</issue>
          (
          <year>1998</year>
          ),
          <fpage>107</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Jacopo</surname>
            <given-names>Staiano</given-names>
          </string-name>
          , Nuria Oliver, Bruno Lepri, Rodrigo de Oliveira, Michele Caraviello, and
          <string-name>
            <given-names>Nicu</given-names>
            <surname>Sebe</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Money walks: a human-centric study on the economics of personal mobile data</article-title>
          .
          <source>In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM</source>
          ,
          <volume>583</volume>
          -
          <fpage>594</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>