<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Optimizing Hierarchical Queries for the Attribution Reporting API</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matthew Dawson</string-name>
          <email>mwdawson@google.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Badih Ghazi</string-name>
          <email>badihghazi@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pritish Kamath</string-name>
          <email>pritishk@google.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kapil Kumar</string-name>
          <email>kkapil@google.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ravi Kumar</string-name>
          <email>ravi.k53@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bo Luan</string-name>
          <email>luanbo@google.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pasin Manurangsi</string-name>
          <email>pasin@google.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nishanth Mundru</string-name>
          <email>nmundru@google.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harikesh Nair</string-name>
          <email>hsnair@google.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adam Sealfon</string-name>
          <email>adamsealfon@google.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shengyu Zhu</string-name>
          <email>shengyuzhu@google.com</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>We study the task of performing hierarchical queries based on summary reports from the Attribution Reporting API for ad conversion measurement. We demonstrate that methods from optimization and diferential privacy can help cope with the noise introduced by privacy guardrails in the API. In particular, we present algorithms for (i) denoising the API outputs and ensuring consistency across diferent levels of the tree, and (ii) optimizing the privacy budget across diferent levels of the tree. We provide an experimental evaluation of the proposed algorithms on public datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;ad conversion measurement</kwd>
        <kwd>hierarchical aggregation</kwd>
        <kwd>diferential privacy</kwd>
        <kwd>attribution reporting API</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>campaign</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Over the last two decades, third-party cookies [1] have
(abede.gecn.o,naesvcseleircnskitoiaonlrmtaoevoainseulwirne)emoanednavte,prwutibhsleiinrseghb,eyarnasdintepaadorrtimiacpupplraecrslosyuiotlndo NewYork
be joined to a conversion on the advertiser, in order to
compute aggregate conversion reports (e.g., the number Fr
of conversions attributed to a subset of impressions) or iday
to train ad bidding models (e.g., [2, 3, 4, 5]). However, in
recent years, privacy concerns have led several browsers
to decide to deprecate third-party cookies, e.g., [6, 7, 8]. Figure 1: Example of Hierarchical Queries.
The Attribution Reporting API [9, 10] seeks to provide
privacy-preserving ways for measuring ad conversions
on the Chrome browser and the Android mobile
operating system. This API relies on a variety of mechanisms
for limiting the privacy leakage, including bounding the
contributions to the output reports of the conversions
attributed to each impression, as well as noise injection
to satisfy diferential privacy (for more details, see
Section 2).</p>
      <p>We study the conversion reporting task, where a query
consists of counting the number of conversions attributed
to impressions such that some features of the conversion
and the impression are restricted to certain given
values. In particular, we focus on the hierarchical queries
setting where the goal is to estimate the number of
conday
budget across the diferent levels of the tree (Section 5). probability mass function at integer  is −+11 · − ||.</p>
      <p>We start by recalling in Section 2 the basics of dif- The -dimensional discrete Laplace mechanism with
paferential privacy and summarizing the query model for rameter  applied to a function  :   → Z, on input
summary reports in the Attribution Reporting API. In a dataset  ∈  , returns  () +  where  is a
Section 3, we formally define the optimization problem dimensional noise random variable whose coordinates are
that we seek to solve. In Section 6, we provide an ex- sampled i.i.d. from DLap().
perimental evaluation of our algorithms on two public
datasets. We conclude with some future directions in Lemma 6. For every  &gt; 0, the -dimensional discrete
Section 7. Laplace mechanism with parameter  ≤ /∆ 1 is -DP.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Preliminaries</title>
      <sec id="sec-3-1">
        <title>2.1. Diferential Privacy (DP)</title>
        <p>Let  be the number of rows in the dataset and let  be
the (arbitrary) set representing the domain of values for
each row. We distinguish two types of columns (a.k.a.
attributes): known and unknown. We also assume
knowledge of the set of possible values that each unknown
attribute can take.</p>
        <p>Definition 1 (DP [13]). For  ≥ 0, an algorithm  is
-DP if for every pair , ′ of datasets that difer on
the unknown attributes of one row1, and for every possible
output , it holds that Pr[() = ] ≤  · Pr[(′) =
].</p>
        <p>Lemma 2 (Basic Composition). Let  be an algorithm
that runs  algorithms 1, . . . ,  on the same dataset
such that  is -DP with  ≥ 0 for each  ∈ []. Then,
 is (∑︀</p>
        <p>=1 )-DP.</p>
        <p>Lemma 3 (Post-processing). Let  &gt; 0, and  and ′ be
any two sets. If  :   →  is an -DP algorithm, and
 :  → ′ is any randomized mapping, then ( ∘  ) :
  → ′ is -DP.</p>
        <p>For an extensive overview of DP, we refer the reader
to the monograph [18]. A commonly used method in DP
is the discrete Laplace mechanism. To define it, we recall
the notion of ℓ1-sensitivity.</p>
        <p>Definition 4 (ℓ1-sensitivity). Let  be any set, and  :
  → R be a -dimensional function. Its ℓ1-sensitivity
is defined as ∆ 1 := max,′ ‖ () −  (′)‖1, where
 and ′ are two datasets that difer on the unknown
attributes of a single row.</p>
        <p>Definition 5 (Discrete Laplace Mechanism). The
discrete Laplace distribution centered at 0 and with
parameter  &gt; 0, denoted by DLap(), is the distribution whose
1We note that this instantiation of the DP definition is related to
the label DP setting in machine learning [14, 15, 16, 17], where the
features of an example are considered known and only its label is
deemed unknown and sensitive. In our use case, there might be
multiple unknown attributes, whose concatenation is treated in a
conceptually similar way to the label in the label DP setting.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Attribution Reporting API</title>
        <p>The aggregatable reports [10] are constructed as follows:
• Impression (a.k.a. source) registration: The API
provides a mechanism for the ad-tech to register an
impression (e.g., a click or view) on the publisher site
or app. During registration, the ad-tech can specify
impression-side aggregation keys (e.g., one
corresponding to the campaign or geo location).
• Conversion (a.k.a. trigger) registration: The API
also provides a mechanism for the ad-tech to register
a conversion on the advertiser site or app. As it does
so, the ad-tech can specify conversion-side key pieces
along with aggregatable values corresponding to each
setting of the impression-side aggregation keys. E.g.,
a conversion-side key piece could capture the
conversion type or (a discretization of) the conversion
timestamp. The combined aggregation key (which can be
thought of as the concatenation of the impression-side
aggregation key and the conversion-side key piece)
is restricted to be at most 128 bits. The aggregatable
value is required to be an integer between 1 and the 1
parameter of the API, which is set to 216 = 65,536.
• Attribution: The API supports last-touch attribution
where the conversion is attributed to the last
(unexpired) impression registered by the same ad-tech. (The
API supports a broader family of single-touch
attribution schemes allowing more flexible prioritization over
impressions and conversions; we do not discuss this
aspect any further as it is orthogonal to our algorithms.2)
• Histogram contributions generation: The API
enforces that the sum of contributions across all
aggregatable reports generated by diferent conversions
attributed to the same impression is capped to at most
the 1 = 216 parameter. An example construction of
histogram contributions is in Figure 2. While in this
example the conversion key piece depends only on
conversion information (the conversion day), it can be
2The API moreover enforces a set of rate limits on registered
impressions, and attributed conversions; we omit discussing these
since they are not essential to the focus of this paper. We refer the
interested reader to the documentation [19].</p>
        <p>Impression
(campaign = 123, location = New York)
Conversion
(day = Friday)
aggregation_keys:
“Level 1: campaign”: “123”
“Level 2: campaign, location”: “123, New York”
“Level 3: campaign, location,”: “123, New York”
aggregatable_trigger_data:
key_piece: “”, source_key: “Level 1: campaign”
key_piece: “”, source_key: “Level 2: campaign, location”
key_piece: “Friday”, source_key: “Level 3: campaign, location,”
Ad-tech
Publisher
aggregatable_values:
“Level 1: campaign”: 21845
“Level 2: campaign, location”: 21845
“Level 3: campaign, location”: 21845
Ad-tech
Advertiser
Histogram Contributions
key: “123”, value: 21845
key: “123, New York”, value: 21845
key: “123, New York, Friday”, value: 21845
Browser</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Optimization Problem</title>
      <p>3The * in the conversion-related field in Table 1 indicates that the
click corresponding to that row did not get an attributed conversion.</p>
      <p>At query time, a set of histogram keys is requested by
the ad-tech [21]. (The set of keys that are queried could
be set to the Cartesian product of all known values of 3.1. Hierarchical Query Estimation
the impression-side features with the set of all possible We formally define the hierarchical query estimation
probvalues of the conversion-side features.) The aggregatable lem. Given a dataset , consider a tree where each node
reports are combined in the aggregation service to pro- corresponds to the subset of the rows of , conditioned
duce a summary report by applying the discrete Laplace on the values of some of the attributes. We consider the
mechanism (Definition 5) with parameter /1 to the setting where each level of the tree introduces a
conrequested aggregation keys. This report satisfies -DP ditioning on the value of a new attribute. For known
where each row of the dataset corresponds to an impres- attributes, the child nodes of a node correspond to the
sion and its attributed conversions if any (see Table 1 for diferent values taken by that attribute in  within the
an example3), and where the known columns in Defini- rows. For unknown attributes, the child nodes correspond
tion 1 are the attributes that only depend on impression to all possible values for that attribute, whether or not
information (campaign and location in Table 1). they actually occur in the dataset. Given this tree, the</p>
      <p>In the hierarchical aggregation setting, each node of problem is to privately release the approximate number
the tree corresponds to an aggregation key, and the level of data rows corresponding to each node that have an
attributed conversion.
√︃ [︂ (︁</p>
      <p>E</p>
      <p>|^− | ︁) 2]︂
max(, )
domness of ˆ.</p>
      <sec id="sec-4-1">
        <title>3.2. Error Measure and Consistency</title>
        <p>We consider the following error measure, which is
deifned in [22].</p>
        <p>Definition 7 (RMS Relative Error at Threshold). For
a count  ≥
0, and its randomized estimate ˆ ∈
R,
the Root Mean Squared Relative Error at Threshold 
when estimating  by ˆ is defined as RMSRE (, ˆ) :=
, where the expectation is over the
ran</p>
        <sec id="sec-4-1-1">
          <title>Algorithm 1 CombineEstimates</title>
          <p>Input: (; var), (; var) : ,  are samples from
random variables ,  resp. such that E  = E 
and variances Var() = var and Var( ) = var.</p>
          <p>Output: (; var) : combined estimate and variance.
var ←
 ←
var· var
var+var
var ·
︁(

var
+ var</p>
          <p>)︁
return (; var)
of conversions as in Figure 1) at every node in a tree. the counts of its children, we can obtain a second
inde</p>
          <p>Suppose we have the count estimates (e.g., the number
Each conversion contributes to the count for multiple
nodes. For example, a conversion for ad campaign 123
that occurs on a Friday in New York contributes to the
6th leaf from the left, but also to each of its ancestor
nodes. This imposes relationships among the counts at
various nodes. If the only geo locations with conversions
attributed to ad campaign 123 on Friday are New York
and Chicago, then the total number of such conversions
must equal the sum of the number of such conversions in
each of the two locations. More generally, the count for
An estimator with this property is called consistent.</p>
          <p>For a tree  with levels 0, . . ., , with  being the
set of nodes at level , and estimators ˆ of the counts 
at each node , define the tree error</p>
          <p>RMSRE ( ) to be
⎯
⎸
⎸
⎸
⎷  + 1
1
 ⎛
∑︁
=0</p>
          <p>1
⎝ || ∈</p>
          <p>⎞
∑︁ RMSRE (, ˆ)2⎠ .</p>
          <p>The goal of the hierarchical query estimation
problem is to privately estimate the counts of every node
with minimum possible tree error, where the estimates
should be consistent. To achieve this, we will employ
post-processing and privacy budgeting strategies.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Post-processing Algorithms</title>
      <p>Directly applying the discrete Laplace mechanism to add
independent noise to each node does not result in a
consistent estimate. Consistency can be achieved by estimating
only the counts of leaves and inferring the count of each
but this can lead to large error for nodes higher up in
the tree. Alternatively, one can achieve consistency by
post-processing independent per-node estimates. Since
DP is preserved under post-processing (Lemma 3), this
comes at no cost to the privacy guarantee, and it can
substantially improve accuracy.</p>
      <p>Since the count of any node must equal the sum of
pendent estimate of the count of any nonleaf node by
summing the estimates of its children. We can combine
these two estimates to obtain a single estimate of lower
variance. Extending this observation, Hay et al. [11]
and Cormode et al. [12] give eficient post-processing
algorithms for regular (every non-leaf has the same
number of children) and balanced (every leaf is at the same
depth) trees, achieving consistency and also a
substantial improvement in accuracy. In particular, estimating
the counts of each node can be expressed as a linear
least-squares solution, which is known to achieve
optimal error variance among all unbiased linear estimators.</p>
      <p>Moreover, this special case of least-squares regression
can be solved in linear time.</p>
      <p>We generalize this algorithm to arbitrary trees in
Algorithm 2. This allows us to handle trees with diferent
fanouts and noise at diferent levels or even at diferent
nodes in the same level, which is the case for the
conversion reporting trees we study. The input to the algorithm
are independent noisy counts  for all nodes  in  ,
with E  =  and variance Var() = var.</p>
      <p>The key idea is a simple method (Algorithm 1) to
combine two independent and unbiased estimates of the same
quantity to get an estimate with reduced variance as
follows: Suppose  and  are two independent random
variables with E  = E  = , with variances var
and var respectively, then the optimal convex
combination of  and  that minimizes the variance in the
estimate of  is inversely proportional to the variances,
namely  = (var ·  +var · )/(var +var ), which
has variance Var() = (var · var )/(var + var ).</p>
      <p>Algorithm 2 comprises of two linear-time passes. The
For each non-leaf node  it recursively computes ↑ ,
which is an optimal linear unbiased estimate of , using
only the noisy counts  for all  that are in the
subtree rooted at , excluding . This is combined with the
noisy count  to compute ⇑ , which is an optimal linear
unbiased estimate of , using only the noisy counts 
for all  that are in the sub-tree rooted at , including .
any node must equal the sum of the counts for its children. regression problem, and these algorithms compute the
nonleaf node by adding the counts of its leaf descendants, first pass proceeds bottom-up from the leaves to the root.</p>
      <sec id="sec-5-1">
        <title>Algorithm 2 TreePostProcessing</title>
        <p>Params: Tree  with root .</p>
        <p>Input: (; var)∈ : noisy counts and variances for all  ∈  .
Output: (̂︀; var)∈ : post-processed estimates and variance for all .
# Bottom-up pass
for leaf  ∈  do
◁ (⇑ ; var⇑ ) ←
◁ (↑ ; var↑) ←
◁ (⇑ ; var⇑ ) ←
# Top-down pass
For root :
◁ (; v̂a︁r) ←</p>
        <p>̂︀
◁ (⇓; var⇓ ) ←
◁  ←
parent()</p>
        <p>︃(
◁ (↓ ; var↓) ←
◁ (; v̂a︁r) ←</p>
        <p>̂︀
◁ (⇓ ; var⇓ ) ←
return (; v̂a︁r)∈
̂︀
(⇑; var⇑ )
(; var)
for each internal node  from largest to smallest depth do
(; var)
︁( ∑︀
∈child() ⇑ ; ∑︀
∈child() var⇑</p>
        <p>︁)</p>
        <p>CombineEstimates((; var), (↑ ; var↑))
for each non-root node  from smallest to largest depth do
⇓ −</p>
        <p>∑︀
∈child()∖{}
⇑ ; var⇓ +</p>
        <p>∑︀
∈child()∖{}
var⇑
CombineEstimates((⇑ ; var⇑ ), (↓ ; var↓))
CombineEstimates((; var), (↓ ; var↓))
)︃
# equals (⇓ − ↑ + ⇑ ; var⇓ + var↑ − var⇑ )

⇑
↑
⇓</p>
        <p>↓
all nodes.
 ∈  , ̂︀ is the best linear unbiased estimator (BLUE) of</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Privacy Budgeting Over Tree</title>
    </sec>
    <sec id="sec-7">
      <title>Levels</title>
      <p>uct_country, device_type, product_age_group &amp;
time_delay_for_conversion. The last attribute is
discretized into 2-day (or 6-day) buckets so that there are at
To allocate the privacy budget across the levels of the most 15 (5 respectively) possible values of the rounded
tree, the simplest approach is to divide it equally among time delay. The first 4 attributes are considered known
the levels, or to put all of the budget on the lowest level. (they only relate to the impression), whereas the last is
However, basic composition (Lemma 2) allows us to allo- considered unknown. For the discretization into 6-day
cate the privacy budget arbitrarily among the nodes of buckets, we retain the unknown attribute and 3 known
the tree, and we can apply post-processing (Algorithm 2) attributes instead of 4 (omitting product_age_group),
to noisy initial estimates with unequal variances as well. resulting in a depth 4 tree (instead of depth 5).
This motivates the question of whether we can improve In Figure 4, we plot RMSRE versus  for various
accuracy with an unequal privacy budget allocation, and diferent methods evaluated on the later 45 days of data,
if so, how. namely,</p>
      <p>Given the true counts , we use a greedy iterative ap- • Equal privacy budget split across levels without any
proach for optimizing the privacy budget allocation. Let post-processing.
 be the number of phases (e.g.,  = 20) corresponding • Equal privacy budget split across levels with
postto the granularity of the allocation. Initially allocate zero processing (Algorithm 2).
(or infinitesimal) privacy budget to each level, and divide • All privacy budget on leaves with post-processing.
the (remaining) privacy budget into  units of size /. • Prior-based optimized privacy budget split across
levIn each of  phases, select the level that would result els optimized for each partner_id without any
postin the lowest RMSRE ( ) when using / additional processing.
privacy budget, and increase the privacy budget of that • Prior-based optimized privacy budget split across
levlevel by /. The error RMSRE ( ) can be computed els optimized for each partner_id with post-processing
directly using the variance v̂a︁r of each post-processed (Algorithm 2).
estimate  as returned by Algorithm 2. We present the For the prior-based privacy budget split optimization,
details of̂t︀his greedy iterative approach in Appendix B. the privacy budgeting was performed using a noisy prior</p>
      <p>Using the true counts to optimize the privacy budget computed on the first 45 days (privately estimated with
allocation can leak sensitive information and violate the  = 1 and an equal privacy budget split over levels). For
privacy guarantee, and is infeasible in the API. Instead of partner_id’s that appear only in the later 45 days, the
using the true counts to allocate the privacy budget, one prior is computed on all the partner_id’s that appear
can use alternatives such as simulated data, or historical in the first 45 days of data. E.g., for  = 4 and the
data that is not subject to the privacy constraints (e.g., depth 5 tree, RMSRE10 ≈ 0.1, and for the depth 4 tree,
before third-party cookie deprecation), or noisy histor- RMSRE5 ≈ 0.17.
ical data that has already been protected with DP (e.g.,
the output of the API over data from a previous time
period). We refer to this family of privacy budget
optimization methods as prior-based. When no such alternatives
are available, one could start out with a suboptimal
privacy budgeting strategy (e.g., uniform privacy budgeting
across levels) and improve the allocation over time.</p>
    </sec>
    <sec id="sec-8">
      <title>7. Conclusion and Future</title>
    </sec>
    <sec id="sec-9">
      <title>Directions</title>
      <p>Another natural direction is to extend the consistency
enforcement algorithm to ensure monotonicity (i.e., that the
output estimate for a node of the tree is at least the output
In this work, we studied hierarchical querying of the At- estimate for any of its children), and non-negativity. Our
tribution Reporting API, and presented algorithms for privacy budgeting method optimizes for one privacy
paconsistency enforcement and privacy budgeting, demon- rameter per-level of the tree; it would be good to explore
strating their performance on two public ad datasets. the extent to which per-node privacy budgeting can yield
We next discuss some interesting future directions. We higher accuracy. While we considered data-independent
focused on the so-called OPC (“one per click”) setting weights when defining the tree error in terms of the node
where each impression gets at most a single attributed errors, there could be situations where data-dependent
conversion. An important direction is to consider the weights are preferable, e.g., to avoid the tree error being
extension to the more general MPC (“many per click”) dominated by the error in many nodes with no
conversetting where an impression can get multiple attributed sions, while being insensitive to the error of few nodes
conversions. It would also be interesting to extend our where most of the conversions occur. Another interesting
treatment of conversion counts to the task of estimating direction to study privacy budgeting under approximate
conversion values. While the error is small for values of DP [25]. Our work considered the hierarchical query
 around 16 in our evaluation, we note that this is spe- model; a natural direction is to optimize the direct query
cific to the datasets and the (restricted) functionality that model [26]. Finally, the Attribution Reporting API also
we study (i.e., conversion counts with OPC). Achieving ofers event-level reports [27]. It would be interesting to
small errors on additional functionalities (e.g., MPC or see if these could be also used to further improve the
conversion values) will likely require larger values of . accuracy for hierarchical queries.
8.0
2.0
(c) 
4.0
epsilon
= 5, depth 3 tree
8.0
16.0
1.0
2.0
8.0</p>
      <p>16.0
4.0
epsilon
(d) 
= 10, depth 3 tree
leaves, and  ′ denote the tree with its children removed.</p>
      <p>For all  ∈  ′, define
(′; var′) =
{︃(; var)
(⇑ ; var⇑ )
if  ̸= * ,
if  = * .
we have
Similar to above, let (′↑; var′↑), (′⇑; var′⇑), (′↓; var′↓),
(′⇓; var′⇓), (̂︀′; v̂a︁r′) be as defined in Algorithm 2
when run on the tree  ′ with input (′; var′)∈ ′ . From
the inductive hypothesis, for all internal nodes  ∈  ′,
̂︀′ =</p>
      <p>∑︁
∈child()</p>
      <p>̂︀′,
and for every leaf  ∈  ′, we have</p>
      <p>∑︁
∈anc()
′</p>
      <p>When we run Algorithm 2 on  , after setting ⇑* and
var↑* , the rest of the bottom-up pass is exactly the same
as that of the run on  ′. Similarly, the top-down pass of
 ′ is the same as that of  except that in  we also set the
values of ̂︀ and v̂a︁r for all  ∈ child(* ). Therefore,
̂︀ =
{︃̂︀′
var↓· +var· ↓
var↓+var
if  ∈/ child(* ),
if  ∈ child(* ),</p>
      <p>(4)
where
(↓ ; var↓) = (̂︀* − ↑* + ; v̂a︁r* + var↑* − var).</p>
      <p>Here we used the observation that for all leaves  of  ′
(including * ), it holds that (′⇑; var′⇑) = (; var)
and hence (′⇓; var′⇓) = (̂︀′; v̂a︁r′) = (; v̂a︁r).</p>
      <p>̂︀
have
(Consistency) For  ̸= * , Equation (4) implies that
the LHS and RHS of the desired consistency condition
is exactly the same as that of Equation (2). So it only
remains to show consistency for  = * . First, we simplify
Equation (4) by noting that var↓ + var = var⇓* + var↑*
for all  ∈ child(* ), and hence for  ∈ child(* ) we</p>
      <p>︁( ⇓* − ↑* ︁)
var
var⇓* + var↑*
var (︃
var↑
var↑*
var (︁ ̂︀* − ↑* ︁)</p>
      <p>.
var↑* ⇓* + var⇓* ↑*
var⇓* + var↑*
− ↑*</p>
      <p>)︃
︃(
 +
var</p>
      <p>)︃
where we use that ↑*
= ∑︀</p>
      <p>∈child(* ) . Thus, the
 as desired.</p>
      <p>consistency condition holds for every internal node  ∈
(1)
(2)
(3)</p>
      <p>Then, the least squares estimator ˆ defined as the
minigiven by ̂︀ = (ˆ), which clearly satisfies consistency</p>
      <p>weighted root-to-leaf
sum preservation properties of Lemma 9. Conversely, if
̂︀
() satisfies consistency, then it must be of the form
˜ for ˜ = ̂︀ for all leaves , and if it satisfies the
weighted root-to-leaf sum preservation property, then as
shown above ˜ must be the least-squares estimator.</p>
      <p>Thus, from Lemma 9, we have that the () returned
by Algorithm 2 satisfies the two properties, we have that
each  is the BLUE for the corresponding true count</p>
      <p>̂︀</p>
      <p>Finally, to see that v̂a︁r is indeed the variance of
estimate ̂︀, we recursively show that var↑, var⇑ , var↓, var⇓
are the variances corresponding to ↑ , ⇑ , ↓ , ⇓
respectively. The key property to verify is that each  quantity
involves a linear combination of estimates which depend
on noisy counts of disjoint parts of the tree and hence
are independent. Thus, we can repeatedly use the
property that for independent drawn  and  , it holds that
Var(
+  ) =  2 Var() +  2 Var( ).
Algorithm 3 presents the greedy iterative approach we
used for optimizing the privacy budget allocation using
true counts , as alluded to in Section 5. The high level
idea is as follows. We choose a parameter  to be a
number of phases (e.g.,  = 20) corresponding to the
granularity of the allocation.</p>
      <p>Initially, allocate an infinitesimal privacy budget to
each level; this is done so that we can use Algorithm 2
to compute v̂a︁r for each node of the tree; we need this
infinitesimal allocation because Algorithm 2 as written
does not allow var to be ∞ for any .4</p>
      <p>We divide the (remaining) privacy budget of  into</p>
      <p>̂︀
 units of size /. In each of  phases, we consider</p>
      <p>̂︀
adding / budget to all nodes at level , choosing an</p>
      <p>̂︀
 that results in the lowest RMSRE ( ), which can
be computed using Algorithm 2; for any budgeting
sequence (0, . . . , ), we set var to be the variance of
DLap(1/) for all nodes  in level . Observe that
RMSRE ( ) can be computed directly using the
variance v̂a︁r of each post-processed estimate ̂︀ as returned
by Algorithm 2 since</p>
      <p>RMSRE (, ) =
̂︀
√︃</p>
      <p>v̂a︁r
max(,  )2</p>
      <p>Algorithm 3 Privacy budgeting via greedy iterations</p>
      <p>Params: Tree  , with levels 0, 1, . . . , .</p>
      <p>Input: Total privacy budget , Number of phases 
Output: Budget split 0, 1, . . . ,  such that ∑︀  =
.</p>
      <p>Choose an infinitesimal  , e.g.,  ← 10− 5.
for  = 0, . . . ,  do</p>
      <p>←  · /
̂︀ ← (1 −  ) (remaining privacy budget)
for  = 1, . . . ,  do
for  = 0, . . . ,  do
 ← RMSRE ( ) using privacy budget split
of (0, . . . ,  + /, . . . , )</p>
      <p>̂︀
(computed using (v̂a︁r)∈ from Algorithm 2)
ℓ ← argmin 
ℓ ← ℓ + ̂︀/
return (0, . . . , )
4While it is possible to modify Algorithm 2 to support var = ∞
for certain subsets of the nodes, we avoid doing so for retaining
clarity.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>