-

Optimizing Hierarchical Queries for the Attribution Reporting API

Matthew Dawson

mwdawson@google.com

Badih Ghazi

badihghazi@gmail.com

Pritish Kamath

pritishk@google.com

Kapil Kumar

kkapil@google.com

Ravi Kumar

ravi.k53@gmail.com

Bo Luan

luanbo@google.com

Pasin Manurangsi

pasin@google.com

Nishanth Mundru

nmundru@google.com

Harikesh Nair

hsnair@google.com

Adam Sealfon

adamsealfon@google.com

Shengyu Zhu

shengyuzhu@google.com

We study the task of performing hierarchical queries based on summary reports from the Attribution Reporting API for ad conversion measurement. We demonstrate that methods from optimization and diferential privacy can help cope with the noise introduced by privacy guardrails in the API. In particular, we present algorithms for (i) denoising the API outputs and ensuring consistency across diferent levels of the tree, and (ii) optimizing the privacy budget across diferent levels of the tree. We provide an experimental evaluation of the proposed algorithms on public datasets.

eol>ad conversion measurement hierarchical aggregation diferential privacy attribution reporting API

campaign

1. Introduction

Over the last two decades, third-party cookies [1] have (abede.gecn.o,naesvcseleircnskitoiaonlrmtaoevoainseulwirne)emoanednavte,prwutibhsleiinrseghb,eyarnasdintepaadorrtimiacpupplraecrslosyuiotlndo NewYork be joined to a conversion on the advertiser, in order to compute aggregate conversion reports (e.g., the number Fr of conversions attributed to a subset of impressions) or iday to train ad bidding models (e.g., [2, 3, 4, 5]). However, in recent years, privacy concerns have led several browsers to decide to deprecate third-party cookies, e.g., [6, 7, 8]. Figure 1: Example of Hierarchical Queries. The Attribution Reporting API [9, 10] seeks to provide privacy-preserving ways for measuring ad conversions on the Chrome browser and the Android mobile operating system. This API relies on a variety of mechanisms for limiting the privacy leakage, including bounding the contributions to the output reports of the conversions attributed to each impression, as well as noise injection to satisfy diferential privacy (for more details, see Section 2).

We study the conversion reporting task, where a query consists of counting the number of conversions attributed to impressions such that some features of the conversion and the impression are restricted to certain given values. In particular, we focus on the hierarchical queries setting where the goal is to estimate the number of conday budget across the diferent levels of the tree (Section 5). probability mass function at integer is −+11 · − ||.

We start by recalling in Section 2 the basics of dif- The -dimensional discrete Laplace mechanism with paferential privacy and summarizing the query model for rameter applied to a function : → Z, on input summary reports in the Attribution Reporting API. In a dataset ∈ , returns () + where is a Section 3, we formally define the optimization problem dimensional noise random variable whose coordinates are that we seek to solve. In Section 6, we provide an ex- sampled i.i.d. from DLap(). perimental evaluation of our algorithms on two public datasets. We conclude with some future directions in Lemma 6. For every > 0, the -dimensional discrete Section 7. Laplace mechanism with parameter ≤ /∆ 1 is -DP.

2. Preliminaries 2.1. Diferential Privacy (DP)

Let be the number of rows in the dataset and let be the (arbitrary) set representing the domain of values for each row. We distinguish two types of columns (a.k.a. attributes): known and unknown. We also assume knowledge of the set of possible values that each unknown attribute can take.

Definition 1 (DP [13]). For ≥ 0, an algorithm is -DP if for every pair , ′ of datasets that difer on the unknown attributes of one row1, and for every possible output , it holds that Pr[() = ] ≤ · Pr[(′) = ].

Lemma 2 (Basic Composition). Let be an algorithm that runs algorithms 1, . . . , on the same dataset such that is -DP with ≥ 0 for each ∈ []. Then, is (∑︀

=1 )-DP.

Lemma 3 (Post-processing). Let > 0, and and ′ be any two sets. If : → is an -DP algorithm, and : → ′ is any randomized mapping, then ( ∘ ) : → ′ is -DP.

For an extensive overview of DP, we refer the reader to the monograph [18]. A commonly used method in DP is the discrete Laplace mechanism. To define it, we recall the notion of ℓ1-sensitivity.

Definition 4 (ℓ1-sensitivity). Let be any set, and : → R be a -dimensional function. Its ℓ1-sensitivity is defined as ∆ 1 := max,′ ‖ () − (′)‖1, where and ′ are two datasets that difer on the unknown attributes of a single row.

Definition 5 (Discrete Laplace Mechanism). The discrete Laplace distribution centered at 0 and with parameter > 0, denoted by DLap(), is the distribution whose 1We note that this instantiation of the DP definition is related to the label DP setting in machine learning [14, 15, 16, 17], where the features of an example are considered known and only its label is deemed unknown and sensitive. In our use case, there might be multiple unknown attributes, whose concatenation is treated in a conceptually similar way to the label in the label DP setting.

2.2. Attribution Reporting API

The aggregatable reports [10] are constructed as follows: • Impression (a.k.a. source) registration: The API provides a mechanism for the ad-tech to register an impression (e.g., a click or view) on the publisher site or app. During registration, the ad-tech can specify impression-side aggregation keys (e.g., one corresponding to the campaign or geo location). • Conversion (a.k.a. trigger) registration: The API also provides a mechanism for the ad-tech to register a conversion on the advertiser site or app. As it does so, the ad-tech can specify conversion-side key pieces along with aggregatable values corresponding to each setting of the impression-side aggregation keys. E.g., a conversion-side key piece could capture the conversion type or (a discretization of) the conversion timestamp. The combined aggregation key (which can be thought of as the concatenation of the impression-side aggregation key and the conversion-side key piece) is restricted to be at most 128 bits. The aggregatable value is required to be an integer between 1 and the 1 parameter of the API, which is set to 216 = 65,536. • Attribution: The API supports last-touch attribution where the conversion is attributed to the last (unexpired) impression registered by the same ad-tech. (The API supports a broader family of single-touch attribution schemes allowing more flexible prioritization over impressions and conversions; we do not discuss this aspect any further as it is orthogonal to our algorithms.2) • Histogram contributions generation: The API enforces that the sum of contributions across all aggregatable reports generated by diferent conversions attributed to the same impression is capped to at most the 1 = 216 parameter. An example construction of histogram contributions is in Figure 2. While in this example the conversion key piece depends only on conversion information (the conversion day), it can be 2The API moreover enforces a set of rate limits on registered impressions, and attributed conversions; we omit discussing these since they are not essential to the focus of this paper. We refer the interested reader to the documentation [19].

Impression (campaign = 123, location = New York) Conversion (day = Friday) aggregation_keys: “Level 1: campaign”: “123” “Level 2: campaign, location”: “123, New York” “Level 3: campaign, location,”: “123, New York” aggregatable_trigger_data: key_piece: “”, source_key: “Level 1: campaign” key_piece: “”, source_key: “Level 2: campaign, location” key_piece: “Friday”, source_key: “Level 3: campaign, location,” Ad-tech Publisher aggregatable_values: “Level 1: campaign”: 21845 “Level 2: campaign, location”: 21845 “Level 3: campaign, location”: 21845 Ad-tech Advertiser Histogram Contributions key: “123”, value: 21845 key: “123, New York”, value: 21845 key: “123, New York, Friday”, value: 21845 Browser

3. Optimization Problem

3The * in the conversion-related field in Table 1 indicates that the click corresponding to that row did not get an attributed conversion.

At query time, a set of histogram keys is requested by the ad-tech [21]. (The set of keys that are queried could be set to the Cartesian product of all known values of 3.1. Hierarchical Query Estimation the impression-side features with the set of all possible We formally define the hierarchical query estimation probvalues of the conversion-side features.) The aggregatable lem. Given a dataset , consider a tree where each node reports are combined in the aggregation service to pro- corresponds to the subset of the rows of , conditioned duce a summary report by applying the discrete Laplace on the values of some of the attributes. We consider the mechanism (Definition 5) with parameter /1 to the setting where each level of the tree introduces a conrequested aggregation keys. This report satisfies -DP ditioning on the value of a new attribute. For known where each row of the dataset corresponds to an impres- attributes, the child nodes of a node correspond to the sion and its attributed conversions if any (see Table 1 for diferent values taken by that attribute in within the an example3), and where the known columns in Defini- rows. For unknown attributes, the child nodes correspond tion 1 are the attributes that only depend on impression to all possible values for that attribute, whether or not information (campaign and location in Table 1). they actually occur in the dataset. Given this tree, the

In the hierarchical aggregation setting, each node of problem is to privately release the approximate number the tree corresponds to an aggregation key, and the level of data rows corresponding to each node that have an attributed conversion. √︃ [︂ (︁

|^− | ︁) 2]︂ max(, ) domness of ˆ.

3.2. Error Measure and Consistency

We consider the following error measure, which is deifned in [22].

Definition 7 (RMS Relative Error at Threshold). For a count ≥ 0, and its randomized estimate ˆ ∈ R, the Root Mean Squared Relative Error at Threshold when estimating by ˆ is defined as RMSRE (, ˆ) := , where the expectation is over the ran

Algorithm 1 CombineEstimates

Input: (; var), (; var) : , are samples from random variables , resp. such that E = E and variances Var() = var and Var( ) = var.

Output: (; var) : combined estimate and variance. var ← ← var· var var+var var · ︁( var + var

)︁ return (; var) of conversions as in Figure 1) at every node in a tree. the counts of its children, we can obtain a second inde

Suppose we have the count estimates (e.g., the number Each conversion contributes to the count for multiple nodes. For example, a conversion for ad campaign 123 that occurs on a Friday in New York contributes to the 6th leaf from the left, but also to each of its ancestor nodes. This imposes relationships among the counts at various nodes. If the only geo locations with conversions attributed to ad campaign 123 on Friday are New York and Chicago, then the total number of such conversions must equal the sum of the number of such conversions in each of the two locations. More generally, the count for An estimator with this property is called consistent.

For a tree with levels 0, . . ., , with being the set of nodes at level , and estimators ˆ of the counts at each node , define the tree error

RMSRE ( ) to be ⎯ ⎸ ⎸ ⎸ ⎷ + 1 1 ⎛ ∑︁ =0

1 ⎝ || ∈

⎞ ∑︁ RMSRE (, ˆ)2⎠ .

The goal of the hierarchical query estimation problem is to privately estimate the counts of every node with minimum possible tree error, where the estimates should be consistent. To achieve this, we will employ post-processing and privacy budgeting strategies.

4. Post-processing Algorithms

Directly applying the discrete Laplace mechanism to add independent noise to each node does not result in a consistent estimate. Consistency can be achieved by estimating only the counts of leaves and inferring the count of each but this can lead to large error for nodes higher up in the tree. Alternatively, one can achieve consistency by post-processing independent per-node estimates. Since DP is preserved under post-processing (Lemma 3), this comes at no cost to the privacy guarantee, and it can substantially improve accuracy.

Since the count of any node must equal the sum of pendent estimate of the count of any nonleaf node by summing the estimates of its children. We can combine these two estimates to obtain a single estimate of lower variance. Extending this observation, Hay et al. [11] and Cormode et al. [12] give eficient post-processing algorithms for regular (every non-leaf has the same number of children) and balanced (every leaf is at the same depth) trees, achieving consistency and also a substantial improvement in accuracy. In particular, estimating the counts of each node can be expressed as a linear least-squares solution, which is known to achieve optimal error variance among all unbiased linear estimators.

Moreover, this special case of least-squares regression can be solved in linear time.

We generalize this algorithm to arbitrary trees in Algorithm 2. This allows us to handle trees with diferent fanouts and noise at diferent levels or even at diferent nodes in the same level, which is the case for the conversion reporting trees we study. The input to the algorithm are independent noisy counts for all nodes in , with E = and variance Var() = var.

The key idea is a simple method (Algorithm 1) to combine two independent and unbiased estimates of the same quantity to get an estimate with reduced variance as follows: Suppose and are two independent random variables with E = E = , with variances var and var respectively, then the optimal convex combination of and that minimizes the variance in the estimate of is inversely proportional to the variances, namely = (var · +var · )/(var +var ), which has variance Var() = (var · var )/(var + var ).

Algorithm 2 comprises of two linear-time passes. The For each non-leaf node it recursively computes ↑ , which is an optimal linear unbiased estimate of , using only the noisy counts for all that are in the subtree rooted at , excluding . This is combined with the noisy count to compute ⇑ , which is an optimal linear unbiased estimate of , using only the noisy counts for all that are in the sub-tree rooted at , including . any node must equal the sum of the counts for its children. regression problem, and these algorithms compute the nonleaf node by adding the counts of its leaf descendants, first pass proceeds bottom-up from the leaves to the root.

Algorithm 2 TreePostProcessing

Params: Tree with root .

Input: (; var)∈ : noisy counts and variances for all ∈ . Output: (̂︀; var)∈ : post-processed estimates and variance for all . # Bottom-up pass for leaf ∈ do ◁ (⇑ ; var⇑ ) ← ◁ (↑ ; var↑) ← ◁ (⇑ ; var⇑ ) ← # Top-down pass For root : ◁ (; v̂a︁r) ←

̂︀ ◁ (⇓; var⇓ ) ← ◁ ← parent()

︃( ◁ (↓ ; var↓) ← ◁ (; v̂a︁r) ←

̂︀ ◁ (⇓ ; var⇓ ) ← return (; v̂a︁r)∈ ̂︀ (⇑; var⇑ ) (; var) for each internal node from largest to smallest depth do (; var) ︁( ∑︀ ∈child() ⇑ ; ∑︀ ∈child() var⇑

︁)

CombineEstimates((; var), (↑ ; var↑)) for each non-root node from smallest to largest depth do ⇓ −

∑︀ ∈child()∖{} ⇑ ; var⇓ +

∑︀ ∈child()∖{} var⇑ CombineEstimates((⇑ ; var⇑ ), (↓ ; var↓)) CombineEstimates((; var), (↓ ; var↓)) )︃ # equals (⇓ − ↑ + ⇑ ; var⇓ + var↑ − var⇑ ) ⇑ ↑ ⇓

↓ all nodes. ∈ , ̂︀ is the best linear unbiased estimator (BLUE) of

5. Privacy Budgeting Over Tree Levels

uct_country, device_type, product_age_group & time_delay_for_conversion. The last attribute is discretized into 2-day (or 6-day) buckets so that there are at To allocate the privacy budget across the levels of the most 15 (5 respectively) possible values of the rounded tree, the simplest approach is to divide it equally among time delay. The first 4 attributes are considered known the levels, or to put all of the budget on the lowest level. (they only relate to the impression), whereas the last is However, basic composition (Lemma 2) allows us to allo- considered unknown. For the discretization into 6-day cate the privacy budget arbitrarily among the nodes of buckets, we retain the unknown attribute and 3 known the tree, and we can apply post-processing (Algorithm 2) attributes instead of 4 (omitting product_age_group), to noisy initial estimates with unequal variances as well. resulting in a depth 4 tree (instead of depth 5). This motivates the question of whether we can improve In Figure 4, we plot RMSRE versus for various accuracy with an unequal privacy budget allocation, and diferent methods evaluated on the later 45 days of data, if so, how. namely,

Given the true counts , we use a greedy iterative ap- • Equal privacy budget split across levels without any proach for optimizing the privacy budget allocation. Let post-processing. be the number of phases (e.g., = 20) corresponding • Equal privacy budget split across levels with postto the granularity of the allocation. Initially allocate zero processing (Algorithm 2). (or infinitesimal) privacy budget to each level, and divide • All privacy budget on leaves with post-processing. the (remaining) privacy budget into units of size /. • Prior-based optimized privacy budget split across levIn each of phases, select the level that would result els optimized for each partner_id without any postin the lowest RMSRE ( ) when using / additional processing. privacy budget, and increase the privacy budget of that • Prior-based optimized privacy budget split across levlevel by /. The error RMSRE ( ) can be computed els optimized for each partner_id with post-processing directly using the variance v̂a︁r of each post-processed (Algorithm 2). estimate as returned by Algorithm 2. We present the For the prior-based privacy budget split optimization, details of̂t︀his greedy iterative approach in Appendix B. the privacy budgeting was performed using a noisy prior

Using the true counts to optimize the privacy budget computed on the first 45 days (privately estimated with allocation can leak sensitive information and violate the = 1 and an equal privacy budget split over levels). For privacy guarantee, and is infeasible in the API. Instead of partner_id’s that appear only in the later 45 days, the using the true counts to allocate the privacy budget, one prior is computed on all the partner_id’s that appear can use alternatives such as simulated data, or historical in the first 45 days of data. E.g., for = 4 and the data that is not subject to the privacy constraints (e.g., depth 5 tree, RMSRE10 ≈ 0.1, and for the depth 4 tree, before third-party cookie deprecation), or noisy histor- RMSRE5 ≈ 0.17. ical data that has already been protected with DP (e.g., the output of the API over data from a previous time period). We refer to this family of privacy budget optimization methods as prior-based. When no such alternatives are available, one could start out with a suboptimal privacy budgeting strategy (e.g., uniform privacy budgeting across levels) and improve the allocation over time.

7. Conclusion and Future Directions

Another natural direction is to extend the consistency enforcement algorithm to ensure monotonicity (i.e., that the output estimate for a node of the tree is at least the output In this work, we studied hierarchical querying of the At- estimate for any of its children), and non-negativity. Our tribution Reporting API, and presented algorithms for privacy budgeting method optimizes for one privacy paconsistency enforcement and privacy budgeting, demon- rameter per-level of the tree; it would be good to explore strating their performance on two public ad datasets. the extent to which per-node privacy budgeting can yield We next discuss some interesting future directions. We higher accuracy. While we considered data-independent focused on the so-called OPC (“one per click”) setting weights when defining the tree error in terms of the node where each impression gets at most a single attributed errors, there could be situations where data-dependent conversion. An important direction is to consider the weights are preferable, e.g., to avoid the tree error being extension to the more general MPC (“many per click”) dominated by the error in many nodes with no conversetting where an impression can get multiple attributed sions, while being insensitive to the error of few nodes conversions. It would also be interesting to extend our where most of the conversions occur. Another interesting treatment of conversion counts to the task of estimating direction to study privacy budgeting under approximate conversion values. While the error is small for values of DP [25]. Our work considered the hierarchical query around 16 in our evaluation, we note that this is spe- model; a natural direction is to optimize the direct query cific to the datasets and the (restricted) functionality that model [26]. Finally, the Attribution Reporting API also we study (i.e., conversion counts with OPC). Achieving ofers event-level reports [27]. It would be interesting to small errors on additional functionalities (e.g., MPC or see if these could be also used to further improve the conversion values) will likely require larger values of . accuracy for hierarchical queries. 8.0 2.0 (c) 4.0 epsilon = 5, depth 3 tree 8.0 16.0 1.0 2.0 8.0

16.0 4.0 epsilon (d) = 10, depth 3 tree leaves, and ′ denote the tree with its children removed.

For all ∈ ′, define (′; var′) = {︃(; var) (⇑ ; var⇑ ) if ̸= * , if = * . we have Similar to above, let (′↑; var′↑), (′⇑; var′⇑), (′↓; var′↓), (′⇓; var′⇓), (̂︀′; v̂a︁r′) be as defined in Algorithm 2 when run on the tree ′ with input (′; var′)∈ ′ . From the inductive hypothesis, for all internal nodes ∈ ′, ̂︀′ =

∑︁ ∈child()

̂︀′, and for every leaf ∈ ′, we have

∑︁ ∈anc() ′

When we run Algorithm 2 on , after setting ⇑* and var↑* , the rest of the bottom-up pass is exactly the same as that of the run on ′. Similarly, the top-down pass of ′ is the same as that of except that in we also set the values of ̂︀ and v̂a︁r for all ∈ child(* ). Therefore, ̂︀ = {︃̂︀′ var↓· +var· ↓ var↓+var if ∈/ child(* ), if ∈ child(* ),

(4) where (↓ ; var↓) = (̂︀* − ↑* + ; v̂a︁r* + var↑* − var).

Here we used the observation that for all leaves of ′ (including * ), it holds that (′⇑; var′⇑) = (; var) and hence (′⇓; var′⇓) = (̂︀′; v̂a︁r′) = (; v̂a︁r).

̂︀ have (Consistency) For ̸= * , Equation (4) implies that the LHS and RHS of the desired consistency condition is exactly the same as that of Equation (2). So it only remains to show consistency for = * . First, we simplify Equation (4) by noting that var↓ + var = var⇓* + var↑* for all ∈ child(* ), and hence for ∈ child(* ) we

︁( ⇓* − ↑* ︁) var var⇓* + var↑* var (︃ var↑ var↑* var (︁ ̂︀* − ↑* ︁)

. var↑* ⇓* + var⇓* ↑* var⇓* + var↑* − ↑*

)︃ ︃( + var

)︃ where we use that ↑* = ∑︀

∈child(* ) . Thus, the as desired.

consistency condition holds for every internal node ∈ (1) (2) (3)

Then, the least squares estimator ˆ defined as the minigiven by ̂︀ = (ˆ), which clearly satisfies consistency

weighted root-to-leaf sum preservation properties of Lemma 9. Conversely, if ̂︀ () satisfies consistency, then it must be of the form ˜ for ˜ = ̂︀ for all leaves , and if it satisfies the weighted root-to-leaf sum preservation property, then as shown above ˜ must be the least-squares estimator.

Thus, from Lemma 9, we have that the () returned by Algorithm 2 satisfies the two properties, we have that each is the BLUE for the corresponding true count

̂︀

Finally, to see that v̂a︁r is indeed the variance of estimate ̂︀, we recursively show that var↑, var⇑ , var↓, var⇓ are the variances corresponding to ↑ , ⇑ , ↓ , ⇓ respectively. The key property to verify is that each quantity involves a linear combination of estimates which depend on noisy counts of disjoint parts of the tree and hence are independent. Thus, we can repeatedly use the property that for independent drawn and , it holds that Var( + ) = 2 Var() + 2 Var( ). Algorithm 3 presents the greedy iterative approach we used for optimizing the privacy budget allocation using true counts , as alluded to in Section 5. The high level idea is as follows. We choose a parameter to be a number of phases (e.g., = 20) corresponding to the granularity of the allocation.

Initially, allocate an infinitesimal privacy budget to each level; this is done so that we can use Algorithm 2 to compute v̂a︁r for each node of the tree; we need this infinitesimal allocation because Algorithm 2 as written does not allow var to be ∞ for any .4

We divide the (remaining) privacy budget of into

̂︀ units of size /. In each of phases, we consider

̂︀ adding / budget to all nodes at level , choosing an

̂︀ that results in the lowest RMSRE ( ), which can be computed using Algorithm 2; for any budgeting sequence (0, . . . , ), we set var to be the variance of DLap(1/) for all nodes in level . Observe that RMSRE ( ) can be computed directly using the variance v̂a︁r of each post-processed estimate ̂︀ as returned by Algorithm 2 since

RMSRE (, ) = ̂︀ √︃

v̂a︁r max(, )2

Algorithm 3 Privacy budgeting via greedy iterations

Params: Tree , with levels 0, 1, . . . , .

Input: Total privacy budget , Number of phases Output: Budget split 0, 1, . . . , such that ∑︀ = .

Choose an infinitesimal , e.g., ← 10− 5. for = 0, . . . , do

← · / ̂︀ ← (1 − ) (remaining privacy budget) for = 1, . . . , do for = 0, . . . , do ← RMSRE ( ) using privacy budget split of (0, . . . , + /, . . . , )

̂︀ (computed using (v̂a︁r)∈ from Algorithm 2) ℓ ← argmin ℓ ← ℓ + ̂︀/ return (0, . . . , ) 4While it is possible to modify Algorithm 2 to support var = ∞ for certain subsets of the nodes, we avoid doing so for retaining clarity.