=Paper=
{{Paper
|id=Vol-3837/paper5
|storemode=property
|title=SERP Interference Network and Its Applications in Search Advertising
|pdfUrl=https://ceur-ws.org/Vol-3837/paper_12_ceur_paper.pdf
|volume=Vol-3837
|authors=Purak Jain,Sandeep Appala
|dblpUrl=https://dblp.org/rec/conf/adkdd/JainA24
}}
==SERP Interference Network and Its Applications in Search Advertising==
SERP Interference Network and Its Applications in Search
Advertising
Purak Jain1,* , Sandeep Appala1
1
Amazon, Seattle, USA
Abstract
Search Engine marketing teams in the e-commerce industry manage global search engine traffic to their websites with the aim to
optimize long-term profitability by delivering the best possible customer experience on Search Engine Results Pages (SERPs). In order to
do so, they need to run continuous and rapid Search Marketing A/B tests to continuously evolve and improve their products. However,
unlike typical e-commerce A/B tests that can randomize based on customer identification, their tests face the challenge of anonymized
users on search engines. On the other hand, simply randomizing on products violates Stable Unit Treatment Value Assumption for most
treatments of interest. In this work, we propose leveraging censored observational data to construct bipartite (Search Query to Product
Ad or Text Ad) SERP interference networks. Using a novel weighting function, we create weighted projections to form unipartite graphs
which can then be use to create clusters to randomized on. We demonstrate this experimental design’s application in evaluating a
new bidding algorithm for Paid Search. Additionally, we provide a blueprint of a novel system architecture utilizing SageMaker which
enables polyglot programming to implement each component of the experimental framework.
Keywords
sponsored search, experiment design, A/B testing
1. Introduction interference" assumption holds. We observe that for most
treatments of interest (e.g. new bidding algorithms for paid
Search Engine marketing teams in the e-commerce industry search programs or improved title headlines for free search
manage global search engine traffic with the aim of optimiz- snippets), the SERP page leads to interference between treat-
ing long-term profitability by delivering the best possible ment and control units causing the Stable Unit Treatment
customer experience on the most important web pages on Value Assumption (SUTVA) to fail, and consequently in-
the internet - Search Engine Results Pages (SERPs). Figure duces bias in the standard estimators used to evaluate the
1 shows the prominent parts of SERP. Search Engines con- value generated by the treatment. A standard answer [4, 5]
tinue to evolve their customer experience and features due to this problem is to replace the “product-split” experiment
to social, technological and economic forces, including pri- design with a “time-split” (or “switchback”) design, where
vacy concerns, and further monetization of their properties the entire market switches repeatedly between treatment
(SERPs). In anticipation of opportunities and risks that come and control. In practice, such designs turn out to be equally
with a shifting landscape advertisers continuously innovate time consuming as geo-based splits since we need to account
with new bidding algorithms, improved paid and free search for long lengths of adjustment period between switches due
creatives, landing pages etc. Randomized experiments, or to the presence of an intermediary i.e. search engine that ap-
A/B tests, are the standard approach for evaluating causal plies the treatment and takes its own time which advertisers
effects of new features [1]. However, Search Marketing cannot control.
experiments are unlike conventional A/B tests in industry
that can randomize on customers as advertisers don’t iden-
tify their customers when they are on a search engine i.e.
the ad publisher. Instead, advertisers may run A/B tests
randomized by geographic locations [2] using search en-
gine’s geo-targeting capabilities but due to ad publisher’s
API limitations they are unable to do so without having to
clone entire advertisement campaigns. The cloning of entire
accounts is operationally expensive and time consuming
restricting the velocity at which they can run such trials.
The next obvious choice for unit of randomization is usu-
ally products or search queries. However for any A/B test,
splits of the unit of randomization should satisfy the assump-
tions of the Neyman - Rubin causal framework [3], that is,
no interference, unconfoundness, overlap and no hidden
treatment variations. For example, if we simply randomly
select products into control and treatment groups, it should
hold the unconfoundness and the overlap assumption given
a large sample size but we still need to check if the "no
ACM SIGKDD Conference on Knowledge Discovery and Data Mining,
AdKDD Workshop 2024, August 2024, Barcelona, Spain
*
Corresponding author.
$ purakjn@amazon.com (P. Jain); appalar@amazon.com (S. Appala)
https://www.amazon.science/author/purak-jain (P. Jain) Figure 1: Components of interest on a Search Engine Results
0009-0009-9758-9336 (P. Jain) Page.
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
Another approach to dealing with spillovers or interfer- 2. Setting and Motivation
ence is given by clustered experiments [6, 7] or in social
network settings, by network bucketing testing [8] where Shopping Ads is one of the ad formats supported on SERPs.
nodes that are relatively clustered together are given the To place ads within the shopping ad carousel advertisers
same assignment of treatment or control [9]. Our work is in- need to participate in an auction competing with other ad-
spired by similar clustered experiments methods that have vertisers. The format of the auction is considered close to
been applied to estimate and reduce bias in marketplace second price Vickrey–Clarke–Groves (VCG) [15, 16, 17], al-
experiments [10]. Specifically, one such example involves though the exact ad publisher implementation is a blackbox
experimentation in internet ad auctions, where each auction for us. As such to maximize long term profitability, it is im-
consists of a keyword along with a set of advertisers who portant to constantly develop, test and launch new bidding
submit competing bids in order for their ads to be displayed algorithms responsible for valuating products worldwide.
when the keyword is queried by a user. There is cross-unit Let’s say, to test out a new bidding algorithm we simply
interference because the same advertiser or keyword may split on products. The Stable Unit Treatment Value As-
appear in multiple auctions. Basse et al. [11] and Ostrovsky sumption (SUTVA) presumes that the valuation assigned to
and Schwarz [12] make the observation that the auction one product by the new algorithm does not influence the
type used for one keyword does not meaningfully affect profitability of other products. However, in the context of
how advertisers bid for other keywords. They then consider shopping advertisements, this assumption may be violated
experiments that group auctions into clusters by their key- due to potential between-product interference. This inter-
words and randomize auction formats across these keyword ference occurs when both a product with a treatment bid
clusters, rather than across advertisers, as a means to avoid from the new model and another with a control bid from
problems with interference. More broadly, in our context of the current model participate in the same auction triggered
Search Marketing, this idea of cluster-level randomization by a search query, deemed relevant by the ad publisher for
corresponds to identifying product or search query clusters both products. See figure 2 for an example. Such scenarios
that are relatively isolated from each other and randomizing clearly breach SUTVA, challenging the validity of our eval-
the interventions across product-clusters rather than across uation method. If one product happens to be assigned to
products. Our primary contribution lies in leveraging obser- treatment group and the other one to control, then the dif-
vational data to build bipartite (Search Query - Product) and ference in financial performance between the two products
tripartite (Search Query - Paid Search Product - Free Search will be resulted from the combined effect of treatment and
URL) SERP interference networks. We introduce an inno- between-product spillover effect, thus making the treatment
vative weight function to generate weighted projections, effect indistinguishable from the product spillover effect.
transforming these networks into unipartite graphs. These
graphs facilitate the clustering of products that co-appear
on SERPs through Paid Search Shopping Ads, Text Ads, or
Free Product Listings. The resultant clusters can then be
randomized during A/B tests to generate insights.
Note that more recently, Johari et al. [13] and Bajari et al.
[14] have proposed newer experiment designs where both
search query and product units are randomized simultane-
ously. While having a similar flavor, neither framework
applies easily to our problem of interest. To begin with, we
cannot control "search-query" assignment as that is deter-
mined by the search engine i.e. the ad publisher. Johari et al.
[13] use a choice model to capture spillovers, which captures
a different kind of market than the one we consider, where
interference is mediated by a matching algorithm. Bajari
et al. [14] imposes a local interaction assumption, which
does not hold in our setting. However, when the graph is
a bi-partite graph it holds some similarity which we plan
to explore in future work for measuring the magnitude of
spillovers.
The rest of this paper proceeds as follows. In Section 2, Figure 2: Note the two Shopping Ad advertisements shown in
the same carousel. If these two products have bids from different
we use the two-population search query - product case as a
bidders (i.e. control and treatment) then Stable Unit Treatment
motivation to build SERP interference network to test out
Value Assumption is violated.
new bidding algorithms. In Section 3, we describe in greater
detail our experiment design. In Section 4, we further share
details on testing a new bidding model using this experimen-
tation design. Section 5 provides an overview of a system
architecture blueprint for deploying such experimentation 3. Product-Cluster Randomized
frameworks. Finally, we discuss our findings and future Control Trial Design
extensions in Section 6.
To address the challenge of interference in experimental
designs, we propose a preemptive modeling strategy that
incorporates interference networks during the design phase.
This approach allows us to shift the unit of randomization
from individual products to clusters of products, as illus-
Table 1
Mock row in a Search Engine’s Query Report shared with the advertiser.
Search Query Impressions Product/Keyword Clicks Metric Day Ad Campaign
chopper axe 6 B00EOVRX06 1 2023-09-01 12345
trated in Figure 3. Importantly, traditional constrained ran- presence of an edge in the original bipartite graph,
domization methods [18], such as segmenting by product otherwise 0.
categories, prove ineffective. This is because search engines
can associate broad upper funnel search queries (e.g., "Harry Using the above approach to take a weighted one-mode
Potter") with a diverse range of products across multiple cat- projection of the bipartite graph leads to an increase in the
egories (e.g., a book, toy, or blanket related to Harry Potter). number of edges, since
(︀ if
)︀ a search query links to 𝑛 products,
By leveraging interference networks, our method ensures we need to consider 𝑛2 pairs of edges.
more robust and accurate experimental outcomes.
Modeling Network Interference The notion of inter-
ference in the network [19] we construct has to be aligned
with the notion of interference we are trying to estimate.
Since the relevance of products to a user search query is
determined by the search engine and ranking algorithm,
an advertiser cannot use its internal datasets that provide
product to keyword mapping e.g. e-commerce website’s
own search to product results. Instead, we use daily reports
provided by the ad publisher itself. These reports have in-
formation on which actual user search query on the search
engine was mapped to which shopping ads product by the
ad publisher. A sample mock row from such a report is
shown in Table 1. We use these search query reports to
construct an undirected bipartite search query - product
graph using the number of impressions as edge weight.
Unipartite Projection To apply one mode projection of
the bipartite graph onto the product nodes in order to model
the between-network interference, we needed a scoring
function to attribute weights to the resulting graph edges.
Since, we start with large number of products ( 200M+),
we could not directly use the edge weighting functions pro-
posed by Stram et al. [20] due to the computational complex-
ity. Instead we propose the following edge weight function
that requires significantly less computation:
𝑛
∑︁ 1
𝑊uni (𝑎, 𝑏) =
𝑖=1
log 𝑒 𝑠𝑞𝑖 )
(𝑓
min(𝑊bi (𝑠𝑞𝑖 , 𝑎), 𝑊bi (𝑠𝑞𝑖 , 𝑏))
𝐼[𝑠𝑞𝑖 , 𝑎, 𝑏] (1)
max(𝑊bi (𝑠𝑞𝑖 , 𝑎), 𝑊bi (𝑠𝑞𝑖 , 𝑏))
where:
Figure 3: Birds-eye view of our experiment design. First, we
1. 𝑊uni (𝑎, 𝑏) is the edge weight in the unipartite graph fetch the search query reports from the ad publisher and create a
(one-mode projection) between product a and b. search-query, product bipartite graph. Then we take a weighted
2. 𝑊bi (𝑠𝑞𝑖 , 𝑎) is edge weight between search query 𝑖 one mode projection and finally use a clustering algorithm to
and 𝑎 in the original bi-partite graph. cluster products. We then do a stratified random split of product
3. 𝑓𝑠𝑞𝑖 is the number of distinct products that a par- clusters.
ticular search query drives impressions to. Since
the distribution is right-skewed i.e. few upper fun- Graph Partitioning Methodology Constructing a prod-
nel queries drive impressions to only a few dis- uct graph following the above approach then allows us to use
tinct products, weighing down by the log of fre- network dismantling algorithms [21] as oppose to naive con-
quency of search query helped us to weigh down nected components approach to creating product clusters.
edge weight contributions between two products Historically, the community identification problem is a well
from very generic queries. studied problem in computer science literature [22, 23, 24].
4. 𝐼[𝑠𝑞𝑖 , 𝑎, 𝑏] is 1 if search query 𝑠𝑞𝑖 trigger an impres- However, a lot of the proposed methods wouldn’t scale
sion for both product 𝑎 and 𝑏 as represented by the up since we are dealing with graphs that are as large as
200M+ nodes and 400M+ edges. Thus, anything that runs new shopping ad bidder experiment, we ended up having
in 𝑂(|𝑉 |2 ) or 𝑂(|𝐸|2 ) is not practical. Moreover during 10,000 clusters and 36% edge weight across clusters. Note,
the graph partitioning phase, we needed to find a balance the above measure (𝐿) overstates the spillover effects as
between two objectives: they consider spillover between clusters that may end up
being in the same group (C or T).
1. Maximize the number of product clusters as they
translate to randomization units. More clusters Magnitude of Spillover: The search query - product
equal more power for our test. bipartite graph we construct usually has a clustering coef-
2. Minimize the between clusters edge weights. ficient [20] of around ∼ 0.6 for most marketplaces which
indicates tightly knit groups in the network suggesting high
spillover. However, to empirically provide a lower bound
Since, the more clusters we create, the less isolated they
on the magnitude of bias due to interference we need to
are, these two objectives are conflicting. For example, if
conduct a meta-experiment that randomizes over two ex-
we want to have zero connections across clusters, then the
periment designs: one Bernoulli randomized, one cluster
obvious solution is to have one cluster only. This, of course,
randomized. We can then check for a statistically significant
would not lend itself to an A/B test. To balance the above
difference between the total average treatment effect esti-
two objectives, first we look at the percentage edge weight
mates obtained with the two designs [27]. In the absence of
across 𝑘 clusters (𝐶1 , ..., 𝐶𝑘 ) i.e. leakage as 𝐿:
business approval to run such a meta-experiment, the next
best directional data point we have is from our previous
attempt to run simple product-split A/B test. The impact
∑︀𝑘 ∑︀
/ 𝑖 𝑊uni (𝑗, 𝑘)
𝐿=
𝑖=1
∑︀𝑗∈𝐶𝑖 ,𝑘∈𝐶 measured from that experiment had been largely overstated
𝑊 uni (𝑗, 𝑘)
𝑗,𝑘
(∼ 44% lift) when compared to the actual lift (∼ 24% lift)
observed.
Secondly, we use a clustering algorithm that is designed
for balanced clustering, that is, all clusters should have
roughly equal size. We evaluated naive connected compo-
nents, power iteration clustering (PIC) [25], and METIS [26]. 4. Application
Connected components minimizes leakage, but suffers from
extreme imbalance. PIC improves cluster imbalance, but In this section, we discuss the use-case motivated in Section
suffers from high leakage. We choose to use METIS parti- 2 to show an application of the product-cluster random-
tioning algorithm which is an extremely efficient and fast ized control trial design. We had developed a new machine
implementation of graph partitioning algorithm for undi- learning based product valuation model for our shopping
rected weighted graph. METIS adopts an objective function ads program to improve over the current in production
to minimize the number of weighted edges whose vertices heuristic bidding algorithm and we wanted to run an online
belong to different partitions. The METIS graph partitioning experiment to understand the impact on the long term profit.
consists of three phases: (i) In the graph coarsening phase, We used the methodology described in Section 3 to create
a series of successively smaller graphs is derived from the product clusters based on search query reports from the ad
input graph. This process continues until the size of the publisher for the past one year.
graph has been reduced to just a few hundred vertices, (ii) In
the initial partitioning phase, a partitioning of the coarsest Constrained Randomization Once we had the clusters,
and hence, smallest, graph is computed and finally (iii) in we created strata of clusters with similar characteristics
the un-coarsening phase, the partitioning of the smallest instead of randomizing them in a simple bernoulli fashion.
graph is projected to the successively larger graphs by as- We measure the net impressions, clicks, cost and profit of
signing the pairs of vertices that were collapsed together to each of the product cluster and stratify clusters on those
the same partition. After each projection step, the partition- axis.
ing is refined using heuristics to iteratively move vertices
between partitions as long as such moves improve the qual- Experiment Setup The goal of this experiment was ver-
ity of the partitioning. The advantages of this methodology ifying the null hypothesis that the new bidding strategy is
are threefold: better than the current bidding strategy in term of bidding
efficiency i.e. increase of net long term profitability while
1. It runs in 𝑂(|𝐸|) time, which is extremely efficient maintaining the total ad spend. We matched the spend be-
for large graphs. tween control and treatment groups to control for elasticity
2. It is the only algorithm that allows precise control as well as to comply with spend constraints at account level.
of both the number partitions and the balances of Finally, we run a simulation based power analysis for cluster
the overall split. randomized designs using difference-in-differences (DID)
3. It is the only algorithm that is specifically trying to estimation [28].
minimize the edgecut (defined as weighted sum of
edges that straddle between different clusters). Measurement To measure the impact of the proposed
valuation method, a DID analysis for cluster randomized de-
signs is performed for two weeks of periods where spends
Optimal number of clusters: We want to be able to
are closely matched. The results from the DID analysis
identify as many nearly independent clusters as possible
showed a lift in click-through-rate for the treatment group
with leakage controlled within the tolerance. We plot leak-
which was consistent with the lift observed post roll out of
age against various choice of 𝑘 (number of partitions), and
the new bidding model. Note that since model errors can be
identify a 𝑘 that is as large as possible where the leakage is
correlated within cluster, failure to control for within-cluster
as small as possible (i.e. identifying the elbow point). For the
error correlation can lead to misleading small standard er- 6. Conclusion and Future Work
ror and consequently low p-values. Although we do not
control for within-cluster error correlation in the model,
post-estimation we obtain cluster-robust standard errors as In this paper, we present a cluster-based randomized con-
proposed by White [29]. trol test design which enables search marketing e-commerce
teams to do fast online experiment launch while minimizing
interference between experimental groups. Our key idea
is to use observational data to construct bipartite (Search
5. System Architecture Query - Product) SERP interference networks and use a
novel weight function to take weighted projections to form
unipartite graphs which can be use to create clusters of prod-
Our product-cluster randomized control methodology as
ucts appearing together on SERP (via Paid Search shopping
detailed in Section 3 asks for a highly scalable and flexible
ads, text ads or Free Search listings), and then using those
infrastructure with very different compute requirements
clusters to randomize on. Online A/B testing results for the
and library support for each step. To address these chal-
treatment group are consistent with the lift observed post
lenges we propose the "Search Marketing Lab" using AWS
roll out of a new bidding model thereby showing that the
SageMaker [30] pipelines which allows to define a series of
A/B test design gives a good estimate of the actual lift. In our
interconnected processing steps where each step (i) can be
previous attempts to run simple product-split A/B test the
provided its own docker image that has our code in preferred
impact measured from experiments had been largely over-
language and (ii) can have its own compute environment.
stated because of spillover effects. Lastly, we present a novel
This allows for polyglot programming. Here, we briefly
simplified system architecture using SageMaker which al-
focus on the split generation component. In particular, we
lows scientist to do polyglot programming using compute
break the approach into 3 modules:
and language suitable for each scientific module.
One downside of inferring interference network from
search query report data is that such observational data is
censored, that is, we only have data when we win the auc-
tion. In future, we are investigating using SERP page data
from platforms like seoClarity to get better visibility into
SERP interference networks and allow us to incorporate not
Figure 4: Birds-eye view of Product-cluster split generation sys- just shopping ad products but also Text Ads keyword and
tem. Free Search URLs to build comprehensive ad units spanning
across all Search channels - Text Ads, Shopping Ads and
1. Graph Generation which requires parsing >1 year Free Search. More recently, this also includes large language
of daily search query report data and Keyword data models powered results like shown in Appendix. We can
to build a graph edge list - thus requiring spark’s than use these ad units to design cross-channel substitution
distributed compute. We use a cluster of memory- experiments. We are also working on investigating further
optimized instances for this step. into the stability of these clusters over time and that they
2. Graph Partitioning In this step we use the METIS can be updated in real time as more data flow in from search
graph partitioning algorithm. METIS is written en- engines. Finally, we are exploring recent proposed exper-
tirely in ANSI C (no distributed implementation) but iment designs by Bajari et al. [14] to measure the actual
there is a python wrapper for the METIS library [26] magnitude of spillovers.
that we use. We use a single compute-optimized
instance for this step.
3. Power Analysis In this module we obtain cluster- Acknowledgments
robust standard errors after fitting a linear mixed
effect model, using R’s cluster.vcov [31] implemen-
tation to return a multi-way cluster-robust variance- We extend our heartfelt gratitude to Doug Wong and Mike
covariance matrix and perform inference for esti- James for their invaluable support and funding, which made
mated coefficients using R’s coeftest. this research possible. We also wish to thank Kingshuk
RoyChoudhury and Han Wu for their insightful discussions
and constructive feedback. Their expertise and thoughtful
SageMaker Pipeline executions can be scheduled using engagement have greatly contributed to the development
Amazon EventBridge passing run-time parameters. This and refinement of our ideas.
allows to define a single pipeline with multiple executions
(e.g. one per marketplace) based on input parameters. The
serves as a blueprint for a large scale production system
combining multiple languages (R, Python on Spark) utilizing References
each to their respective strengths (R for statistical analysis
modules, python on Spark for ETL) triggering SageMaker [1] R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu,
processing jobs orchestrated via SageMaker Pipelines. N. Pohlmann, Online controlled experiments at large
scale, in: Proceedings of the 19th ACM SIGKDD In-
ternational Conference on Knowledge Discovery and
Data Mining, 2013, pp. 1168–1176.
[2] J. Vaver, J. Koehler, Measuring ad effectiveness using
geo experiments, Google Inc (2011).
[3] D. Rubin, Causal inference using potential outcomes, 321–330.
J. Amer. Statist. Assoc. 100 (2005) 322–331. URL: https: [23] M. Girvan, M. E. Newman, Community structure in
//doi.org/10.1198/016214504000001880. doi:10.1198/ social and biological networks, Proceedings of the
016214504000001880. National Academy of Sciences 99 (2002) 7821–7826.
[4] T. D. Cook, D. L. DeMets, Introduction to Statistical [24] M. E. Newman, Fast algorithm for detecting commu-
Methods for Clinical Trials, Chapman and Hall/CRC, nity structure in networks, Physical Review E 69 (2004)
2007. 066133.
[5] I. Bojinov, D. Simchi-Levi, J. Zhao, Design and analysis [25] F. Lin, W. W. Cohen, Power iteration clustering (2010).
of switchback experiments, Management Science 69 [26] G. Karypis, V. Kumar, Metis: A software package for
(2023) 3759–3777. partitioning unstructured graphs, partitioning meshes,
[6] C. Roberts, S. A. Roberts, Design and analysis of clini- and computing fill-reducing orderings of sparse ma-
cal trials with clustering effects due to treatment, Clin- trices (1997).
ical Trials 2 (2005) 152–162. [27] M. Saveski, J. Pouget-Abadie, G. Saint-Jacques,
[7] P. M. Aronow, C. Samii, Estimating average causal W. Duan, S. Ghosh, Y. Xu, E. M. Airoldi, Detecting
effects under general interference, with application to network effects: Randomizing over randomized exper-
a social network experiment (2017). iments, in: Proceedings of the 23rd ACM SIGKDD
[8] L. Backstrom, J. Kleinberg, Network bucket testing, International Conference on Knowledge Discovery
in: Proceedings of the 20th International Conference and Data Mining, 2017, pp. 1027–1035.
on World Wide Web, 2011, pp. 615–624. [28] J. D. Angrist, J.-S. Pischke, Mostly Harmless Economet-
[9] J. Ugander, B. Karrer, L. Backstrom, J. Kleinberg, Graph rics: An Empiricist’s Companion, Princeton University
cluster randomization: Network exposure to multiple Press, 2009.
universes, in: Proceedings of the 19th ACM SIGKDD [29] H. White, Asymptotic Theory for Econometricians,
International Conference on Knowledge Discovery Academic Press, 2014.
and Data Mining, 2013, pp. 329–337. [30] A. V. Joshi, A. V. Joshi, Amazon’s machine learning
[10] D. Holtz, R. Lobel, I. Liskovich, S. Aral, Reducing toolkit: Sagemaker, Machine Learning and Artificial
interference bias in online marketplace pricing experi- Intelligence (2020) 233–243.
ments, arXiv preprint arXiv:2004.12489 (2020). [31] M. Arellano, Computing robust standard errors for
[11] G. W. Basse, H. A. Soufiani, D. Lambert, Randomiza- within-groups estimators, Oxford Bulletin of Eco-
tion and the pernicious effects of limited budgets on nomics & Statistics 49 (1987).
auction experiments, in: Artificial Intelligence and
Statistics, PMLR, 2016, pp. 1412–1420.
[12] M. Ostrovsky, M. Schwarz, Reserve prices in internet
advertising auctions: A field experiment, in: Pro- A. Components of Interest on
ceedings of the 12th ACM Conference on Electronic Recent LLM Powered SERPs
Commerce, 2011, pp. 59–60.
[13] R. Johari, H. Li, I. Liskovich, G. Y. Weintraub, Experi-
mental design in two-sided platforms: An analysis of
bias, Management Science 68 (2022) 7069–7089.
[14] P. Bajari, B. Burdick, G. W. Imbens, L. Masoero, J. Mc-
Queen, T. Richardson, I. M. Rosen, Multiple randomiza-
tion designs, arXiv preprint arXiv:2112.13495 (2021).
[15] W. Vickrey, Counterspeculation, auctions, and com-
petitive sealed tenders, The Journal of Finance 16
(1961) 8–37.
[16] E. H. Clarke, Multipart pricing of public goods, Public
Choice (1971) 17–33.
[17] T. Groves, Incentives in teams, Econometrica: Journal
of the Econometric Society (1973) 617–631.
[18] L. H. Moulton, Covariate-based constrained random-
ization of group-randomized trials, Clinical Trials 1
(2004) 297–305.
[19] M. G. Hudgens, M. E. Halloran, Toward causal in-
ference with interference, Journal of the American
Statistical Association 103 (2008) 832–842.
[20] R. Stram, P. Reuss, K.-D. Althoff, Weighted one mode
projection of a bipartite graph as a local similarity mea-
sure, in: Case-Based Reasoning Research and Devel-
opment: 25th International Conference, ICCBR 2017,
Trondheim, Norway, June 26-28, 2017, Proceedings 25,
Springer, 2017, pp. 375–389.
[21] A. Braunstein, L. Dall’Asta, G. Semerjian, L. Zdeborová, Figure 5: Evolving Large Language Model powered SERPs with
Network dismantling, Proceedings of the National components of interest highlighted in rectangle boxes.
Academy of Sciences 113 (2016) 12368–12373.
[22] M. E. Newman, Detecting community structure in
networks, The European Physical Journal B 38 (2004)