=Paper= {{Paper |id=Vol-3837/paper5 |storemode=property |title=SERP Interference Network and Its Applications in Search Advertising |pdfUrl=https://ceur-ws.org/Vol-3837/paper_12_ceur_paper.pdf |volume=Vol-3837 |authors=Purak Jain,Sandeep Appala |dblpUrl=https://dblp.org/rec/conf/adkdd/JainA24 }} ==SERP Interference Network and Its Applications in Search Advertising== https://ceur-ws.org/Vol-3837/paper_12_ceur_paper.pdf
                         SERP Interference Network and Its Applications in Search
                         Advertising
                         Purak Jain1,* , Sandeep Appala1
                         1
                             Amazon, Seattle, USA


                                           Abstract
                                           Search Engine marketing teams in the e-commerce industry manage global search engine traffic to their websites with the aim to
                                           optimize long-term profitability by delivering the best possible customer experience on Search Engine Results Pages (SERPs). In order to
                                           do so, they need to run continuous and rapid Search Marketing A/B tests to continuously evolve and improve their products. However,
                                           unlike typical e-commerce A/B tests that can randomize based on customer identification, their tests face the challenge of anonymized
                                           users on search engines. On the other hand, simply randomizing on products violates Stable Unit Treatment Value Assumption for most
                                           treatments of interest. In this work, we propose leveraging censored observational data to construct bipartite (Search Query to Product
                                           Ad or Text Ad) SERP interference networks. Using a novel weighting function, we create weighted projections to form unipartite graphs
                                           which can then be use to create clusters to randomized on. We demonstrate this experimental design’s application in evaluating a
                                           new bidding algorithm for Paid Search. Additionally, we provide a blueprint of a novel system architecture utilizing SageMaker which
                                           enables polyglot programming to implement each component of the experimental framework.

                                           Keywords
                                           sponsored search, experiment design, A/B testing



                         1. Introduction                                                                                              interference" assumption holds. We observe that for most
                                                                                                                                      treatments of interest (e.g. new bidding algorithms for paid
                         Search Engine marketing teams in the e-commerce industry                                                     search programs or improved title headlines for free search
                         manage global search engine traffic with the aim of optimiz-                                                 snippets), the SERP page leads to interference between treat-
                         ing long-term profitability by delivering the best possible                                                  ment and control units causing the Stable Unit Treatment
                         customer experience on the most important web pages on                                                       Value Assumption (SUTVA) to fail, and consequently in-
                         the internet - Search Engine Results Pages (SERPs). Figure                                                   duces bias in the standard estimators used to evaluate the
                         1 shows the prominent parts of SERP. Search Engines con-                                                     value generated by the treatment. A standard answer [4, 5]
                         tinue to evolve their customer experience and features due                                                   to this problem is to replace the “product-split” experiment
                         to social, technological and economic forces, including pri-                                                 design with a “time-split” (or “switchback”) design, where
                         vacy concerns, and further monetization of their properties                                                  the entire market switches repeatedly between treatment
                         (SERPs). In anticipation of opportunities and risks that come                                                and control. In practice, such designs turn out to be equally
                         with a shifting landscape advertisers continuously innovate                                                  time consuming as geo-based splits since we need to account
                         with new bidding algorithms, improved paid and free search                                                   for long lengths of adjustment period between switches due
                         creatives, landing pages etc. Randomized experiments, or                                                     to the presence of an intermediary i.e. search engine that ap-
                         A/B tests, are the standard approach for evaluating causal                                                   plies the treatment and takes its own time which advertisers
                         effects of new features [1]. However, Search Marketing                                                       cannot control.
                         experiments are unlike conventional A/B tests in industry
                         that can randomize on customers as advertisers don’t iden-
                         tify their customers when they are on a search engine i.e.
                         the ad publisher. Instead, advertisers may run A/B tests
                         randomized by geographic locations [2] using search en-
                         gine’s geo-targeting capabilities but due to ad publisher’s
                         API limitations they are unable to do so without having to
                         clone entire advertisement campaigns. The cloning of entire
                         accounts is operationally expensive and time consuming
                         restricting the velocity at which they can run such trials.
                            The next obvious choice for unit of randomization is usu-
                         ally products or search queries. However for any A/B test,
                         splits of the unit of randomization should satisfy the assump-
                         tions of the Neyman - Rubin causal framework [3], that is,
                         no interference, unconfoundness, overlap and no hidden
                         treatment variations. For example, if we simply randomly
                         select products into control and treatment groups, it should
                         hold the unconfoundness and the overlap assumption given
                         a large sample size but we still need to check if the "no

                         ACM SIGKDD Conference on Knowledge Discovery and Data Mining,
                         AdKDD Workshop 2024, August 2024, Barcelona, Spain
                         *
                           Corresponding author.
                         $ purakjn@amazon.com (P. Jain); appalar@amazon.com (S. Appala)
                         € https://www.amazon.science/author/purak-jain (P. Jain)                                                     Figure 1: Components of interest on a Search Engine Results
                          0009-0009-9758-9336 (P. Jain)                                                                              Page.
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).



CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   Another approach to dealing with spillovers or interfer-        2. Setting and Motivation
ence is given by clustered experiments [6, 7] or in social
network settings, by network bucketing testing [8] where           Shopping Ads is one of the ad formats supported on SERPs.
nodes that are relatively clustered together are given the         To place ads within the shopping ad carousel advertisers
same assignment of treatment or control [9]. Our work is in-       need to participate in an auction competing with other ad-
spired by similar clustered experiments methods that have          vertisers. The format of the auction is considered close to
been applied to estimate and reduce bias in marketplace            second price Vickrey–Clarke–Groves (VCG) [15, 16, 17], al-
experiments [10]. Specifically, one such example involves          though the exact ad publisher implementation is a blackbox
experimentation in internet ad auctions, where each auction        for us. As such to maximize long term profitability, it is im-
consists of a keyword along with a set of advertisers who          portant to constantly develop, test and launch new bidding
submit competing bids in order for their ads to be displayed       algorithms responsible for valuating products worldwide.
when the keyword is queried by a user. There is cross-unit         Let’s say, to test out a new bidding algorithm we simply
interference because the same advertiser or keyword may            split on products. The Stable Unit Treatment Value As-
appear in multiple auctions. Basse et al. [11] and Ostrovsky       sumption (SUTVA) presumes that the valuation assigned to
and Schwarz [12] make the observation that the auction             one product by the new algorithm does not influence the
type used for one keyword does not meaningfully affect             profitability of other products. However, in the context of
how advertisers bid for other keywords. They then consider         shopping advertisements, this assumption may be violated
experiments that group auctions into clusters by their key-        due to potential between-product interference. This inter-
words and randomize auction formats across these keyword           ference occurs when both a product with a treatment bid
clusters, rather than across advertisers, as a means to avoid      from the new model and another with a control bid from
problems with interference. More broadly, in our context of        the current model participate in the same auction triggered
Search Marketing, this idea of cluster-level randomization         by a search query, deemed relevant by the ad publisher for
corresponds to identifying product or search query clusters        both products. See figure 2 for an example. Such scenarios
that are relatively isolated from each other and randomizing       clearly breach SUTVA, challenging the validity of our eval-
the interventions across product-clusters rather than across       uation method. If one product happens to be assigned to
products. Our primary contribution lies in leveraging obser-       treatment group and the other one to control, then the dif-
vational data to build bipartite (Search Query - Product) and      ference in financial performance between the two products
tripartite (Search Query - Paid Search Product - Free Search       will be resulted from the combined effect of treatment and
URL) SERP interference networks. We introduce an inno-             between-product spillover effect, thus making the treatment
vative weight function to generate weighted projections,           effect indistinguishable from the product spillover effect.
transforming these networks into unipartite graphs. These
graphs facilitate the clustering of products that co-appear
on SERPs through Paid Search Shopping Ads, Text Ads, or
Free Product Listings. The resultant clusters can then be
randomized during A/B tests to generate insights.
   Note that more recently, Johari et al. [13] and Bajari et al.
[14] have proposed newer experiment designs where both
search query and product units are randomized simultane-
ously. While having a similar flavor, neither framework
applies easily to our problem of interest. To begin with, we
cannot control "search-query" assignment as that is deter-
mined by the search engine i.e. the ad publisher. Johari et al.
[13] use a choice model to capture spillovers, which captures
a different kind of market than the one we consider, where
interference is mediated by a matching algorithm. Bajari
et al. [14] imposes a local interaction assumption, which
does not hold in our setting. However, when the graph is
a bi-partite graph it holds some similarity which we plan
to explore in future work for measuring the magnitude of
spillovers.
   The rest of this paper proceeds as follows. In Section 2,       Figure 2: Note the two Shopping Ad advertisements shown in
                                                                   the same carousel. If these two products have bids from different
we use the two-population search query - product case as a
                                                                   bidders (i.e. control and treatment) then Stable Unit Treatment
motivation to build SERP interference network to test out
                                                                   Value Assumption is violated.
new bidding algorithms. In Section 3, we describe in greater
detail our experiment design. In Section 4, we further share
details on testing a new bidding model using this experimen-
tation design. Section 5 provides an overview of a system
architecture blueprint for deploying such experimentation          3. Product-Cluster Randomized
frameworks. Finally, we discuss our findings and future               Control Trial Design
extensions in Section 6.
                                                                   To address the challenge of interference in experimental
                                                                   designs, we propose a preemptive modeling strategy that
                                                                   incorporates interference networks during the design phase.
                                                                   This approach allows us to shift the unit of randomization
                                                                   from individual products to clusters of products, as illus-
     Table 1
     Mock row in a Search Engine’s Query Report shared with the advertiser.

                    Search Query      Impressions     Product/Keyword    Clicks     Metric Day    Ad Campaign
                       chopper axe         6            B00EOVRX06         1        2023-09-01        12345



trated in Figure 3. Importantly, traditional constrained ran-                  presence of an edge in the original bipartite graph,
domization methods [18], such as segmenting by product                         otherwise 0.
categories, prove ineffective. This is because search engines
can associate broad upper funnel search queries (e.g., "Harry          Using the above approach to take a weighted one-mode
Potter") with a diverse range of products across multiple cat-       projection of the bipartite graph leads to an increase in the
egories (e.g., a book, toy, or blanket related to Harry Potter).     number of edges, since
                                                                                          (︀ if
                                                                                             )︀ a search query links to 𝑛 products,
By leveraging interference networks, our method ensures              we need to consider 𝑛2 pairs of edges.
more robust and accurate experimental outcomes.

   Modeling Network Interference The notion of inter-
ference in the network [19] we construct has to be aligned
with the notion of interference we are trying to estimate.
Since the relevance of products to a user search query is
determined by the search engine and ranking algorithm,
an advertiser cannot use its internal datasets that provide
product to keyword mapping e.g. e-commerce website’s
own search to product results. Instead, we use daily reports
provided by the ad publisher itself. These reports have in-
formation on which actual user search query on the search
engine was mapped to which shopping ads product by the
ad publisher. A sample mock row from such a report is
shown in Table 1. We use these search query reports to
construct an undirected bipartite search query - product
graph using the number of impressions as edge weight.

   Unipartite Projection To apply one mode projection of
the bipartite graph onto the product nodes in order to model
the between-network interference, we needed a scoring
function to attribute weights to the resulting graph edges.
Since, we start with large number of products ( 200M+),
we could not directly use the edge weighting functions pro-
posed by Stram et al. [20] due to the computational complex-
ity. Instead we propose the following edge weight function
that requires significantly less computation:


                   𝑛
                  ∑︁       1
  𝑊uni (𝑎, 𝑏) =
                  𝑖=1
                      log 𝑒 𝑠𝑞𝑖 )
                           (𝑓
            min(𝑊bi (𝑠𝑞𝑖 , 𝑎), 𝑊bi (𝑠𝑞𝑖 , 𝑏))
                                              𝐼[𝑠𝑞𝑖 , 𝑎, 𝑏]   (1)
            max(𝑊bi (𝑠𝑞𝑖 , 𝑎), 𝑊bi (𝑠𝑞𝑖 , 𝑏))

  where:
                                                                     Figure 3: Birds-eye view of our experiment design. First, we
    1. 𝑊uni (𝑎, 𝑏) is the edge weight in the unipartite graph        fetch the search query reports from the ad publisher and create a
       (one-mode projection) between product a and b.                search-query, product bipartite graph. Then we take a weighted
    2. 𝑊bi (𝑠𝑞𝑖 , 𝑎) is edge weight between search query 𝑖           one mode projection and finally use a clustering algorithm to
       and 𝑎 in the original bi-partite graph.                       cluster products. We then do a stratified random split of product
    3. 𝑓𝑠𝑞𝑖 is the number of distinct products that a par-           clusters.
       ticular search query drives impressions to. Since
       the distribution is right-skewed i.e. few upper fun-             Graph Partitioning Methodology Constructing a prod-
       nel queries drive impressions to only a few dis-              uct graph following the above approach then allows us to use
       tinct products, weighing down by the log of fre-              network dismantling algorithms [21] as oppose to naive con-
       quency of search query helped us to weigh down                nected components approach to creating product clusters.
       edge weight contributions between two products                Historically, the community identification problem is a well
       from very generic queries.                                    studied problem in computer science literature [22, 23, 24].
    4. 𝐼[𝑠𝑞𝑖 , 𝑎, 𝑏] is 1 if search query 𝑠𝑞𝑖 trigger an impres-     However, a lot of the proposed methods wouldn’t scale
       sion for both product 𝑎 and 𝑏 as represented by the           up since we are dealing with graphs that are as large as
200M+ nodes and 400M+ edges. Thus, anything that runs              new shopping ad bidder experiment, we ended up having
in 𝑂(|𝑉 |2 ) or 𝑂(|𝐸|2 ) is not practical. Moreover during         10,000 clusters and 36% edge weight across clusters. Note,
the graph partitioning phase, we needed to find a balance          the above measure (𝐿) overstates the spillover effects as
between two objectives:                                            they consider spillover between clusters that may end up
                                                                   being in the same group (C or T).
    1. Maximize the number of product clusters as they
       translate to randomization units. More clusters                Magnitude of Spillover: The search query - product
       equal more power for our test.                              bipartite graph we construct usually has a clustering coef-
    2. Minimize the between clusters edge weights.                 ficient [20] of around ∼ 0.6 for most marketplaces which
                                                                   indicates tightly knit groups in the network suggesting high
                                                                   spillover. However, to empirically provide a lower bound
  Since, the more clusters we create, the less isolated they
                                                                   on the magnitude of bias due to interference we need to
are, these two objectives are conflicting. For example, if
                                                                   conduct a meta-experiment that randomizes over two ex-
we want to have zero connections across clusters, then the
                                                                   periment designs: one Bernoulli randomized, one cluster
obvious solution is to have one cluster only. This, of course,
                                                                   randomized. We can then check for a statistically significant
would not lend itself to an A/B test. To balance the above
                                                                   difference between the total average treatment effect esti-
two objectives, first we look at the percentage edge weight
                                                                   mates obtained with the two designs [27]. In the absence of
across 𝑘 clusters (𝐶1 , ..., 𝐶𝑘 ) i.e. leakage as 𝐿:
                                                                   business approval to run such a meta-experiment, the next
                                                                   best directional data point we have is from our previous
                                                                   attempt to run simple product-split A/B test. The impact
                  ∑︀𝑘      ∑︀
                                     / 𝑖 𝑊uni (𝑗, 𝑘)
            𝐿=
                     𝑖=1
                           ∑︀𝑗∈𝐶𝑖 ,𝑘∈𝐶                             measured from that experiment had been largely overstated
                                  𝑊 uni (𝑗, 𝑘)
                              𝑗,𝑘
                                                                   (∼ 44% lift) when compared to the actual lift (∼ 24% lift)
                                                                   observed.
   Secondly, we use a clustering algorithm that is designed
for balanced clustering, that is, all clusters should have
roughly equal size. We evaluated naive connected compo-
nents, power iteration clustering (PIC) [25], and METIS [26].      4. Application
Connected components minimizes leakage, but suffers from
extreme imbalance. PIC improves cluster imbalance, but             In this section, we discuss the use-case motivated in Section
suffers from high leakage. We choose to use METIS parti-           2 to show an application of the product-cluster random-
tioning algorithm which is an extremely efficient and fast         ized control trial design. We had developed a new machine
implementation of graph partitioning algorithm for undi-           learning based product valuation model for our shopping
rected weighted graph. METIS adopts an objective function          ads program to improve over the current in production
to minimize the number of weighted edges whose vertices            heuristic bidding algorithm and we wanted to run an online
belong to different partitions. The METIS graph partitioning       experiment to understand the impact on the long term profit.
consists of three phases: (i) In the graph coarsening phase,       We used the methodology described in Section 3 to create
a series of successively smaller graphs is derived from the        product clusters based on search query reports from the ad
input graph. This process continues until the size of the          publisher for the past one year.
graph has been reduced to just a few hundred vertices, (ii) In
the initial partitioning phase, a partitioning of the coarsest       Constrained Randomization Once we had the clusters,
and hence, smallest, graph is computed and finally (iii) in        we created strata of clusters with similar characteristics
the un-coarsening phase, the partitioning of the smallest          instead of randomizing them in a simple bernoulli fashion.
graph is projected to the successively larger graphs by as-        We measure the net impressions, clicks, cost and profit of
signing the pairs of vertices that were collapsed together to      each of the product cluster and stratify clusters on those
the same partition. After each projection step, the partition-     axis.
ing is refined using heuristics to iteratively move vertices
between partitions as long as such moves improve the qual-            Experiment Setup The goal of this experiment was ver-
ity of the partitioning. The advantages of this methodology        ifying the null hypothesis that the new bidding strategy is
are threefold:                                                     better than the current bidding strategy in term of bidding
                                                                   efficiency i.e. increase of net long term profitability while
    1. It runs in 𝑂(|𝐸|) time, which is extremely efficient        maintaining the total ad spend. We matched the spend be-
       for large graphs.                                           tween control and treatment groups to control for elasticity
    2. It is the only algorithm that allows precise control        as well as to comply with spend constraints at account level.
       of both the number partitions and the balances of           Finally, we run a simulation based power analysis for cluster
       the overall split.                                          randomized designs using difference-in-differences (DID)
    3. It is the only algorithm that is specifically trying to     estimation [28].
       minimize the edgecut (defined as weighted sum of
       edges that straddle between different clusters).               Measurement To measure the impact of the proposed
                                                                   valuation method, a DID analysis for cluster randomized de-
                                                                   signs is performed for two weeks of periods where spends
   Optimal number of clusters: We want to be able to
                                                                   are closely matched. The results from the DID analysis
identify as many nearly independent clusters as possible
                                                                   showed a lift in click-through-rate for the treatment group
with leakage controlled within the tolerance. We plot leak-
                                                                   which was consistent with the lift observed post roll out of
age against various choice of 𝑘 (number of partitions), and
                                                                   the new bidding model. Note that since model errors can be
identify a 𝑘 that is as large as possible where the leakage is
                                                                   correlated within cluster, failure to control for within-cluster
as small as possible (i.e. identifying the elbow point). For the
error correlation can lead to misleading small standard er-         6. Conclusion and Future Work
ror and consequently low p-values. Although we do not
control for within-cluster error correlation in the model,
post-estimation we obtain cluster-robust standard errors as         In this paper, we present a cluster-based randomized con-
proposed by White [29].                                             trol test design which enables search marketing e-commerce
                                                                    teams to do fast online experiment launch while minimizing
                                                                    interference between experimental groups. Our key idea
                                                                    is to use observational data to construct bipartite (Search
5. System Architecture                                              Query - Product) SERP interference networks and use a
                                                                    novel weight function to take weighted projections to form
                                                                    unipartite graphs which can be use to create clusters of prod-
Our product-cluster randomized control methodology as
                                                                    ucts appearing together on SERP (via Paid Search shopping
detailed in Section 3 asks for a highly scalable and flexible
                                                                    ads, text ads or Free Search listings), and then using those
infrastructure with very different compute requirements
                                                                    clusters to randomize on. Online A/B testing results for the
and library support for each step. To address these chal-
                                                                    treatment group are consistent with the lift observed post
lenges we propose the "Search Marketing Lab" using AWS
                                                                    roll out of a new bidding model thereby showing that the
SageMaker [30] pipelines which allows to define a series of
                                                                    A/B test design gives a good estimate of the actual lift. In our
interconnected processing steps where each step (i) can be
                                                                    previous attempts to run simple product-split A/B test the
provided its own docker image that has our code in preferred
                                                                    impact measured from experiments had been largely over-
language and (ii) can have its own compute environment.
                                                                    stated because of spillover effects. Lastly, we present a novel
This allows for polyglot programming. Here, we briefly
                                                                    simplified system architecture using SageMaker which al-
focus on the split generation component. In particular, we
                                                                    lows scientist to do polyglot programming using compute
break the approach into 3 modules:
                                                                    and language suitable for each scientific module.

                                                                       One downside of inferring interference network from
                                                                    search query report data is that such observational data is
                                                                    censored, that is, we only have data when we win the auc-
                                                                    tion. In future, we are investigating using SERP page data
                                                                    from platforms like seoClarity to get better visibility into
                                                                    SERP interference networks and allow us to incorporate not
Figure 4: Birds-eye view of Product-cluster split generation sys-   just shopping ad products but also Text Ads keyword and
tem.                                                                Free Search URLs to build comprehensive ad units spanning
                                                                    across all Search channels - Text Ads, Shopping Ads and
    1. Graph Generation which requires parsing >1 year              Free Search. More recently, this also includes large language
       of daily search query report data and Keyword data           models powered results like shown in Appendix. We can
       to build a graph edge list - thus requiring spark’s          than use these ad units to design cross-channel substitution
       distributed compute. We use a cluster of memory-             experiments. We are also working on investigating further
       optimized instances for this step.                           into the stability of these clusters over time and that they
    2. Graph Partitioning In this step we use the METIS             can be updated in real time as more data flow in from search
       graph partitioning algorithm. METIS is written en-           engines. Finally, we are exploring recent proposed exper-
       tirely in ANSI C (no distributed implementation) but         iment designs by Bajari et al. [14] to measure the actual
       there is a python wrapper for the METIS library [26]         magnitude of spillovers.
       that we use. We use a single compute-optimized
       instance for this step.
    3. Power Analysis In this module we obtain cluster-             Acknowledgments
       robust standard errors after fitting a linear mixed
       effect model, using R’s cluster.vcov [31] implemen-
       tation to return a multi-way cluster-robust variance-        We extend our heartfelt gratitude to Doug Wong and Mike
       covariance matrix and perform inference for esti-            James for their invaluable support and funding, which made
       mated coefficients using R’s coeftest.                       this research possible. We also wish to thank Kingshuk
                                                                    RoyChoudhury and Han Wu for their insightful discussions
                                                                    and constructive feedback. Their expertise and thoughtful
   SageMaker Pipeline executions can be scheduled using             engagement have greatly contributed to the development
Amazon EventBridge passing run-time parameters. This                and refinement of our ideas.
allows to define a single pipeline with multiple executions
(e.g. one per marketplace) based on input parameters. The
serves as a blueprint for a large scale production system
combining multiple languages (R, Python on Spark) utilizing         References
each to their respective strengths (R for statistical analysis
modules, python on Spark for ETL) triggering SageMaker               [1] R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu,
processing jobs orchestrated via SageMaker Pipelines.                    N. Pohlmann, Online controlled experiments at large
                                                                         scale, in: Proceedings of the 19th ACM SIGKDD In-
                                                                         ternational Conference on Knowledge Discovery and
                                                                         Data Mining, 2013, pp. 1168–1176.
                                                                     [2] J. Vaver, J. Koehler, Measuring ad effectiveness using
                                                                         geo experiments, Google Inc (2011).
 [3] D. Rubin, Causal inference using potential outcomes,              321–330.
     J. Amer. Statist. Assoc. 100 (2005) 322–331. URL: https:     [23] M. Girvan, M. E. Newman, Community structure in
     //doi.org/10.1198/016214504000001880. doi:10.1198/                social and biological networks, Proceedings of the
     016214504000001880.                                               National Academy of Sciences 99 (2002) 7821–7826.
 [4] T. D. Cook, D. L. DeMets, Introduction to Statistical        [24] M. E. Newman, Fast algorithm for detecting commu-
     Methods for Clinical Trials, Chapman and Hall/CRC,                nity structure in networks, Physical Review E 69 (2004)
     2007.                                                             066133.
 [5] I. Bojinov, D. Simchi-Levi, J. Zhao, Design and analysis     [25] F. Lin, W. W. Cohen, Power iteration clustering (2010).
     of switchback experiments, Management Science 69             [26] G. Karypis, V. Kumar, Metis: A software package for
     (2023) 3759–3777.                                                 partitioning unstructured graphs, partitioning meshes,
 [6] C. Roberts, S. A. Roberts, Design and analysis of clini-          and computing fill-reducing orderings of sparse ma-
     cal trials with clustering effects due to treatment, Clin-        trices (1997).
     ical Trials 2 (2005) 152–162.                                [27] M. Saveski, J. Pouget-Abadie, G. Saint-Jacques,
 [7] P. M. Aronow, C. Samii, Estimating average causal                 W. Duan, S. Ghosh, Y. Xu, E. M. Airoldi, Detecting
     effects under general interference, with application to           network effects: Randomizing over randomized exper-
     a social network experiment (2017).                               iments, in: Proceedings of the 23rd ACM SIGKDD
 [8] L. Backstrom, J. Kleinberg, Network bucket testing,               International Conference on Knowledge Discovery
     in: Proceedings of the 20th International Conference              and Data Mining, 2017, pp. 1027–1035.
     on World Wide Web, 2011, pp. 615–624.                        [28] J. D. Angrist, J.-S. Pischke, Mostly Harmless Economet-
 [9] J. Ugander, B. Karrer, L. Backstrom, J. Kleinberg, Graph          rics: An Empiricist’s Companion, Princeton University
     cluster randomization: Network exposure to multiple               Press, 2009.
     universes, in: Proceedings of the 19th ACM SIGKDD            [29] H. White, Asymptotic Theory for Econometricians,
     International Conference on Knowledge Discovery                   Academic Press, 2014.
     and Data Mining, 2013, pp. 329–337.                          [30] A. V. Joshi, A. V. Joshi, Amazon’s machine learning
[10] D. Holtz, R. Lobel, I. Liskovich, S. Aral, Reducing               toolkit: Sagemaker, Machine Learning and Artificial
     interference bias in online marketplace pricing experi-           Intelligence (2020) 233–243.
     ments, arXiv preprint arXiv:2004.12489 (2020).               [31] M. Arellano, Computing robust standard errors for
[11] G. W. Basse, H. A. Soufiani, D. Lambert, Randomiza-               within-groups estimators, Oxford Bulletin of Eco-
     tion and the pernicious effects of limited budgets on             nomics & Statistics 49 (1987).
     auction experiments, in: Artificial Intelligence and
     Statistics, PMLR, 2016, pp. 1412–1420.
[12] M. Ostrovsky, M. Schwarz, Reserve prices in internet
     advertising auctions: A field experiment, in: Pro-           A. Components of Interest on
     ceedings of the 12th ACM Conference on Electronic               Recent LLM Powered SERPs
     Commerce, 2011, pp. 59–60.
[13] R. Johari, H. Li, I. Liskovich, G. Y. Weintraub, Experi-
     mental design in two-sided platforms: An analysis of
     bias, Management Science 68 (2022) 7069–7089.
[14] P. Bajari, B. Burdick, G. W. Imbens, L. Masoero, J. Mc-
     Queen, T. Richardson, I. M. Rosen, Multiple randomiza-
     tion designs, arXiv preprint arXiv:2112.13495 (2021).
[15] W. Vickrey, Counterspeculation, auctions, and com-
     petitive sealed tenders, The Journal of Finance 16
     (1961) 8–37.
[16] E. H. Clarke, Multipart pricing of public goods, Public
     Choice (1971) 17–33.
[17] T. Groves, Incentives in teams, Econometrica: Journal
     of the Econometric Society (1973) 617–631.
[18] L. H. Moulton, Covariate-based constrained random-
     ization of group-randomized trials, Clinical Trials 1
     (2004) 297–305.
[19] M. G. Hudgens, M. E. Halloran, Toward causal in-
     ference with interference, Journal of the American
     Statistical Association 103 (2008) 832–842.
[20] R. Stram, P. Reuss, K.-D. Althoff, Weighted one mode
     projection of a bipartite graph as a local similarity mea-
     sure, in: Case-Based Reasoning Research and Devel-
     opment: 25th International Conference, ICCBR 2017,
     Trondheim, Norway, June 26-28, 2017, Proceedings 25,
     Springer, 2017, pp. 375–389.
[21] A. Braunstein, L. Dall’Asta, G. Semerjian, L. Zdeborová,     Figure 5: Evolving Large Language Model powered SERPs with
     Network dismantling, Proceedings of the National             components of interest highlighted in rectangle boxes.
     Academy of Sciences 113 (2016) 12368–12373.
[22] M. E. Newman, Detecting community structure in
     networks, The European Physical Journal B 38 (2004)