SERP Interference Network and Its Applications in Search Advertising Purak Jain1,* , Sandeep Appala1 1 Amazon, Seattle, USA Abstract Search Engine marketing teams in the e-commerce industry manage global search engine traffic to their websites with the aim to optimize long-term profitability by delivering the best possible customer experience on Search Engine Results Pages (SERPs). In order to do so, they need to run continuous and rapid Search Marketing A/B tests to continuously evolve and improve their products. However, unlike typical e-commerce A/B tests that can randomize based on customer identification, their tests face the challenge of anonymized users on search engines. On the other hand, simply randomizing on products violates Stable Unit Treatment Value Assumption for most treatments of interest. In this work, we propose leveraging censored observational data to construct bipartite (Search Query to Product Ad or Text Ad) SERP interference networks. Using a novel weighting function, we create weighted projections to form unipartite graphs which can then be use to create clusters to randomized on. We demonstrate this experimental design’s application in evaluating a new bidding algorithm for Paid Search. Additionally, we provide a blueprint of a novel system architecture utilizing SageMaker which enables polyglot programming to implement each component of the experimental framework. Keywords sponsored search, experiment design, A/B testing 1. Introduction interference" assumption holds. We observe that for most treatments of interest (e.g. new bidding algorithms for paid Search Engine marketing teams in the e-commerce industry search programs or improved title headlines for free search manage global search engine traffic with the aim of optimiz- snippets), the SERP page leads to interference between treat- ing long-term profitability by delivering the best possible ment and control units causing the Stable Unit Treatment customer experience on the most important web pages on Value Assumption (SUTVA) to fail, and consequently in- the internet - Search Engine Results Pages (SERPs). Figure duces bias in the standard estimators used to evaluate the 1 shows the prominent parts of SERP. Search Engines con- value generated by the treatment. A standard answer [4, 5] tinue to evolve their customer experience and features due to this problem is to replace the “product-split” experiment to social, technological and economic forces, including pri- design with a “time-split” (or “switchback”) design, where vacy concerns, and further monetization of their properties the entire market switches repeatedly between treatment (SERPs). In anticipation of opportunities and risks that come and control. In practice, such designs turn out to be equally with a shifting landscape advertisers continuously innovate time consuming as geo-based splits since we need to account with new bidding algorithms, improved paid and free search for long lengths of adjustment period between switches due creatives, landing pages etc. Randomized experiments, or to the presence of an intermediary i.e. search engine that ap- A/B tests, are the standard approach for evaluating causal plies the treatment and takes its own time which advertisers effects of new features [1]. However, Search Marketing cannot control. experiments are unlike conventional A/B tests in industry that can randomize on customers as advertisers don’t iden- tify their customers when they are on a search engine i.e. the ad publisher. Instead, advertisers may run A/B tests randomized by geographic locations [2] using search en- gine’s geo-targeting capabilities but due to ad publisher’s API limitations they are unable to do so without having to clone entire advertisement campaigns. The cloning of entire accounts is operationally expensive and time consuming restricting the velocity at which they can run such trials. The next obvious choice for unit of randomization is usu- ally products or search queries. However for any A/B test, splits of the unit of randomization should satisfy the assump- tions of the Neyman - Rubin causal framework [3], that is, no interference, unconfoundness, overlap and no hidden treatment variations. For example, if we simply randomly select products into control and treatment groups, it should hold the unconfoundness and the overlap assumption given a large sample size but we still need to check if the "no ACM SIGKDD Conference on Knowledge Discovery and Data Mining, AdKDD Workshop 2024, August 2024, Barcelona, Spain * Corresponding author. $ purakjn@amazon.com (P. Jain); appalar@amazon.com (S. Appala) € https://www.amazon.science/author/purak-jain (P. Jain) Figure 1: Components of interest on a Search Engine Results  0009-0009-9758-9336 (P. Jain) Page. © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Another approach to dealing with spillovers or interfer- 2. Setting and Motivation ence is given by clustered experiments [6, 7] or in social network settings, by network bucketing testing [8] where Shopping Ads is one of the ad formats supported on SERPs. nodes that are relatively clustered together are given the To place ads within the shopping ad carousel advertisers same assignment of treatment or control [9]. Our work is in- need to participate in an auction competing with other ad- spired by similar clustered experiments methods that have vertisers. The format of the auction is considered close to been applied to estimate and reduce bias in marketplace second price Vickrey–Clarke–Groves (VCG) [15, 16, 17], al- experiments [10]. Specifically, one such example involves though the exact ad publisher implementation is a blackbox experimentation in internet ad auctions, where each auction for us. As such to maximize long term profitability, it is im- consists of a keyword along with a set of advertisers who portant to constantly develop, test and launch new bidding submit competing bids in order for their ads to be displayed algorithms responsible for valuating products worldwide. when the keyword is queried by a user. There is cross-unit Let’s say, to test out a new bidding algorithm we simply interference because the same advertiser or keyword may split on products. The Stable Unit Treatment Value As- appear in multiple auctions. Basse et al. [11] and Ostrovsky sumption (SUTVA) presumes that the valuation assigned to and Schwarz [12] make the observation that the auction one product by the new algorithm does not influence the type used for one keyword does not meaningfully affect profitability of other products. However, in the context of how advertisers bid for other keywords. They then consider shopping advertisements, this assumption may be violated experiments that group auctions into clusters by their key- due to potential between-product interference. This inter- words and randomize auction formats across these keyword ference occurs when both a product with a treatment bid clusters, rather than across advertisers, as a means to avoid from the new model and another with a control bid from problems with interference. More broadly, in our context of the current model participate in the same auction triggered Search Marketing, this idea of cluster-level randomization by a search query, deemed relevant by the ad publisher for corresponds to identifying product or search query clusters both products. See figure 2 for an example. Such scenarios that are relatively isolated from each other and randomizing clearly breach SUTVA, challenging the validity of our eval- the interventions across product-clusters rather than across uation method. If one product happens to be assigned to products. Our primary contribution lies in leveraging obser- treatment group and the other one to control, then the dif- vational data to build bipartite (Search Query - Product) and ference in financial performance between the two products tripartite (Search Query - Paid Search Product - Free Search will be resulted from the combined effect of treatment and URL) SERP interference networks. We introduce an inno- between-product spillover effect, thus making the treatment vative weight function to generate weighted projections, effect indistinguishable from the product spillover effect. transforming these networks into unipartite graphs. These graphs facilitate the clustering of products that co-appear on SERPs through Paid Search Shopping Ads, Text Ads, or Free Product Listings. The resultant clusters can then be randomized during A/B tests to generate insights. Note that more recently, Johari et al. [13] and Bajari et al. [14] have proposed newer experiment designs where both search query and product units are randomized simultane- ously. While having a similar flavor, neither framework applies easily to our problem of interest. To begin with, we cannot control "search-query" assignment as that is deter- mined by the search engine i.e. the ad publisher. Johari et al. [13] use a choice model to capture spillovers, which captures a different kind of market than the one we consider, where interference is mediated by a matching algorithm. Bajari et al. [14] imposes a local interaction assumption, which does not hold in our setting. However, when the graph is a bi-partite graph it holds some similarity which we plan to explore in future work for measuring the magnitude of spillovers. The rest of this paper proceeds as follows. In Section 2, Figure 2: Note the two Shopping Ad advertisements shown in the same carousel. If these two products have bids from different we use the two-population search query - product case as a bidders (i.e. control and treatment) then Stable Unit Treatment motivation to build SERP interference network to test out Value Assumption is violated. new bidding algorithms. In Section 3, we describe in greater detail our experiment design. In Section 4, we further share details on testing a new bidding model using this experimen- tation design. Section 5 provides an overview of a system architecture blueprint for deploying such experimentation 3. Product-Cluster Randomized frameworks. Finally, we discuss our findings and future Control Trial Design extensions in Section 6. To address the challenge of interference in experimental designs, we propose a preemptive modeling strategy that incorporates interference networks during the design phase. This approach allows us to shift the unit of randomization from individual products to clusters of products, as illus- Table 1 Mock row in a Search Engine’s Query Report shared with the advertiser. Search Query Impressions Product/Keyword Clicks Metric Day Ad Campaign chopper axe 6 B00EOVRX06 1 2023-09-01 12345 trated in Figure 3. Importantly, traditional constrained ran- presence of an edge in the original bipartite graph, domization methods [18], such as segmenting by product otherwise 0. categories, prove ineffective. This is because search engines can associate broad upper funnel search queries (e.g., "Harry Using the above approach to take a weighted one-mode Potter") with a diverse range of products across multiple cat- projection of the bipartite graph leads to an increase in the egories (e.g., a book, toy, or blanket related to Harry Potter). number of edges, since (︀ if )︀ a search query links to 𝑛 products, By leveraging interference networks, our method ensures we need to consider 𝑛2 pairs of edges. more robust and accurate experimental outcomes. Modeling Network Interference The notion of inter- ference in the network [19] we construct has to be aligned with the notion of interference we are trying to estimate. Since the relevance of products to a user search query is determined by the search engine and ranking algorithm, an advertiser cannot use its internal datasets that provide product to keyword mapping e.g. e-commerce website’s own search to product results. Instead, we use daily reports provided by the ad publisher itself. These reports have in- formation on which actual user search query on the search engine was mapped to which shopping ads product by the ad publisher. A sample mock row from such a report is shown in Table 1. We use these search query reports to construct an undirected bipartite search query - product graph using the number of impressions as edge weight. Unipartite Projection To apply one mode projection of the bipartite graph onto the product nodes in order to model the between-network interference, we needed a scoring function to attribute weights to the resulting graph edges. Since, we start with large number of products ( 200M+), we could not directly use the edge weighting functions pro- posed by Stram et al. [20] due to the computational complex- ity. Instead we propose the following edge weight function that requires significantly less computation: 𝑛 ∑︁ 1 𝑊uni (𝑎, 𝑏) = 𝑖=1 log 𝑒 𝑠𝑞𝑖 ) (𝑓 min(𝑊bi (𝑠𝑞𝑖 , 𝑎), 𝑊bi (𝑠𝑞𝑖 , 𝑏)) 𝐼[𝑠𝑞𝑖 , 𝑎, 𝑏] (1) max(𝑊bi (𝑠𝑞𝑖 , 𝑎), 𝑊bi (𝑠𝑞𝑖 , 𝑏)) where: Figure 3: Birds-eye view of our experiment design. First, we 1. 𝑊uni (𝑎, 𝑏) is the edge weight in the unipartite graph fetch the search query reports from the ad publisher and create a (one-mode projection) between product a and b. search-query, product bipartite graph. Then we take a weighted 2. 𝑊bi (𝑠𝑞𝑖 , 𝑎) is edge weight between search query 𝑖 one mode projection and finally use a clustering algorithm to and 𝑎 in the original bi-partite graph. cluster products. We then do a stratified random split of product 3. 𝑓𝑠𝑞𝑖 is the number of distinct products that a par- clusters. ticular search query drives impressions to. Since the distribution is right-skewed i.e. few upper fun- Graph Partitioning Methodology Constructing a prod- nel queries drive impressions to only a few dis- uct graph following the above approach then allows us to use tinct products, weighing down by the log of fre- network dismantling algorithms [21] as oppose to naive con- quency of search query helped us to weigh down nected components approach to creating product clusters. edge weight contributions between two products Historically, the community identification problem is a well from very generic queries. studied problem in computer science literature [22, 23, 24]. 4. 𝐼[𝑠𝑞𝑖 , 𝑎, 𝑏] is 1 if search query 𝑠𝑞𝑖 trigger an impres- However, a lot of the proposed methods wouldn’t scale sion for both product 𝑎 and 𝑏 as represented by the up since we are dealing with graphs that are as large as 200M+ nodes and 400M+ edges. Thus, anything that runs new shopping ad bidder experiment, we ended up having in 𝑂(|𝑉 |2 ) or 𝑂(|𝐸|2 ) is not practical. Moreover during 10,000 clusters and 36% edge weight across clusters. Note, the graph partitioning phase, we needed to find a balance the above measure (𝐿) overstates the spillover effects as between two objectives: they consider spillover between clusters that may end up being in the same group (C or T). 1. Maximize the number of product clusters as they translate to randomization units. More clusters Magnitude of Spillover: The search query - product equal more power for our test. bipartite graph we construct usually has a clustering coef- 2. Minimize the between clusters edge weights. ficient [20] of around ∼ 0.6 for most marketplaces which indicates tightly knit groups in the network suggesting high spillover. However, to empirically provide a lower bound Since, the more clusters we create, the less isolated they on the magnitude of bias due to interference we need to are, these two objectives are conflicting. For example, if conduct a meta-experiment that randomizes over two ex- we want to have zero connections across clusters, then the periment designs: one Bernoulli randomized, one cluster obvious solution is to have one cluster only. This, of course, randomized. We can then check for a statistically significant would not lend itself to an A/B test. To balance the above difference between the total average treatment effect esti- two objectives, first we look at the percentage edge weight mates obtained with the two designs [27]. In the absence of across 𝑘 clusters (𝐶1 , ..., 𝐶𝑘 ) i.e. leakage as 𝐿: business approval to run such a meta-experiment, the next best directional data point we have is from our previous attempt to run simple product-split A/B test. The impact ∑︀𝑘 ∑︀ / 𝑖 𝑊uni (𝑗, 𝑘) 𝐿= 𝑖=1 ∑︀𝑗∈𝐶𝑖 ,𝑘∈𝐶 measured from that experiment had been largely overstated 𝑊 uni (𝑗, 𝑘) 𝑗,𝑘 (∼ 44% lift) when compared to the actual lift (∼ 24% lift) observed. Secondly, we use a clustering algorithm that is designed for balanced clustering, that is, all clusters should have roughly equal size. We evaluated naive connected compo- nents, power iteration clustering (PIC) [25], and METIS [26]. 4. Application Connected components minimizes leakage, but suffers from extreme imbalance. PIC improves cluster imbalance, but In this section, we discuss the use-case motivated in Section suffers from high leakage. We choose to use METIS parti- 2 to show an application of the product-cluster random- tioning algorithm which is an extremely efficient and fast ized control trial design. We had developed a new machine implementation of graph partitioning algorithm for undi- learning based product valuation model for our shopping rected weighted graph. METIS adopts an objective function ads program to improve over the current in production to minimize the number of weighted edges whose vertices heuristic bidding algorithm and we wanted to run an online belong to different partitions. The METIS graph partitioning experiment to understand the impact on the long term profit. consists of three phases: (i) In the graph coarsening phase, We used the methodology described in Section 3 to create a series of successively smaller graphs is derived from the product clusters based on search query reports from the ad input graph. This process continues until the size of the publisher for the past one year. graph has been reduced to just a few hundred vertices, (ii) In the initial partitioning phase, a partitioning of the coarsest Constrained Randomization Once we had the clusters, and hence, smallest, graph is computed and finally (iii) in we created strata of clusters with similar characteristics the un-coarsening phase, the partitioning of the smallest instead of randomizing them in a simple bernoulli fashion. graph is projected to the successively larger graphs by as- We measure the net impressions, clicks, cost and profit of signing the pairs of vertices that were collapsed together to each of the product cluster and stratify clusters on those the same partition. After each projection step, the partition- axis. ing is refined using heuristics to iteratively move vertices between partitions as long as such moves improve the qual- Experiment Setup The goal of this experiment was ver- ity of the partitioning. The advantages of this methodology ifying the null hypothesis that the new bidding strategy is are threefold: better than the current bidding strategy in term of bidding efficiency i.e. increase of net long term profitability while 1. It runs in 𝑂(|𝐸|) time, which is extremely efficient maintaining the total ad spend. We matched the spend be- for large graphs. tween control and treatment groups to control for elasticity 2. It is the only algorithm that allows precise control as well as to comply with spend constraints at account level. of both the number partitions and the balances of Finally, we run a simulation based power analysis for cluster the overall split. randomized designs using difference-in-differences (DID) 3. It is the only algorithm that is specifically trying to estimation [28]. minimize the edgecut (defined as weighted sum of edges that straddle between different clusters). Measurement To measure the impact of the proposed valuation method, a DID analysis for cluster randomized de- signs is performed for two weeks of periods where spends Optimal number of clusters: We want to be able to are closely matched. The results from the DID analysis identify as many nearly independent clusters as possible showed a lift in click-through-rate for the treatment group with leakage controlled within the tolerance. We plot leak- which was consistent with the lift observed post roll out of age against various choice of 𝑘 (number of partitions), and the new bidding model. Note that since model errors can be identify a 𝑘 that is as large as possible where the leakage is correlated within cluster, failure to control for within-cluster as small as possible (i.e. identifying the elbow point). For the error correlation can lead to misleading small standard er- 6. Conclusion and Future Work ror and consequently low p-values. Although we do not control for within-cluster error correlation in the model, post-estimation we obtain cluster-robust standard errors as In this paper, we present a cluster-based randomized con- proposed by White [29]. trol test design which enables search marketing e-commerce teams to do fast online experiment launch while minimizing interference between experimental groups. Our key idea is to use observational data to construct bipartite (Search 5. System Architecture Query - Product) SERP interference networks and use a novel weight function to take weighted projections to form unipartite graphs which can be use to create clusters of prod- Our product-cluster randomized control methodology as ucts appearing together on SERP (via Paid Search shopping detailed in Section 3 asks for a highly scalable and flexible ads, text ads or Free Search listings), and then using those infrastructure with very different compute requirements clusters to randomize on. Online A/B testing results for the and library support for each step. To address these chal- treatment group are consistent with the lift observed post lenges we propose the "Search Marketing Lab" using AWS roll out of a new bidding model thereby showing that the SageMaker [30] pipelines which allows to define a series of A/B test design gives a good estimate of the actual lift. In our interconnected processing steps where each step (i) can be previous attempts to run simple product-split A/B test the provided its own docker image that has our code in preferred impact measured from experiments had been largely over- language and (ii) can have its own compute environment. stated because of spillover effects. Lastly, we present a novel This allows for polyglot programming. Here, we briefly simplified system architecture using SageMaker which al- focus on the split generation component. In particular, we lows scientist to do polyglot programming using compute break the approach into 3 modules: and language suitable for each scientific module. One downside of inferring interference network from search query report data is that such observational data is censored, that is, we only have data when we win the auc- tion. In future, we are investigating using SERP page data from platforms like seoClarity to get better visibility into SERP interference networks and allow us to incorporate not Figure 4: Birds-eye view of Product-cluster split generation sys- just shopping ad products but also Text Ads keyword and tem. Free Search URLs to build comprehensive ad units spanning across all Search channels - Text Ads, Shopping Ads and 1. Graph Generation which requires parsing >1 year Free Search. More recently, this also includes large language of daily search query report data and Keyword data models powered results like shown in Appendix. We can to build a graph edge list - thus requiring spark’s than use these ad units to design cross-channel substitution distributed compute. We use a cluster of memory- experiments. We are also working on investigating further optimized instances for this step. into the stability of these clusters over time and that they 2. Graph Partitioning In this step we use the METIS can be updated in real time as more data flow in from search graph partitioning algorithm. METIS is written en- engines. Finally, we are exploring recent proposed exper- tirely in ANSI C (no distributed implementation) but iment designs by Bajari et al. [14] to measure the actual there is a python wrapper for the METIS library [26] magnitude of spillovers. that we use. We use a single compute-optimized instance for this step. 3. Power Analysis In this module we obtain cluster- Acknowledgments robust standard errors after fitting a linear mixed effect model, using R’s cluster.vcov [31] implemen- tation to return a multi-way cluster-robust variance- We extend our heartfelt gratitude to Doug Wong and Mike covariance matrix and perform inference for esti- James for their invaluable support and funding, which made mated coefficients using R’s coeftest. this research possible. We also wish to thank Kingshuk RoyChoudhury and Han Wu for their insightful discussions and constructive feedback. Their expertise and thoughtful SageMaker Pipeline executions can be scheduled using engagement have greatly contributed to the development Amazon EventBridge passing run-time parameters. This and refinement of our ideas. allows to define a single pipeline with multiple executions (e.g. one per marketplace) based on input parameters. The serves as a blueprint for a large scale production system combining multiple languages (R, Python on Spark) utilizing References each to their respective strengths (R for statistical analysis modules, python on Spark for ETL) triggering SageMaker [1] R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu, processing jobs orchestrated via SageMaker Pipelines. N. Pohlmann, Online controlled experiments at large scale, in: Proceedings of the 19th ACM SIGKDD In- ternational Conference on Knowledge Discovery and Data Mining, 2013, pp. 1168–1176. [2] J. Vaver, J. Koehler, Measuring ad effectiveness using geo experiments, Google Inc (2011). [3] D. Rubin, Causal inference using potential outcomes, 321–330. J. Amer. Statist. Assoc. 100 (2005) 322–331. URL: https: [23] M. Girvan, M. E. Newman, Community structure in //doi.org/10.1198/016214504000001880. doi:10.1198/ social and biological networks, Proceedings of the 016214504000001880. National Academy of Sciences 99 (2002) 7821–7826. [4] T. D. Cook, D. L. DeMets, Introduction to Statistical [24] M. E. Newman, Fast algorithm for detecting commu- Methods for Clinical Trials, Chapman and Hall/CRC, nity structure in networks, Physical Review E 69 (2004) 2007. 066133. [5] I. Bojinov, D. Simchi-Levi, J. Zhao, Design and analysis [25] F. Lin, W. W. Cohen, Power iteration clustering (2010). of switchback experiments, Management Science 69 [26] G. Karypis, V. Kumar, Metis: A software package for (2023) 3759–3777. partitioning unstructured graphs, partitioning meshes, [6] C. Roberts, S. A. Roberts, Design and analysis of clini- and computing fill-reducing orderings of sparse ma- cal trials with clustering effects due to treatment, Clin- trices (1997). ical Trials 2 (2005) 152–162. [27] M. Saveski, J. Pouget-Abadie, G. Saint-Jacques, [7] P. M. Aronow, C. Samii, Estimating average causal W. Duan, S. Ghosh, Y. Xu, E. M. Airoldi, Detecting effects under general interference, with application to network effects: Randomizing over randomized exper- a social network experiment (2017). iments, in: Proceedings of the 23rd ACM SIGKDD [8] L. Backstrom, J. Kleinberg, Network bucket testing, International Conference on Knowledge Discovery in: Proceedings of the 20th International Conference and Data Mining, 2017, pp. 1027–1035. on World Wide Web, 2011, pp. 615–624. [28] J. D. Angrist, J.-S. Pischke, Mostly Harmless Economet- [9] J. Ugander, B. Karrer, L. Backstrom, J. Kleinberg, Graph rics: An Empiricist’s Companion, Princeton University cluster randomization: Network exposure to multiple Press, 2009. universes, in: Proceedings of the 19th ACM SIGKDD [29] H. White, Asymptotic Theory for Econometricians, International Conference on Knowledge Discovery Academic Press, 2014. and Data Mining, 2013, pp. 329–337. [30] A. V. Joshi, A. V. Joshi, Amazon’s machine learning [10] D. Holtz, R. Lobel, I. Liskovich, S. Aral, Reducing toolkit: Sagemaker, Machine Learning and Artificial interference bias in online marketplace pricing experi- Intelligence (2020) 233–243. ments, arXiv preprint arXiv:2004.12489 (2020). [31] M. Arellano, Computing robust standard errors for [11] G. W. Basse, H. A. Soufiani, D. Lambert, Randomiza- within-groups estimators, Oxford Bulletin of Eco- tion and the pernicious effects of limited budgets on nomics & Statistics 49 (1987). auction experiments, in: Artificial Intelligence and Statistics, PMLR, 2016, pp. 1412–1420. [12] M. Ostrovsky, M. Schwarz, Reserve prices in internet advertising auctions: A field experiment, in: Pro- A. Components of Interest on ceedings of the 12th ACM Conference on Electronic Recent LLM Powered SERPs Commerce, 2011, pp. 59–60. [13] R. Johari, H. Li, I. Liskovich, G. Y. Weintraub, Experi- mental design in two-sided platforms: An analysis of bias, Management Science 68 (2022) 7069–7089. [14] P. Bajari, B. Burdick, G. W. Imbens, L. Masoero, J. Mc- Queen, T. Richardson, I. M. Rosen, Multiple randomiza- tion designs, arXiv preprint arXiv:2112.13495 (2021). [15] W. Vickrey, Counterspeculation, auctions, and com- petitive sealed tenders, The Journal of Finance 16 (1961) 8–37. [16] E. H. Clarke, Multipart pricing of public goods, Public Choice (1971) 17–33. [17] T. Groves, Incentives in teams, Econometrica: Journal of the Econometric Society (1973) 617–631. [18] L. H. Moulton, Covariate-based constrained random- ization of group-randomized trials, Clinical Trials 1 (2004) 297–305. [19] M. G. Hudgens, M. E. Halloran, Toward causal in- ference with interference, Journal of the American Statistical Association 103 (2008) 832–842. [20] R. Stram, P. Reuss, K.-D. Althoff, Weighted one mode projection of a bipartite graph as a local similarity mea- sure, in: Case-Based Reasoning Research and Devel- opment: 25th International Conference, ICCBR 2017, Trondheim, Norway, June 26-28, 2017, Proceedings 25, Springer, 2017, pp. 375–389. [21] A. Braunstein, L. Dall’Asta, G. Semerjian, L. Zdeborová, Figure 5: Evolving Large Language Model powered SERPs with Network dismantling, Proceedings of the National components of interest highlighted in rectangle boxes. Academy of Sciences 113 (2016) 12368–12373. [22] M. E. Newman, Detecting community structure in networks, The European Physical Journal B 38 (2004)