1. Introduction

1613-0073

Online Advertising Auctions : Robust Click-Through-Rate Prediction

Ryohei Emori

0 2

Shinya Suzumura

Nobuyuki Shimizu

Takahiro Hoshino

hoshino@econ.keio.ac.jp 0 2

Instrumental Variables, Omitted Variable Bias, Robustness, Cold-start Problem, Click-Through-Rate, Online Advertising Auction

0 Keio University , 2-15-45, Mita, Minato-ku, Tokyo , Japan 1 LY Corporation , Kioi Tower 1-3 Kioicho, Chiyoda-ku, Tokyo , Japan 2 Riken AIP center , 1-4-1 Nihonbashi, Chuo-ku, Tokyo , Japan

2024

25 29

Predicting the click-through rate (CTR) in online ad auctions is essential for calculating bid amounts and forming rankings. However, predicting CTR from historical data faces some dificulties, one of which is the cold-start problem. Our research uses the instrumental variables (IVs) framework to address the cold-start problem and selection bias, validating robust CTR prediction in online advertising auctions. Although generally identifying IVs in wide applications is notably challenging, their potential use is not limited to CTR prediction; they can potentially be used to address practical issues and research questions in advertising auctions in general. We put forth bid amounts as IVs, discussing their validity as IVs and testing the robustness of predictions using IVs in both simulations and real data scenarios. Moreover, we enhanced our methodology by integrating explicit interactions between bid amounts and other features, demonstrating that accounting for heterogeneity in IVs significantly improves prediction accuracy in actual data. Our proposal on IVs and its refined CTR prediction approach enriches the research fields on causal inference robustness and invariant prediction.

1. Introduction

Online advertising, an essential backbone of the digital economy, relies heavily on accurate prediction models to allocate ads efectively and enhance the user experience. Crucially, the accuracy of click-through rate (CTR) prediction plays a pivotal role in determining the success in terms of welfare of of online advertising auctions, and at the same time, hover the potential biases that may skew results [ 1, 2 ].

In addition to the problem of bias that lurks in some online ad auctions and is often the subject of research, the cold-start problem arises when we must make predictions for new advertisements or infrequent users, leading to decreased predictive accuracy. Against the backdrop of problems arising from those various factors, causal methods of predicting user behavior that capture invariant user behavior have risen as a subject of high research interest [ 3, 4, 5 ]. Among them, prior research [ 3 ] has highlighted that one of those causal methods, the instrumental variables (IVs) method, has the potential to contribute to solving the coldstart problem. [ 6 ] provided a methodology for IVs using neural networks, but specific IVs always need to be identiifed in a specific research domain. [ 7 ] uses the user’s search query as an instrumental variable; their use of IVs is limited to search advertising and may not satisfy one of the conditions for IVs, the exclusion restriction.

In this paper, we identify bid amounts as IVs in online ad auction settings and demonstrate that click prediction using the IVs method exhibits robust predictions in the overall prediction and cold start problems.

Although IVs are generally considered dificult to identify, they have the potential to: 1) maximize the use of data, including impressions of ads with low historical win rates; 2) not require random impressions of ads; 3) avoid assumptions AdKDD’24 30th ACM SIGKDD Conference on Knowledge Discovery and ∗Corresponding author.

CEUR

ceur-ws.org that often lead to erroneous predictions due to the unrealistic absence of unobserved confounding factors between treatment and outcome relationships [ 8 ]; and 4) potentially infer the causal efect of impressions on conversion as well as clicks.

Furthermore, we demonstrate that the explicit use of firststage heterogeneity in the IVs method can be strongly recommended in online ad auctions [ 9, 10 ]. First-stage heterogeneity in the IVs method has been relatively overlooked compared to heterogeneity in the second stage, namely, user response. However, we find that increasing the association between IVs and impression probability shows robust predictions for the overall prediction and the cold-start problem.

The contributions of the paper have three main points: 1. We identify and propose valid IVs tailored to online advertising auctions. The IVs suit broad advertising auction contexts, including display and search advertising. Furthermore, the IVs method is expected to have further applications such as causal inference of medium- and long-term efects of ad impressions on conversions, etc., not limited to causal efects on user click behavior in online ad auctions. 2. There have been few empirical examples the IVs method has been demonstrated to be capable of making invariant behavioral predictions. We identify valid IVs for further application in the setting of online ad auctions, a setting in which the research ifeld has been broaden, and demonstrated the robustness of the IVs method’s prediction accuracy for the overall forecast and the cold-start scenario in our experiments. 3. Notably, our research advances the concept of utilizing the first stage heterogeneity in the IVs method in the context of prediction. By considering heterogeneity in the strength of IVs concerning impression probability, our method shows more significantly robust prediction performance in whole prediction and the cold-start scenario.

2. Identification of Instrumental

Variables in Ad Auctions 2.1. Ad Auctions and Biases - ./0.1,!!, !!,!! .345678034,!!, !!,!! !.

Before we explain that the bid amounts is IVs, we describe the setting in ad auctions. This is because it is essential to examine the actual flow of data generation to ascertain the IVs.

The notations used to describe the auction mechanism are as follows: the total number of auctions is N, the number of auctioneers participating in auction ∈ {1, ⋯ , } and the auctioneer’s advertisement is ∈ {1, ⋯ , }. Let ad , be the bid amount that the auctioneer spends on the be the predictive click-through-rate, and ∗ be the ad that wins an impression to the user in the auction not,

. Also, is the outcome that is 1 if ad is clicked and 0 if is a variables vector used to target ads and users in ad . To simplify complex efects such as position bias, we assume a setting where there is only one ad that wins an that is 1 if the ad ∗ is clicked and 0 otherwise. impression. Therefore, let be a binary dummy that is 1 when = ∗ and 0 otherwise. Also, let be the outcome is , Here, is as followed: = ( = 1| = 1, ), where ables. clicked given winning impression, target and other vari is the probability of whether ad will be

In ad auctions, there can be various methods for determining auction scores. Here, for instance, the auction score is calculated as follows: = × , This determination scheme, which takes into account bid amount and predictive CTR in the auction score, has been studied under the name ”weighted GSP” [ 11, 12 ]. When the bid amount is a manual bid by the auctioneer, it is generated from the distribution of bid amounts conditional on the target variable of the ad set by the auctioneer. Alternatively, when the bid amount is an automated bid by the platform, the bid amount is generated by, for example, predictive conversion rate (pCVR) and target CPA. In this case, is a function of

. That is, bid amounts is generated from some distribution conditioned on the target variables of the ad set by the auctioneer or other variables used by the platform. Thus, ∼ ( ), !! !! !! where (⋅) is the generated distribution of bid amounts.

As summarized by [ 2 ], bias in the recommendation system is a looping process. Figure 1 depicts the looping of several biases, focused in ad auctions setting, which are interdependent. In particular, the auction score will be biased if the platform’s prediction of the pCTR is a biased estimator. The same is true for pCVR and adjust term. The assignment of impressions by the auction score with bias is as follows: ∗ = arg max

∈{1,⋯, } biased.

2.2. Causal View of Online Ad Auctions

!! sponse, and is unobserved heterogeneity of click behavior that correlates with some or all of consisting of user and ad features but cannot be observed, known as the omitted variable. ∗(⋅) is a function returns a predictive probability when = 1.

Treatments are determined in the auction system together with predicted values such as pCTR and pCVR, which are conditioned on the user and ad features involved in ad auctions, and the advertiser’s bid amount. At this point, pCTR and pCVR are not conditioned on omitted variables , which generates a bias in the estimates of predictive outcome. Since the bid amount is determined from the predictions with this bias and an auction is formed, there is a strong suspicion that the impressions are endogenous variables, which are variables correlated with the error term amplified through the auction with the omitted variable bias. We consider the assumption that no omitted variables exist as a type of inductive bias, a convenient assumption for pCTR model.

Unconfoundedness, i.e., a situation where no omitted variables exist, is a somewhat severe assumption for realworld data. Therefore, IVs methods that do not require the assumption of unconfoundedness can be compelling and valuable.

2.3. Validating Bid Amounts as IVs

There are three conditions that valid IVs satisfy. The first is the relevance of the IVs to a treatment variable. The second can write them as follows: is an exclusion restriction, where the IVs does not directly afect the outcome but rather afects the outcome through the treatment variable. The third is the independence of the IVs with respect to the treatment and the outcome. Notating IVs vector in ad as and combining these conditions, we ∶ ∶ ∶ IVs { ⟂̸ ,

, } ⟂ ,

| ⟂ , We argue that bid amounts is valid as IVs in ad auctions. The reason bid amounts function as IVs is summarized in impressions, the relevance is explicitly acknowledged by the fact that the main item in the auction score is the bid amount. Concerning the exclusion restriction, the bid amount only influences impressions through the auction score. Therefore, the bid amounts does not influence the user’s click behavior. Conditional on the variables used by advertisers and platforms to set bid amounts, bid amounts are valid instruments.

2.4. Reasons Other Variables are Not Valid

Here, we introduce why other variables, such as bid times used for targeting, do not meet the conditions of an instrumental variable in ad auctions.

Relevance : Take targeting variables as an example. From the perspective of relevance, advertisers determine bid amounts based on targeting users, which should relate to the probability of assignment. Bid amounts influence the auction score directly, ensuring more vital relevance than targeting variables, while targeting variables have an ”indirect” relevance to the auction score.

Conditional Independence : The more crucial condition, however, is that targeting variables do not satisfy the independence from the unobserved factors afecting the user’s probability of clicking. For instance, consider bid times as one of the targeting variables. The time when a user requests an advertisement, that is, the user’s visitation process, and the probability of clicking the ad can be related. Users visiting at 10 AM may have a higher or lower probability of clicking an ad, and even if conditioned on other targeting variables, the presence of unobserved factors makes it impossible to guarantee the independence of bid times from the click probability. On the other hand, the probability that a user will click is considered independent of the bid amount, conditioned on the targeting variables, since the user cannot know how much was paid for the specific advertising at the time of the click.

Exclusion Restriction : From the perspective of the exclusion restriction, targeting variables afect the probability of a user’s click, and do not ensure that their influence on the click probability is exerted solely through the assignment of impressions.

3. Click Prediction with First-stage IVs Heterogeneity

In the methodology section, we propose several variants of the IVs method to examine the following questions: • Q.1 Do prediction methods using simple neural networks with IVs perform in the online ad auction setting? and • Q.2 Is IVs heterogeneity strongly present in online ad auction settings and is explicitly addressing it efective in prediction?, • Q.3 Heterogeneity in treatment efects is widely known, but by how much improvement relative to accounting for heterogeneity in IVs? To introduce models that respond to those questions, the methodology section is organized as follows. For Q.1, We ifrst introduce the basic structure of the nonparametric IVs method and highlight its heterogeneous relevance to the probability of winning impressions in ad auctions. Next, Q.2, we present a method based on an attention network that explicitly considers interactions between IVs and their other features. Finally, Q.3, we explicitly incorporate heterogeneity in click probabilities by employing an interaction structure similar to the heterogeneity of instrumental variables. Figure 3 summarizes our proposed final IVs method.

For simplicity in subscripting the training data, corresponds to the record number in this section.

y!"#!$ &,'()* ','()* &,$% !,$% …

… … … &,'()* ','()* &,$% !,$% tion of multiple IVs, and we assume that depends only on through (

, ) and call it first stage. ∗ is a function which is called second stage. In the ad auctions, ( that returns a predictive probability of the event = 1, is the predicted impression probability, henceforth which is a multi-task learning frame and can be trained , ) in one step together with . Using neural networks, a layer structure can be used that follows the simplified manner of IVs, which we henceforth refer to as the IV-BS approach.

Although there can be several approaches incorporating interactions between features and IVs, we use an attention network. This is because it is suitable merely for validating the idea of bid amount heterogeneity.

3.2. Leveraging First-Stage IVs by Interactions

Given a dataset, let the input feature matrix be represented as after passing through an input layer where all units are fully connected, including units from and features.

Let denote the batch size and represent the number of units in the input layer, leading to having dimensions of × . The instrumental variable, represented as matrix , has dimensions × 1 . To align with the shape of , matrix iv is formed by performing a tiling operation on . Specifically, each row of is replicated on the basis of the number of columns in . Furthermore, the weight matrix for IVs interaction is denoted as iv and has dimensions × . Using these matrices, the attention score iv is calculated as: iv = (

iv( iv ⊙ ) + iv).

Here, we use the swish function as an activation function in the weight matrix iv so as to represent the non-linear strength in the heterogeneity of bid amounts. We feed element-wise products as interactions into the fully connected layer with the softmax function as the activation function to generate the attention score . Then, we obtain the representation g by the element-wise product of the input layer and the generated attention scores iv.

iv = iv ⊙ We combine the representation g obtained by the attention layer and the features input in a fully connected neural network to form the hidden layer.

3.3. Second-stage Heterogeneity

In the second stage, namely in side, it is evident that heterogeneity exists when conditioning on user and advertisement features regarding the efect of impressions. Similarly to how we took the dot product of bid amounts and feature units in the input layer in the first stage, we symmetrically use the same in the second stage. The input layer consists of fully connected units from tures. The structure of the entire network including and fea

3.4. Loss Function for Multi-task Learning

In the multi-task learning framework for pIMP and pCTR, we adjust the loss function for pCTR by applying sample weights through an indicator function, 1{ =1}: = This function ensures that the is only computed for data points with impressions, when = 1, filtering out instances without impressions from afecting the pCTR loss calculation. This approach allows us to concentrate on the performance of the model to predict CTR. × 1{ =1} 5: 6: 7: 8: 9: 10: 15: 16: 17: 18: 19: 20: 21:

4. Experiments

The experimental section is divided into two parts: simulation and evaluation in scenarios approximating the coldstart problem with real data sets. The code for replication is available at the following link: https://github.com/ryoheiemori/NPIV-pCTR. Please note that the repository excludes sections related to private data.

The notation is consistent with that used in Section 3.

4.1. Simulated Datasets

Algorithm 1 Simulating auction data and validating baselines 1: 1. Initializing paramaters: 2: Set parameters ( , , ) 3: ∶= 0 4: while < 5, 000 do

Generate and ∼ Bernoulli( ), where = Logistic( ′ + ) if = 1 then ) ∶= + 1 end if ∼ Bernoulli( ), where = Logistic( ′ + 11: end while 12: Train pCTR: ( = 1| = 1) ∶= ( ) 13: 2. Generating historical auction data: 14: for each auction in 5, 000 do ∼ Beta(, 2) by [ 14 ], where ∶= Logistic( ′ ) = 20 Generate and

a specific distribution: Uniform [ −5, 5 ] for ∈ {1, ⋯ , 10} , from a normal distribution with a mean of 0.1 and variance of conditional independence between the treatment and 4.2.2. Test data In the test data, the prediction baselines using the day after the 7 days of training data is evaluated. The test dataset consists of all independently displayed records conditional on ads’ targeting variables.

To evaluate the model’s performance in cold-start scenarios, the test data was divided based on previous ad impressions. Specifically, the data was split into 20 subsets at every 5% quantile, with each subset containing data points below the respective quantile. To ensure suficient sample size, the test data included 2,000,000 records. Predicting clicks with more past impressions is generally easier, even with a simple baseline.

4.3. Evaluation Score

We used log loss, known as a standard evaluation metric for pCTR, and the area under the curve (AUC) scores. AUC is a proper metric for evaluating rankings in assessing the ability to predict the correct position in auction rankings. For the simulation data, we employes the actual scores and relative scores to compare improvements. For our real dataset, we present relative evaluation scores due to confidentiality. The relative scores are defined as follows: Relative LogLoss =

Relative AUC = (

Naive LogLoss − Compared LogLoss

Naive LogLoss Compared AUC − 0.5

Naive AUC − 0.5 − 1) × 100. To evaluate our proposed methods with instrumental variables, we took a naive benchmark and comparative baselines.

tion. 1. Naive: The Naive has three hidden layers between the input layer of features and their passage to the sigmoid function, building a pCTR model. Each of these hidden layers consists of 256 units. The first layer uses the swish activation function, while the second and third layers use the ReLU activation func2. IV-BS: The baseline is described in section 3.1. Its pCTR model has the same network structure as Naive, including

in the input layer. 3. IV-FS: The baseline is described in section 3.2. In side, it has the same network structure as IV4. IV-SSFS: The baseline in side is described in section 3.3, while its network has the same structure as IV-FS in

side. 5. UBIPS : It consists of times for unbiased inverse propensity weighting estimator [ 15 ]. Its network structure is consistent with IV-BS for

4.5. Comparing Each Baselines

7 6 5 4 sLog Lo 3 2 1

Naive IV-BS UBIPS 100 0 × 100, performance even with omitted variables. IV-BS remains stable and robust, especially on the left side where the test data’s value is high. Notably, omitted variable bias cannot be ignored even in the Weighted GSP impression assignment algorithm, and in this regard, IV-BS demonstrates superior performance. An evaluation of our proposed methods on the real dataset is shown in Figure 5. It is expected that Naive performs relatively well since the training data includes many ads with numerous impressions. However, our proposed methods, IV-BS, IV-FS, and IV-SSFS, show significant improvement in relative AUC, particularly for ads with few previous impressions. The improvement of UBIPS over Naive, unlike in the simulation experiment, is likely attributable to the confounder being associated with the variable observed in the actual data.

Improvement for ads with few impressions matches that for ads with many, likely due to the infrequent inclusion of rare ads in training data, causing popularity bias. Notably, the increasing improvement of IVs methods for the 0 − 20 quantile of previous impressions demonstrates their robustness in predicting rare ads.

5. Conclusion

This paper argues that bid amount is a valid instrumental variable under the assumption of conditional independence, and tested its validity by applying it to predictive CTR. Our experiment on a real dataset showed that explicitly accounting for heterogeneity in the strength of IVs allows for eficient and robust predictions. For greater extensibility, incorporating complex interactions between IVs and other features with more developed approachs such asgraph neural networks is recommended. Additionally, addressing other looping bias and validating prediction methods in repeated auctions would be valuable.

[1]

Marotta ,

Wu ,

Zhang , A. Acquisti, The welfare impact of targeted advertising technologies , Information Systems Research 33 ( 2022 ) 131 - 151 . doi: 10 . 1287/isre. 2021 . 1024 .

[2]

Chen ,

Dong ,

Wang ,

Feng ,

Wang ,

He , Bias and debias in recommender system: A survey and future directions , ACM Transactions on Information Systems 41 ( 2023 ) 1 - 39 .

[3]

Bühlmann , Invariance, causality and robustness, Statistical science 35 ( 2020 ) 404 - 426 .

[4]

He ,

Wang ,

Cui ,

Zou ,

Zhang ,

Cui ,

Jiang , Causpref: Causal preference learning for out-of-distribution recommendation , in: Proceedings of the ACM Web Conference 2022 , 2022 , pp. 410 - 421 .

[5]

Feder , G. Horowitz,

Wald ,

Reichart , N. Rosenfeld, In the eye of the beholder: Robust prediction with causal user modeling , Advances in Neural Information Processing Systems 35 ( 2022 ) 14419 - 14433 .

[6]

Hartford ,

Lewis ,

Leyton-Brown , M. Taddy, Deep iv: A flexible approach for counterfactual prediction , in: International Conference on Machine Learning, PMLR , 2017 , pp. 1414 - 1423 .

[7]

Si ,

Han ,

Zhang , J. Xu,

Yin ,

Song ,

J.-R.

Wen , A model-agnostic causal learning framework for recommendation using search data , in: Proceedings of the ACM Web Conference 2022 , WWW '22, Association for Computing Machinery, New York, NY, USA, 2022 , p. 224 - 233 . URL: https://doi.org/10.1145/ 3485447.3511951. doi: 10 .1145/3485447.3511951.

[8]

G. W.

Imbens , Instrumental variables: An econometrician's perspective , Statistical Science 29 ( 2014 ) 323 - 358 . URL: http://www.jstor.org/stable/43288511.

[9]

Belloni ,

Chen ,

Chernozhukov ,

Hansen , Sparse models and methods for optimal instruments with an application to eminent domain , Econometrica 80 ( 2012 ) 2369 - 2429 .

[10]

Abadie ,

Gu ,

Shen , Instrumental variable estimation with first-stage heterogeneity , Journal of econometrics ( 2023 ) 105425 -.

[11]

D. R.

Thompson ,

Leyton-Brown , Revenue optimization in the generalized second-price auction , in: Proceedings of the fourteenth ACM conference on Electronic commerce , 2013 , pp. 837 - 852 .

[12]

Sun ,

Zhou ,

Deng , Optimal reserve prices in weighted gsp auctions , Electronic Commerce Research and Applications 13 ( 2014 ) 178 - 187 . URL: https://www.sciencedirect.com/ science/article/pii/S1567422314000106. doi:https: //doi.org/10.1016/j.elerap. 2014 . 02 .003.

[13]

Frolich , Nonparametric iv estimation of local average treatment efects with covariates , Journal of econometrics 139 ( 2007 ) 35 - 75 .

[14]

Ferrari ,

Cribari-Neto , Beta regression for modelling rates and proportions , Journal of applied statistics 31 ( 2004 ) 799 - 815 .

[15]

Saito ,

Yaginuma ,

Nishino ,

Sakata ,

Nakata , Unbiased recommender learning from missing-notat-random implicit feedback , in: Proceedings of the 13th International Conference on Web Search and Data Mining , WSDM '20, Association for Computing Machinery, New York, NY, USA, 2020 , p. 501 - 509 . URL: https://doi.org/10.1145/3336191.3371783. doi: 10 . 1145/3336191.3371783.