<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Mitigating Targeting Bias in Content Recommendation with Causal Bandits∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>YAN ZHAO</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>MITCHELL GOODMAN</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SAMEER KANASE</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SHENGHE XU</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>YANNICK KIMMEL</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>BRENT PAYNE</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SAAD KHAN</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>PATRICIA GRAO</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Additional Key Words and Phrases: Personalization, Recommender system</institution>
          ,
          <addr-line>Content optimization, Content ranking, Selection bias, Causal bandit, Contextual bandit, Uplift, View-through attribution, Fairness, Counterfactual learning</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>8</fpage>
      <lpage>23</lpage>
      <abstract>
        <p>Recommendations systems play a central role in improving customer experience on the Amazon retail website. Commonly, Learningto-Rank (LTR) methods are employed to rank content, however these methods are subject to bias inherent in the observational data that they use for training. This paper studies a domain-specific self-selection bias, called Content Targeting Bias, introduced when content is generated for specific targeted customers. When content specifically targets classes of customers who are more or less likely to take actions associated with traditional recommendations algorithms (clicks, purchases), the resulting observations reflect a biased relationship between the content and feedback. These observations do not account for the counterfactual condition, or what would have happened if the customer had not received a recommendation. In many cases, customers will have a high propensity of generating rewards, independent of the recommendations shown on the website. In this work we incorporate causal uplift modeling with contextual bandits in order to consider the heterogeneous treatment efect as an adjusted objective for top-k content selection. We demonstrate the performance and impact of the framework through both ofline model evaluations and multiple live A/B experiments. CCS Concepts: • Computing methodologies → Sequential decision making; Batch learning; Learning from implicit feedback; Causal reasoning and diagnostics; Learning to rank; Supervised learning by regression; • Applied computing → Online shopping; • Information systems → Content ranking; Personalization; Top-k retrieval in databases; Recommender systems; • Mathematics of computing → Bayesian computation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 INTRODUCTION</title>
      <p>In many e-commerce applications, customers rely on recommender systems to help sort through large corpuses
of content in order to discover the small fraction of content that they would be interested in. Amazon’s content
optimization/ranking system is designed as a self-service tool which enables teams across the company to build content
recommendation strategies once and run anywhere. Such a content optimization system is challenging mostly due to
following two reasons: (1) continuous learning: surfacing the right content to the right user at the right time requires a
ranking system to continually adapt to users’ shifting interested along with newly introduced content; (2) content bias
reduction: learning the unbiased incremental value of each piece of content given the context is typically unachievable
given the limits of partial observations.</p>
      <p>
        To continuously learn new content and adapt to changing customer behaviors, exploration/exploitation trade-of in
the context of Reinforcement learning, is currently an active area of research. In literature, numerous techniques have
emerged which are competitive and have shown promising results. These include epsilon-greedy ([
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]), adding random
noise to parameters[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], bootstrap sampling[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and Thompson Sampling[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Content bias is introduced by the process of making recommendations online, which influences the way users interact
with the system and how the data collected from users is fed back into the system. This leads to several types of biases
such as popularity bias[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], human decision bias[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], position bias[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] or selection bias[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Traditional learning-to-rank
approaches must contend with the these biases, and most approaches are focusing on position[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] or selection bias[
        <xref ref-type="bibr" rid="ref30">30</xref>
        ].
      </p>
      <p>In this paper, we identified a new type of bias called Content Targeting Bias, introduced in a recommendations systems
at industry scale due to content targeting criteria. Here, content targeting criteria, defined by content recommendation
strategy owners, target only certain populations of customer or types of page contexts. These content owners then
participate in ranking competitions only when targeting criteria is met. Content targeting criteria can be as simple as
"all customers and context" or can be specific to a small portion of customers who have taken particular actions over the
past month(s). For example, Content Targeting Bias is introduced when content owners target only signed-in customers,
which are known to spend more on average than the population as a whole regardless of the recommendations provided.
Another way this targeting bias can happen is when content owners target recommendations to some negative-profit
item pages. Not accounting for such biases in ranking means we end up over or under estimating a content incremental
performance, thus unfair ranking and degraded customer experiences. For practical reasons it is nearly impossible for a
ranking system to have awareness of all targeting criteria that each content owner uses, along with an awareness of the
detailed context and customer information at the level of each content owner. Thus, there needs to be a feature-agnostic
way to mitigate Content Targeting Bias and thereby improve ranking.</p>
      <p>
        On top of this, we further proposed quantified measurement for Content Targeting Bias within recommendation
systems, and proposed solutions for reducing such biases. Our solution to reduce Content Targeting Bias was intuited
by causal bandits work[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. In this work, we incorporate uplift model [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ][
        <xref ref-type="bibr" rid="ref20">20</xref>
        ][
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], using meta learning approaches
(e.g. x-learner, r-learner)[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], into contextualized bandits[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This approach was designed to consider the
heterogeneous treatment efect between when content eligible to show but not actually observed (not treated) v.s.
observed (treated), with the goal of improving equal opportunity of showing content without targeting criteria impacts,
thus maximizing customer experience. To the best of our knowledge, our work is the first to identify content targeting
as a unique bias and incorporate uplift modeling into bandit approaches including Bayesian Linear Regression Model
(BLIR [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]), to reduce such biases for the content ranking problem. During experimentation in a Amazon commercial
system[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], this work achieved significant online improvements across multiple pages within Amazon e-commerce
website.
      </p>
      <p>This paper is organized as follows: In section 2, we describe problem definitions. Section 3 describes the proposed
solution. Section 4 covers ofline model evaluation and online live experiments results with learnings. Finally, Section 5
details conclusions and future work.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>PROBLEM DESCRIPTION</title>
    </sec>
    <sec id="sec-3">
      <title>Formalizing problems</title>
      <p>We define widget group as a region of experience on Amazon’s e-commerce website which can be populated with
recommended content (a.k.a. widget) provided by diferent teams. Note here the number of content rendered on widget
group is much less than the number of all possible candidate content that is generated. Eligibility of rendering widget
 is typically determined by a combined factor of both the widget’s targeting criteria and ranking system valuation.</p>
      <p>
        The metric for measuring reward  is determined by  , short for ‘metric of interest’. In our setting,  takes
into account the short-term as well as long-term impact to the customer’s shopping experience, and helps us to fairly
balance multiple and difering objectives of various stakeholders. Many Learning-to-Rank systems in the literature
([
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref11">11</xref>
        ][
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]) optimize for Click-through Rate (CTR), while we are more interested in site-wise  . Towards this end,
we have adopted an attribution modeling using view-through attribution (VTA)[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ][
        <xref ref-type="bibr" rid="ref31">31</xref>
        ], which credits widgets for all
rewards following an impression within an attribution window (e.g. 100 minutes). For example, if a customer view
some content recommendation  for longer than 1 second (defined as an impression) and then make a purchase in the
subsequent 100 minutes, the reward  generated by this purchase along with all other high value actions taken in these
100 minutes will all be attributed back to the impressed content .
      </p>
      <p>The drawback of this methodology is that it loosens the connection between the content and the associated reward,
which makes ranking system more vulnerable to Content Targeting Bias. Specifically, by considering feedback as
“response” without considering counterfactual cases (i.e. what customer would have behaved if he/she had not received
a recommendation) we end up with a biased estimate. Cases where customers have a high probability of generating
down session rewards independent of the recommendations shown, can lead the system to over-estimate the value of
widgets shown. This misspecification of value due to Content Targeting Bias disrupts fairness guarantees provided by
the ranking system and ultimately leads to a suboptimal customer experience.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Formalizing Content Targeting Bias and ranking fairness</title>
      <p>
        To formalize Content Targeting Bias, we adopt a recently introduced idea of opportunity bias[
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] , a formula designed
to evaluate whether diferent types of content receive clicks (or other engagement metrics) proportional to their true
targeted population sizes (i.e. do content with diferent targeting criteria receive similar true positive rates?). This
method assumes that the content recommended by content owners are all relatively in good quality. We believe this
formalization of Content Targeting Bias is directly aligned with user satisfaction and economic gains of content owners.
      </p>
      <p>
        To quantify the impact of Content Targeting Bias to recommendation systems, we need to first calculate the true
positive rate for each content. Using show rate as an example, suppose content  has been exposed to customers 
times in total, the true positive rate for  is   =  / , where  is the total times of content  is generated based
on content owners’ targeting criteria. Then, we can use the Gini Coeficient[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] to measure the inequality in true
positive rates corresponding to content generation
      </p>
      <p>= Í (2 ∗(−∗Í − 1)∗)  (1)
where contents are indexed from 1 to  in targeted audience size, non-descending. We use −1 ≤  ≤ 1 to quantify
the Content Targeting Bias in recommendation system: a close to 0  indicates a low bias;  &gt; 0 represents
that true positive rate is positively correlated to content targeted audience size; and  &lt; 0 represents that the true
positive rate is negatively correlated to audience size.
3</p>
    </sec>
    <sec id="sec-5">
      <title>METHODOLOGY</title>
      <p>
        To address problems above, we propose a framework employing uplift techniques with contextual bandits on top of
VTA, to de-bias observations for the ranking system. In more detail, we consider adding contextual features in an uplift
model in order to estimate Conditional Average Treatment Efect (CATE [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) between exposure v.s. non-exposure of
a recommendation to customers, using both Randomized Controlled Trial (RCT) or observational data. Under this
framework, we also propose a modeling architecture incorporating Bayesian Linear Regression (BLIR) with Thompson
Sampling to achieve online exploration-exploitation trade-ofs.
3.1
      </p>
    </sec>
    <sec id="sec-6">
      <title>Assumptions and definitions</title>
      <p>We divided the causal impacts of showing a widget to customers on reward  into two parts, (1) request-level
incrementality, or the incremental value of showing top  widgets to customer within request. This is to remove confounding
3
factors which impact the overall down-session reward  independent of the recommendations received. (2) widget-level
attribution: out of the top  widgets, each widget’s contribution to request-level incrementality. Ideally, we want to
have a single causal model that can solve request-level incrementality and widget-level attribution at the same time,
but for scope of this paper, we focus on the first problem, which is more related to the Content Targeting Bias issue, as
removing customer/context intrinsic value from observed reward .</p>
      <p>
        Let  be a dummy variable indicating treatment status, with  = 1 if a customer receives recommended contents
(treatment) for a given request and  = 0 otherwise. The observed reward is defined as  ≡  · (1) + (1 −  ) · (0),
where  (1) and  (0) are the potential outcomes when people receive the treatment or not. Under the unconfoundedness
assumption[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ],  (0),  (1) ⊥  , we can approximate  using  (1) if treatment is imposed , or  (0) otherwise. In
practice, instead of the simple unconfoundedness assumption, we make a conditional unconfoundedness assumption
due to the non-random treatment assignment, which is also known as “strongly ignorable treatment assignment”[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ],
 (0),  (1) ⊥  | . In other words, given all the covariates X, the treatment assignment will be independent of the
potential outcomes. Given  =  , the CATE  ( ) is then defined as  ( ) ≡  [ (1) −  (0) | =  ]
      </p>
      <p>In this work, treatment group is defined as request-level exposure of top  widgets while control group is defined
as non-exposure of the entire widget group. This definition is to account for exogenous factors in the top-k ranking
problem, where widgets on top ranks may have an impact on lower ranked widget’s exposures.
3.2</p>
    </sec>
    <sec id="sec-7">
      <title>Features</title>
      <p>Another key point related to the unconfoundedness assumption is the set of covariates  which can support it. In the
real world, finding all confounders can be intractable, however in this work we simplify the problem by limiting to two
confounding factors which impact user propensities: (1) content targeting where content is only generated to subgroup
of customers; (2) model targeting where the model returns content non-uniformly given diferent page context. The
conditional unconfoundedness assumption then becomes true as long as we can capture page context and a customer’s
intrinsic values in  . In the Content Targeting Bias problem, a customer’s intrinsic values can be related only to the
candidate widgets generated for a given request. Thus, with feature representations of candidate widgets as well as
other page context and customer information, we can appropriately counteract the confounding problem.
3.3</p>
    </sec>
    <sec id="sec-8">
      <title>Two-model framework estimating pseudo-efect</title>
      <p>
        We further propose a two-model framework composed of a baseline (control) model and an uplift model using Linear
Regression, to reduce Content Targeting Bias. Note we use linear model in this paper, but this approach can be applied
to any types of Machine Learning models.
3.3.1 Model 1 - Baseline Model. The baseline model measures the observation in the control group, which is to estimate
the expected down-session rewards of various contents being eligible to show but not actually observed by customers,
defined at request level as ˆ0 =   + , where  is feature weights, and during training we find which best explains
data.  is a noise term to capture unobserved variables.  represents the features used in the baseline model as explained
in above sections.
3.3.2 Model 2 - Uplift Model. For each request with  individual observation (widget) in the treatment groups, which
are returned and actually observed by customers, we define 2
where  is observed down-session reward for a customer at request , and (1) is the imputed treatment efects for
request  in the treated group, based on the baseline outcome estimator.  (1) is imputed treatment efects estimation
for a given request. This “pseudo-efect” is an adjusted objective with Content Targeting Bias reduction, and it is
then used as the outcome in a secondary machine learning model to obtain the response functions with treatment
efects estimated. To achieve online exploration-exploitation trade-of, we utilize Bayesian Linear Regression Model
(BLIR) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] approach, ˆ(1) ∼ N ( , 2 ), where ˆ(1) is the estimated score for widget ,  is the coeficients of

features,  represents the features used in the uplift model. Note that  is diferent from  in that  contains all
candidate widgets information and is only used in ofline, while  only contains features for focal widget . This
uplift model estimates “pseudo-treatment-efects” for the observations in the treatment group, and can help reduce
Content Targeting Bias, since we remove the counterfactual efect using the baseline (control) model. Finally, we do
point-wise ranking online, by estimating a score for every candidate widget, sorting and returning top-K to customers.
The exploration-exploitation trade-of is achieved by sampling model parameters from their posterior distributions
through Thompson Sampling.
      </p>
      <p>In addition to bias reduction with causal efect estimations, another fact of this two-model framework is that the
baseline model is only used ofline, generating uplift objectives for the second model. This simplification empowers us
to include as many features as we can in  without concern for latency or other requirements for online systems, such
as representations of all candidate widgets features, while keeping online feature set  relatively simple but achieve
similar efect on bias reductions.
3.4</p>
    </sec>
    <sec id="sec-9">
      <title>Log-tricks on objectives</title>
      <p>One trick we performed is to transform reward  and uplift estimations into log-scale. Transforming reward into
log-scale is a widely used trick for removing outliers, thus we first introduced it on top of baseline model estimations.
Here instead of using , we used 1 ( ) =  ( ) ∗ 1 (| |) to achieve symmetry and valid values at
zeros (will still denote using  in following context). Due to Jensen’s inequality, transforming log-scaled baseline
estimations _ˆ0 back is biased, thus, we directly perform treatment efects estimations by 3
_ (1) = _ − _ˆ0 ( ) = _ (1) + 
(3)
where _ˆ0 ( ) can be treated as geometric mean of baseline values given covariates  , like ( (Π ˆ0)1/ ) | ,
 (1)
while treatment efects _ (1) becomes multiplicative, like ( _ˆ0 ). Although this definition difers from
additive uplift in above section, the signs still has meaning, in that positive value indicate positive incrementality
whereas negative value indicate negative incrementality. In addition, defining this relationship as multiplicative also
lends an intuitive semantic meaning. For example, say customers who are not signed-in spend $10 on average while
signed-in customers spend $100, when representing uplift for some content , instead of an additive uplift of $10 (thus
$20 for unsigned-in customer and $110 for signed-in customer), a multiplicative lift ratio of 1.1 makes more sense in
terms of e-commerce context (thus $11 for unsigned-in customer and $110 for signed-in customers). Results from A/B
testing show log scaled uplift (multiplicative) performs better than additive uplift.
3.5</p>
    </sec>
    <sec id="sec-10">
      <title>Data collection</title>
      <p>Randomly hiding data, as in RCT, could provide us with a more unbiased estimate of CATE, however it is costly to
proactively hide content from customers. In RCT setup, we randomly punt (do not display) the entire widget group
a small percentage  of the time, and train baseline model using only this punted trafic. In this way, using RCT, we
are able to remove the confounder efect resulting from customer selection bias, e.g. customer’s propensities to scroll
down the page and browse content. Observational data, in contrast to RCT, can provide suficient data with minimal
cost, but at the expense of increased data bias. In practice, when top K widgets are returned, they are actually not
always all shown on viewport to customers. Our system is able to capture this client-side impression behaviors, and
our proposed work is able to train baseline&amp;uplift models using this observational data. Results show limited biases
using observational data compared to RCT.
4
4.1</p>
    </sec>
    <sec id="sec-11">
      <title>MODEL EVALUATION AND EXPERIMENTS</title>
    </sec>
    <sec id="sec-12">
      <title>Ranking fairness estimation</title>
      <p>In ofline model evaluation of ranking fairness, we compared  scores defined in section 2.2 by ranking content
using production model (non-uplift linear bandits directly regressed on VTA) v.s. two-model uplift bandit approach.
Also, in order to measure fairness in multiple dimensions, we defined   in 2 ways</p>
      <p>=  ÍÍÍcocoboÍnsnetctereovnnnetttdegenecxntopeneorxtsaeputnoretesdurerweard TTPPRR ooff spheorfworrmataence (4)</p>
      <p>From ??, uplift approach reduces bias in terms of both content show rate and content average performance, especially
the latter, which indicates that our approach improves fairness based more on content’s performance than their show
rate coverage, which better aligns with business goals.
Through analysis on real online data, we identified several widgets targeting customers with high intrinsic rewards,
compare their observed score with predicted uplift values, and validated that uplift is able to reduce those biases. From
Table 2, widget C and D are widgets targeting at high-valued customer only, while widget A, B, E, F are common
widgets targeting at all customers. The scores are observed or estimated rewards as defined in section 2.1, and are
represented in tuples formatting as (average across all customers, average across high-valued customers only).
Evaluating using proposed framework, we intentionally exclude all customer related features, so model won’t depend
on customer profile. We can see that although widget C has the highest average observed score across all customers,
the actual uplift prediction is not as high as widget B after reducing Content Targeting Bias (0.15 v.s. 0.24), this aligns
with our observations of these widgets through online experimentation. A similar pattern can be found on widget D.
4.3</p>
    </sec>
    <sec id="sec-13">
      <title>RCT and observational data</title>
      <p>We also evaluate the baseline model in two-model uplift approach, trained with either RCT or observation data. Through
ofline analysis, we see that the observational uplift approach is able to achieve similar results compared to RCT
6
of counterfactuals (5% ∼ 10%). This gap can be interpreted by customers’ selection biases, that when customers
intentionally don’t view content (observational data), they might be attracted by other content on the page or already
have a clear shopping mission. This in turn leads to higher estimates in observational baseline models vs RCT. Estimating
this gap is important, since this can be used to adjust observational modeling, and improve model interpretability while
avoiding the high cost imposed by RCT, e.g. showing sub-optimal results on a certain percentage of the populations.
4.4</p>
    </sec>
    <sec id="sec-14">
      <title>Online experiments</title>
      <p>
        Content Targeting Bias might appear in diferent formats across diferent pages. For example, on homepage, Content
Targeting Bias is mostly introduced by diferent targeting criteria on customer populations; while on product detail
pages, biases are mostly introduced by targeting criteria towards context information. To gain a thorough understanding
about this proposed work, we have completed five online randomized A/B experiments[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
following treatments are performed, (1) observational uplift: two-model uplift with observational data;(2) RCT uplift:
two-model uplift with RCT data. In Experiment 2, we ran on slots located at top of product detail pages, with
treatment group as observational uplift. In Experiment 3, we ran on cart pages, using uplift with observational data.
In Experiment 4, we ran on desktop product detail page, the following treatments are performed, (1) observational
uplift; (2) observational uplift with log scale tricks. In Experiment 5 ran on mobile app product detail page, using
observational uplift with log scale tricks as treatment group.
4.4.2 Online experiment results. We observed consistent improvements across experiments. Table 3 shows details on
 results with confidence intervals. Through these experiments, we proved (1) Improvements using heterogeneous
treatment efect estimation on top of bandits approach, on diferent pages including homepage, detail page, cart pages,
across Amazon e-commerce websites. Out of these, experiment 1, 3, 4, 5 achieved statistically significant improvements
with p-value less than 0.05, with experiment 5 having p-value close to 0.000. (2) the proposed method using observational
data achieved significant improvements (with p-value 0.02) while RCT only outperformed production model with low
confidence (with p-value 0.38). This demonstrated that observational uplift modeling can achieve similar results as
using RCT, by successfully minimizing potential bias in training examples. Conversely, RCT depends on random hiding
contents which is guaranteed to be suboptimal some percentage of the time, thus the overall RCT performance is
hurt; (3) the proposed uplift model with log tricks outperforms additive uplift, which can be observed directly from
experiment 4 where uplift with log tricks achieved significant improvements (with p-value 0.024) while additive uplift
improvements was not significant with p-value as 0.17. This further demonstrates our hypothesis on log tricks for
better managing outliers and more reasonable semantic meanings from multiplicative uplift.
      </p>
    </sec>
    <sec id="sec-15">
      <title>5 CONCLUSION</title>
      <p>In this paper, we studied a new type of bias in Learning-to-Rank systems, called Content Targeting Bias. We defined
such bias, proposed quantified measurement and further proposed an online ranking approach using BLIR considering
contextual features into uplift modeling to reduce such bias for top-K content selection. Through this work, we
introduced log-tricks for treatment efect estimations between exposure v.s. non-exposure of a recommendation and
compared baseline models trained using both RCT and observational data. This work demonstrates significant bias
reduction as well as significant  improvements both ofline and online. In future work, building on top of current
framework, we will improve uplift estimation by applying propensity-weighting based meta learner approach e.g.
double ML (R-learner) to improve current uplift modeling, to further reduce display biases in content rankings.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Jason</given-names>
            <surname>Abrevaya</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yu-Chin Hsu</surname>
          </string-name>
          , and Robert P Lieli.
          <article-title>Estimating conditional average treatment efects</article-title>
          .
          <source>Journal of Business &amp; Economic Statistics</source>
          ,
          <volume>33</volume>
          (
          <issue>4</issue>
          ):
          <fpage>485</fpage>
          -
          <lpage>505</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Aman</given-names>
            <surname>Agarwal</surname>
          </string-name>
          , Kenta Takatsu, Ivan Zaitsev, and
          <string-name>
            <given-names>Thorsten</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <article-title>A general framework for counterfactual learning-to-rank</article-title>
          .
          <source>In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <fpage>5</fpage>
          -
          <lpage>14</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Charles</given-names>
            <surname>Blundell</surname>
          </string-name>
          , Julien Cornebise, Koray Kavukcuoglu, and
          <string-name>
            <given-names>Daan</given-names>
            <surname>Wierstra</surname>
          </string-name>
          .
          <article-title>Weight uncertainty in neural network</article-title>
          .
          <source>In International Conference on Machine Learning</source>
          , pages
          <fpage>1613</fpage>
          -
          <lpage>1622</lpage>
          . PMLR,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Malcolm</surname>
            <given-names>C Brown.</given-names>
          </string-name>
          <article-title>Using gini-style indices to evaluate the spatial patterns of health practitioners: theoretical considerations and an application based on alberta data</article-title>
          .
          <source>Social science &amp; medicine</source>
          ,
          <volume>38</volume>
          (
          <issue>9</issue>
          ):
          <fpage>1243</fpage>
          -
          <lpage>1256</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Òscar</given-names>
            <surname>Celma</surname>
          </string-name>
          and
          <string-name>
            <given-names>Pedro</given-names>
            <surname>Cano</surname>
          </string-name>
          .
          <article-title>From hits to niches? or how popular artists can bias music recommendation and discovery</article-title>
          .
          <source>In Proceedings of the 2nd KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Olivier</given-names>
            <surname>Chapelle</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lihong</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>An empirical evaluation of thompson sampling</article-title>
          .
          <source>Advances in neural information processing systems</source>
          ,
          <volume>24</volume>
          :
          <fpage>2249</fpage>
          -
          <lpage>2257</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Li</given-names>
            <surname>Chen</surname>
          </string-name>
          , Marco De Gemmis, Alexander Felfernig, Pasquale Lops, Francesco Ricci, and
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Semeraro</surname>
          </string-name>
          .
          <article-title>Human decision making and recommender systems</article-title>
          .
          <source>ACM Transactions on Interactive Intelligent Systems (TiiS)</source>
          ,
          <volume>3</volume>
          (
          <issue>3</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          ,
          <year>2013</year>
          . 8
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Yarin</given-names>
            <surname>Gal</surname>
          </string-name>
          and
          <string-name>
            <given-names>Zoubin</given-names>
            <surname>Ghahramani</surname>
          </string-name>
          .
          <article-title>Dropout as a bayesian approximation: Representing model uncertainty in deep learning</article-title>
          .
          <source>In international conference on machine learning</source>
          , pages
          <fpage>1050</fpage>
          -
          <lpage>1059</lpage>
          . PMLR,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Graton</given-names>
            <surname>Gathright</surname>
          </string-name>
          , Ranjan Roopesh, Vasudev Rahul, Marshall Yan, and Fan Zhang.
          <article-title>Cross-channel attribution of consumer marketing</article-title>
          .
          <source>In Amazon Machine Learning Conference</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Thore</surname>
            <given-names>Graepel</given-names>
          </string-name>
          , Joaquin Quinonero Candela, Thomas Borchert, and
          <string-name>
            <given-names>Ralf</given-names>
            <surname>Herbrich</surname>
          </string-name>
          .
          <article-title>Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft's bing search engine</article-title>
          . In ICML,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Huifeng</surname>
            <given-names>Guo</given-names>
          </string-name>
          , Ruiming Tang, Yunming Ye,
          <string-name>
            <given-names>Zhenguo</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Xiuqiang</given-names>
            <surname>He</surname>
          </string-name>
          .
          <article-title>Deepfm: a factorization-machine based neural network for ctr prediction</article-title>
          .
          <source>arXiv preprint arXiv:1703.04247</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Somit</surname>
            <given-names>Gupta</given-names>
          </string-name>
          , Ronny Kohavi, Diane Tang, Ya Xu, Reid Andersen, Eytan Bakshy, Niall Cardin, Sumita Chandran, Nanyu Chen,
          <string-name>
            <given-names>Dominic</given-names>
            <surname>Coey</surname>
          </string-name>
          , et al.
          <article-title>Top challenges from the first practical online controlled experiments summit</article-title>
          .
          <source>ACM SIGKDD Explorations Newsletter</source>
          ,
          <volume>21</volume>
          (
          <issue>1</issue>
          ):
          <fpage>20</fpage>
          -
          <lpage>35</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>James</surname>
            <given-names>J</given-names>
          </string-name>
          <string-name>
            <surname>Heckman</surname>
          </string-name>
          .
          <article-title>Sample selection bias as a specification error with an application to the estimation of labor supply functions</article-title>
          . Princeton University Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Thorsten</surname>
            <given-names>Joachims</given-names>
          </string-name>
          , Adith Swaminathan, and
          <string-name>
            <given-names>Tobias</given-names>
            <surname>Schnabel</surname>
          </string-name>
          .
          <article-title>Unbiased learning-to-rank with biased feedback</article-title>
          .
          <source>In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining</source>
          , pages
          <fpage>781</fpage>
          -
          <lpage>789</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Sameer</surname>
            <given-names>Kanase</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Yan</given-names>
            <surname>Zhao</surname>
          </string-name>
          , Shenghe Xu, Mitchell Goodman, Manohar Mandalapu, Benjamyn Ward, Chan Jeon, Shreya Kamath, Ben Cohen, Vlad Suslikov, Yujia Liu, Hengjia Zhang, Yannick Kimmel, Saad Khan, Brent Payne, and
          <string-name>
            <given-names>Patricia</given-names>
            <surname>Grao</surname>
          </string-name>
          .
          <article-title>An application of causal bandit to content optimization</article-title>
          .
          <source>In Proceedings of the 5th Workshop on Online Recommender Systems and User Modeling (ORSUM</source>
          <year>2022</year>
          ),
          <source>in conjunction with the 16th ACM Conference on Recommender Systems (RecSys</source>
          <year>2022</year>
          ), Seattle, WA, USA,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Pavel</surname>
            <given-names>Kireyev</given-names>
          </string-name>
          , Koen Pauwels, and
          <string-name>
            <given-names>Sunil</given-names>
            <surname>Gupta</surname>
          </string-name>
          .
          <article-title>Do display ads influence search? attribution and dynamics in online advertising</article-title>
          .
          <source>International Journal of Research in Marketing</source>
          ,
          <volume>33</volume>
          (
          <issue>3</issue>
          ):
          <fpage>475</fpage>
          -
          <lpage>490</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Sören</surname>
            <given-names>R Künzel</given-names>
          </string-name>
          , Jasjeet S Sekhon,
          <string-name>
            <surname>Peter J Bickel</surname>
            , and
            <given-names>Bin</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>Metalearners for estimating heterogeneous treatment efects using machine learning</article-title>
          .
          <source>Proceedings of the national academy of sciences</source>
          ,
          <volume>116</volume>
          (
          <issue>10</issue>
          ):
          <fpage>4156</fpage>
          -
          <lpage>4165</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Lihong</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Wei Chu</surname>
          </string-name>
          , John Langford, and
          <string-name>
            <given-names>Robert E</given-names>
            <surname>Schapire</surname>
          </string-name>
          .
          <article-title>A contextual-bandit approach to personalized news article recommendation</article-title>
          .
          <source>In Proceedings of the 19th international conference on World wide web</source>
          , pages
          <fpage>661</fpage>
          -
          <lpage>670</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Ian</surname>
            <given-names>Osband</given-names>
          </string-name>
          , Charles Blundell, Alexander Pritzel, and Benjamin Van Roy.
          <article-title>Deep exploration via bootstrapped dqn</article-title>
          .
          <source>Advances in neural information processing systems</source>
          ,
          <volume>29</volume>
          :
          <fpage>4026</fpage>
          -
          <lpage>4034</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Roopesh</surname>
            <given-names>Ranjan</given-names>
          </string-name>
          , Narayanan Sadagopan, and
          <string-name>
            <given-names>Guido</given-names>
            <surname>Imbens</surname>
          </string-name>
          .
          <article-title>A propensity matching approach to multi touch attribution</article-title>
          .
          <source>In Amazon Machine Learning Conference</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Thomas</surname>
            <given-names>S Richardson</given-names>
          </string-name>
          , Yu Liu,
          <string-name>
            <surname>James McQueen</surname>
            ,
            <given-names>and Doug</given-names>
          </string-name>
          <string-name>
            <surname>Hains</surname>
          </string-name>
          .
          <article-title>A bayesian model for online activity sample sizes</article-title>
          .
          <source>In International Conference on Artificial Intelligence and Statistics</source>
          , pages
          <fpage>1775</fpage>
          -
          <lpage>1785</lpage>
          . PMLR,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Paul R Rosenbaum and Donald B Rubin</surname>
          </string-name>
          .
          <article-title>The central role of the propensity score in observational studies for causal efects</article-title>
          .
          <source>Biometrika</source>
          ,
          <volume>70</volume>
          (
          <issue>1</issue>
          ):
          <fpage>41</fpage>
          -
          <lpage>55</lpage>
          ,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Donald</surname>
            <given-names>B</given-names>
          </string-name>
          <string-name>
            <surname>Rubin</surname>
          </string-name>
          .
          <article-title>Estimating causal efects of treatments in randomized and nonrandomized studies</article-title>
          .
          <source>Journal of educational Psychology</source>
          ,
          <volume>66</volume>
          (
          <issue>5</issue>
          ):
          <fpage>688</fpage>
          ,
          <year>1974</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Neela</surname>
            <given-names>Sawant</given-names>
          </string-name>
          , Chii Babu Namballa, Narayanan Sadagopan, and
          <article-title>Houssam Nassif. Multi-armed bandit framework for causal efect optimization</article-title>
          .
          <source>In Amazon Machine Learning Conference</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Neela</surname>
            <given-names>Sawant</given-names>
          </string-name>
          , Chitti Babu Namballa, Narayanan Sadagopan, and
          <string-name>
            <given-names>Houssam</given-names>
            <surname>Nassif</surname>
          </string-name>
          .
          <article-title>Contextual multi-armed bandits for causal marketing</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <year>01859</year>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Bradly</surname>
            <given-names>C Stadie</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sergey Levine</surname>
            , and
            <given-names>Pieter</given-names>
          </string-name>
          <string-name>
            <surname>Abbeel</surname>
          </string-name>
          .
          <article-title>Incentivizing exploration in reinforcement learning with deep predictive models</article-title>
          .
          <source>arXiv preprint arXiv:1507.00814</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Bo</surname>
            <given-names>Tan</given-names>
          </string-name>
          , Pramod Muralidharan, Naveen Nair, Wenduo Wang, Shaurya Gupta, Jimmy Issac, Vignesh Kannappan, Prakash Bulusu, and
          <string-name>
            <given-names>Phil</given-names>
            <surname>Leslie</surname>
          </string-name>
          .
          <article-title>Attribution of prime member signups to prime benefits</article-title>
          .
          <source>In Amazon Machine Learning Conference</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Adam</surname>
            <given-names>Wagstaf</given-names>
          </string-name>
          , Pierella Paci, and Eddy Van Doorslaer.
          <article-title>On the measurement of inequalities in health</article-title>
          .
          <source>Social science &amp; medicine</source>
          ,
          <volume>33</volume>
          (
          <issue>5</issue>
          ):
          <fpage>545</fpage>
          -
          <lpage>557</lpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Hao</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Naiyan</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Dit-Yan Yeung</surname>
          </string-name>
          .
          <article-title>Collaborative deep learning for recommender systems</article-title>
          .
          <source>In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining</source>
          , pages
          <fpage>1235</fpage>
          -
          <lpage>1244</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Xuanhui</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Bendersky</surname>
          </string-name>
          , Donald Metzler, and
          <string-name>
            <given-names>Marc</given-names>
            <surname>Najork</surname>
          </string-name>
          .
          <article-title>Learning to rank with selection bias in personal search</article-title>
          .
          <source>In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval</source>
          , pages
          <fpage>115</fpage>
          -
          <lpage>124</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Shenghe</surname>
            <given-names>Xu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Yan</given-names>
            <surname>Zhao</surname>
          </string-name>
          , Sameer Kanase, Mitchell Goodman, Saad Khan, Brent Payne, and
          <string-name>
            <given-names>Patricia</given-names>
            <surname>Grao</surname>
          </string-name>
          .
          <article-title>Machine learning attribution: Inferring item-level impact from slate recommendation in e-commerce</article-title>
          .
          <source>In KDD 2022 Workshop on First Content Understanding and Generation for e-Commerce</source>
          ,
          <year>2022</year>
          . URL https://www.amazon.science/publications/machine
          <article-title>-learning-attribution-inferring-item-level-impact-from-slate-recommendation-in-ecommerce.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>Zhenyu</given-names>
            <surname>Zhao</surname>
          </string-name>
          and
          <string-name>
            <given-names>Totte</given-names>
            <surname>Harinen</surname>
          </string-name>
          .
          <article-title>Uplift modeling for multiple treatments with cost optimization</article-title>
          .
          <source>In 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)</source>
          , pages
          <fpage>422</fpage>
          -
          <lpage>431</lpage>
          . IEEE,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Ziwei</surname>
            <given-names>Zhu</given-names>
          </string-name>
          , Yun He,
          <string-name>
            <surname>Xing Zhao</surname>
            ,
            <given-names>and James</given-names>
          </string-name>
          <string-name>
            <surname>Caverlee</surname>
          </string-name>
          .
          <article-title>Popularity bias in dynamic recommendation</article-title>
          .
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>