<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SIGIR Workshop on eCommerce, July</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fan Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Weijie Yuan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nahid Anwar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fangping Huang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Braden Hufman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lichun Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantin Shmakov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sunil Goda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Jung</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Niranjana Moleyar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaobo Peng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kuang-chih Lee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Musen Wen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Walmart Inc.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DSP, XGBoost</institution>
          ,
          <addr-line>Airflow, DAG, click-through-rate, conversion-rate, Deep Ranker</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>17</volume>
      <issue>2025</issue>
      <abstract>
        <p>Walmart's Artemis initiative, which establishes an in-house advertising (ad) platform, holds significant business implications aligned with Walmart's mission to enable customers to save money and live better. By introducing auction-based pricing mechanisms for ad impressions, suppliers seek to connect customers with pertinent display and video advertisements can specify their desired price per impression across Walmart's owned-andoperated properties as well as third-party sites. A critical challenge in developing this system is to ensure that the ads displayed on our platform are relevant to users. In this paper, we present our first ever endeavor to build an in-house demand-side platform (DSP) ad ranking system within Walmart. We outline our iterative approach to enhancing ad relevance, thereby improving user engagement by catering to user intent and needs, while simultaneously assisting advertisers in achieving their advertising objectives and maximizing return on investment.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Demand-side platform (DSP) advertising has become increasingly popular in recent years due to its
ability to streamline advertising process for both advertisers and publishers. DSPs enable advertisers to
purchase advertising inventory across various platforms, including display, video, mobile, and social
media, all through a single interface. Leveraging advanced targeting and optimization algorithms,
DSPs facilitate advertisers in reaching their intended audiences with enhanced precision and eficiency,
while also generating valuable data-driven insights to inform and refine future advertising campaign
strategies.</p>
      <p>Prior to implementing an in-house demand-side platform system, Walmart relied on third-party
platforms, such as Google and TradeDesk, to connect advertisers with their target audiences on the
Walmart website. However, these of-the-shelf DSP solutions were neither optimized for Walmart’s
specific e-commerce requirement nor cost-efective. Consequently, Walmart initiated the development
of a customized DSP system designed to better address its distinct advertising needs through the
incorporation of tailored data logic. In this paper, we detail the experiments and advancements
conducted over the past two years in pursuit of creating this novel system. Although further progress
remains necessary, our aim is to ofer valuable insights and perspectives derived from our experience in
developing this sophisticated, custom-built system.</p>
      <p>Specifically, we (1) highlight the key features and data sources incorporated into the system in section
3.1; (2) discuss the details of our core ranking model components in section 3.2; (3) illustrate engineering
infrastructure and training pipeline we developed to build the model, as well as operational excellence
in section 3.3; (4) present a series of experiments we conducted and their results, in order to drive the
system performance, particularly focus on feature engineering in section 4 and 5. Through this paper,
CEUR
Workshop</p>
      <p>ISSN1613-0073
we aim to provide insights into the challenges encountered while developing a custom DSP system
from scratch, as well as describe the innovative solutions implemented to address these challenges.
Furthermore, we outline a road-map for future work, identifying potential next steps aimed at further
enhancement and optimization of the proposed system.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        In advertising technology, DSP systems are used by advertisers to bid for ad space [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. In our context,
the platform is the Walmart website, and advertisers are companies involved in selling products ranging
from dog food to children’s toys. Specifically, if a user is on a page with a particular context, for example
searching for orange juice, it would be relevant to suggest associated items based on the user’s location
and past behavior. Figure 1 shows an example of current display ads location on the Walmart website
and how it is diferent from sponsor ads and organic search results.
      </p>
      <p>Determining how to rank candidate ads that ensure relevance to users is key to all stakeholders’
success in the ads-ecosystem, as it directly influences both user engagement experiences and advertiser
business objectives. Additionally, showing relevant ads to user also helps boost the Walmart platform’s
long term success. Displaying too many irrelevant ads can be a short term win but a long term loss since
users may become dissatisfied and either leave or get “ads blindness”. An ideal ranking methodology
should strive to maximize user engagement as well as advertiser’s return on investment (ROI). Here, we
adopted first price auction and charged advertiser based on impression served. Currently, the ranking
formula within Walmart’s in-house Artemis system is calculated by multiplying the engagement score
by the bid price, as highlighted by the dark green color in Figure 2. As a result, to accurately predict each
user’s engagement score plays a decisive role for the whole Artemis system. Our team’s responsibilities
span across the engagement prediction and bid price generation. However, due to the scope limitation
of this paper, we will focus exclusively on our eforts related to engagement prediction.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Implementation</title>
      <p>In this section, we detail our implementation approach from multiple perspectives, including the data
and features used in modeling, the modeling framework and techniques employed, and operational</p>
      <sec id="sec-3-1">
        <title>3.1. Data and Features</title>
        <p>
          Given the highly imbalanced nature of engagement data, significant care was taken to ensure that our
model did not become biased towards the majority class [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. As commonly observed in click-through
rate (CTR) prediction tasks, the number of click events is substantially smaller than non-click events.
To address this issue, we employed down-sampling to the negative class (non-click events) and assigned
greater weight to the minority class (click events) relative to the majority class during the model training
process. We are using simple random downsamping at current stage, but are also actively exploring
other options like aggregated and cluster sampling.
        </p>
        <p>We will now discuss the features utilized by our model. Figure 3 illustrates the set of features currently
employed in our modeling framework. These features generally fall into the following categories.</p>
        <p>
          The first category consists of supply-side features, like adLocation, deviceType, platform etc.
Another category encompasses request-side features like zipcode, time of the day, day of week
etc. For example, we have three adLocations: Skyline, Marquee and Brandbox on Walmart website. It
helps us mitigate the potential position bias issue by adopting the adLocation feature.
runtime, we need to pre-process them which is illustrated in the right part of the Figure 3. The
precomputed feature data is then stored in two copies: one copy is saved to a Google Cloud Storage (GCS)
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] bucket for ofline training data integration, and the other copy is ingested into an online Cassandra
database [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to facilitate rapid feature retrieval during runtime.
        </p>
        <p>Additionally, we incorporate user-level features such as gender and age. We are currently
emphasizing on adding more of this category of features, as our analyses indicated that incorporating
user-specific attributes significantly enhances the targeting capability of our advertisements, allowing
us to reach more relevant users. Further details regarding our exploration and utilization of user features
are presented in the case study section.</p>
        <p>Finally, we introduced additional pairwise features, such as the duration of time spent by each user
in various top-level departments, and the frequency with which each user visited pages within level 1
product-type categories on Walmart’s website over the past 30 days. Leveraging these cross features
between users and advertised items enables us to uncover user preferences more efectively and generate
more personalized ad recommendations based on historical user behavior. Our subsequent analysis
of feature importance confirmed that incorporating these pairwise interaction features significantly
enhances prediction accuracy and ad targeting efectiveness.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model Framework</title>
        <p>
          The primary model architecture employed for our custom DSP is based on XGBoost [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. This solution
was selected due to its speed, reliability, and track record of success. Additionally, XGBoost is already
utilized by the Sponsored Products (SP) section of Walmart’s AdTech division, making it a natural
choice for our application. By leveraging XGBoost, we constructed a powerful and adaptable model
capable of delivering valuable insights for optimizing campaign performance.
        </p>
        <p>To simply demonstrate the algorithm of XGBoost. The objective function at iteration t that we need
to minimize is the following:

=1
ℒ () = ∑  (  ,  ̂
(−1) +   (  )) + Ω(  )
(1)
split.</p>
        <p>otherwise, we stop growing the branch.</p>
        <p>
          Where  is a function of CART learners, a sum of the current and previous additive trees. Ω(  ) is
the regularization at iteration  . Basically, at iteration  , we need to build a learner that achieves the
maximum possible reduction of loss. In practice, the steps to build the next learner are:
• Start with a single root node (containing all the training examples).
• Iterate over all features and their respective values, evaluating the loss reduction for each possible
• The gain for the best split must be positive and greater than the min_split_gain parameter;
Furthermore, due to our strategy of down-sampling negative data points, calibration techniques
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] were applied after XGBoost modeling to produce output scores that more accurately represent the

=1
min ∑(  ̂ −   )2
(2)
true likelihood of events (clicks or conversions, in our case) and are easier to interpret by subsequent
optimization systems. Among several widely adopted calibration approaches, we selected isotonic
regression, a non-parametric method [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Unlike common techniques such as Platt scaling, which
assumes a logistic relationship, isotonic regression does not impose any predefined functional form.
Additionally, isotonic regression preserves the monotonic relationship between predicted scores and
actual probabilities which is a very essential characteristic for ad-ranking models. In simple words,
calibration through isotonic regression does not alter the original ranking of impressions provided by
the XGBoost model.
        </p>
        <p>To demonstrate the algorithm of isotonic regression, let us assume ( 1,  1), … , (  ,   ) be a simplified
representation of a given set of observations, where   is the raw probability from XGBoost and   is the
ground truth label. Isotonic regression seeks a least-squares fit  ̂ for all  , subject to the constraint that
 ̂ ≤  ̂ whenever   ≤   . Formally, the isotonic regression objective can be represented as follows:
s.t.  ̂ ≤  ̂ for all (, ) ∈ , where  = {(, ) ∶   ≤   }</p>
        <p>
          Models developed by data scientists have traditionally been integrated into engineering systems
through manual processes. Specifically, data scientists construct models using Jupyter notebooks hosted
on Google Cloud Platform’s (GCP) Dataproc environment. These models are rigorously evaluated and
validated using ofline performance metrics, typically ROC-AUC and PR-AUC scores [
          <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
          ].
        </p>
        <p>Once a model achieves satisfactory performance, a corresponding MLEAP bundle is generated to
facilitate integration into production systems [13]. MLEAP is a widely adopted tool in the industry,
ofering a range of benefits. Using MLEAP, data scientists can construct a comprehensive ‘pipeline’
consisting of multiple steps, allowing for complex transformations of the raw features, for example,
preprocessing the input features, executing custom transformations, running machine learning models
such as XGBoost, and converting the raw probability scores to produce well-calibrated predictions.
By providing a standardized, repeatable process for building and deploying models, MLEAP helps
streamline the integration of data science models into engineering systems.</p>
        <p>A crucial feature of MLEAP is its capability to serialize model pipelines into deployable bundles. Once
generated, these bundles can be utilized by engineering teams to instantiate Java or Python objects
capable of delivering real-time predictions during runtime. PySpark/MLEAP bundles are optimized for
computational eficiency, establishing them as a reliable and widely adopted industry standard. This
approach facilitates faster prediction serving and enhances the seamless integration of data science
models within engineering frameworks. An overview of our XGBoost-based PySpark/MLEAP
modelbuilding process is presented in Figure 4.</p>
        <p>Our current MLEAP-XGBoost model is efective at using categorical contextual features, numeric
historical engagement features as well as other low dimensional data; however, we are currently
exploring high dimensional representations of both users and advertisements. XGBoost has limitations
when working with this high dimensional data. To combat this, we’ve begun exploring diferent deep
learning models. We tested simple multi-layer perceptron, as well as Wide and Deep architecture and
DeepFMx architecture [14]. The initial results are on par with the current production model even
without embedding features. With intensive feature engineering and embedding features to kick in at
near future, we are confident that the deep learning model will beat XGBoost in performance.</p>
        <p>Currently, we are supporting four types of engagement: impressions, views, clicks and conversions.
Views, clicks and conversions each requires individual models, each packaged as MLEAP bundles, while
impressions always have an engagement score of 1. We are only passing adrequest to one of the three
aforementioned models for engagement scoring due to infrastructure cost-efectiveness considerations
as well as serving latency constraints.</p>
        <p>In order to utilize embedded features as well as provide three simultaneous engagement scores, we
are developing a multi-task deep learning ranker with an MMoE architecture [15, 16]. Unlike our
XGBoost models, we store our deep learning models in model.onnx files. We then serve the loaded deep
learning model in a Triton environment [17].</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Model Automation and Operational Excellence</title>
        <p>Although manually building model bundles in notebooks is feasible, this approach may lack scalability.
To enhance our processes and improve eficiency, we developed an Airflow Directed Acyclic Graph
(DAG) for model training, evaluation, and promotion [18]. This automation ensures consistent and
reliable execution of these tasks. Airflow extends traditional scheduling tools like crontab by ofering
capabilities such as resource allocation, a user-friendly interface for monitoring logs and job execution
histories, and task scheduling.</p>
        <p>Models are trained on daily basis using the most recent data and subsequently saved to a GCS bucket.
The engineering team retrieves these updated models and creates the corresponding Java objects used
for real-time predictions. The engineering system handles feature extraction from relevant ads and
passes these features to the model, generating probability outputs used for ad-ranking decisions.</p>
        <p>To enhance our Airflow job monitoring, we integrated our workflows with Hubot and implemented
customized logging. These enhancements enable direct transmission of specific model metrics to a
dedicated Slack channel which made monitoring and debugging much easier.</p>
        <p>Recently, collaboration with our machine learning infrastructure team led to the Airflow DAG
development process becoming more eficient. The machine learning infrastructure team integrates most
of the features mentioned above into one single unified platform which is called model automation
framework (MAF). MAF facilitates streamlined on-boarding of new models, training old models with
updated configurations, triggering new run to refresh model artifacts, inspecting model metrics, and
assessing feature coverage, significantly improving the eficiency of our model deployment and promotion
processes.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Case Study</title>
      <p>Our eforts to boost system performance have focused on two primary areas: model innovation and
feature innovation. For the purpose of this paper, we focus on our advancements related to feature
innovation in this section.</p>
      <sec id="sec-4-1">
        <title>4.1. Click Model User Feature</title>
        <p>Previously, our models incorporated ad-side engagement features, which measured how diferent users
interacted with a particular ad, for instance, metrics such as the total number of impressions or the
ad’s average click-through rate (CTR). These ad-centric features contributed significantly to model
performance and ranked prominently in feature importance analyses. Extending this approach, we
subsequently examined another critical dimension influencing ad engagement: user past behavior.</p>
        <p>Figure 5 illustrates the calculation methodology for these user features. Specifically, we aggregated
each user’s interactions with Walmart’s display ads over the previous 7-day and 30-day periods,
generating summary metrics such as total impressions, views, clicks, and conversions, along with derived
ratios including view rate, click-through rate (CTR), and conversion rate (CVR). These newly created
user-level features were subsequently combined with existing production features for ofline evaluation.
In our ofline simulation, we observed a relative lift of 3% in ROC-AUC performance, as shown in Table 2.
Motivated by these promising results, we proceeded to deploy these features in production.</p>
        <p>The primary challenge in deploying this feature to production lies in eficiently managing the high
volume of user-level features at scale for both ofline and online environments. Since, Walmart
Ecommerce observes more than 30 million unique users daily; therefore, eficiently generating hundreds
of millions of ofline training data records, along with handling online serving queries within tens of
milliseconds, necessitates sophisticated system architecture and best engineering practices. Through
multiple iterations and explorations, we developed a scalable user-feature architecture by leveraging
existing infrastructure components capable of addressing these demanding requirements.</p>
        <p>At first, data scientists provide prototype user-feature aggregation queries to the Feature Management
Service (FMS) team. FMS refines these queries into production-quality code and subsequently stores the
aggregated features in both ofline and online feature stores. Ofline features are integrated with current
production features to form updated ofline training datasets. For online feature serving, we utilize
the User Profile Service (UPS), a newly developed dedicated system at Walmart designed explicitly for
handling high-throughput, low-latency queries. Further details regarding feature development and
serving are depicted in Figure 6.</p>
        <p>Our subsequent A/B tests confirmed significant improvements in terms of AUC after incorporating
these user engagement features. These results will be discussed further in the next section.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Conversion Model Prediction Spread Improvement with User Feature</title>
        <p>In the initial iterations of the conversion model, user-specific features were not incorporated, resulting in
challenges related to the concentration of predicted probability scores. Specifically, the model produced
nearly identical conversion probabilities for certain line items, despite variations across dimensions
such as zip code, time of day, and ad location.</p>
        <p>Upon deeper analysis of feature distributions and XGBoost feature importance, we identified the root
cause of this issue: the primary predictive engagement features, calculated at various granularities,
remained constant throughout each day due to their daily update schedule. Additionally, another
significant category of features—relevancy scores—often exhibited values at or near zero for these
problematic line items. This feature distribution consequently caused predicted scores to cluster around
a single point or within a small range.</p>
        <p>To resolve this issue, incorporating user-level features emerged as a natural solution, aligning closely
with the ranking system’s core objective: selecting ads tailored to user preferences while maximizing
platform profitability at the same time. By integrating user-specific features such as the amount of time
spent in various departments and user visit frequency within diferent product categories, the model
became more adept at identifying ads most likely to attract user interest and drive clicks or conversions.</p>
        <p>Following the successful integration and ofline evaluation of these user-level time-spent and
visitfrequency features, as detailed previously in section 4.1, we observed a significant mitigation in
prediction score concentration, resulting in a more varied and discriminative prediction distribution.
Additionally, recent enhancements, including the integration of further user engagement and
brandcategory features, have further improved model performance, aligning with the strategies discussed
earlier in section 4.1.</p>
        <p>It is important to note that enhancing the performance of the conversion model, particularly with
respect to evaluation metrics, is inherently more challenging. One contributing factor is that the existing
production conversion model incorporates a broader set of features compared to the click model, such
as ad conversion engagement metrics and user visit frequency. Given that the current conversion model
has already achieved a high AUC, achieving further incremental improvements becomes increasingly
dificult.</p>
        <p>Furthermore, producing prediction scores with greater variance provides significant benefits to
downstream optimization systems by facilitating clearer distinctions among candidate ads, thus enabling
more efective decision-making when selecting advertisements with the highest potential value.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. EXPERIMENTATION</title>
      <p>This section presents the outcomes of our online A/B experiments designed to evaluate the efectiveness
of newly introduced user-level features in both click and conversion models. We report performance
improvements across key metrics, including click-through rate (CTR), conversion rate (CVR), and return
on ad spend (ROAS), based on production-scale testing on Walmart’s internal platform.</p>
      <sec id="sec-5-1">
        <title>5.1. Result of Click Model User Feature</title>
        <p>The A/B test was conducted using Walmart internal testing platform, known as “EXPO”. The experiment
was conducted on the ad request session level. During ad serving, each user was randomly assigned
to either a control or a variant group based on user ID hashing. Additionally, the Ads Serving team
recorded a session-specific SPA ID to facilitate trafic identification during experiment analysis.</p>
        <p>Our core success metric for this experiment was the empirical click-through rate (eCTR). We also
monitored various guardrail metrics such as cost-per-click (CPC), cost-per-thousand impressions (CPM),
total ad spend, and fill rate to ensure overall system health.</p>
        <p>The experiment initially began with trafic allocations of 1% each for the control and variant buckets,
subsequently increasing to 10%, 33.3%, and ultimately 50%. At each stage, statistically significant lifts in
eCTR were observed. 10% relative eCTR lift was observed at full test trafic (50% control group vs. 50%
experiment group) at 99% confident interval. Encouraged by these positive outcomes, the feature was
launched into production. Post-launch monitoring has consistently demonstrated an upward trend in
system-wide eCTR, achieving over 5% relative lift.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Result of Conversion Model</title>
        <p>Conversion model A/B testing follows the same logic and procedure as described in section 5.1, with
the control variant representing a nearly random assignment of auction winners. Upon deployment,
the conversion optimization model demonstrated substantial performance improvements across key
metrics, including up to a 7-times increase in the number of conversions, a 6-times increase in sales
revenue, a 2-4 times improvement in empirical conversion rate (eCVR), and a 2-times enhancement in
return on ad spend (ROAS) for 13 of the 17 tested campaign lines.</p>
        <p>For the remaining 4 out of 17 under performing low-conversion campaign, the model either sufers
from score concentration issue or inherently low conversion propensity. The former issue was addressed
using the strategies outlined previously in section 4.2. For campaigns inherently resistant to conversion
optimization, we implemented a conversion threshold criterion, requiring lines to accumulate a suficient
number of conversions before qualifying for conversion-based optimization. This threshold ensures that
conversion optimization is applied efectively. We acknowledge that campaigns with limited historical
conversions may not benefit from optimization regardless of model sophistication.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Future Direction</title>
      <p>The development of Walmart’s new in-house DSP model has provided a robust foundation upon which
several promising areas for future exploration have emerged, particularly centered around advanced
model architectures and further automation of pipelines.</p>
      <p>A key area of interest moving forward is the integration and rigorous evaluation of deep
learningbased ranking models (also known as Deep Rankers). Given the complex, sparse, and nonlinear nature
of user-ad interactions, deep learning architectures, such as DeepFM, Wide &amp; Deep, and xDeepFM
[14, 19, 20], could ofer substantial promise. These models generally excel at capturing intricate patterns,
latent interactions, and higher-order feature relationships, potentially delivering significant predictive
accuracy improvements. Future work will involve deploying and thoroughly evaluating these deep
architectures via extensive online A/B testing to quantify their incremental benefits and assess their
computational feasibility in production environments.</p>
      <p>Additionally, we aim to expand our DSP framework to include video advertisements alongside
existing display ads. Video ads present unique challenges, such as modeling sequential viewer behavior,
attention span dynamics, and engagement patterns, which deep learning methods are particularly
well-suited to address, given their capability to handle complex, sequential, and contextual data signals.</p>
      <p>Moreover, we plan to enhance our automated modeling framework by implementing continuous
hyperparameter optimization through Airflow DAGs, improving logging and monitoring capabilities
to rapidly diagnose performance issues, and achieving faster model deployment cycles by leveraging
advanced serving frameworks like the PyTorch-based Deep Java Library (DJL) [21].</p>
      <p>Finally, our long-term vision includes investigating sophisticated optimization strategies, such as
real-time model personalization, online learning methods, and hybrid approaches combining traditional
machine learning with deep learning paradigms. These advancements will ensure Walmart’s DSP
continues evolving in alignment with the cutting-edge developments in advertising technology.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>In this paper, we presented Walmart’s pioneering efort to build its first in-house Demand-Side Platform
for ad ranking. We detailed the challenges encountered, the rationale behind critical implementation
choices, and key innovations throughout the system development, emphasizing our efective utilization
of XGBoost models integrated through robust engineering pipelines using MLEAP serialization and
Airflow automation.</p>
      <p>Our initial experimental results have demonstrated both the feasibility and efectiveness of this
customized DSP solution, achieving measurable improvements in click-through rate and validating our
approach. Additionally, initial assessments suggest promising opportunities to explore deep learning
models, highlighting significant potential for future adoption of advanced deep ranking architectures to
further enhance model performance.</p>
      <p>Developing a DSP system from scratch is both challenging and rewarding. Through continuous
innovation and rigorous experimentation, we anticipate that our DSP will substantially elevate ad
relevance, improve customer engagement, and enhance advertising eficiency, ultimately aligning with
Walmart’s core mission of enabling customers to save money and live better. We believe the insights
and methodologies presented in this paper will benefit both practitioners and researchers, marking a
significant step forward in the advancement of advertising technology.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT (GPT-4) in order to: grammar, spelling
check and paraphrase. After using this tool, the author(s) reviewed and edited the content as needed
and take full responsibility for the publication’s content.
[13] Combust.ml Team, Mleap documentation, https://combust.github.io/mleap-docs/, 2025. Accessed:
2025-04-16.
[14] H. Guo, R. Tang, Y. Ye, Z. Li, X. He, Deepfm: A factorization-machine based neural network for ctr
prediction, 2017. URL: https://arxiv.org/abs/1703.04247. arXiv:1703.04247.
[15] J. Ma, Z. Zhao, X. Yi, J. Chen, L. Hong, E. H. Chi, Modeling task relationships in multi-task learning
with multi-gate mixture-of-experts, in: Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery &amp; Data Mining, KDD ’18, Association for Computing
Machinery, New York, NY, USA, 2018, p. 1930–1939. URL: https://doi.org/10.1145/3219819.3220007.
doi:10.1145/3219819.3220007.
[16] F. Wang, H. Gu, D. Li, T. Lu, P. Zhang, N. Gu, Towards deeper, lighter and interpretable cross
network for ctr prediction, in: Proceedings of the 32nd ACM International Conference on
Information and Knowledge Management, CIKM ’23, Association for Computing Machinery, New
York, NY, USA, 2023, p. 2523–2533. URL: https://doi.org/10.1145/3583780.3615089. doi:10.1145/
3583780.3615089.
[17] C. Savard, N. Manganelli, B. Holzman, L. Gray, A. Perlof, K. Pedro, K. Stenson, K. Ulmer, Optimizing
high throughput inference on graph neural networks at shared computing facilities with the nvidia
triton inference server, 2023. URL: https://arxiv.org/abs/2312.06838. arXiv:2312.06838.
[18] Apache Software Foundation, Apache airflow documentation, https://airflow.apache.org/docs/,
2025. Accessed: 2025-04-16.
[19] H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado,
W. Chai, M. Ispir, R. Anil, Z. Haque, L. Hong, V. Jain, X. Liu, H. Shah, Wide deep learning for
recommender systems, 2016. URL: https://arxiv.org/abs/1606.07792. arXiv:1606.07792.
[20] R. Wang, B. Fu, G. Fu, M. Wang, Deep cross network for ad click predictions, 2017. URL: https:
//arxiv.org/abs/1708.05123. arXiv:1708.05123.
[21] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,
L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy,
B. Steiner, L. Fang, J. Bai, S. Chintala, Pytorch: An imperative style, high-performance deep
learning library, 2019. URL: https://arxiv.org/abs/1912.01703. arXiv:1912.01703.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Grigas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lobos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            chih
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Profit maximization for online advertising demand-side platforms</article-title>
          ,
          <year>2017</year>
          . URL: https://arxiv.org/abs/1706.01614. arXiv:
          <volume>1706</volume>
          .
          <fpage>01614</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Moriwaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hayakawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Matsui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Saito</surname>
          </string-name>
          , I. Munemasa,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shibata</surname>
          </string-name>
          ,
          <article-title>A real-world implementation of unbiased lift-based bidding system</article-title>
          ,
          <source>in: 2021 IEEE International Conference on Big Data (Big Data)</source>
          , IEEE,
          <year>2021</year>
          , p.
          <fpage>1877</fpage>
          -
          <lpage>1888</lpage>
          . URL: http://dx.doi.org/10.1109/BigData52589.
          <year>2021</year>
          .
          <volume>9671800</volume>
          . doi:
          <volume>10</volume>
          .1109/bigdata52589.
          <year>2021</year>
          .
          <volume>9671800</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <article-title>Learning from imbalanced data</article-title>
          ,
          <source>IEEE Trans. on Knowl. and Data Eng</source>
          .
          <volume>21</volume>
          (
          <year>2009</year>
          )
          <fpage>1263</fpage>
          -
          <lpage>1284</lpage>
          . URL: https://doi.org/10.1109/TKDE.
          <year>2008</year>
          .
          <volume>239</volume>
          . doi:
          <volume>10</volume>
          .1109/TKDE.
          <year>2008</year>
          .
          <volume>239</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Raghavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schütze</surname>
          </string-name>
          , Introduction to Information Retrieval, Cambridge University Press,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Google</given-names>
            <surname>Cloud</surname>
          </string-name>
          , Cloud storage documentation, https://cloud.google.com/storage/docs,
          <year>2025</year>
          . Accessed:
          <fpage>2025</fpage>
          -04-16.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lakshman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Malik</surname>
          </string-name>
          ,
          <article-title>Cassandra: a decentralized structured storage system</article-title>
          ,
          <source>SIGOPS Oper. Syst. Rev</source>
          .
          <volume>44</volume>
          (
          <year>2010</year>
          )
          <fpage>35</fpage>
          -
          <lpage>40</lpage>
          . URL: https://doi.org/10.1145/1773912.1773922. doi:
          <volume>10</volume>
          .1145/1773912. 1773922.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>Xgboost: A scalable tree boosting system</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2016</year>
          , p.
          <fpage>785</fpage>
          -
          <lpage>794</lpage>
          . URL: http://dx.doi.org/10.1145/2939672.2939785. doi:
          <volume>10</volume>
          .1145/2939672.2939785.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Ojeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Jansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Thiéry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Blankenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Weimar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmid</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Ziegler,</surname>
          </string-name>
          <article-title>Calibrating machine learning approaches for probability estimation: A comprehensive comparison</article-title>
          ,
          <source>Statistics in Medicine 42</source>
          (
          <year>2023</year>
          )
          <fpage>5451</fpage>
          -
          <lpage>5478</lpage>
          . URL: https: //onlinelibrary.wiley.com/doi/abs/10.1002/sim.9921. doi:https://doi.org/10.1002/sim.9921. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.9921.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Niculescu-Mizil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Caruana</surname>
          </string-name>
          ,
          <article-title>Predicting good probabilities with supervised learning</article-title>
          ,
          <source>in: Proceedings of the 22nd International Conference on Machine Learning</source>
          , ICML '05,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2005</year>
          , p.
          <fpage>625</fpage>
          -
          <lpage>632</lpage>
          . URL: https://doi.org/10.1145/ 1102351.1102430. doi:
          <volume>10</volume>
          .1145/1102351.1102430.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>M. B. A. McDermott</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L. H.</given-names>
          </string-name>
          <string-name>
            <surname>Hansen</surname>
            , G. Angelotti,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gallifant</surname>
          </string-name>
          ,
          <article-title>A closer look at auroc and auprc under class imbalance</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2401.06091. arXiv:
          <volume>2401</volume>
          .
          <fpage>06091</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goadrich</surname>
          </string-name>
          ,
          <article-title>The relationship between precision-recall and roc curves</article-title>
          ,
          <source>in: Proceedings of the 23rd International Conference on Machine Learning</source>
          , ICML '06,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2006</year>
          , p.
          <fpage>233</fpage>
          -
          <lpage>240</lpage>
          . URL: https://doi.org/10.1145/1143844.1143874. doi:
          <volume>10</volume>
          .1145/1143844.1143874.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Bradley</surname>
          </string-name>
          ,
          <article-title>The use of the area under the roc curve in the evaluation of machine learning algorithms</article-title>
          ,
          <source>Pattern Recognit</source>
          .
          <volume>30</volume>
          (
          <year>1997</year>
          )
          <fpage>1145</fpage>
          -
          <lpage>1159</lpage>
          . URL: https://api.semanticscholar.org/CorpusID: 13806304.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>