Page-Wise Personalized Recommendations in an Industrial e-Commerce Setting Liying Zheng1 , Yingji Pan1 and Yuri M. Brovman1 1 Ebay, Inc. Abstract Providing personalized recommendations based on the dynamic sequential behaviors of users plays an important role in e-commerce platforms since it can considerably improve a user’s shopping experience. Previous works apply a unified model pipeline to build recommender systems, without considering the differentiated behavior patterns and intrinsic shopping tendencies on different pages of an e-commerce website. In this paper, we focus on generating a personalized recommender system optimized to both the View Item Page and Homepage by elaborately designing strategies for data formulation and model structure. Our proposed model (PWPRec) consists of a causal transformer encoder together with a fusion module designed for different pages, built on the basis of the classical two-tower structure. This provides the capability to capture a balanced long-short interest or diverse multiple interests of a user during their shopping journey across multiple types of pages. We have conducted experiments both on in-house datasets as well as public datasets to validate the effectiveness of our model, all showing significant improvements on Recall@k metrics compared to the commonly applied sequential models of recent years. Additionally, we built a state-of-the-art deep learning based retrieval system utilizing real-time KNN search as well as near real-time (NRT) user embedding updates to reduce the recommendation delay to a few seconds. Our online A/B test results show a big advantage compared to the previous GRU-based sequential model in production, with a 38.5% increase in purchased items due to model improvements and 107% increase in purchased items due to the engineering innovations. Keywords sequential recommendation, multi-interest, attention network, transformer encoder 1. Introduction Recommender systems play a fundamental role in e- commerce marketplaces, offering personalized recom- mendation products based on a user’s specific interests which will largely improve a user’s shopping experience. In this work, we focus on ”user context based” recom- mender systems that generate recommendations using a user’s historical interactions as the main context. There are several different landing pages which display recom- mendations to the user on an e-commerce platform and in this work we focus on two pages: View Item Page (VIP) and Homepage (HP). For the VIP, users usually have a specific shopping mission when they navigate a detailed item page, thus they tend to spend more time comparing Figure 1: Screenshot of eBay View Item Page recommenda- similar products and trying to find the most appropriate tion module with one item set of personalized items. one. Figure 1 depicts an example of a VIP with a user context based recommendations module based on users recent views. For the HP, usually at the beginning of a user’s shopping session, users tend to wander through the ested in, thus we plan to design a new module generating whole page without a specific shopping mission. They multiple item sets capturing user’s multiple interests. could be attracted by discounted or hot-sale products, or Incorporating different user shopping behavior pat- diversified categories they have been consistently inter- terns on the VIP and HP mentioned above, we have devel- oped a page-wise personalized recommendation model (PWPRec) in order to capture a user’s different shopping ORSUM@ACM RecSys 2022: 5th Workshop on Online Recommender goals and interests. Specifically, here are the main con- Systems and User Modeling, jointly with the 16th ACM Conference on Recommender Systems, September 23rd, 2022, Seattle, WA, USA tributions of the paper: Envelope-Open liyzheng@ebay.com (L. Zheng); yingpan@ebay.com (Y. Pan); ybrovman@ebay.com (Y. M. Brovman) 1. We present a page-wise deep learning model that © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). considers multiple shopping contexts in an indus- CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) trial setting. 2. We develop a novel model architecture by com- Most prior work generates a single embedding to repre- bining a causal transformer encoder with a long- sent a user, this is reasonable for recommendation pages short or multi-interest fusion module in order to or placements with specified target items. But in some generate user embedding(s). occasions such as Homepage recommendations we may 3. We deploy our recommender system to our pro- like to provide users with more diversified set of recom- duction industrial setting building a state-of-the- mendations reflecting the multiple interests of a user. It art deep learning based retrieval system in the can be observed that the problem of how to capture the process. multiple interests of a user a popular topic in recent years. Weston et al. [9] introduced a highly scalable method for The paper is organized in the following sections. Sec- learning a non-linear latent factorization to model the tion 2 covers related approaches in literature to our multiple interests of a user. Li et al. [10] proposed a multi- method. The main model architecture is discussed in interest extractor layer based on capsule network with Section 3. The datasets and sampling strategies used for the dynamic routing mechanism. Cen et al. [11] explored our offline experiments are then discussed in Section 4. a self-attentive method for multi-interest extraction, and An overview of our production engineering architecture utilized an aggregation module to balance accuracy and as well as A/B tests is presented in Section 5. We conclude diversity. our work in Section 6. In terms of engineering system architecture, there are several works which describe large scale embedding 2. Related Works based retrieval systems. Pal et al. [12] describes an in- dustrial embeddings based retrieval system which uses Adding personalization in recommender systems is a well HNSW model [13] for the approximate nearest neighbor studied problem both in academia and in industrial ap- (ANN) component. There are several production systems plications. Recently, deep neural networks have been that utilize a two tower model for search and retrieval, adopted in personalized recommendations, with the abil- including in the social media space [14] as well as in e- ity to build a more generalized model by capturing com- commerce space [15, 16]. We will now discuss the details plex content-based features, which can also serve well in of our model architecture. cold-start situations or volatile situations. To generate personalized recommendations, the sequential behaviors of users are effectively exploited by applying different se- 3. The PWPRec Model quential encoder networks. Many works apply Recurrent In our application scenario, we find that a user’s distribu- Neural Networks (RNN) for sequential recommendation tion of recently viewed items differs for different pages. and obtain promising results, among those Hidasi et al. For VIP, users usually have a definite shopping purpose [1] proposed an GRU-based network to model the se- and are thus more likely to click on items related to their quential behaviors of users, and adopts the last output most recently viewed items. However, for Homepage, as the user embedding (known as GRU4Rec), Hidasi and users have a less focused shopping purpose and may click Karatzoglou [2] proposed a top-k gain ranking loss func- on different categories of items. Thus we build our model tion used in RNNs for session-based recommendations, in consideration of different pages and placements, which Li et al. [3] was also based on RNN network, but pro- can better capture and understand the different behavior posed a way to balance a user’s local interest and global intentions of users. interest (known as NARM). Besides RNN network, the recently well-known self-attention mechanism [4] for sequential modeling has also been commonly applied 3.1. Page-Wise Sequential Behavior in recommendations, Kang and McAuley [5] proposed Analysis a self-attention based network to capture the sequential Before introducing our detailed model, we first present behaviors of users, and the encoded value of last item an analysis of a user’s shopping behavior as a function in the sequence is regarded as the ultimate user vector of time. In our sequential modeling approach, every (known as SASRec), Sun et al. [6] adopted the Bidirec- training example is composed of a positive target clicked tional Encoder Representations from Transformers which item, several negative items a user did not click on in the trained a bidirectional model to predict the masked items impression, and a series of user historical items. in the sequence. Also, there are some methods based We build up a histogram of the overlap between the on graph neural networks proposed for sequential rec- category of the target item and historical items for all ommendation, Wu et al. [7] modeled session sequences users in the dataset. Figure 2 demonstrates the difference as graph-structured data to take item transitions into between the VIP and HP distributions. The horizontal account, Xu et al. [8] proposed a graph contextualized axis represents the number of hours between the target self-attention model for session-based recommendation. clicked item and a historical item, while the vertical axis at the same time. The overall architecture of PWPRec is represents the category overlap between the target and shown in Figure 3. Following our previous work [17] we historical items. It can be seen from the graph that for keep using the same structure for the item tower, and fo- the View Item Page (orange in Figure 2), about 80% of cus on optimizing the user tower. The original user tower users are also viewing the same category in the first hour adopted the recurrent neural network as the base encoder before, 5% are viewing same category in the second hour of user’s historical events and an average fusion strategy before, indicating that users are focused on the same to generate the final user embedding. Here we optimize category of most recent items. While for Homepage, the the user encoder network by two architectural modules: curve is more gradual with only 30% overlap in the first 1) a sequential encoder to better capture the ordered his- hour before, indicating that on homepage users show torical events, and 2) a fusion network to better adapt to interest in categories they interacted with in a longer pages with different historical item distributions. In the period, thus target item category may correlate to more next sections, we will delve deep into these modules in diverse historical categories. detail. 3.3. Causal Transformer Encoder The transformer network and self-attention mechanism described in [4] are widely applied in NLP related tasks, and achieved state-of-the-art performance. Here we adopt the idea of transformer and self-attention to func- tion as the sequential encoder, and we have made some modifications in order to capture the order information which is of vital importance to recommendation scenar- ios. 3.3.1. Relative Positional Embedding We first tried the fixed position embedding originated from the vanilla self-attention in [4], it did not work well. Figure 2: User historical category overlap histogram on dif- ferent pages. This may due to fixed embedding cannot capture the rel- ative positional information well, which is quite essential in an e-commerce setting. In our case, a learnable em- Based on the above analysis, we decide to adopt dif- bedding with relative position value works the best. The ferent data formulation strategies and different model relative position value is calculated as below: structures for different pages. • For the View Item Page, considering users usu- 𝑝𝑜𝑠(𝑖𝑡𝑒𝑚𝑖 ) = 𝑇 − 𝑖 (1) ally have specific shopping missions and less in- where 𝑇 is the position of the target item and 𝑖 is the terests in other categories, we organize training position of items prior to the target item. The relative data in a ”session-based way” with most recent position is then encoded into embedding P𝑒𝑚𝑏 , and the past behaviors in a shorter period. The ultimate final input to the transformer encoder is calculated as: output will be a singular item set showing the user’s most recent interest. IN𝑣𝑒𝑐𝑡𝑜𝑟 = ITEM𝑒𝑚𝑏 + P𝑒𝑚𝑏 (2) • For the Homepage, users may show interest in where ITEM𝑒𝑚𝑏 is the original item embedding, and the a diverse set of categories that they interacted final input vector IN𝑣𝑒𝑐𝑡𝑜𝑟 is a vector addition of the item with, even several days before. In this case, we embedding with the positional embedding P𝑒𝑚𝑏 . organize the training data in a ”user-based way” incorporating more days and more past behaviors. 3.3.2. Causal Attention Mask The ultimate output will be multiple item sets showing the user’s multiple interests through the The vanilla transformer encoder attends to any positions long shopping journey. in a sequence by self-attention and multi-head mecha- nism, with each head output being formulated as: 3.2. Model structure Q V Our proposed approach for personalized recommenda- head𝑖 = 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(QWi , KWK i , VWi ) tions is based on a two-tower deep learning model struc- QKT (3) ture to generate user embedding(s) and item embeddings 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(Q, K, V) = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥( )V √𝑑 Figure 3: Two tower model architecture for user embedding(s) and item embedding. The causal transformer encoder is explained in the left part. The fusion module serving different pages will be explained in subsequent subsections, with long-short fusion generating a comprehensive user embedding or multi-interest fusion generating multiple user embeddings. where Q, K, V are the packed matrices of queries, keys interest applied for the VIP, named as Long-Short Fu- and values, 𝑑 is the dimension of queries and keys�and sion; the other for generating multiple interests for the Q V HP, named as Multi-Interest Fusion. Wi , WK i , Wi are the parameter matrices, as described in self-attention mechanism [4]. For the sequential recommendation scenario, a causal 3.4.1. Long-Short Fusion Strategy mask [18] needs to be performed to guarantee that post- For generating one single interest, we would like to adopt clicked items cannot be seen when predicting previous a network architecture which combines a user’s short- items, otherwise this may lead to data leakage. There- term and long-term interests. The short-term takes the fore, we apply a lower triangle attention mask matrix (as last position output of the transformer encoder, indicat- shown in the left part of Figure 3) to guarantee the causal- ing the most recent preferences; while the long-term ity between items, and in this way the self-attention can takes outputs of all the positions into consideration, indi- be formulated as : cating their global preferences. We involve the attention mechanism to calculate a weighted average of all the QKT outputs to form a long-term interest. which can be inter- 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(Q, K, V) = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥(𝑀𝑎𝑠𝑘( ))V preted as: √𝑑 (4) 𝑀𝑎𝑠𝑘 = 𝑇 𝑟𝑖𝑙(𝑂𝑛𝑒𝑠(M ∈ ℛ 𝐿∗𝐿 )), 𝐿 where 𝐿 is the sequence length, Ones represents all-ones U𝑙𝑜𝑛𝑔 = ∑ 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(ik𝑒𝑛𝑐 , iL𝑒𝑛𝑐 ) ∗ ik𝑒𝑛𝑐 matrix and Tril represents the lower triangular part of 𝑘=1 matrix, and the mask operation will fill future values with 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(ik𝑒𝑛𝑐 , iL𝑒𝑛𝑐 ) = v𝑇 𝜎(A1 × ik𝑒𝑛𝑐 + A2 × iL𝑒𝑛𝑐 ) −𝑖𝑛𝑓. (5) where the attention function is additive attention [19], 3.4. Fusion Module iL𝑒𝑛𝑐 represents the last position item embedding, ik𝑒𝑛𝑐 In our industrial recommendation scenario, we design represents item embeddings of all the positions, A1 trans- two fusion networks to handle the different recommenda- forms ik𝑒𝑛𝑐 into a latent space, A2 plays the same role tion targets, one for generating a comprehensive single for iL𝑒𝑛𝑐 , and 𝜎 is the sigmoid function. After learning a long-term and a short-term embed- ding, the last important step is to integrate them appropri- ately. Here we chose the gated way to learn contribution coefficients of long-term and short-term embeddings, which is illustrated in Figure 4, and can be calculated as: U𝑒𝑚𝑏 = (1 − 𝑔𝑎𝑡𝑒) × U𝑙𝑜𝑛𝑔 + 𝑔𝑎𝑡𝑒 × U𝑠ℎ𝑜𝑟𝑡 (6) 𝑔𝑎𝑡𝑒 = 𝜎(G1 × U𝑠ℎ𝑜𝑟𝑡 + G2 × U𝑙𝑜𝑛𝑔 ) where U𝑠ℎ𝑜𝑟𝑡 = iL𝑒𝑛𝑐 , U𝑙𝑜𝑛𝑔 is in Equation (5) and 𝜎 is sigmoid activation function. In the gate equation, G1 Figure 5: Multi-Interest Fusion Module. and G2 both transform U𝑠ℎ𝑜𝑟𝑡 and U𝑙𝑜𝑛𝑔 into latent spaces, respectively. where I ∈ ℛ 𝐿∗𝑑𝑒𝑚𝑏 is the sequential items embeddings, W1 ∈ ℛ 𝑑𝑒𝑚𝑏 ∗𝑑ℎ𝑖𝑑𝑑𝑒𝑛 is a trainable parameter matrix which transforms input item encoded vectors from dimension 𝑑𝑒𝑚𝑏 to 𝑑ℎ𝑖𝑑𝑑𝑒𝑛 (usually hidden is several times larger than emb to increase model capacity), W2 ∈ ℛ 𝑑ℎ𝑖𝑑𝑑𝑒𝑛 ∗𝐾 is an- other trainable parameter matrix which maps 𝑑ℎ𝑖𝑑𝑑𝑒𝑛 to the number of embeddings 𝐾 (𝐾 is the number of user interests to be generated). The attention weights matrix is A ∈ ℛ 𝐾 ∗𝐿 . The final multiple user embeddings is U ∈ ℛ 𝐾 ∗𝑑𝑒𝑚𝑏 . Figure 4: Long-short Fusion Module. 4. Offline Datasets & Experiments In this section, we describe the dataset we utilized to train and validate our PWPRec model. We have adopted 3.4.2. Multi-Interest Fusion Strategy different data formulation strategies for the View Item Page and Homepage respectively. We also conducted The multi-interest fusion module is utilized to capture experiments on both our eBay dataset and public dataset multiple interests from a user’s shopping journey. A to validate the effectiveness of our model. multi-head self-attentive network is applied to transform the sequential item encoders of a user into multiple user representations. We follow the self-attentive method 4.1. Dataset for View Item Page originated from Lin et al. [20], which was then applied in For the View Item Page, users tended to click more items a recommendation system in Cen et al. [11] to function related to recently viewed items, thus we organize the as the multi-interest extractor. In our work, we found data in a session-based way. Here we choose two session- that when this multi-interest fusion module was com- based datasets for our experiments, one is collected from bined with the transformer sequential encoder, the model our eBay in-house data, the other is the public YooChoose performance was significantly improved. dataset [21] which is also commonly adopted by research The multi-interest fusion network is illustrated in Fig- papers. ure 5. Suppose we have a sequence of items 𝑖1 , 𝑖2 , ..., 𝑖𝐿 , and after the causal transformer encoder, the items can be • eBay (session-based) dataset represented as I = {i1enc , i2enc , ..., iLenc }, with sequence This dataset is derived from our real world length 𝐿. A multi-head self-attentive layer is adopted to eBay production traffic containing view item page calculate the attention weights A of input item sequences, events within a session. All items are enriched with each head representing one interest. The mutiple with necessary metadata like titles, aspects and user embeddings U for the current user can be calculated categories as well. as : • YooChoose dataset This dataset is provided by YooChoose in RecSys Challenge 2015, with each session encapsulating A = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥((𝑡𝑎𝑛ℎ(IW1 )W2 )𝑇 ) (7) the click events that a user performed from a U = AI retailer. In this dataset, only item id and category Statistics eBay�(user-based) Taobao is provided to generate an item embedding. # of training users 40 million 0.8 million # of validation users 2 million 97k In order to better validate the effectiveness of sequen- Average sequence length 102 87 tial encoders, we filter out very short sessions with se- quence length of less than 4. The data statistics of the Table 2 two datasets are shown in Table 1. Homepage Data Statistics. Statistics eBay(session-based) YooChoose # of training sessions 18 million 1.9 million # of validation sessions 2 million 470k # of items 72 million 53k Average sequence length 15 8 Table 1 View Item Page Data Statistics. 4.2. Dataset for Homepage For the Homepage, we organize the data in a user-based way within a longer time window and thus much longer Figure 6: User historical category overlap histogram on user’s sequential length is obtained. Here we choose Taobao and eBay Homepage two user-based datasets in our experiments, one is col- lected from our eBay in-house data, the other is the public Taobao dataset [22]. 4.3.1. View Item Page • eBay (user-based) dataset View item page data samples are grouped by session, This dataset is also derived from our real world as the session lengths are usually shorter than the user- eBay production traffic containing clicked items based way, we adopt global negative sampling to choose on Homepage as the target label, and all the items negative items in a larger candidate pool. For the training that a user have viewed within 30 days before the phase, each training data sample has one positive item clicked item as the sequential historical events. and 10 negative items; while for validation phase, we • Taobao dataset select 1000 negative items to make the evaluation more This dataset contains the sequential behavior of generalized to the whole candidate item set. We use users collected from Taobao, which consists of 1 cross entropy loss to train the model, whose target is to million users shopping behaviors within 10 days. maximize the softmax probability of the positive item: We follow the same training/validation data split- ting methods as in [11]. 𝑒 𝛾 (v𝑝𝑜𝑠 ,v𝑢 ) 𝑃(𝑝𝑜𝑠|𝑈 ) = , The statistics of the above datasets are shown in Table 2. ∑𝑖∈𝑝𝑜𝑠∪𝑛𝑒𝑔 𝑒 𝛾 (v𝑖 ,v𝑢 ) (8) Also, we build up a histogram of the overlap between the 𝐿𝑜𝑠𝑠 = −𝑙𝑜𝑔𝑃(𝑝𝑜𝑠|𝑈 ) category of the target item and historical items for the Taobao dataset in Figure 6, which shows a gradual curve where v𝑢 ∈ ℝ𝑑 is a 𝑑-dimensional vector for the em- more similar to the eBay HP than VIP. So we adopted bedding of user 𝑈, v𝑝𝑜𝑠 ∈ ℝ𝑑 is a 𝑑-dimensional vector for the Taobao dataset to validate the effectiveness of the the embedding of positive item, 𝛾 is the affinity function multiple interests model targeted for Homepage. between user and item (we adopt the inner product result as the affinity score), and 𝑝𝑜𝑠 ∪ 𝑛𝑒𝑔 is the union set of the 4.3. Model Training & Validation target positive and sampled negative items. For model training and validation, different negative sam- pling strategies and loss calculations are adopted for View 4.3.2. Homepage Item Page and Homepage. As for the Homepage, data samples are grouped in the user-based way with longer sequential behaviors within a 30 days time window, batch negative sampling is adopted to select 1000 samples both in training and validation entropy with inverse temperature, 2) adopt a attention- phase. Here the loss calculation logic for training and val- based weighted sum mechanism to generate the ulti- idation process is tackled differently for accelerating the mate embedding. We call our baseline model GRU4Rec- convergence of multiple user embeddings model struc- Enhanced. For our model PWPRec proposed in this ture. paper, we add the suffix (LS) to represent Long-Short fusion strategy. • Training phase. As we have the positive item We see from Table 3 that on both of the datasets we for the target label information, we can use the have depicted in Section 4.1, our model PWPRec(LS) positive item embedding to choose one final user achieved the best performance. Our model gains 10+% embedding from multiple embeddings as the one increase on Recall@1 compared to the baseline model to calculate the training loss. GRU4Rec-Enhanced. We notice that on the YooChoose dataset, the recall values are lower, possibly because of v𝑢 = V𝑢 [𝑎𝑟𝑔𝑚𝑎𝑥(V𝑢 v𝑇𝑝𝑜𝑠 )] (9) the smaller size of training set as well as the lack of item features, like titles or aspects. However, our model has where v𝑢 ∈ ℝ𝑑 is the final user embedding we a bigger advantage on this dataset even for recall with select to calculate the loss in equation (8), V𝑢 is larger Ks, which implies that even in a situation where the multiple embeddings genenerated for the user, less features are available, our model can perform better and v𝑝𝑜𝑠 ∈ ℝ𝑑 is the positive item embedding. and have better generalization capabilities. • Validation phase. Different from the training phase, label information like positive item can- 4.4.2. Homepage Experiments not be used in metrics calculation, otherwise this would result in label leakage. Here we applied a The experimental results can be found in Table 4. For our simplified trick to fastern the procedure, which model PWPRec proposed in this paper, we add the suffix is selecting one user embedding having the maxi- (MI) to represent Multi-Interest fusion strategy. Here mum summarized affinity score with the candi- we chose the model ComiRec described in [11] as the date item set as the final user embedding for loss baseline, and also the well-known multi-interest model and metrics calculation. MIND [10] for comparison. We see from Table 4 that our model PWPRec(MI), with the transformer encoder v𝑢 = V𝑢 [𝑎𝑟𝑔𝑚𝑎𝑥( ∑ V𝑢 v𝑇𝑖 )] (10) and multi-interest fusion layer, outperforms the other 𝑖∈𝑖𝑡𝑒𝑚𝑠 two models by 20+% on Recall@1 metrics, and a high 𝑑 where v𝑢 ∈ ℝ is the final user embedding we se- 10+% increase for recall with larger K. Similar to pre- lect to calculate model metrics, V𝑢 is the multiple vious experiments, our model gains a more significant embeddings genenerated for the user, and v𝑖 ∈ ℝ𝑑 improvement on the public datasets which lacks item is the item embedding contained in the candidate feature information. item set for validation. 5. Production Engineering 4.4. Offline Experimental Results Architecture The primary evaluation metric we use is Recall@k at Details of the continuous improvements we have made several 𝑘 = 1, 5, 10, 20. For P impressions, the metric is defined as: to the engineering architecture of this system can be seen in our eBay Tech Blog post [23]. Most of the mod- 𝑃 1 # relevant items @ k eling innovations described in Sections 3.2 and 4 were 𝑅𝑒𝑐𝑎𝑙𝑙@𝑘 = ∑ (11) 𝑃 𝑖=1 # total relevant items A/B tested against the baseline version of the system described in our previous work [17]. In our previous ap- We then explain the experimental results conducted on proach, most of the model calculations were performed different pages with different datasets. offline with daily batch jobs to generate user/item em- beddings and perform KNN for every user embedding 4.4.1. View Item Page Experiments searching over the space of item embeddings. There is a clear disadvantage, namely the delay between offline cal- The experimental results can be found in Table 3. We culation of predictions (performed daily) and displaying select three other models for comparison: GRU4Rec [1], the recommendations to the user could lead to stale out- NARM [3], SASRec [5]. Our baseline model , described dated recommendations and a degraded user experience. in [17], is very similar to GRU4Rec but has the follow- To overcome this issue and reduce this delay to a few ing enhancements: 1) changes the loss function to cross seconds, we built a state-of-the-art deep learning based Table 3 Experimental results for View Item Page Dataset Model Recall@1 Recall@5 Recall@10 Recall@20 GRU4Rec-Enhanced 0.4212 0.6341 0.7038 0.7643 NARM 0.4249(+0.88%) 0.6378(+0.58%) 0.7101(+0.90%) 0.7743(+1.31%) SASRec 0.4745(+12.65%) 0.6509(+2.65%) 0.7084(+0.65%) 0.767(+0.35%) ours-PWPRec(LS) 0.4761(+13.03%) 0.6611(+4.26%) 0.7239(+2.86%) 0.7777(+1.75%) GRU4Rec-Enhanced 0.1222 0.127 0.1372 0.2116 NARM 0.1186(-2.95%) 0.123(-3.15%) 0.1331(-2.99%) 0.2079(-1.75%) SASRec 0.1366(+11.78%) 0.1429(+12.52%) 0.1526(+11.22%) 0.2281(+7.80%) ours-PWPRec(LS) 0.1407(+15.14%) 0.1459(+14.88%) 0.1568(+14.29%) 0.2306(+8.98%) Table 4 Experimental results for Homepage Dataset Model Recall@1 Recall@5 Recall@10 Recall@20 ComiRec 0.4835 0.7139 0.7522 0.7916 MIND 0.3832(-20.74%) 0.5863(-17.87%) 0.6231(-17.16%) 0.687(-13.21%) ours-PWPRec(MI) 0.5898(+21.99%) 0.8221(15.16%) 0.8512(13.16%) 0.8738(+10.38%) ComiRec 0.1098 0.1970 0.2461 0.3007 MIND 0.0862(-21.49%) 0.1533(-22.18%) 0.1926(-21.74%) 0.2514(-16.39%) ours-PWPRec(MI) 0.1373(+25.05%) 0.2419(+22.79%) 0.2914(+18.41%) 0.3427(+13.97%) are returned. In order to generate a user embedding in real time, we capture user click activity on the site using Apache Kafka message events and process them using a Apache Flink application. The events are enriched with metadata and processed through a deep learning model prediction microservice to generate the actual embed- ding vector, which is subsequently stored in Couchbase. Putting all of these together, we generate the full NRT flow for personalized recommendations: 1. Step 1A - A user clicks on previous View Item Pages and these click events are collected using the Kafka messaging platform. 2. Step 1B - The Flink application aggregates the last several events and generates a user embedding by calling the model prediction microservice. 3. Step 1C - The user embedding is stored in Couch- base with {key:value} = {user id: user embedding vector}. Figure 7: Production engineering architecture featuring real- 4. Step 2A - As the user lands on a View Item Page, time KNN search as well as NRT user embedding updates. the backend recommendations application gets the user embedding from Couchbase. 5. Step 2B - A request is made to the KNN microser- retrieval system utilizing real-time KNN search as well vice, personalized recommendations are returned as near real-time (NRT) user embedding updates, details and rendered back to the user. displayed in Figure 7. As a result of this system architecture, the delay be- To enable fast real-time KNN search for vector em- tween generating personalized recommendations based beddings, we have built an in-house KNN microservice on the user’s session data and displaying them is reduced based on HNSW [13] method, where a user embedding to a few seconds. This system is in production serving is sent as an input, an ANN search is performed in the high volume traffic to a diverse set of users. Next we will item embedding space, and then item recommendations discuss our online A/B testing results which support our offline model evaluations. 5.1. Online Evaluation In order to understand how our models perform online, we deployed them to serve real world users and produc- tion traffic. We compare results respectively for View Item Page and Homepage, and the NRT architecture as well. 5.1.1. View Item Page A/B Test We performed A/B testing on the View Item Page on the desktop platform comparing our PWPRec(LS) model to our previous baseline model [17] named as GRU4Rec- Enhanced. Our model outperformed the previous base- Figure 8: A user scrape of production environment, (a) shows line with a 38.53% increase on purchases. This implies the user’s historical interacted items, (b)(c)(d) shows the three interests we have captured for this user, with (b) representing our model with transformer encoder better captures the the first interest on Jeans, (c) representing the second interest sequential behavior of a user, and the Long-Short fusion on Hot Tubs, (d) representing the third interest on Rings. mechanism is also a good choice to automatically bal- ance the weights captured from long interests and short interests, better than the previous weighted sum fusion way. 6. Summary and future work In this paper, we presented an approach for generating 5.1.2. NRT A/B Test personalized recommendations by considering different It was interesting to see the impact of the reduced delay user behavior patterns on different pages in an indus- between recommendation generation and serving on the trial e-commerce setting. Different strategies on data operational metrics of the system as we deployed the formulation and fusion layer adoption have been elabo- NRT engineering architecture to production. The pur- rately designed to capture a user’s sequential behavior chases were improved by 107% compared to the previous on the View Item Page and the Homepage. The over- offline system. This makes sense from a user experience all structure is based on a two-tower model aiming to perspective, as the user shopping journey evolves in real learn embeddings of items and users in a shared vector time, the model embedding is updated in real time, the space. To model the user’s sequential behavior, we adopt recommendation relevance quality is improved, and op- a causal transformer encoder together with Long-Short erations metrics are better. [17] fusion or Multi-Interest fusion determined by page set- tings on the user tower side. This approach captures the 5.1.3. Homepage Multi-Interest User Scrapes user’s long-short interests and multiple interests well. In order to verify the effectiveness of our model, we We are in the process of serving our multi-interest model have conducted experiments on our in-house datasets online for A/B test. However, we wanted to share some and commonly adopted public datasets as well. All ex- multi-interest user recommendations from production en- periments showed significant improvements over the vironment to demonstrate the performance of the model. baseline approaches in comparison. Furthermore, a per- We can see the generated recommendations in Figure 8 sonalized recommender system with NRT engineering which depicts 3 distinct sets of recommended items based architecture has been launched to production and is now on the multiple interests of a user from related browsing serving recommendations at scale to eBay buyers. This history. Based on the user’s past viewed items shown on system reacts quickly based on instant user interactions the first line, our model captures three interests for this and generates large improvements in the buyer shopping user, which accurately reveals the intrinsically diverse set experience. Online A/B tests have also been conducted of interests of a user throughout their shopping journey. for our proposed model as well as the NRT architecture, which also show increases on downstream business met- rics, such as purchases. We are actively working to enhance the performance and extend the application scenario of our model as well as engineering system. One direction of future work is [9] J. Weston, R. J. Weiss, H. Yee, Nonlinear latent to incorporate more rich user features (e.g. demographic factorization by embedding multiple user interests features like buyer age and behavioral features like pur- (2013). URL: https://static.googleusercontent.com chase quantity) as well as item features (e.g. item price, /media/research.google.com/en//pubs/archive/4153 popularity). Another direction is to add a deep learning 5.pdf. ranking model after the multiple recommendation sets [10] C. Li, Z. Liu, M. Wu, Y. Xu, P. Huang, H. Zhao, have been retrieved, in order to further optimize for oper- G. Kang, Q. Chen, W. Li, D. L. Lee, Multi-interest ational metrics, like engagement or conversion. Last but network with dynamic routing for recommendation not least, besides the current View Item Page and Home- at tmall, arXiv:1904.08030 (2019). page we are serving, we plan to extend our personalized [11] Y. Cen, J. Zhang, X. Zou, C. Zhou, H. Yang, J. Tang, recommneder system to more scenarios like infinite feed Controllable multi-interest framework for recom- as well as checkout success placements, in order to give mendation, arXiv:2005.09347 (2020). users more personalized and diverse choice with NRT [12] A. Pal, C. Eksombatchai, Y. Zhou, B. Zhao, C. Rosen- experiences in their shopping journey. berg, J. Leskovec, Pinnersage: Multi-modal user embedding framework for recommendations at pin- terest, in: Proceedings of the 26th ACM SIGKDD 7. Acknowledgements International Conference on Knowledge Discovery & Data Mining, 2020, pp. 2311–2320. We wanted to thank the generous support of Sathish [13] Y. A. Malkov, D. A. Yashunin, Efficient and robust Veeraraghavan, Bing Zhou, Arman Uygur, Marshall approximate nearest neighbor search using hierar- Wu, Sriganesh Madhvanath, Santosh Shahane, Leonard chical navigable small world graphs, IEEE transac- Dahlmann, Menghan Wang, and Jeff Kahn for the help tions on pattern analysis and machine intelligence with the production system as well as review comments 42 (2018) 824–836. during the manuscript preparation. [14] J.-T. Huang, A. Sharma, S. Sun, L. Xia, D. Zhang, P. Pronin, J. Padmanabhan, G. Ottaviano, L. Yang, References Embedding-based retrieval in facebook search, in: Proceedings of the 26th ACM SIGKDD Interna- [1] B. Hidasi, A. Karatzoglou, L. Baltrunas, D. Tikk, tional Conference on Knowledge Discovery & Data Session-based recommendations with recurrent Mining, 2020, pp. 2553–2561. neural networks, arXiv:1511.06939 (2016). [15] H. Zhang, S. Wang, K. Zhang, Z. Tang, Y. Jiang, [2] B. Hidasi, A. Karatzoglou, Recurrent neural net- Y. Xiao, W. Yan, W.-Y. Yang, Towards personalized works with top-k gains for session-based recom- and semantic retrieval: An end-to-end solution for mendations, arXiv:1706.03847 (2017). e-commerce search via embedding learning, in: [3] J. Li, P. Ren, Z. Chen, Z. Ren, J. Ma, Neural atentive Proceedings of the 43rd International ACM SIGIR session-based recommendation, arXiv:1711.04725 Conference on Research and Development in Infor- (2017). mation Retrieval, 2020, pp. 2407–2416. [4] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkor- [16] S. Li, F. Lv, T. Jin, G. Lin, K. Yang, X. Zeng, X.-M. eit, L. Jones, A. N. Gomez, L. Kaiser, I. Polo- Wu, Q. Ma, Embedding-based product retrieval in sukhin, Attention is all you need, arXiv preprint taobao search, in: Proceedings of the 27th ACM arXiv:2102.06156 (2017). SIGKDD Conference on Knowledge Discovery & [5] W.-C. Kang, J. McAuley, Self-attentive sequential Data Mining, 2021, pp. 3181–3189. recommendation, arXiv:1808.09781 (2018). [17] T. Wang, Y. M. Brovman, S. Madhvanath, Person- [6] F. Sun, J. Liu, J. Wu, C. Pei, X. Lin, W. Ou, P. Jiang, alized embedding-based e-commerce recommen- Bert4rec: Sequential recommendation with bidirec- dations at ebay, arXiv preprint arXiv:2102.06156 tional encoder representations from transformer, (2021). arXiv:1904.06690 (2019). [18] W.-C. Kang, J. McAuley, Self-attentive sequential [7] S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, T. Tan, recommendation, arXiv preprint arXiv:1808.09781 Session-based recommendation with graph neural (2018). networks, arXiv:1811.00855 (2019). [19] D. Bahdanau, K. Cho, Y. Bengio, Neural machine [8] C. Xu, P. Zhao, Y. Liu, V. S. Sheng, J. Xu, F. Zhuang, translation by jointly learning to align and trans- J. Fang, X. Zhou, Graph contextualized self- late., CoRR, abs/1409.0473, 2014 (2020). attention network for session-based recommenda- [20] Z. Lin, M. Feng, C. Nogueira dos Santos, M. Yu, tion (2019). URL: https://www.ijcai.org/proceeding B. Xiang, B. Zhou, Y. Bengio, A structured self- s/2019/0547.pdf. attentive sentence embedding, arXiv preprint arXiv:1703.03130 (2017). [21] Yoochoose, Recsys challenge 2015, 2015. URL: https: //recsys.acm.org/recsys15/challenge/. [22] Alimama, User behavior data from taobao for rec- ommendation, 2018. URL: https://tianchi.aliyun.c om/dataset/dataDetail?dataId=649&userId=1&lan g=en-us. [23] Y. M. Brovman, Building a deep learning based re- trieval system for personalized recommendations, 2022. URL: https://tech.ebayinc.com/engineering/b uilding-a-deep-learning-based-retrieval-system-f or-personalized-recommendations/.