1. Introduction

Entire Cost Enhanced Multi-Task Model for Online-to-Ofline Conversion Rate Prediction

Yingyi Zhang

Xianneng Li

Yahe Yu

Jian Tang

Huanfang Deng

Junya Lu

Yeyin Zhang

Qiancheng Jiang

Yunsen Xian

Liqian Yu

: Workshop on Deep Learning for Search

Recommen- dation

co-located with the

st ACM International Conference on Information

Knowledge Management (CIKM)

October

Atlanta

xianneng@dlut.edu.cn (X. Li)

yaheyu@dlut.edu.cn (Y. Yu)

tangjian

@meituan.com (J. Tang)

denghuanfang@meituan.com (H. Deng)

lujunya@meituan.com (J. Lu)

zhangyeyin@meituan.com (Y. Zhang)

jiangqiancheng@meituan.com (Q. Jiang)

xianyunsen@meituan.com (Y. Xian)

yuliqian@meituan.com (L. Yu)

0 Dalian University of Technology , Dalian, 116024 , China 1 Meituan , Beijing, 100102 , China

Predicting users' conversion rate (CVR) is essentially important for ranking systems in industrial Online-to-Ofline (O2O) applications. Numerous eforts have been made in CVR modeling to achieve state-of-the-art performance. However, existing methods mainly focus on the Business-to-Customer (B2C) scenario, which makes implementations to O2O meet with mixed success. This can be revealed via several scenario-specific challenges. For example, O2O users in diferent locations generally encounter diferent candidates of surrounding stores. This leads to users' behavioral regularity becoming essentially prominent. Besides, O2O users' conversion includes a two-stage cost, i.e., online order cost and ofline transportation cost. This inspires that users' location sensitivity deserves additional attention compared with conventional scenarios. Motivated by these characteristics, we propose a novel CVR prediction method for the O2O scenario, named Entire Cost enhanced Multi-task Model (ECMM): i) users' historical behavior sequences across diferent locations are modeled to capture the users' preference of behavioral regularity; ii) both online order cost and ofline transportation cost are modeled to predict the users' aggregated preference for conversion. By designing two novel attention mechanisms, i.e., convert attention and sliding window attention, ECMM can be trained end-to-end to appropriately fit O2O characteristics. Extensive experiments have been carried out under a real-world industrial O2O platform Meituan. Both ofline and rigorous online A/B tests under the billion-level data scale demonstrate the superiority of the proposed ECMM over the highly optimized state-of-the-art baselines.

eol>Online-to-Ofline Multi-Task Learning Conversion Rate Prediction

1. Introduction

Offline stores that user interacted historical y • ECMM elongates the observation dimensions by learning users’ online conversion preferences from historical behavior sequences. A new mechanism named convert attention is proposed to learn the user’s behavior regularity from the global and local perspectives of online order cost. • To the best of our knowledge, ECMM is the first method for CVR modeling from the perspective of ofline transportation cost. We propose a new mechanism named sliding window attention to dynamically learn users’ preference of ofline transportation. • ECMM is testified under a real-world industrial O2O platform, where extensive experiments are carried out. Both ofline and rigorous online A/B tests under the billion-level data scale demonstrate the significant superiority of ECMM over the state-of-the-art baselines.

2. Related Work

[ 10, 11, 12, 13, 14, 15 ] and some method solving domain respectively. We note that the first three terms are the specifically problem are proposed [ 16, 17, 18, 19]. How- information widely used in conventional CVR modeling, ever, where the intrinsic characteristics of O2O, i.e., on- while ℎ and ℎ are two newly considered ones to assist line behavioral regularity and ofline transportation reg- in the modeling of behavioral and transportation reguularity, are rarely considered. larities. Moreover, the user’s click and order sequences

One possible strategy to improve learning users’ online in ECMM are used from the online-ofline cost perspecbehavioral regularity and ofline transportation regular- tive, i.e., online order cost and ofline transportation cost, ity is to consider user statistical features i.e. user’s aver- which are essentially diferent from that of conventional age online order cost and user’s average ofline distance CTR prediction methods of modeling user’s multiple infeatures. However, in O2O scenarios, the spatiotemporal terests [ 20, 21, 22, 23, 24, 25 ]. As a novel CVR prediction nature is inseparable, and using this strategy will lose method for the O2O scenario, the contributions of ECMM time-series information when characterizing user pref- are threefold: erences. Therefore, sequence representation techniques are also taken into account as shown in Figure 1. Inspired by the success within deep learning, recent CVR prediction model has evolved from traditional approaches to deep approaches. Traditional method used logistic ( = 1|) regression [26, 27] and GBDT [28] for modeling CVR ( = 1| = 1, ) = ( = 1|) , (1) problem with feature interactions. However, nonlinear relationships of features are not considered in these modwhere is (, , , ℎ, ℎ), is the user, denotes the els. Modern deep learning based method transforms CVR store, and represents the current context, such as the problem into a multi-task problem [ 10, 11, 12 ]. ESMM current time, city, day of the week, and other informa- [ 10 ] make use of users sequential actions, "impression tion that is independent of user and store. ℎ and ℎ are → click → pay", to solve sample selection bias and data the user’s historical click sequence and order sequence, sparsity problem over the entire space by simultaneous modeling of CTR and CTCVR tasks. ESM2 [11] method and context features, the entire cost module contain both extends users sequential actions to a more general situa- the user’s click and order sequence to capture the user’s tion, "impression → click → D(O)Action →pay", which historical cost preference, and the cost combination modsimultaneous models CVR with CTR, CTAVR and CTCVR ule for combining online-to-ofline cost to predict CTR tasks. HM3 [12] form "impression → click → D(O)Mi and CVR. With this network, the model can capture the → D(O)Ma → pay" perspective models CVR with CTR, user’s online behavioral and ofline transportation regD-Mi, D-Ma and CTCVR tasks. ularities, which are hidden in users’ historical behavior

However, all these methods are based on B2C e- sequences. The details of each module are described as commerce platforms which makes implementations to follows.

O2O platforms meet with mixed success. Users have unique sequential actions in O2O, which can be repre- 3.1. Motivation sented as "impression→click→online order→ofline consumption". Such situations require CVR model to con- As discussed in the previous section, users’ online behavsider not only user online behavioral regularity, but also ioral and ofline transportation regularities are indispensofline transportation regularity. able for O2O recommendation [ 9, 2, 20, 21 ]. However, how to define their relationship with users’ behavior se2.2. User Behavior Sequence quence as well as embody both online and ofline cost into a unified framework for CVR prediction remains

Representation unexplored.

In the past decade, user behavior sequence representation For one thing, we propose a novel CVR prediction have received much attention and achieved remarkable method from the perspective of user historical behavior. efectiveness. Many well designed recommender meth- We proposed convert attention to extract the local and ods have been proposed and brought huge commercial global preference of users’ online-to-ofline behaviors revenues for companies and advertisers. In this mod- from both depth and breadth perspectives. From a loels, users’ history behaviors are transformed into low- cal view, an order placed by a user is afected by clicks. dimension vectors after embedding to represent users’ We design the local impact of a click on a order from interest and other character. DIN [ 20 ] employs the atten- the store perspective. From a global perspective, users’ tion mechanism to activate historical behaviors locally overall order sequence receives the impression of click which capture user diversity interest to the given target sequence in terms of id, price, and relative distance. For item. DIEN [21] further proposes an auxiliary loss and another, to model users’ transportation cost, we capture attention mechanism with GRU to capture the dynamic the information of the distance sequence implied in users’ evolution of users interest. DFN [29] jointly consider preference for ofline cost in the O2O scenario, to assist explicit/implicit and positive/negative feedbacks to learn the model in learning users’ conversion preference in user unbiased preferences. Moreover, inspired by the suc- the ofline stage. Each store of a user’s historical click cess of the self-attention architecture [30], Transformer and order has distance features which means the ofline is introduced in for session CTR prediction [31]. MIND transportation cost. Then we use sliding window atten[32] and DMIN [33] model multi-interest by multiple tion method to calculate the user dynamic preference for vectors with dynamic routing mechanism and capsule ofline cost during diferent timestamps. network.

Although all these user behavior sequence representa- 3.2. Base Module tion methods have brought a huge boost to the business from the perspective of user interest, there are still opportunities for improvement in modeling user behavior sequences from other perspectives. Cost sensitivity [34] is an indispensable aspect of user modeling, and users of e-commerce often have certain restrictions on payment costs which makes it possible to further improve the user behavior sequence modeling from the perspective of cost.

The base module is used to aggregate the basic features.

Refer to [ 10, 11, 12 ], the embedding and MLP (multiple layer perception) structures are used in the base module. The user, store, and contextual features ( ∈ R , ∈ R , and ∈ R ) are the inputs of the base module, which are mapped into a d-dimensional space via embedding operations. MLP are used to learn the aggregated vector of basic features, with ELU [35] as the activation function: = ( ((, , ))). (2)

3. The Proposed Approach In this section, we introduce the proposed ECMM model. As shown in Figure 2, it consists of three modules, which are base module includes the online user, the ofline store

CTCVR Share net feature concat CVR CVR network

3.3. Entire Cost Module

restriction. The sparse attention takes the embedding of the user’s current context feature, click and order sequences as input, and then get the most important user click and order behavior in the current context. The sparse attention [36] is defined as follows: Diferent from B2C purchase, O2O scenario generally considers surrounding stores of a user’s location. Limited candidates actually reduce the possibility of matching with users’ preference. Thus, it is critical to accurately capture the user’s behavioral regularity from historical behaviors. Meanwhile, O2O users need to consider two- (, , ) = (( √ stage costs for decision making, i.e., online order cost and ofline transportation cost, both of which should be considered. Entire cost module is designed to solve the above problems and is the most important part of the ECMM model. It contains two parts: online cost feature module and ofline cost feature module.

Online Cost Feature Module. Each store that in the user’s click or order sequence has side-information features of id , distance and price that represent the user cost that he decide to click/order an ofline store in the online platform. Then we have embedding of the i-th store in user historical behavior, (7) where the operation takes the top pieces of historical information most relevant to the current context.

Through the sparse attention, we can get the updated embeddings of user’s click and order sequences:

(9) where means converts context features as query vector, {, } denotes converts the user click sequence as key and value vectors and {, } as well. ℎ = (, , ), ℎ ∈ R3. (3) In order to better capture the impact of the user click sequence on the order sequence from the retrieved ℎ = (, , ), ℎ ∈ R3. (4) click and order aggregation information, we propose a Thus, the user’s historical click and order behavior se- convert attention mechanism to capture these impacts quences, i.e., and , can be represented as follows: from both local and global perspectives. From a local perspective, the preference of the user’s = (ℎ1 , ℎ2 , ..., ℎ), ∈ R× 3, (5) conversion to store ℎ, ∈ can be characterized by the clicked store ℎ, ∈ related to where the order = (ℎ1, ℎ2, ..., ℎ), ∈ R× 3, (6) was placed: = (, , ), ∈ R× 3, = (, , ), ∈ R× 3,

(8) )) , where denotes the length of user’s click and order sequences.

After embedding, the sparse attention is used to capture the user’s historical preference under contextual = (W × ℎ,) ⊗ ( × ℎ, ) ,

(10) , = Σ =1 ( ) × ℎ, + ℎ, , ℎ, ∈ R3, (11)

Σ =1( ) = (,), = (,), = (,), ∈ R× .

(12)

For each dimension, we calculate the impact of the user’s clicked sequence on the user’s order sequence from a global perspective: = ( × ,) ⊗ ( ×

, ) , (13) ( ) , = Σ Σ ( ) , + , , , = ,:+, , ∈ R× , ∈ {, }, (19) , = ( , √ ), , ∈ R× , ∈ {, },

(20) = Σ =1,,, ∈ R× , ∈ {, },

(21) where ∈ N denotes our window length, , denotes the subsequence in -th window, denotes the user ofline preference of the window length dimension matrix, and = ( ) denotes the user ofline preference vector. where , ∈ R3× 3 is trainable parameters. We propose a sliding window attention mechanism that represents the correlation between clicked store and or- uses fixed-length windows to characterize the user’s prefder store . , means to use the aggregation of clicked erence for transportation cost in diferent periods, bestores information to obtain the local conversion prefer- cause the user’s preference for transportation cost varies ence to update the order store information. Here, we use in diferent periods. Note the mechanism has generation the residual design to retain the original information of for not only O2O platform users but also for other scethe order store. nario which need to capture user dynamic preference

From a global perspective, the user’s preferences for during diferent period. diferent dimensions (i.e., store’s id, price, distance) of Each ofline store has a distance feature ∈ R order stores are afected by the relevant information of with respect to the current store, we match this feature the clicked store. Hence, we separate the submatrix from with the user’s historical distance sequence: the click and order sequences: Datasets. We selected 30 days exposure logs from August to September obtained from the online O2O business system to train the CVR model. We have two test sets: = (), ∈ R× , (17) one is one day dataset in September and another is three days in October. Since user behavior evolves with time, = (), ∈ R× . (18) the closer the time is to the training data, the closer the distribution of user behavior is to the training data, , ∈ R× , 3.4. Cost Combination Module

(14) where , ∈ (, , ), , ∈ R× In this section, we embody CTR and CVR prediction tasks is trainable parameters, represents the correlation into a multi-task framework. The input of this module is between the click sequence in dimension and the the concatenation of the outputs from base module and order sequence in dimension , , means that entire cost module. and are calculated by MLP using the click additional information aggregation to network, respectively. obtain the global conversion preference to update the order sequence. The residual design is also used in this = ( ([, ℎ, ℎ, , ])), (22) part.

Finally, the aggregation of order sequence and click = ( ([, ℎ, ℎ, , ])). (23) sequence can be obtained : ℎ ∈ R3, (15) (16) ℎ = (‖ (, ) + ‖ (, )),

ℎ = (), ℎ ∈ R3, where ‖ means concatenate of vectors.

Ofline Cost Feature Module. In O2O scenario, oflfine transportation costs also play an important role in the conversion rate as users need to go to ofline stores. We first construct the user’s historical behavior sequences to represent the user’s historical click and order transportation costs, and takes them as the input of the -layers Transformer encoder:

To this end, we calculate the post-view click

through&conversion rate (CTCVR) by = * . The loss function used here is lambda loss [37].

4. Experiments In this section, we evaluate the model performance of the proposed ECMM. We describe the experimental settings and experimental results as follows. 4.1. Experimental Settings

and the longer the relative time is, the user behavior task model for learning CTR and CVR in the industry. b) distribution will change. Therefore the test sets in this ESMM+DIN [ 20 ]. Based on ESMM, users’ click sequence experiment can efectively evaluate the accuracy and feature and the current store feature are introduced by generalization of the model. The number of our training DIN method. samples is approximately 1.1 billion, while the testing (2) Ablation: a) ECMM wo ofline and convAttn . sets are 40 million and 100 million, respectively. Based on ECMM, we only use online convert cost with

Metric. The goal of our ranking task is to provide a out convert attention. b) ECMM wo ofline . Based on list that is more likely to facilitate users’ conversion. The ECMM, we only use online convert cost. c) ECMM wo evaluation metric used in this paper is NDCG. We have online and slidWinAttn. Based on ECMM, we only use two ranking strategies: sorting by CTR and sorting by ofline convert cost without sliding window attention. d) CTCVR. So we have NDCG sorted by CTR to predict real ECMM wo online. Based on ECMM, we only use ofline click rate and NDCG sorted by CTCVR to predict real convert cost. purchase rate. The calculation criteria are as follows: (3) ECMM variants: a) ECMM+dualInfo: Based on ECMM, we calculate convert attention not only convert = = Σ =1(2 − 1)/(1 + ) , click sequence information to the order sequence but Σ |=1|(2 − 1)/(1 + ) also convert order sequence information to the click se(24) quence. b) ECMM+sepInput: Based on ECMM, we use where represents the length of the list of stores ranked the click feature as the input for the CTR network, the by the model, represents the label of the sample includ- order feature as the input for the CVR network. ing click and order difering from the model task, and || represents the number of stores that label is not 4.2. Ofline Performance zero.

Compared Methods. Our baseline is a highly opti- The evaluation metric used in this paper is CTR-NDCG mized ESMM model that incorporates a large number and CTCVR-NDCG. Table 1 shows the experimental reof business features and handcrafted features. The to- sults of the comparison methods on two testing sets, from tal number of features is 473. The embedding matrix of which we have: dimension is 10. We use the sequences feature from For the entire cost module, compared with ESMM, users’ history for 180 days and the length is 50. The ECMM can obtain a 0.35% gain on CTR-NDCG and 0.38% numbers of Transformer layers is 2. Because 80% of gain on CTCVR-NDCG 1. And all other ablation methods users click sequence length is less than 10 and order se- and variants can also improve the model performance quence length is less than 5, and considering the service after modeling users’ behavior sequences. performance, the of the sparse attention we chose is For online cost feature, compared with ESMM, 10. The dimension of the MLP used in the base module is ESMM+DIN adding click sequence has a certain increase 1024, and the dimension of the four-layer MLP used by in CTR- and CTCVR-NDCG. As showen in Figure 3, the CTR and CVR networks is 512, 256, 128, 1 with ELU ECMM wo ofline and convAttn , which is further added activation function, respectively. And all baselines take to the order sequence, slightly decreases in the CTRinto account the statistical user features of online and oflfine costs for fair comparison. We conduct comparative experiments with three categories of methods: (1) Baselines: a) ESMM [ 10 ]. An outstanding multi

1For large-scale datasets in industrial recommender systems, the

improvement is considerable because of its hardness, and the testing results in Section 3.3 further verify the significant improvement of our proposal.

NDCG, but greatly improves the CTCVR-NDCG. ECMM

wo ofline indicates that the convert attention mechanism can learn users’ order characteristics from click to order.

These three methods show that it is efective to utilize historical features to improve CVR prediction. The convert attention brings 0.18% and 0.19% gains in CTR-NDCG and CTCVR-NDCG.

For ofline cost feature , the ECMM wo online and slidWinAttn model that uses distance sequence features brings stronger efects improve both CTR- and CTCVRNDCG. As showen in Figure 4, comparing ECMM wo online and slidWinAttn with ESMM, it can be seen that the ofline transportation cost is indispensable for the conversion rate prediction of O2O platform. And ECMM wo online model introduced by our proposed slide window attention brings greater gains by dynamic matching user preference during diferent times. The sliding window method brings 0.02% and 0.05% gains in CTR-NDCG and CTCVR-NDCG.

In order to explore whether the user’s historical 5. Conclusion order will afect click, we further study with the ECMM+dualInfo model that the order sequence trans- In this paper, inspired by the user sequential behaviors mits information to the click sequence. It can be seen in O2O platform, a novel model is proposed to predict that the click NDCG decreased by 0.05%, and the CTCVR- conversion rate. Further, introduce covert attention and NDCG decreased by 0.06%. We separate the click and the sliding window attention in the cost module to learn users’ order features into the CTR network and CVR network to online behavioral regularity and ofline transportation obtain the ECMM+sepInput model to verify the feature regularity. At the same time, ofline experiments have impact of diferent task, and found that separate features proved the efectiveness of our proposed method to learn will reduce model performance. users’ conversion from users’ click sequence to order

To verify the generalization of our model instead of sequence, and the accuracy of the ranking list is imiftting users over a certain period, we further evaluate proved by evaluating NDCG. Online experiments show our method on a test set in October. The results are that ECMM method has a significant efect on improvconsistent with the assessment in September. The ECMM model shows that the advantage of considering users’ online behavioral and ofline transportation regularities is helpful in predicting users’ current CTR and CTCVR.

4.3. Online Evaluations Online A/B test was conducted in the recommender sys

tem in 7 days in January 2022. For the control group, 10% of users were randomly assigned and presented in a recommender system presented by a highly optimized ESMM algorithm. For the experimental group, 10% of users were randomly selected to use the ECMM method. In the online experiment, we choose CTR and CTCVR as evaluation indicators, where CTCVR represents the purchase rate of each request. The result is shown in Figure 5. We can see that our proposed ECMM method improves the CTR by 0.52% (p-value=0.00<0.05) compared with the baseline model, and the CTCVR by 0.73% (pvalue=0.02<0.05), which has a 1.8% (p-value=0.02<0.05) increase in total revenue. Here, total revenue increases to 1.8% with a 0.45% increase in CTCVR means the model provides users with higher price list. So far, the ECMM method has been applied to the main online trafic and has served more than hundreds of millions of users, bringing a significant increase in the total revenue of Meituan.

Acknowledgments This research was supported by the National Natural Science Foundation of China (NSFC) under Grant 72071029, 71974031 and 72231010. This research was also supported by Meituan.

ing the total revenue of the O2O platform. For now, the ECMM method has been applied to the main online trafifc, bringing a significant increase in the total revenue of the enterprise. [15] S. Guo, L. Zou, Y. Liu, W. Ye, S. Cheng, S. Wang, 2671–2679.

H. Chen, D. Yin, Y. Chang, Enhanced Doubly Robust [24] K. Ren, J. Qin, Y. Fang, W. Zhang, L. Zheng, W. Bian, Learning for Debiasing Post-Click Conversion Rate G. Zhou, J. Xu, Y. Yu, X. Zhu, et al., Lifelong seEstimation, Association for Computing Machinery, quential modeling with personalized memorization New York, NY, USA, 2021, p. 275–284. URL: https: for user response prediction, in: Proceedings of the //doi.org/10.1145/3404835.3462917. 42nd International ACM SIGIR Conference on Re[16] X. Pan, M. Li, J. Zhang, K. Yu, L. Wang, H. Wen, search and Development in Information Retrieval, C. Mao, B. Cao, Conversion rate prediction via meta 2019, pp. 565–574. learning in small-scale recommendation scenarios, [ 25 ] Q. Tan, J. Zhang, J. Yao, N. Liu, J. Zhou, H. Yang, arXiv preprint arXiv:2112.13753 (2021). X. Hu, Sparse-interest network for sequential rec[17] H. Wang, Z. Li, X. Liu, D. Ding, Z. Hu, P. Zhang, ommendation, in: Proceedings of the 14th ACM C. Zhou, J. Bu, Fulfillment-time-aware personalized International Conference on Web Search and Data ranking for on-demand food recommendation, in: Mining, 2021, pp. 598–606.

Proceedings of the 30th ACM International Confer- [26] K.-c. Lee, B. Orten, A. Dasdan, W. Li, Estimating ence on Information & Knowledge Management, conversion rate in display advertising from past 2021, pp. 4184–4192. erformance data, in: Proceedings of the 18th ACM [18] D. Xi, Z. Chen, P. Yan, Y. Zhang, Y. Zhu, F. Zhuang, SIGKDD international conference on Knowledge Y. Chen, Modeling the sequential dependence discovery and data mining, 2012, pp. 768–776. among audience multi-step conversions with multi- [27] O. Chapelle, Modeling delayed feedback in distask learning in targeted display advertising, in: play advertising, in: Proceedings of the 20th ACM Proceedings of the 27th ACM SIGKDD Conference SIGKDD international conference on Knowledge on Knowledge Discovery & Data Mining, 2021, pp. discovery and data mining, 2014, pp. 1097–1105. 3745–3755. [28] Q. Lu, S. Pan, L. Wang, J. Pan, F. Wan, H. Yang, A [19] F. Xiao, L. Li, W. Xu, J. Zhao, X. Yang, J. Lang, practical framework of conversion rate prediction H. Wang, Dmbgn: Deep multi-behavior graph net- for online display advertising, in: Proceedings of works for voucher redemption rate prediction, in: the ADKDD’17, 2017, pp. 1–9.

Proceedings of the 27th ACM SIGKDD Conference [29] R. Xie, C. Ling, Y. Wang, R. Wang, F. Xia, L. Lin, on Knowledge Discovery & Data Mining, 2021, pp. Deep feedback network for recommendation, in: 3786–3794. Proceedings of the Twenty-Ninth International [ 20 ] G. Zhou, X. Zhu, C. Song, Y. Fan, H. Zhu, X. Ma, Conference on International Joint Conferences on Y. Yan, J. Jin, H. Li, K. Gai, Deep interest network Artificial Intelligence, 2021, pp. 2519–2525. for click-through rate prediction, in: Proceedings [30] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, of the 24th ACM SIGKDD International Conference L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Aton Knowledge Discovery & Data Mining, KDD ’18, tention is all you need, Advances in neural inforAssociation for Computing Machinery, New York, mation processing systems 30 (2017). NY, USA, 2018, p. 1059–1068. URL: https://doi.org/ [31] Y. Feng, F. Lv, W. Shen, M. Wang, F. Sun, Y. Zhu, 10.1145/3219819.3219823. doi:10.1145/3219819. K. Yang, Deep session interest network for click3219823. through rate prediction, in: IJCAI, 2019. [21] G. Zhou, N. Mou, Y. Fan, Q. Pi, W. Bian, C. Zhou, [32] C. Li, Z. Liu, M. Wu, Y. Xu, P. Huang, H. Zhao, X. Zhu, K. Gai, Deep interest evolution net- G. Kang, Q. Chen, W. Li, Lee, Multi-interest network for click-through rate prediction, volume 33, work with dynamic routing for recommendation 2019, pp. 5941–5948. URL: https://ojs.aaai.org/index. at tmall, Proceedings of the 28th ACM Internaphp/AAAI/article/view/4545. doi:10.1609/aaai. tional Conference on Information and Knowledge v33i01.33015941. Management (2019). [22] C. Li, Z. Liu, M. Wu, Y. Xu, H. Zhao, P. Huang, [33] Z. Xiao, L. Yang, W. Jiang, Y. Wei, Y. Hu, H. Wang, G. Kang, Q. Chen, W. Li, D. L. Lee, Multi-interest Deep multi-interest network for click-through rate network with dynamic routing for recommendation prediction, Proceedings of the 29th ACM Interat tmall, in: Proceedings of the 28th ACM interna- national Conference on Information & Knowledge tional conference on information and knowledge Management (2020).

management, 2019, pp. 2615–2623. [34] T. Natarajan, S. A. Balasubramanian, D. Kasilingam, [23] Q. Pi, W. Bian, G. Zhou, X. Zhu, K. Gai, Practice on Understanding the intention to use mobile shoplong sequential user behavior modeling for click- ping applications and its influence on price sensithrough rate prediction, in: Proceedings of the tivity, Journal of Retailing and Consumer Services 25th ACM SIGKDD International Conference on 37 (2017) 8–22.

Knowledge Discovery & Data Mining, 2019, pp. [35] D. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (elus), in: Y. Bengio, Y. LeCun (Eds.), 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016. URL: http://arxiv.org/abs/1511.07289. [36] G. Zhao, J. Lin, Z. Zhang, X. Ren, X. Sun, Sparse transformer: Concentrated attention through explicit selection, 2020. URL: https://openreview.net/ forum?id=Hye87grYDH. [37] X. Wang, C. Li, N. Golbandi, M. Bendersky, M. Najork, The lambdaloss framework for ranking metric optimization, in: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018, pp. 1313–1322.

10.1145/2424321.2424348. doi: 10 .1145/2424321.

[7]

Huang ,

Hu ,

Tang ,

Chen ,

Qi , J. Cheng,

ctr prediction , in: Proceedings of the 44th Inter-

Development in Information Retrieval , 2021 , pp.

[8]

Ping ,

Gao ,

Liu ,

Du ,

Luo ,

Jin ,

Li ,

in: Proceedings of the 27th ACM SIGKDD Confer-

ence on Knowledge Discovery & Data Mining , 2021 ,

pp. 3472 - 3482 .

[9]

Fang ,

Gu ,

Luo ,

Xu , Contemporaneous and

delayed sales impact of location-based mobile pro [1]

Ding ,

Tang , T. Liu,

Xu ,

Zhang ,

Shi , motions, Information Systems Research 26 ( 2015 )

Jiang ,

Shen , Infer implicit contexts in real-time 552-564 .

online-to-ofline recommendation , in: Proceedings [10]

Ma , L. Zhao , G.

Huang , Z.

Wang , Z.

Hu , X.

Zhu ,

of the 25th ACM SIGKDD International Conference K. Gai, Entire space multi-task model: An ef-

on Knowledge

Discovery

& Data Mining, KDD '19, fective approach for estimating post-click conver-

Association for Computing Machinery , New York, sion rate, in: The 41st International ACM SIGIR

NY , USA, 2019 , p. 2336 - 2346 . URL: https://doi.org/ Conference on Research & Development in Infor-

10.1145/3292500.3330716. doi: 10 .1145/3292500. mation Retrieval, SIGIR '18, Association for Com-

3330716. puting Machinery, New York, NY, USA, 2018 , p. [2]

Li ,

Shen ,

Bart , Local market characteris- 1137 - 1140 . URL: https://doi.org/10.1145/3209978.

tics and online-to-ofline commerce: An empirical 3210104 . doi: 10 .1145/3209978.3210104.

analysis of groupon, Management Science 64 ( 2018 ) [11]

Wen ,

Zhang ,

Wang ,

Lv ,

Bao ,

Lin ,

1860- 1878 . K. Yang, Entire space multi-task modeling via post [3]

Kawanaka ,

Moriwaki , Uplift modeling for click behavior decomposition for conversion rate

location-based online advertising , in: Proceedings prediction, in: Proceedings of the 43rd Interna-

of the 3rd ACM SIGSPATIAL International Work- tional ACM SIGIR Conference on Research and De-

cial Networks and Geoadvertising , LocalRec '19, Computing

Machinery

, New York, NY, USA, 2020 ,

Association for Computing Machinery , New York, p. 2377 - 2386 . URL: https://doi.org/10.1145/3397271.

NY , USA, 2019 . 3401443 . [4] M.-H. Park , J. -H. Hong , S.-B. Cho, Location-based [12] H.

Wen , J.

Zhang , F.

Lv , W.

Bao , T.

Wang , Z.

Chen ,

conference on ubiquitous intelligence and comput- diction , in : Proceedings of the 44th International

ing, Springer, 2007 , pp. 1130 - 1139 . ACM SIGIR Conference on Research and Devel[5]

Yang , T. Liu,

Sun , E. Bertino, Exploring the opment in Information Retrieval, Association for

interaction efects for temporal spatial behavior Computing Machinery , New York, NY, USA, 2021 ,

prediction, in: Proceedings of the 28th ACM In- p. 2187 - 2191 . URL: https://doi.org/10.1145/3404835.

ternational Conference on Information and Knowl- 3463053.

edge

Management

, CIKM '19 , Association for Com- [13] Q.

Lu , S.

Pan , L.

Wang , J.

Pan , F.

Wan , H.

Yang ,

puting Machinery , New York, NY, USA, 2019 , p. A practical framework of conversion rate predic-

2013- 2022 . URL: https://doi.org/10.1145/3357384. tion for online display advertising , in: Proceed-

3357963. doi: 10 .1145/3357384.3357963. ings of the ADKDD'17 , ADKDD'17, Association [6] J.

Bao , Y.

Zheng , M. F.

Mokbel , Location-based and for Computing Machinery , New York, NY, USA,

preference-aware recommendation using sparse 2017 . URL: https://doi.org/10.1145/3124749.3124750.

geo-social networking data , in: Proceedings of doi:10.1145/3124749 .3124750.

the 20th International Conference on Advances in [14]

Tong ,

Xu ,

Yan ,

Xu , Impact of diferent

Geographic

Information

Systems , SIGSPATIAL ' 12 , platform promotions on online sales and conversion

NY , USA, 2012 , p. 199 - 208 . URL: https://doi.org/ length, Decision Support Systems ( 2022 ) 113746 .