PSAC: Context-based Purchase Prediction Framework via User’s Sequential Actions Wei-Cheng Chen Chih-Yu Wang Su-Chen Lin Research Center for Information Research Center for Information Verizon Media Technology Innovation, Academia Technology Innovation, Academia Taipei, Taiwan Sinica, Sinica, suchenl@verizonmedia.com Taipei, , Taiwan Taipei, , Taiwan jimmyweicc@citi.sinica.edu.tw cywang@citi.sinica.edu.tw Alex Ou Tzu-Chiang Liou Verizon Media Verizon Media Taipei, Taiwan Taipei, Taiwan oualex@verizonmedia.com tcliou@verizonmedia.com ABSTRACT CCS CONCEPTS Along with the daily operation of e-commerce web services, a sig- • Information systems → Electronic commerce; Recommender nificant quantity of data has been recorded. The research of user’s systems; Data analytics; • Applied computing → Online shop- behaviors based on the collected data has generated intense atten- ping; tion for accurately offering services that can match the customer’s needs and predict the purchase actions. Traditionally, most of the KEYWORDS researches utilize only the behavioral instances between users and user behavior analysis, e-commerce purchase prediction, deep learn- products, i.e., browse or click history, and session status. However, ing these features provide only a fundamental knowledge of the given user rather than the rationale behind their actions. We find that 1 INTRODUCTION query should play an important role as well as it is the main entry 1.1 Background point for users when arriving e-commerce website. Since users utilize queries to decide the direction of succeeding event, the se- Along with the Internet evolution, electronic commerce (e-commerce) mantic meanings of these queries demonstrate a particular link has generated enormous interests in the past few years. Global retail with the action. e-commerce industry’s market sales have recorded around 20.77% In this paper, we propose the Prediction framework that ana- of compound annual growth rate (CAGR) from 2014 to 2018 and lyzes User’s Sequential Actions via Context (PSAC) to exploit the are looking forward to sustaining the growth pace until 2021 [1]. connection between the user’s searching keywords and behaviors To cope with increasing competitions, companies and researchers to investigate their ultimate intention on an e-commerce website race to develop strategies for matching customers’ needs and for and improve the purchase prediction accuracy. We utilize the e- optimizing companies’ profit. Some examples are personalized ad- commerce dataset provided by Yahoo Taiwan, one of the largest web vertisement [17], upselling to existing customers [10], exclusive services provider in Taiwan. According to our preliminary analysis, offers, and account-based marketing [16]. However, these solutions we design a session-based structure to deal with the environment- often require the ability to perceive the customer’s intention to shifting (influenced by coexisting fashion), and experience-shifting gain advantages in the competition from other platforms. (changed through user’s actions) issues which we observed in the Recently, researchers explored the possibility to identify user’s dataset. In the simulation section, we apply two deep learning intention through the interactions between users and products, frameworks to perform the prediction task. Experimental results such as query terms and browsing/purchasing history. Query term, confirm that queries serve as a critical matter in perceiving a user’s for instance, would roughly illustrate the needs the user would like purchasing intention. Moreover, the proposed framework could sig- to be satisfied. When the intent of the user could be identified, the nificantly improve the prediction accuracy compared with baseline system could redirect the users to proper products recommenda- methods. tions. For example, Adhikari et al. [2] employ Customer Interaction Networks (CIN) to investigate the relationship between queries and indicate that these relations can significantly improve searching quality. On the other hand, Kumar et al. [11] utilize user interactions Copyright © 2019 by the paper’s authors. Copying permitted for private and academic purposes. data on a mobile application, such as activities properties, dwell In: J. Degenhardt, S. Kallumadi, U. Porwal, A. Trotman (eds.): time, and query contents, to learn the quality of search engine’s Proceedings of the SIGIR 2019 eCom workshop, July 2019, Paris, France, published at feedback. http://ceur-ws.org Despite query, researchers also studied other types of behav- iors we can collect and analyze in e-commerce websites, such as clickstream, product quality, and user profiles. In particular, Huang SIGIR 2019 eCom, July 2019, Paris, France Wei-Cheng Chen and Chi h-Yu Wang, et al. et al. [6] examine mobile phones users’ behavior across different (2) We investigate the behavior of the real-world user on e- e-commerce platforms. In their study, they claim that purchase deci- commerce activity that provides further information in pre- sions tend to happen promptly, and spatiotemporal factors such as dicting the user’s purchase intention. time, regions, and platforms, influence shopping behaviors. Even- (3) We confirm that query factors are critical in intent and ac- tually, their simulation indicates that user’s e-commerce platform tion predictions, which can be utilized in future works and preference is predictable. Similarly, Zhou et al. [20] offer that it industries application. is possible to improve prediction performance by utilizing micro behaviors such as comments, carting, and ordering. Ni et al. [14] The remainder of the paper proceeds as follows. Section 2 con- aim at constructing a universal user representation via multiple sists of three parts: a description of the datasets, the investigation e-commerce tasks. After modeling behaviors’ and items’ features of data, and the preprocessing procedures. Section 3 describes the into sequences of user behaviors, they employ deep Recurrent Neu- implementation of e-commerce purchasing prediction system via ral Network (RNN) architecture and attention-based techniques time-shifting scheme and the deep learning frameworks. Finally, for multi-task learning for ten days of data. Additionally, Lo et Section 4 examines the simulation results, and Section 5 presents al. [12] utilize extended real-world data to build a cross-platform the conclusions. analysis of user behavior for more general purposes. They discover evidence that purchase intent will gradually increase through time and significantly soar three days before purchase. Remaining stud- 2 DATASETS EXPLORATION ies discuss topics including the sparseness in e-commerce context In this paper, we use a dataset provided by Yahoo Taiwan, which [8], exploration of user-item pairwise relationship [15], and online contains a variety of behaviors of its e-commerce platform users. advertising [19], et cetera. Most importantly, the data not only document user’s activities on the platform but also save their searching queries, a textual 1.2 Rationale of Our Approach expression used by the customers to find the products they want. From previous literature, we see that the researches in purchase intention prediction vary mainly with dataset’s attributes. Never- 2.1 E-commerce Dataset Preparation theless, the approaches that employ queries to perceive purchase Our sample dataset consists of real-world transaction records from intention is still scarce. In this study, we collaborate with Yahoo the e-commerce platform which spans six months. Specifically, Taiwan which incorporates its e-commerce platform’s data. In par- the dataset consists of three types of information: click, view, and ticular, we collect the dataset describing various behaviors made buy, of over a million unique user’s online actions, including 12M, by its users, e.g., actions (click, view, buy), connection properties 10M, and 1.2M records individually. Both click and view records (timestamp, time spent), and the most critical piece of information - specify the query terms the user entered before such actions. For entered queries. Different from most existing researches, this paper further merging procedure, we label each record in the dataset with focuses on the construction of user representation via their online a session id s which specifies the belonging connection. The dataset actions to cope with e-commerce purchase prediction. Moreover, contains the following information: we utilize the relationship between users’ textual inputs and their actions to realize his/her intention. Furthermore, we propose that users behavior are usually time-sensitive in two manners, that is, 2.1.1 Click Records. Each click action represents the click action environment-shifting and experience-shifting. These phenomena il- of a specific user after he/she enters an arbitrary searching query. lustrate the evolution of the user’s actions through the purchase Explicitly, numerous information will be stored, i.e., the user’s id, activity, which occurs in actual e-commerce platforms. entered queries, product’s id, and the time that the action occurs. According to the above observations, we establish Prediction Because the click behavior demonstrates a one-to-one connection framework that analyzes User’s Sequential Actions via Context- between entered queries and product, it can be seen as a strong based information (PSAC), a prediction system based on session- assumption that the user has a great interest in the product. based representation which records the behavioral sequence of each user. Meanwhile, we apply corpus embedding techniques to 2.1.2 View Records. Similar to click records, viewing data record encode the latent information of entered queries and derive related the same set of features as clicks data. Nevertheless, there is a slight attributes from the sequence of entered queries. Finally, the system difference in the product’s attributes. In each record, view actions utilizes two deep learning frameworks to analyze, store, and predict stand for the operation of a user skimming through a shopping page. a user’s intention under different scenarios. The simulation results Compared with click actions which cover only a single product, a indicate that the sequence of query contents is an essential feature view record saves plural items on a single page after searching for in learning user’s shopping intention and points out the difference arbitrary keywords. between the two frameworks. Our main contributions in this paper include: 2.1.3 Buy Records. Finally, buy dataset describes each purchase (1) We implement a Prediction framework that analyzes User’s occurrence that happens during the targeting period. As for the Sequential Actions via Context-based information, which data information, each buy record consists of all previous attributes predicts user’s purchase intention via query-based informa- except for the entered query. We list each dataset’s attributes in tion and their corresponding actions. Table 1. PSAC: Context-based Purchase Prediction Framework via User’s Sequential Actions SIGIR 2019 eCom, July 2019, Paris, France Table 1: Features in each dataset see that there is a similar distribution across six months. Around 30% of queries appear at least two times, 20% and 13% of queries Property Symbol Explanation Click View Buy occur at least three and four times, and lastly, 10% of queries appear five or more times. The figures are not only credible that most users user id u user ID of current o o o session use an independent query in searching actions but also demonstrate session id s session ID of cur- o o o the diversity of e-commerce searching. Moreover, we define Space rent session Ratio in 1(b) to illustrates the query’s coverage ratio. For instance, start time ts starting time of o o o SRm,5 computes the coverage ratio of queries that appear at least current action five times in month m. We can see that across all months, although query q entered query or o o x CRm,C diminishes when we increase C, the value of corresponding queries of current SRm,C retain 80% of the corpus space. The results indicate that action users on e-commerce platforms tend to reuse queries during their product id p product ID of cur- o o o searching events. Furthermore, it is conceivable and economical rent action to learn the figuration of e-commerce queries data via a compact dataset by only preserving the queries with multiple counts. 2.2 Exploratory Data Analysis 2.2.2 Action Behavior. Customers’ action exhibits various clues To obtain a brief knowledge of users’ behavior on Yahoo Taiwan as well. Figure 2 describes the time difference between the start of e-commerce platform, we conduct initial analysis on three top- action (click/view) with the action to add products into the cart. ics: (i) query usage, (ii) action behavior, and (iii) session statistics. Specifically, the difference is computed via action’s ts and corre- Furthermore, we examine the analysis under monthly-basis. sponding buy’s ts (after matching with the same u and p). 2.2.1 Query Usage. First of all, we find out that for each month Difference between click and buy Difference between click and buy in the dataset, the amount of unique queries ranges from 210K to (in minutes) (in hours) 260K. Around 70% of queries were found unrepeated in the dataset. 40.0% 80.0% 30.0% 60.0% Beckmann et al. [3] also state this phenomenon that users tend 20.0% 40.0% 10.0% 20.0% to define personal keywords to describe the products. While an 0.0% 0.0% 0-10 10-20 20-30 30-40 40-50 50-60 <1 1-3 3-6 6-12 12-24 > 24 extensive data like this is possible for causing oversize dimension 1 2 3 4 5 6 1 2 3 4 5 6 problems during the embedding process, we set two terms to tackle (a) (b) the issue, i.e., Count Ratio (CRm,C ) and Space Ratio (SRm,C ) of each Difference between view and buy Difference between view and buy month m. We represent the calculation in Eq. (1a) and (1b) and the (in minutes) (in hours) 40.0% 80.0% bar charts in Figure 1(a) and 1(b). 30.0% 60.0% 20.0% 40.0% 10.0% 20.0% number of qi in month m with c i,m ≥ C 0.0% 0.0% CRm,C = , (1a) 0-10 10-20 20-30 30-40 40-50 50-60 <1 1-3 3-6 6-12 12-24 > 24 nm 1 2 3 4 5 6 1 2 3 4 5 6 Ínm σi ∗ c i,m (c) (d) SRm,C = i=1 Ínm . (1b) i=1 c i,m Figure 2: Different behavior of Click and View with the fol- where nm is the amount of unique query in m, Qm = {qi | i = lowing carting. 1...nm } is the set of unique queries in m, c i,m is the number of occurance of qi in m, σi = 1 if qi in month m with c i,m ≥ C else Furthermore, for both click and view actions, we separate the σi = 0, and C is a predefined constant ranges from one to five. figures into minutes and hours sections. Additionally, the first bars within 2(b) and 2(d) include all elements of 2(a) and 2(c). Comparing Count ratio (CR) of each month Space ratio (SR) of each month 2(a) with 2(c), we can see that around 35% of click actions have a 40.0% 100.0% 80.0% 10 minutes gap with the following buy action. On the other hand, 30.0% 60.0% nearly 20% of view actions lead to a buy action within 10 minutes. 20.0% 40.0% Moreover, a significant difference between 2(b) and 2(d) locates at 10.0% 20.0% the last bars, which depicts the portion of users carting the products 0.0% 0.0% after twenty-four hours. For view actions, there is a slight excess C >= 2 C >= 3 C >= 4 C >= 5 C >= 2 C >= 3 C >= 4 C >= 5 in carting after a day compare to click actions. These observations 1 2 3 4 5 6 1 2 3 4 5 6 (a) (b) suggest two arguments: (i) A click action is faster to induce a buy action than a view action; and (ii) An user who is still browsing Figure 1: Count ratio CRm,C and Space ratio SRm,C under the platform might require more time to make the purchase deci- month m and Count C. sion. These indications are useful for companies to design different strategies for drawing customers’ intent. In 1(a), each segment in the chart represents the number of 2.2.3 Session Statistics. Lastly, we investigate the number of queries different query qi occurs more than C times in month m. We can used by users having different intention in this section. To begin SIGIR 2019 eCom, July 2019, Paris, France Wei-Cheng Chen and Chi h-Yu Wang, et al. with, we combine our click and view records via session’s id s and the Web industry, a session defines the login event after a user mark each action as purchased or non-purchased according to the connects to the hosting server. Specifically, we record all of the buying dataset. Details for merging will be presented in Section activities during a given session to illustrate the user’s behavior, 2.3. Afterward, we divide all constructed sessions into two parts: which is beneficial for learning the development of their intentions purchased sessions and non-purchased sessions. Figure 3 describes dynamically. According to our dataset, Yahoo Taiwan specifies its the related information. session as a fixed period after any user creates a connection to their server. If an individual user browses the pages more than the given Length of a session (All session) period, they will create multiple sessions according to the actual 50.0% 40.0% spending time. 30.0% 20.0% The reason why we construct the data on a session-level is for 10.0% 0.0% the rationale behind searching conventions. In the real world, a L=1 L=2 L=3 L=4 L=5 L = 6~10 L = 11~20 L = 21~30 L = 31~50 others user’s searching habits and related actions are usually regarded as 1 2 3 4 5 6 (a) time-sensitive, closely connected with prior and latter instances. Length of a session (buy=True) 50.0% This phenomenon manifests in two aspects: (i) experience-shifting 40.0% 30.0% and (ii) environment-shifting. As for the first concept, it is con- 20.0% vinced that human often generates interests as the time shift. For 10.0% 0.0% example, when a customer is entering a shop, he or she might L=1 L=2 L=3 L=4 L=5 L = 6~10 L = 11~20 L = 21~30 L = 31~50 others 1 2 3 4 5 6 initially search for the thing using a general description, such as (b) shoes, bags, or computer. Typically, people take these actions as Length of a session (buy=False) 50.0% wandering and waiting; they either describe fewer details about the 40.0% 30.0% item needed in the first place or did not realize the subject they are 20.0% 10.0% searching. Furthermore, they might want to compare the same kind 0.0% L=1 L=2 L=3 L=4 L=5 L = 6~10 L = 11~20 L = 21~30 L = 31~50 others of products manufactured by a different company. Such actions 1 2 3 4 5 6 are likely to last for a while until a suitable commodity is found. (c) Usually, customers tend to check their favorite brand or look for specific functionality and appearance. In addition, some buyers will Figure 3: Length between different type of session. L stands calculate their affordable budget. For example, a user who start the for the number of actions in the given session. search of bags might end up checking Saint Laurent leather shoulder bag or backpack 13’ notebook waterproof. For consumers already It is obvious to see the difference between 3(b) and 3(c) that have a preference, they are likely to finish the previous process in about 40% of non-purchased users use only one query during their lesser time. Also, the behavior exists not only in off-line business session. On the contrary, only 15-25% of purchased users use one but also the online platform. With the advantage of technology, query through their session. Moreover, for users who purchase customers have more resources and information from the Internet, the product eventually, around 50% of the customers generate a which makes the experience cycle more transient. Moreover, this medium or large sequence of actions (L ≥ 6) within a single session. process often accompanies with several clicking actions which help The appearance of session length demonstrates that customers who the user for getting more product details. As for the second concept, might make the deal at the end have more thoughts and reactions we will describe the rationale in Section 3.3. to achieve their goal in given operation time. To construct a session-level behavior sequence, we perform sev- Like most financial issues, one of the primary problem in e- eral preprocessing steps on the raw data. At first, we need to define commerce purchase prediction is the existence of imbalance data two groups of features from our dataset: the first set describes user’s [4]. Typically, most of the platform users have no absolute intention actions after establishing a connection. The other group of features before they start searching for products. It is more probable that indicates the textual contents of users’ queries, which requires Nat- they were either entertained by other websites’ commercial or kept ural Language Preprocessing (NLP) techniques to constructs the pondering over the purchasing decision. As a result, a majority of latent representation. users leave the website without making a purchase. For example, the size of non-purchased records in our dataset is 22 to 30 times the size of purchased records. Without balancing the training data, 2.3.1 User Behavior Sequence (UBS). To illustrate history move- preliminary results are likely to be overrated. Therefore, we perform ments after each user logs onto the server, we transform the session- the random over-sampling approach on the training data, that is, based data into the User Behavior Sequence (UBS) to represent each randomly multiplying true cases until the number of both cases are session’s properties. During the merging process, we first define the similar. As for the testing set, we do not balance the data since it click and view records as primary actions and the buy records as needs to fit the circumstance in the real world. supportive information. Then, we combine primary actions having the same session id s and sort all of the records by primary actions’ 2.3 Data Preprocessing start time ts . As for the supportive information, we employ the time information to check whether each action leads to a purchase To utilize the e-commerce data, we illustrate the procedure to merge event. If any buy record’s start time ts is less than any primary separate datasets in this section. We first utilize the session id s action’s start time, and they share the same user id u and product to merge the action data into a unified user behavior dataset. In PSAC: Context-based Purchase Prediction Framework via User’s Sequential Actions SIGIR 2019 eCom, July 2019, Paris, France id p, we mark the primary activities afterward as purchased. More- Table 2: Advanced terms over, we ignore repeated primary actions, which defined as two consecutive actions having a time gap of fewer than 5 seconds. In Property Symbol Explanation general, this phenomenon might be a consequence of the network time gap td The difference between current action (i)’ error or double-clicking rather than actual intent. For each logged and previous action (j)’s starting time: t si user, we also sort his/her purchasing records by start time before j and t s . For the last action in each UBS, the merging process. we assign a dummy value τ as the differ- ence. 2.3.2 Context-aware Latent Representation. After constructing the action type βi The type of current action, either click structure of UBS, we turn our attention to the query information or view. in the dataset. To utilize the semantic features of each query, we query length l i1 The length of the entered query q i . perform several NLP techniques to construct queries’ latent repre- query amount l i2 The number of the terms after q i been sentation and integrate the features with UBS. tokenized. First, we would like to construct the query corpus from all se- word2vec w qi A value represents the word2vec layer quences in UBS. For each sequence, we concatenate each primary of given query q i . Based on the com- action’s entered queries into a sequence and tokenize the sequence putation approach, denoted as w av д or using white space. Besides, there is no stop words replacement w f ul l . or term normalization (stemming and lemmatization) during the word2vec similarity ws A value represents the similarity of preprocessing since the existent of redundant inputs are typical be- word2vec layer between current action havior in e-commerce. After these operations, we build up a corpus (i) and previous action (j): w qi and w q j . consists of over 165M searching sequences in a more structured It is computed with the cosine similarity. If current action is the first action, set the status. Nevertheless, the corpus space is too large to utilize which value as one. requires further processing. To generate the embedded representa- word2vec variance wv A value represents the variance of a input tion, we trained through word2vec [13] to project the input corpus sequence’s all entered queries’ word2vec into a lower dimension to recognize the latency representation. layer. If the input sequence has one ac- Based on the conclusion in Section 2.2.1, we only take the word tion only, set the value as zero. which occurs at least five times in the corpus into consideration. position pos A value represents current action’s posi- Furthermore, we empirically set the target dimension as 100, which tion within the belonging session. is significantly smaller than the original vector spaces. After generating the representation of current corpus, we ex- tend the constituents of UBS and compute several context-related attributes. For each occurence of searching, we assign two query- (4) PSACf : each record is the concatenation of UBS+ , w f ull , w s , related features (i) query length li1 and (ii) query amount li2 . As for and wv ; the query contents, we concatenate the embedded representation (5) PSAC+f : each record consists of PSACf and pos. of linked qi via two approaches. The first approach is to append Since the main purpose of this paper is to probe the effectiveness the average of complete semantic embedding while the second ap- of query-based dataset in perceiving user’s purchase intention, we proach considers the representation thoroughly, denoted as w avд ignore context-related features in the baseline method. For the sec- and w f ull respectively. Since we like to investigate whether each ond and third combination, we consider several query-related fea- action of the user gradually affects their final decision, we further tures, including query length and amount, latency representation, define a parameter pos to state the query’s position inside the linked similarity, and variance. Moreover, we apply different computation session. For a latter position, it is plausible that the user has a more approach to incorporate the semantic values. Finally, the position precise thought on the product he/she wants and a more thorough- attribute is added in the vector to examine the process that the user going intention in making the orders or not. The definitions of reshape their thoughts during the connection. advanced terms and context-related attributes are demonstrated in Table 2. 3 PURCHASING PREDICTION SYSTEMS VIA 2.3.3 The Input Layers Settings. To understand the efforts of each TIME SHIFTING STANDARD feature, we construct the input layer under the following context: In this section, we propose the Prediction framework that analyzes For the baseline in comparison, we only consider UBS’s standard User’s Sequential Actions via Context (PSAC). Firstly, we define features, i.e., time gap td and the action type βi . As for remain- several situations to apprehend human’s behavior and the con- ing settings, we concatenate the baseline feature vector with the cept of time-shifting mentioned in previous sections, including (i) following combination of advanced terms. experience-shifting and (ii) environment-shifting. Specifically, we (1) UBS: each record consists of n of actions and the time the will consider the evolution of queries reshaping and a data-process user spent (td + βi ); that takes contemporary fashion into account. For the experiments, (2) UBS+ : each record is the concatenation of UBS, li1 , and li2 ; we apply two deep learning models: (i) Deep Neural Network (DNN) (3) PSACa : each record is the concatenation of UBS+ , w avд , w s , and (ii) Long Short-Term Memory Networks (LSTM), where we and wv ; observed opposite views between two frameworks. SIGIR 2019 eCom, July 2019, Paris, France Wei-Cheng Chen and Chi h-Yu Wang, et al. Raw Data Concatenation Sequential Segmentation UBS & QLR Input and Embedding Layer DL Layer (UBS) (N_GRAM and SUB_GRAM) 1 DNN Baseline True/False 2 LSTM Train View Train query Query Train Click 3 … True/False action PSAC 4 Test DNN Test position True/False Test LSTM Buy label 5 Figure 4: Framework of PSAC 3.1 Framework overview action 1 action 2 action 3 action 4 action 5 In short, we illustrate the framework of PSAC in Figure 4. We action 4 first combine the view and click instances with session id s. The 1 action 1 action 2 action 3 (target) preliminary output will be used to construct the UBS. Next, to deal action 5 2 action 2 action 3 action 4 with environment-shifting, we apply sequential segmentation and N_GRAM = 4 (target) 1 2 divide the data into training and testing sets to simulate the actual SUB_GRAM = 1 action 4 action 5 condition. For each segment, we extract two parts of data, namely, SUB_GRAM = 2 action 3 action 4 action 4 action 5 query corpus and action sequences. As for the query, we apply action 2 action 3 action 4 action 3 action 4 action 5 context embedding techniques to project the corpus into lower SUB_GRAM = 3 dimensional representation. As for action, we label each action as SUB_GRAM = 4 action 1 action 2 action 3 action 4 action 2 action 3 action 4 action 5 purchased or non-purchased based on the buy dataset’s start time t 1 , user id u, and product id p. After merging each action in UBS with Figure 5: N_GRAM M and SUB_GRAM µ: A sketch of the its corresponding textual feature, we compute several advanced experience-shifting when M equals four. terms, e.q., similarity, variance, and position attributes. Finally, we deal with the imbalance in e-commerce data and establish different Table 3: Number of subsequences scenarios to investigate the effect of query-based features. In the experiment, we implement the proposed system under two deep learning frameworks (DNN and LSTM) for purchase prediction. M Training Testing 1 32,848,118 14,077,765 2 26,045,458 11,162,339 3 21,767,795 9,329,055 3.2 Experience-shifting of UBS 4 18,653,252 7,994,251 Recall that in the real world, customers shape their idea through 5 16,241,526 6,960,654 time and experience when searching on the internet. With the session-based data structure, we use the following settings to con- sider this property. After each construction of the input layer, we 3.3 Environment-shifting of UBS thus define two parameters: N_GRAM (M) and SUB_GRAM (µ) We then introduce the idea of environment-shifting, that is, the where µ = 1...M. With a given M, we first gather UBS with a latest trend or craze toward certain products in the market. Practi- length of n where n ≥ M. Then, each UBS will be separated into cally, people are likely to put focuses on the hottest products or the n − M + 1 of subsequences, and we denote the set of subsequences latest fashion in the market. Furthermore, companies often enlarge as UBSn,M . Also, each subsequence owns n actions. Furthermore, the current by releasing commercials or discounts whenever they we regard each instance in UBSn,M as an independent action series. want to attract potential consumers. To simulate the real condition, Meanwhile, the final position in each series is the prediction target we utilize the rolling-based approach to train the model separately. for purchase intention. We illustrate the construction of UBS5,4 in Explicitly, we segment the data into several pairs of subsets for the Figure 5; we can see that a UBS with five actions separates into two simulation: the first subset will cover x days of data to build UBS as subsequences if M is four (action1−4 and action2−5 ). the training set, and the other defines consecutive y days of data to Secondly, both subsequences generate four action series and construct UBS as the testing set. We execute the training process each series consist of µ = 1, 2, 3, 4 continuous actions. Such a de- separately until all segments were done and aggregate the results sign is capable for us to understand whether more actions benefit to get the whole pictures. We demonstrate the process in Figure 6. the proposed system in learning user’s intention. Eventually, the Moreover, we balance the training set to avoid overfitting and number of subsequences built from our data is represented in Table remain the imbalance in the testing set to fit reality. In sum, the 3. advantages of sequential segmentation are 1) it is more reasonable for companies to use prior data as the training set of following PSAC: Context-based Purchase Prediction Framework via User’s Sequential Actions SIGIR 2019 eCom, July 2019, Paris, France Start End Table 4: Structure of DNN and LSTM models x Training set . . . . . . Model Layer structure # untrained weights Testing set . . . . . . DNN 1024-1024-256-256-2 1.3M y LSTM 256-256-256-2 1.3M Figure 6: Sequential segmentation to capture contemporary trends For specific model configuration of DNN, we apply ReLU as non- linearity on the input layer and hidden layer. Because our targeting problem is a binary classification, we choose softmax function as data because we can not foresee the behavior of future users in the output layer’s activation function. Furthermore, we train our reality, and 2) it is more manageable since each training utilize a experiments with Adam [9] using the following hyper-parameter smaller dataset compared with the original one. In this paper, we settings: learning rate α of 0.001, the exponential decay rate β 1 empirically set the period of x, y as 28 and 14. Eventually, we divide and β 2 of 0.9, and 0.999 and implement categorical cross-entropy our dataset into twelve pairs of subsets, and each pair has a month as the loss function. As for LSTM, we select RMSprop [18] as the of data as the training set and the afterward half month of data as optimizer with learning rate α of 0.001 and the exponential decay the testing set. rate ρ of 0.9 and use the same loss function as DNN. We illustrate the architecture in Table 4. 3.4 Deep Learning Frameworks Finally, in the training process, we use 30% of the data as the In this paper, we want to explore whether the usage of queries help validation set, and the mini-batch size is set to be 500 across all data us to distinguish potential buyers from other users. To achieve the sets. The results reported were obtained after 50 epochs of training goal, we adopted two types of deep learning structure: Deep Neural over the sample dataset. To avoid the simulation processes trapped Network (DNN) and Long Short-Term Memory Networks (LSTM) in a local minimum, we implement a callback mechanism which to predict a user’s purchasing intention. interrupts the simulation if the loss was unchanged over the last ten epochs. 3.4.1 Deep Neural Network, DNN. DNN is the most common model in the deep learning field. With the help of tuning parameters 4 SIMULATION RESULTS through gradient descent and a different activation function, it is possible to build up the optimal objective functions for targeting We evaluate the performance of PSAC with simulations utilizing the problems. Yahoo! Taiwan dataset. For each simulation, we select the training and testing data sharing the same segment and extract the set of 3.4.2 Long Short-Term Memory Networks, LSTM. LSTM is one of attributes under a given scenario. After constructing the input layer the extensions from vanilla RNN proposed by Hochreiter et al. [5] of each segment, we process the modeling on training data with 30% to deal with vanishing and exploding gradient problems. With the of instances as the validation set. Once finishing the training, we capacity of learning long-term dependencies, LSTM is widely used evaluate the model’s ability to predict testing data in the following and demonstrates decent performance on many recent studies. To sections. solve the problem of vanishing and exploding gradient, LSTM re- places the kernel of RNN structure with the idea of cell state, a 4.1 Performance Indicators representation of the state of input sequence transmitted informa- According to [7], as the increase in data skewness, ROC would be tion through the entire network. Moreover, LSTM demonstrates a misleading indicator. Since the imbalance in our data distribu- the capacity to add or remove information from the cell state by a tion is highly significant (22x-30x), we employ the F 1 score as the mechanism called gates. performance evaluation principle. 3.4.3 Hyper-parameter Setting and Implementation Details. To make Pre ∗ Rec sure that two models train under a similar condition, we construct F1 = 2 ∗ , (2) both models with the structure having an approximate number Pre + Rec of untrained nodes. For DNN, we build up a five layers network where Pre stands for the precision and Rec is the recall. with one input layer of 1024 nodes, three hidden layers (1024, 256, and 256 nodes particularly), and one output layer of 2 nodes while 4.2 Numerical Results our task is a binary classification. On the other hand, we construct In the experiment, we analyze the effect of query contents with the LSTM model as a four layers structure (one input layer of 256 different environmental settings. Table 5 and 6 describe the nu- nodes, two hidden layers of 256 nodes, and one output layer of 2 merical results of PSAC for purchase prediction under different nodes). Moreover, layers in both models are fully connected, and scenarios and deep learning framework. Explicitly, the simulation the input layer’s shape will match the size of the embedding layer results are computed by taking the average of all segments’ outcome. given in Section 2.3.3. As for the naive setting (under M = 1 and Moreover, the average improvement illustrates the enhancement use UBSbase ), both models deploy roughly 1.3M untrained weights between current input settings with baseline approach (UBS). Based during the training process. on the results, we made the following observations. SIGIR 2019 eCom, July 2019, Paris, France Wei-Cheng Chen and Chi h-Yu Wang, et al. Table 5: Prediction Performance (DNN, F 1 score) Table 6: Prediction Performance (LSTM, F 1 score) M fset µ =1 µ =2 µ =3 µ =4 µ = 5 improvement (avg.) M fset µ =1 µ =2 µ =3 µ =4 µ = 5 improvement (avg.) UBS 0.1154 UBS 0.1158 UBS+ 0.1181 +2.34% UBS+ 0.1167 +0.78% PSACa 0.1175 +1,82% PSACa 0.1163 +0.43% 1 1 PSACf 0.1420 +23.05% PSACf 0.1435 +23.92% PSAC+f 0.1429 +23.83% PSAC+f 0.1439 +24.27% UBS 0.1132 0.1109 UBS 0.1122 0.1092 UBS+ 0.1139 0.1115 +0.58% UBS+ 0.1128 0.1103 +0.77% PSACa 0.1133 0.1137 +1.31% PSACa 0.1135 0.1119 +1.82% 2 2 PSACf 0.1346 0.1360 +20.77% PSACf 0.1363 0.1322 +21.27% PSAC+f 0.1346 0.1369 +21.17% PSAC+f 0.1352 0.1345 +21.83% UBS 0.1103 0.0937 0.0930 UBS 0.1091 0.1068 0.1014 UBS+ 0.1131 0.1093 0.1071 +11.45% UBS+ 0.1106 0.1080 0.1004 +0.50% PSACa 0.1108 0.1113 0.1105 +12.68% PSACa 0.1081 0.1082 0.1017 +0.23% 3 3 PSACf 0.1317 0.1317 0.1334 +34.47% PSACf 0.1329 0.1274 0.1253 +21.56% PSAC+f 0.1310 0.1319 0.1344 +34.68% PSAC+f 0.1325 0.1303 0.1279 +23.20% UBS 0.1112 0.1061 0.1048 0.1010 UBS 0.1065 0.1066 0.0990 0.0892 UBS+ 0.1076 0.1072 0.1045 0.1028 -0.18% UBS+ 0.1080 0.1077 0.0978 0.0926 +1.26% PSACa 0.1069 0.1081 0.1070 0.1077 +1.69% PSACa 0.1068 0.1045 0.0991 0.0981 +2.10% 4 PSACf 0.1292 0.1296 0.1299 0.1319 +23.22% 4 PSACf 0.1302 0.1232 0.1201 0.1156 +22.18% PSAC+f 0.1283 0.1292 0.1305 0.1317 +23.02% PSAC+f 0.1299 0.1262 0.1226 0.1180 +24.12% UBS 0.1075 0.1048 0.1024 0.1006 0.1006 UBS 0.1045 0.1053 0.0968 0.0876 0.0822 UBS+ 0.1067 0.1055 0.1031 0.1023 0.1012 +0.58% UBS+ 0.1064 0.1045 0.0955 0.0904 0.0894 +2.33% PSACa 0.1053 0.1062 0.1050 0.1059 0.1063 +2.55% PSACa 0.1034 0.1023 0.0967 0.0962 0.0924 +3.64% 5 5 PSACf 0.1271 0.1274 0.1278 0.1262 0.1282 +23.50% PSACf 0.1293 0.1199 0.1152 0.1108 0.1067 +22.58% PSAC+f 0.1263 0.1257 0.1271 0.1283 0.1290 +23.46% PSAC+f 0.1274 0.1218 0.1178 0.1137 0.1089 +24.31% Table 7: Prediction Performance (all models, F 1 score) First of all, we discover a slight increase when we add basic query- related features (query length and query number, UBS+ ) to the baseline approach. As for the PSAC which utilizes context-related M fset model µ =1 µ =2 µ =3 µ =4 µ =5 features (PSACa and PSACf ), the performance of both methods PSAC+f DNN 0.1429 1 outperform the baseline outputs; however, their results are entirely PSAC+f LSTM 0.1439 different. If we take averaged values as the input features, the im- PSAC+f DNN 0.1346 0.1369 provement compared with the first approach is under 10%. On the 2 PSAC+f LSTM 0.1352 0.1345 other hand, if we extend the input layers with the complete queries information (PSACf ), the performance consistently surpasses the PSAC+f DNN 0.1310 0.1319 0.1344 3 baseline outcomes around 20% to 30%. We conclude that a vast PSAC+f LSTM 0.1325 0.1303 0.1279 amount of information is missing during the regularization pro- PSAC+f DNN 0.1283 0.1292 0.1305 0.1317 cedure. Furthermore, we can see that query information plays a 4 vital role in perceiving the user’s intention. Next, we observe that PSAC+f LSTM 0.1299 0.1262 0.1226 0.1180 the results demonstrate that the position attribute is another useful PSAC+f DNN 0.1263 0.1257 0.1271 0.1283 0.1290 5 factor. We recognize one to two percent increase after considering PSAC+f LSTM 0.1274 0.1218 0.1178 0.1137 0.1089 the position attribute on most of the session sets with different N_GRAM M. If we consider only the baseline approach and the enhanced the evolution of the user’s query contents globally and to capture version (UBS, UBS+ ), the prediction performance declines as the their intention more appropriately in an attention-like structure. length of the input sequence (µ) grow. After we take query-related Additionally, this phenomenon occurs in all M settings. Contrary features into account, the influences of µ on these two deep learning to DNN, the score of LSTM decreases when applying query-related frameworks are different. As for DNN model, once we construct features if we utilize more µ in training. While the rationale behind the input layer with query-related features (PSACa , PSACf , and RNN-based models pushes itself to put more focuses on closer PSAC+ f ), the model’s performance gradually raises as µ increases. steps, it is plausible that a more extended sequence might impair the Since DNN recognizes each action in the input sequence equally performance. For example, a user might determine his/her shopping during the training process, it is possible for us to comprehend list in the middle of the searching operations and continues to PSAC: Context-based Purchase Prediction Framework via User’s Sequential Actions SIGIR 2019 eCom, July 2019, Paris, France browse other commodities afterward. If the user did not pass the [10] Bernard Kubiak and Paweł Weichbroth. 2010. Cross- And Up-selling Techniques thought onto the next action, it is hard for LSTM to perceive the In E-Commerce Activities. Journal of Internet Banking and Commerce 15 (12 2010). conversion. Finally, we provide the comparison of DNN and LSTM [11] Rohan Kumar, Mohit Kumar, Neil Shah, and Christos Faloutsos. 2018. Did We Get with the PSAC+ f in Table 7. We can see that LSTM outperforms It Right? Predicting Query Performance in E-commerce Search. arXiv:1808.00239 [cs] (Aug 2018). http://arxiv.org/abs/1808.00239 arXiv: 1808.00239. DNN for shorter action sequences. However, as the users begin to [12] Caroline Lo, Dan Frankowski, and Jure Leskovec. 2016. Understanding Behaviors reshape their thoughts and to start a new searching action, DNN is that Lead to Purchasing: A Case Study of Pinterest. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining more capable in realizing the connection between each action and - KDD ’16. ACM Press, 531–540. https://doi.org/10.1145/2939672.2939729 query. [13] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs] (Jan 2013). http://arxiv.org/abs/1301.3781 arXiv: 1301.3781. 5 CONCLUSIONS [14] Yabo Ni, Dan Ou, Shichen Liu, Xiang Li, Wenwu Ou, Anxiang Zeng, and Luo Si. With the growing usage of e-commerce platforms, the analysis 2018. Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-commerce Tasks. arXiv:1805.10727 [cs, stat] (May 2018). http: of user’s purchase behavior has attracted attention in the field. //arxiv.org/abs/1805.10727 arXiv: 1805.10727. Through an integrated prediction framework, the numerical re- [15] Chanyoung Park, Donghyun Kim, Min-Chul Yang, Jung-Tae Lee, and Hwanjo Yu. 2017. Your Click Knows It: Predicting User Purchase through Improved User-Item sults in our study indicate that query-related features are essential Pairwise Relationship. arXiv:1706.06716 [cs] (Jun 2017). http://arxiv.org/abs/1706. in making purchase prediction on e-commerce platforms. To uti- 06716 arXiv: 1706.06716. lize query-related features correctly, we need to consider several [16] Michael Rose. [n.d.]. What Is Account-Based Marketing? https://www.forbes. com/sites/forbesagencycouncil/2017/11/01/what-is-account-based-marketing/ query’s attributes, such as user’s behavior in reshaping their ideas, [17] Timo Saari, Niklas Ravaja, Jari Laarni, Marko Turpeinen, and Kari Kallinen. the relationship between continuous queries, and the approach to 2004. Psychologically targeted persuasive advertising and product information represent query’s embedded components. These results also support in e-commerce. In Proceedings of the 6th international conference on Electronic commerce - ICEC ’04. ACM Press, 245. https://doi.org/10.1145/1052220.1052252 that both deep learning frameworks have their advantages; how- [18] Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5—RmsProp: Divide ever, DNN’s performance is more robust in capturing customer’s the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning. (2012). purchasing intention as the searching sequence grow. Furthermore, [19] Shuangfei Zhai, Keng-hao Chang, Ruofei Zhang, and Zhongfei Mark Zhang. 2016. the construction of session-based data structure is useful in storing DeepIntent: Learning Attentions for Online Advertising with Recurrent Neural customers’ behavior sequences, and we provide several mechanisms Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16. ACM Press, 1295–1304. in explaining the phenomenon of time-shifting in the real world. https://doi.org/10.1145/2939672.2939759 Such evidence should be of importance in e-commerce purchase [20] Meizi Zhou, Zhuoye Ding, Jiliang Tang, and Dawei Yin. 2018. Micro Behaviors: prediction. A New Perspective in E-commerce Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining - WSDM ’18. ACM Press, 727–735. https://doi.org/10.1145/3159652.3159671 ACKNOWLEDGMENTS This work was supported by the Ministry of Science and Technology under Grant MOST 105-2221-E-001-003-MY3 and the Academia Sinica under Grand Challenge Seed Project AS-GC-108-01. REFERENCES [1] [n.d.]. Global retail e-commerce market size 2014-2021. https://www.statista. com/statistics/379046/worldwide-retail-e-commerce-sales/ [2] Bijaya Adhikari, Parikshit Sondhi, Wenke Zhang, Mohit Sharma, and B. Aditya Prakash. 2018. Mining E-Commerce Query Relations using Customer Interac- tion Networks. In Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW ’18. ACM Press, 1805–1814. https://doi.org/10.1145/3178876. 3186174 [3] J.L. Beckmann, A. Halverson, R. Krishnamurthy, and J.F. Naughton. 2006. Ex- tending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format. (:unav) (2006). https://doi.org/10.1109/icde.2006.67 [4] Liliya Besaleva and Alfred C. Weaver. 2017. Classification of imbalanced data in E-commerce. In 2017 Intelligent Systems Conference (IntelliSys). IEEE, 744–750. https://doi.org/10.1109/IntelliSys.2017.8324212 [5] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (Nov 1997), 1735–1780. https://doi.org/10.1162/neco. 1997.9.8.1735 [6] Hong Huang, Bo Zhao, Hao Zhao, Zhou Zhuang, Zhenxuan Wang, Xiaoming Yao, Xinggang Wang, Hai Jin, and Xiaoming Fu. 2018. A Cross-Platform Consumer Behavior Analysis of Large-Scale Mobile Shopping Data. In Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW ’18. ACM Press, 1785–1794. https://doi.org/10.1145/3178876.3186169 [7] Laszlo A. Jeni, Jeffrey F. Cohn, and Fernando De La Torre. 2013. Facing Imbalanced Data–Recommendations for the Use of Performance Metrics. (:unav) (Sep 2013). https://doi.org/10.1109/acii.2013.47 [8] Ru Jia, Ru Li, Meiju Yu, and Shanshan Wang. 2017. E-commerce purchase predic- tion approach by user behavior data. (Jul 2017), 1–5. https://doi.org/10.1109/ CITS.2017.8035294 [9] Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti- mization. arXiv:1412.6980 [cs] (Dec 2014). http://arxiv.org/abs/1412.6980 arXiv: 1412.6980.