=Paper= {{Paper |id=Vol-2410/paper32.pdf |storemode=property |title=PSAC: Context-based Purchase Prediction Framework via User's Sequential Actions |pdfUrl=https://ceur-ws.org/Vol-2410/paper32.pdf |volume=Vol-2410 |authors=Wei-Cheng Chen,Chih-Yu Wang,Su-Chen Lin,Alex Ou,Tzu-Chiang Liou |dblpUrl=https://dblp.org/rec/conf/sigir/ChenWLOL19 }} ==PSAC: Context-based Purchase Prediction Framework via User's Sequential Actions== https://ceur-ws.org/Vol-2410/paper32.pdf
PSAC: Context-based Purchase Prediction Framework via User’s
                     Sequential Actions
                Wei-Cheng Chen                                              Chih-Yu Wang                                 Su-Chen Lin
       Research Center for Information                           Research Center for Information                       Verizon Media
      Technology Innovation, Academia                           Technology Innovation, Academia                        Taipei, Taiwan
                   Sinica,                                                   Sinica,                             suchenl@verizonmedia.com
              Taipei, , Taiwan                                          Taipei, , Taiwan
       jimmyweicc@citi.sinica.edu.tw                               cywang@citi.sinica.edu.tw

                                                    Alex Ou                                   Tzu-Chiang Liou
                                              Verizon Media                                     Verizon Media
                                              Taipei, Taiwan                                    Taipei, Taiwan
                                        oualex@verizonmedia.com                           tcliou@verizonmedia.com
ABSTRACT                                                                              CCS CONCEPTS
Along with the daily operation of e-commerce web services, a sig-                     • Information systems → Electronic commerce; Recommender
nificant quantity of data has been recorded. The research of user’s                   systems; Data analytics; • Applied computing → Online shop-
behaviors based on the collected data has generated intense atten-                    ping;
tion for accurately offering services that can match the customer’s
needs and predict the purchase actions. Traditionally, most of the                    KEYWORDS
researches utilize only the behavioral instances between users and                    user behavior analysis, e-commerce purchase prediction, deep learn-
products, i.e., browse or click history, and session status. However,                 ing
these features provide only a fundamental knowledge of the given
user rather than the rationale behind their actions. We find that                     1 INTRODUCTION
query should play an important role as well as it is the main entry
                                                                                      1.1 Background
point for users when arriving e-commerce website. Since users
utilize queries to decide the direction of succeeding event, the se-                  Along with the Internet evolution, electronic commerce (e-commerce)
mantic meanings of these queries demonstrate a particular link                        has generated enormous interests in the past few years. Global retail
with the action.                                                                      e-commerce industry’s market sales have recorded around 20.77%
   In this paper, we propose the Prediction framework that ana-                       of compound annual growth rate (CAGR) from 2014 to 2018 and
lyzes User’s Sequential Actions via Context (PSAC) to exploit the                     are looking forward to sustaining the growth pace until 2021 [1].
connection between the user’s searching keywords and behaviors                        To cope with increasing competitions, companies and researchers
to investigate their ultimate intention on an e-commerce website                      race to develop strategies for matching customers’ needs and for
and improve the purchase prediction accuracy. We utilize the e-                       optimizing companies’ profit. Some examples are personalized ad-
commerce dataset provided by Yahoo Taiwan, one of the largest web                     vertisement [17], upselling to existing customers [10], exclusive
services provider in Taiwan. According to our preliminary analysis,                   offers, and account-based marketing [16]. However, these solutions
we design a session-based structure to deal with the environment-                     often require the ability to perceive the customer’s intention to
shifting (influenced by coexisting fashion), and experience-shifting                  gain advantages in the competition from other platforms.
(changed through user’s actions) issues which we observed in the                         Recently, researchers explored the possibility to identify user’s
dataset. In the simulation section, we apply two deep learning                        intention through the interactions between users and products,
frameworks to perform the prediction task. Experimental results                       such as query terms and browsing/purchasing history. Query term,
confirm that queries serve as a critical matter in perceiving a user’s                for instance, would roughly illustrate the needs the user would like
purchasing intention. Moreover, the proposed framework could sig-                     to be satisfied. When the intent of the user could be identified, the
nificantly improve the prediction accuracy compared with baseline                     system could redirect the users to proper products recommenda-
methods.                                                                              tions. For example, Adhikari et al. [2] employ Customer Interaction
                                                                                      Networks (CIN) to investigate the relationship between queries and
                                                                                      indicate that these relations can significantly improve searching
                                                                                      quality. On the other hand, Kumar et al. [11] utilize user interactions
Copyright © 2019 by the paper’s authors. Copying permitted for private and academic
purposes.                                                                             data on a mobile application, such as activities properties, dwell
In: J. Degenhardt, S. Kallumadi, U. Porwal, A. Trotman (eds.):                        time, and query contents, to learn the quality of search engine’s
Proceedings of the SIGIR 2019 eCom workshop, July 2019, Paris, France, published at   feedback.
http://ceur-ws.org
                                                                                         Despite query, researchers also studied other types of behav-
                                                                                      iors we can collect and analyze in e-commerce websites, such as
                                                                                      clickstream, product quality, and user profiles. In particular, Huang
SIGIR 2019 eCom, July 2019, Paris, France                                                      Wei-Cheng Chen and Chi h-Yu Wang, et al.


et al. [6] examine mobile phones users’ behavior across different           (2) We investigate the behavior of the real-world user on e-
e-commerce platforms. In their study, they claim that purchase deci-            commerce activity that provides further information in pre-
sions tend to happen promptly, and spatiotemporal factors such as               dicting the user’s purchase intention.
time, regions, and platforms, influence shopping behaviors. Even-           (3) We confirm that query factors are critical in intent and ac-
tually, their simulation indicates that user’s e-commerce platform              tion predictions, which can be utilized in future works and
preference is predictable. Similarly, Zhou et al. [20] offer that it            industries application.
is possible to improve prediction performance by utilizing micro
behaviors such as comments, carting, and ordering. Ni et al. [14]          The remainder of the paper proceeds as follows. Section 2 con-
aim at constructing a universal user representation via multiple        sists of three parts: a description of the datasets, the investigation
e-commerce tasks. After modeling behaviors’ and items’ features         of data, and the preprocessing procedures. Section 3 describes the
into sequences of user behaviors, they employ deep Recurrent Neu-       implementation of e-commerce purchasing prediction system via
ral Network (RNN) architecture and attention-based techniques           time-shifting scheme and the deep learning frameworks. Finally,
for multi-task learning for ten days of data. Additionally, Lo et       Section 4 examines the simulation results, and Section 5 presents
al. [12] utilize extended real-world data to build a cross-platform     the conclusions.
analysis of user behavior for more general purposes. They discover
evidence that purchase intent will gradually increase through time
and significantly soar three days before purchase. Remaining stud-      2     DATASETS EXPLORATION
ies discuss topics including the sparseness in e-commerce context       In this paper, we use a dataset provided by Yahoo Taiwan, which
[8], exploration of user-item pairwise relationship [15], and online    contains a variety of behaviors of its e-commerce platform users.
advertising [19], et cetera.                                            Most importantly, the data not only document user’s activities
                                                                        on the platform but also save their searching queries, a textual
1.2    Rationale of Our Approach                                        expression used by the customers to find the products they want.
From previous literature, we see that the researches in purchase
intention prediction vary mainly with dataset’s attributes. Never-      2.1    E-commerce Dataset Preparation
theless, the approaches that employ queries to perceive purchase        Our sample dataset consists of real-world transaction records from
intention is still scarce. In this study, we collaborate with Yahoo     the e-commerce platform which spans six months. Specifically,
Taiwan which incorporates its e-commerce platform’s data. In par-       the dataset consists of three types of information: click, view, and
ticular, we collect the dataset describing various behaviors made       buy, of over a million unique user’s online actions, including 12M,
by its users, e.g., actions (click, view, buy), connection properties   10M, and 1.2M records individually. Both click and view records
(timestamp, time spent), and the most critical piece of information -   specify the query terms the user entered before such actions. For
entered queries. Different from most existing researches, this paper    further merging procedure, we label each record in the dataset with
focuses on the construction of user representation via their online     a session id s which specifies the belonging connection. The dataset
actions to cope with e-commerce purchase prediction. Moreover,          contains the following information:
we utilize the relationship between users’ textual inputs and their
actions to realize his/her intention. Furthermore, we propose that
users behavior are usually time-sensitive in two manners, that is,      2.1.1 Click Records. Each click action represents the click action
environment-shifting and experience-shifting. These phenomena il-       of a specific user after he/she enters an arbitrary searching query.
lustrate the evolution of the user’s actions through the purchase       Explicitly, numerous information will be stored, i.e., the user’s id,
activity, which occurs in actual e-commerce platforms.                  entered queries, product’s id, and the time that the action occurs.
   According to the above observations, we establish Prediction         Because the click behavior demonstrates a one-to-one connection
framework that analyzes User’s Sequential Actions via Context-          between entered queries and product, it can be seen as a strong
based information (PSAC), a prediction system based on session-         assumption that the user has a great interest in the product.
based representation which records the behavioral sequence of
each user. Meanwhile, we apply corpus embedding techniques to           2.1.2 View Records. Similar to click records, viewing data record
encode the latent information of entered queries and derive related     the same set of features as clicks data. Nevertheless, there is a slight
attributes from the sequence of entered queries. Finally, the system    difference in the product’s attributes. In each record, view actions
utilizes two deep learning frameworks to analyze, store, and predict    stand for the operation of a user skimming through a shopping page.
a user’s intention under different scenarios. The simulation results    Compared with click actions which cover only a single product, a
indicate that the sequence of query contents is an essential feature    view record saves plural items on a single page after searching for
in learning user’s shopping intention and points out the difference     arbitrary keywords.
between the two frameworks. Our main contributions in this paper
include:                                                                2.1.3 Buy Records. Finally, buy dataset describes each purchase
   (1) We implement a Prediction framework that analyzes User’s         occurrence that happens during the targeting period. As for the
       Sequential Actions via Context-based information, which          data information, each buy record consists of all previous attributes
       predicts user’s purchase intention via query-based informa-      except for the entered query. We list each dataset’s attributes in
       tion and their corresponding actions.                            Table 1.
PSAC: Context-based Purchase Prediction Framework via User’s Sequential Actions                                                                                       SIGIR 2019 eCom, July 2019, Paris, France

                          Table 1: Features in each dataset                                                           see that there is a similar distribution across six months. Around
                                                                                                                      30% of queries appear at least two times, 20% and 13% of queries
 Property                     Symbol           Explanation                     Click View Buy                         occur at least three and four times, and lastly, 10% of queries appear
                                                                                                                      five or more times. The figures are not only credible that most users
 user id                      u                user ID of current o                           o            o
                                               session
                                                                                                                      use an independent query in searching actions but also demonstrate
 session id                   s                session ID of cur- o                           o            o          the diversity of e-commerce searching. Moreover, we define Space
                                               rent session                                                           Ratio in 1(b) to illustrates the query’s coverage ratio. For instance,
 start time                   ts               starting time of o                             o            o          SRm,5 computes the coverage ratio of queries that appear at least
                                               current action                                                         five times in month m. We can see that across all months, although
 query                        q                entered query or o                             o            x          CRm,C diminishes when we increase C, the value of corresponding
                                               queries of current                                                     SRm,C retain 80% of the corpus space. The results indicate that
                                               action                                                                 users on e-commerce platforms tend to reuse queries during their
 product id                   p                product ID of cur- o                           o            o          searching events. Furthermore, it is conceivable and economical
                                               rent action
                                                                                                                      to learn the figuration of e-commerce queries data via a compact
                                                                                                                      dataset by only preserving the queries with multiple counts.
2.2      Exploratory Data Analysis                                                                                    2.2.2 Action Behavior. Customers’ action exhibits various clues
To obtain a brief knowledge of users’ behavior on Yahoo Taiwan                                                        as well. Figure 2 describes the time difference between the start of
e-commerce platform, we conduct initial analysis on three top-                                                        action (click/view) with the action to add products into the cart.
ics: (i) query usage, (ii) action behavior, and (iii) session statistics.                                             Specifically, the difference is computed via action’s ts and corre-
Furthermore, we examine the analysis under monthly-basis.                                                             sponding buy’s ts (after matching with the same u and p).
2.2.1 Query Usage. First of all, we find out that for each month
                                                                                                                              Difference between click and buy                                Difference between click and buy
in the dataset, the amount of unique queries ranges from 210K to                                                                        (in minutes)                                                      (in hours)
260K. Around 70% of queries were found unrepeated in the dataset.                                                     40.0%                                                           80.0%
                                                                                                                      30.0%                                                           60.0%
Beckmann et al. [3] also state this phenomenon that users tend                                                        20.0%                                                           40.0%
                                                                                                                      10.0%                                                           20.0%
to define personal keywords to describe the products. While an                                                         0.0%                                                            0.0%
                                                                                                                              0-10   10-20    20-30       30-40       40-50   50-60            <1    1-3       3-6       6-12       12-24   > 24
extensive data like this is possible for causing oversize dimension
                                                                                                                                      1   2    3      4    5      6                                  1     2   3     4    5     6
problems during the embedding process, we set two terms to tackle                                                                              (a)                                                             (b)
the issue, i.e., Count Ratio (CRm,C ) and Space Ratio (SRm,C ) of each
                                                                                                                              Difference between view and buy                                 Difference between view and buy
month m. We represent the calculation in Eq. (1a) and (1b) and the                                                                      (in minutes)                                                      (in hours)
                                                                                                                      40.0%                                                           80.0%
bar charts in Figure 1(a) and 1(b).                                                                                   30.0%                                                           60.0%
                                                                                                                      20.0%                                                           40.0%
                                                                                                                      10.0%                                                           20.0%
                          number of qi in month m with c i,m ≥ C                                                       0.0%                                                            0.0%
        CRm,C =                                                   ,                                            (1a)           0-10   10-20    20-30       30-40       40-50   50-60            <1    1-3       3-6       6-12       12-24   > 24
                                           nm                                                                                         1   2    3      4    5      6                                  1     2   3     4    5     6
                                                  Ínm
                                                       σi ∗ c i,m                                                                              (c)                                                             (d)
                                         SRm,C = i=1
                                                   Ínm            .                                            (1b)
                                                      i=1 c i,m                                                       Figure 2: Different behavior of Click and View with the fol-
where nm is the amount of unique query in m, Qm = {qi | i =                                                           lowing carting.
1...nm } is the set of unique queries in m, c i,m is the number of
occurance of qi in m, σi = 1 if qi in month m with c i,m ≥ C else                                                        Furthermore, for both click and view actions, we separate the
σi = 0, and C is a predefined constant ranges from one to five.                                                       figures into minutes and hours sections. Additionally, the first bars
                                                                                                                      within 2(b) and 2(d) include all elements of 2(a) and 2(c). Comparing
        Count ratio (CR) of each month                           Space ratio (SR) of each month
                                                                                                                      2(a) with 2(c), we can see that around 35% of click actions have a
40.0%                                                   100.0%

                                                        80.0%
                                                                                                                      10 minutes gap with the following buy action. On the other hand,
30.0%
                                                        60.0%
                                                                                                                      nearly 20% of view actions lead to a buy action within 10 minutes.
20.0%
                                                        40.0%
                                                                                                                      Moreover, a significant difference between 2(b) and 2(d) locates at
10.0%
                                                        20.0%                                                         the last bars, which depicts the portion of users carting the products
 0.0%                                                    0.0%                                                         after twenty-four hours. For view actions, there is a slight excess
         C >= 2       C >= 3          C >= 4   C >= 5             C >= 2        C >= 3            C >= 4   C >= 5
                                                                                                                      in carting after a day compare to click actions. These observations
                  1   2   3       4   5   6                                1    2   3     4       5   6

                          (a)                                                       (b)
                                                                                                                      suggest two arguments: (i) A click action is faster to induce a buy
                                                                                                                      action than a view action; and (ii) An user who is still browsing
Figure 1: Count ratio CRm,C and Space ratio SRm,C under                                                               the platform might require more time to make the purchase deci-
month m and Count C.                                                                                                  sion. These indications are useful for companies to design different
                                                                                                                      strategies for drawing customers’ intent.
   In 1(a), each segment in the chart represents the number of                                                        2.2.3 Session Statistics. Lastly, we investigate the number of queries
different query qi occurs more than C times in month m. We can                                                        used by users having different intention in this section. To begin
SIGIR 2019 eCom, July 2019, Paris, France                                                                                  Wei-Cheng Chen and Chi h-Yu Wang, et al.


with, we combine our click and view records via session’s id s and                                  the Web industry, a session defines the login event after a user
mark each action as purchased or non-purchased according to the                                     connects to the hosting server. Specifically, we record all of the
buying dataset. Details for merging will be presented in Section                                    activities during a given session to illustrate the user’s behavior,
2.3. Afterward, we divide all constructed sessions into two parts:                                  which is beneficial for learning the development of their intentions
purchased sessions and non-purchased sessions. Figure 3 describes                                   dynamically. According to our dataset, Yahoo Taiwan specifies its
the related information.                                                                            session as a fixed period after any user creates a connection to their
                                                                                                    server. If an individual user browses the pages more than the given
                          Length of a session (All session)                                         period, they will create multiple sessions according to the actual
50.0%
40.0%                                                                                               spending time.
30.0%
20.0%                                                                                                  The reason why we construct the data on a session-level is for
10.0%
 0.0%                                                                                               the rationale behind searching conventions. In the real world, a
        L=1   L=2   L=3      L=4           L=5    L = 6~10 L = 11~20 L = 21~30 L = 31~50   others
                                                                                                    user’s searching habits and related actions are usually regarded as
                                   1   2    3 4     5   6
                                            (a)                                                     time-sensitive, closely connected with prior and latter instances.
                          Length of a session (buy=True)
50.0%
                                                                                                    This phenomenon manifests in two aspects: (i) experience-shifting
40.0%
30.0%
                                                                                                    and (ii) environment-shifting. As for the first concept, it is con-
20.0%                                                                                               vinced that human often generates interests as the time shift. For
10.0%
 0.0%                                                                                               example, when a customer is entering a shop, he or she might
        L=1   L=2   L=3      L=4           L=5    L = 6~10 L = 11~20 L = 21~30 L = 31~50   others
                                   1   2    3 4     5   6
                                                                                                    initially search for the thing using a general description, such as
                                            (b)                                                     shoes, bags, or computer. Typically, people take these actions as
                          Length of a session (buy=False)
50.0%                                                                                               wandering and waiting; they either describe fewer details about the
40.0%
30.0%                                                                                               item needed in the first place or did not realize the subject they are
20.0%
10.0%                                                                                               searching. Furthermore, they might want to compare the same kind
 0.0%
        L=1   L=2   L=3      L=4           L=5    L = 6~10 L = 11~20 L = 21~30 L = 31~50   others
                                                                                                    of products manufactured by a different company. Such actions
                                   1   2    3 4     5   6                                           are likely to last for a while until a suitable commodity is found.
                                            (c)
                                                                                                    Usually, customers tend to check their favorite brand or look for
                                                                                                    specific functionality and appearance. In addition, some buyers will
Figure 3: Length between different type of session. L stands
                                                                                                    calculate their affordable budget. For example, a user who start the
for the number of actions in the given session.
                                                                                                    search of bags might end up checking Saint Laurent leather shoulder
                                                                                                    bag or backpack 13’ notebook waterproof. For consumers already
   It is obvious to see the difference between 3(b) and 3(c) that                                   have a preference, they are likely to finish the previous process in
about 40% of non-purchased users use only one query during their                                    lesser time. Also, the behavior exists not only in off-line business
session. On the contrary, only 15-25% of purchased users use one                                    but also the online platform. With the advantage of technology,
query through their session. Moreover, for users who purchase                                       customers have more resources and information from the Internet,
the product eventually, around 50% of the customers generate a                                      which makes the experience cycle more transient. Moreover, this
medium or large sequence of actions (L ≥ 6) within a single session.                                process often accompanies with several clicking actions which help
The appearance of session length demonstrates that customers who                                    the user for getting more product details. As for the second concept,
might make the deal at the end have more thoughts and reactions                                     we will describe the rationale in Section 3.3.
to achieve their goal in given operation time.                                                         To construct a session-level behavior sequence, we perform sev-
   Like most financial issues, one of the primary problem in e-                                     eral preprocessing steps on the raw data. At first, we need to define
commerce purchase prediction is the existence of imbalance data                                     two groups of features from our dataset: the first set describes user’s
[4]. Typically, most of the platform users have no absolute intention                               actions after establishing a connection. The other group of features
before they start searching for products. It is more probable that                                  indicates the textual contents of users’ queries, which requires Nat-
they were either entertained by other websites’ commercial or kept                                  ural Language Preprocessing (NLP) techniques to constructs the
pondering over the purchasing decision. As a result, a majority of                                  latent representation.
users leave the website without making a purchase. For example,
the size of non-purchased records in our dataset is 22 to 30 times
the size of purchased records. Without balancing the training data,                                 2.3.1 User Behavior Sequence (UBS). To illustrate history move-
preliminary results are likely to be overrated. Therefore, we perform                               ments after each user logs onto the server, we transform the session-
the random over-sampling approach on the training data, that is,                                    based data into the User Behavior Sequence (UBS) to represent each
randomly multiplying true cases until the number of both cases are                                  session’s properties. During the merging process, we first define the
similar. As for the testing set, we do not balance the data since it                                click and view records as primary actions and the buy records as
needs to fit the circumstance in the real world.                                                    supportive information. Then, we combine primary actions having
                                                                                                    the same session id s and sort all of the records by primary actions’
2.3     Data Preprocessing                                                                          start time ts . As for the supportive information, we employ the
                                                                                                    time information to check whether each action leads to a purchase
To utilize the e-commerce data, we illustrate the procedure to merge
                                                                                                    event. If any buy record’s start time ts is less than any primary
separate datasets in this section. We first utilize the session id s
                                                                                                    action’s start time, and they share the same user id u and product
to merge the action data into a unified user behavior dataset. In
PSAC: Context-based Purchase Prediction Framework via User’s Sequential Actions                       SIGIR 2019 eCom, July 2019, Paris, France


id p, we mark the primary activities afterward as purchased. More-                                Table 2: Advanced terms
over, we ignore repeated primary actions, which defined as two
consecutive actions having a time gap of fewer than 5 seconds. In           Property              Symbol Explanation
general, this phenomenon might be a consequence of the network
                                                                            time gap                td     The difference between current action (i)’
error or double-clicking rather than actual intent. For each logged                                        and previous action (j)’s starting time: t si
user, we also sort his/her purchasing records by start time before                                               j
                                                                                                           and t s . For the last action in each UBS,
the merging process.                                                                                       we assign a dummy value τ as the differ-
                                                                                                           ence.
2.3.2 Context-aware Latent Representation. After constructing the           action type             βi     The type of current action, either click
structure of UBS, we turn our attention to the query information                                           or view.
in the dataset. To utilize the semantic features of each query, we
                                                                            query length            l i1   The length of the entered query q i .
perform several NLP techniques to construct queries’ latent repre-
                                                                            query amount            l i2   The number of the terms after q i been
sentation and integrate the features with UBS.                                                             tokenized.
    First, we would like to construct the query corpus from all se-         word2vec               w qi    A value represents the word2vec layer
quences in UBS. For each sequence, we concatenate each primary                                             of given query q i . Based on the com-
action’s entered queries into a sequence and tokenize the sequence                                         putation approach, denoted as w av д or
using white space. Besides, there is no stop words replacement                                             w f ul l .
or term normalization (stemming and lemmatization) during the               word2vec similarity     ws     A value represents the similarity of
preprocessing since the existent of redundant inputs are typical be-                                       word2vec layer between current action
havior in e-commerce. After these operations, we build up a corpus                                         (i) and previous action (j): w qi and w q j .
consists of over 165M searching sequences in a more structured                                             It is computed with the cosine similarity.
                                                                                                           If current action is the first action, set the
status. Nevertheless, the corpus space is too large to utilize which
                                                                                                           value as one.
requires further processing. To generate the embedded representa-           word2vec variance       wv     A value represents the variance of a input
tion, we trained through word2vec [13] to project the input corpus                                         sequence’s all entered queries’ word2vec
into a lower dimension to recognize the latency representation.                                            layer. If the input sequence has one ac-
Based on the conclusion in Section 2.2.1, we only take the word                                            tion only, set the value as zero.
which occurs at least five times in the corpus into consideration.
                                                                            position               pos     A value represents current action’s posi-
Furthermore, we empirically set the target dimension as 100, which                                         tion within the belonging session.
is significantly smaller than the original vector spaces.
    After generating the representation of current corpus, we ex-
tend the constituents of UBS and compute several context-related
attributes. For each occurence of searching, we assign two query-              (4) PSACf : each record is the concatenation of UBS+ , w f ull , w s ,
related features (i) query length li1 and (ii) query amount li2 . As for           and wv ;
the query contents, we concatenate the embedded representation                 (5) PSAC+f
                                                                                          : each record consists of PSACf and pos.
of linked qi via two approaches. The first approach is to append
                                                                              Since the main purpose of this paper is to probe the effectiveness
the average of complete semantic embedding while the second ap-
                                                                           of query-based dataset in perceiving user’s purchase intention, we
proach considers the representation thoroughly, denoted as w avд
                                                                           ignore context-related features in the baseline method. For the sec-
and w f ull respectively. Since we like to investigate whether each
                                                                           ond and third combination, we consider several query-related fea-
action of the user gradually affects their final decision, we further
                                                                           tures, including query length and amount, latency representation,
define a parameter pos to state the query’s position inside the linked
                                                                           similarity, and variance. Moreover, we apply different computation
session. For a latter position, it is plausible that the user has a more
                                                                           approach to incorporate the semantic values. Finally, the position
precise thought on the product he/she wants and a more thorough-
                                                                           attribute is added in the vector to examine the process that the user
going intention in making the orders or not. The definitions of
                                                                           reshape their thoughts during the connection.
advanced terms and context-related attributes are demonstrated in
Table 2.
                                                                           3    PURCHASING PREDICTION SYSTEMS VIA
2.3.3 The Input Layers Settings. To understand the efforts of each              TIME SHIFTING STANDARD
feature, we construct the input layer under the following context:
                                                                           In this section, we propose the Prediction framework that analyzes
For the baseline in comparison, we only consider UBS’s standard
                                                                           User’s Sequential Actions via Context (PSAC). Firstly, we define
features, i.e., time gap td and the action type βi . As for remain-
                                                                           several situations to apprehend human’s behavior and the con-
ing settings, we concatenate the baseline feature vector with the
                                                                           cept of time-shifting mentioned in previous sections, including (i)
following combination of advanced terms.
                                                                           experience-shifting and (ii) environment-shifting. Specifically, we
   (1) UBS: each record consists of n of actions and the time the          will consider the evolution of queries reshaping and a data-process
       user spent (td + βi );                                              that takes contemporary fashion into account. For the experiments,
   (2) UBS+ : each record is the concatenation of UBS, li1 , and li2 ;     we apply two deep learning models: (i) Deep Neural Network (DNN)
   (3) PSACa : each record is the concatenation of UBS+ , w avд , w s ,    and (ii) Long Short-Term Memory Networks (LSTM), where we
       and wv ;                                                            observed opposite views between two frameworks.
SIGIR 2019 eCom, July 2019, Paris, France                                                                                          Wei-Cheng Chen and Chi h-Yu Wang, et al.

  Raw Data       Concatenation   Sequential Segmentation           UBS & QLR             Input and Embedding Layer                                                   DL Layer
                    (UBS)                                                               (N_GRAM and SUB_GRAM)
                                                                                                                                                      1
                                                                                                                                                                            DNN
                                                                                       Baseline                                                                                                           True/False
                                                                                                                                                      2                         LSTM




                                       Train
   View




                                         Train
                                                       query                               Query




                                           Train
      Click
                                                                                                                                                      3                     …                              True/False
                                                       action                                    PSAC
                                                                                                                                                      4
                                       Test                                                                                                                                 DNN
                                         Test
                                                                                                       position                                                                                            True/False
                                           Test                                                                                                                                 LSTM
      Buy                                                  label                                                                                      5




                                                                       Figure 4: Framework of PSAC


3.1         Framework overview                                                                                    action 1          action 2              action 3              action 4              action 5


In short, we illustrate the framework of PSAC in Figure 4. We
                                                                                                                                                                                                      action 4
first combine the view and click instances with session id s. The                                                            1      action 1              action 2              action 3
                                                                                                                                                                                                      (target)

preliminary output will be used to construct the UBS. Next, to deal                                                                                                                                   action 5
                                                                                                                             2      action 2              action 3              action 4
with environment-shifting, we apply sequential segmentation and                        N_GRAM = 4
                                                                                                                                                                                                      (target)
                                                                                                                                                           1                                                            2
divide the data into training and testing sets to simulate the actual
                                                                                        SUB_GRAM = 1                                           action 4                                                  action 5
condition. For each segment, we extract two parts of data, namely,
                                                                                       SUB_GRAM = 2                                action 3    action 4                                    action 4      action 5
query corpus and action sequences. As for the query, we apply
                                                                                                                        action 2   action 3    action 4                     action 3       action 4      action 5
context embedding techniques to project the corpus into lower                          SUB_GRAM = 3



dimensional representation. As for action, we label each action as                     SUB_GRAM = 4      action 1       action 2   action 3    action 4        action 2     action 3       action 4      action 5

purchased or non-purchased based on the buy dataset’s start time t 1 ,
user id u, and product id p. After merging each action in UBS with                     Figure 5: N_GRAM M and SUB_GRAM µ: A sketch of the
its corresponding textual feature, we compute several advanced                         experience-shifting when M equals four.
terms, e.q., similarity, variance, and position attributes. Finally, we
deal with the imbalance in e-commerce data and establish different
                                                                                                                     Table 3: Number of subsequences
scenarios to investigate the effect of query-based features. In the
experiment, we implement the proposed system under two deep
learning frameworks (DNN and LSTM) for purchase prediction.                                  M                           Training                                      Testing
                                                                                             1                           32,848,118                                    14,077,765
                                                                                             2                           26,045,458                                    11,162,339
                                                                                             3                           21,767,795                                    9,329,055
3.2         Experience-shifting of UBS                                                       4                           18,653,252                                    7,994,251
Recall that in the real world, customers shape their idea through                            5                           16,241,526                                    6,960,654
time and experience when searching on the internet. With the
session-based data structure, we use the following settings to con-
sider this property. After each construction of the input layer, we                    3.3         Environment-shifting of UBS
thus define two parameters: N_GRAM (M) and SUB_GRAM (µ)                                We then introduce the idea of environment-shifting, that is, the
where µ = 1...M. With a given M, we first gather UBS with a                            latest trend or craze toward certain products in the market. Practi-
length of n where n ≥ M. Then, each UBS will be separated into                         cally, people are likely to put focuses on the hottest products or the
n − M + 1 of subsequences, and we denote the set of subsequences                       latest fashion in the market. Furthermore, companies often enlarge
as UBSn,M . Also, each subsequence owns n actions. Furthermore,                        the current by releasing commercials or discounts whenever they
we regard each instance in UBSn,M as an independent action series.                     want to attract potential consumers. To simulate the real condition,
Meanwhile, the final position in each series is the prediction target                  we utilize the rolling-based approach to train the model separately.
for purchase intention. We illustrate the construction of UBS5,4 in                    Explicitly, we segment the data into several pairs of subsets for the
Figure 5; we can see that a UBS with five actions separates into two                   simulation: the first subset will cover x days of data to build UBS as
subsequences if M is four (action1−4 and action2−5 ).                                  the training set, and the other defines consecutive y days of data to
   Secondly, both subsequences generate four action series and                         construct UBS as the testing set. We execute the training process
each series consist of µ = 1, 2, 3, 4 continuous actions. Such a de-                   separately until all segments were done and aggregate the results
sign is capable for us to understand whether more actions benefit                      to get the whole pictures. We demonstrate the process in Figure 6.
the proposed system in learning user’s intention. Eventually, the                         Moreover, we balance the training set to avoid overfitting and
number of subsequences built from our data is represented in Table                     remain the imbalance in the testing set to fit reality. In sum, the
3.                                                                                     advantages of sequential segmentation are 1) it is more reasonable
                                                                                       for companies to use prior data as the training set of following
PSAC: Context-based Purchase Prediction Framework via User’s Sequential Actions                   SIGIR 2019 eCom, July 2019, Paris, France

          Start                                                   End             Table 4: Structure of DNN and LSTM models
                  x

Training set                    . . . . . .                               Model         Layer structure                # untrained weights

Testing set                             . . . . . .                       DNN           1024-1024-256-256-2            1.3M
                      y                                                   LSTM          256-256-256-2                  1.3M

Figure 6: Sequential segmentation to capture contemporary
trends                                                                      For specific model configuration of DNN, we apply ReLU as non-
                                                                         linearity on the input layer and hidden layer. Because our targeting
                                                                         problem is a binary classification, we choose softmax function as
data because we can not foresee the behavior of future users in          the output layer’s activation function. Furthermore, we train our
reality, and 2) it is more manageable since each training utilize a      experiments with Adam [9] using the following hyper-parameter
smaller dataset compared with the original one. In this paper, we        settings: learning rate α of 0.001, the exponential decay rate β 1
empirically set the period of x, y as 28 and 14. Eventually, we divide   and β 2 of 0.9, and 0.999 and implement categorical cross-entropy
our dataset into twelve pairs of subsets, and each pair has a month      as the loss function. As for LSTM, we select RMSprop [18] as the
of data as the training set and the afterward half month of data as      optimizer with learning rate α of 0.001 and the exponential decay
the testing set.                                                         rate ρ of 0.9 and use the same loss function as DNN. We illustrate
                                                                         the architecture in Table 4.
3.4       Deep Learning Frameworks                                          Finally, in the training process, we use 30% of the data as the
In this paper, we want to explore whether the usage of queries help      validation set, and the mini-batch size is set to be 500 across all data
us to distinguish potential buyers from other users. To achieve the      sets. The results reported were obtained after 50 epochs of training
goal, we adopted two types of deep learning structure: Deep Neural       over the sample dataset. To avoid the simulation processes trapped
Network (DNN) and Long Short-Term Memory Networks (LSTM)                 in a local minimum, we implement a callback mechanism which
to predict a user’s purchasing intention.                                interrupts the simulation if the loss was unchanged over the last
                                                                         ten epochs.
3.4.1 Deep Neural Network, DNN. DNN is the most common model
in the deep learning field. With the help of tuning parameters           4     SIMULATION RESULTS
through gradient descent and a different activation function, it is
possible to build up the optimal objective functions for targeting       We evaluate the performance of PSAC with simulations utilizing the
problems.                                                                Yahoo! Taiwan dataset. For each simulation, we select the training
                                                                         and testing data sharing the same segment and extract the set of
3.4.2 Long Short-Term Memory Networks, LSTM. LSTM is one of              attributes under a given scenario. After constructing the input layer
the extensions from vanilla RNN proposed by Hochreiter et al. [5]        of each segment, we process the modeling on training data with 30%
to deal with vanishing and exploding gradient problems. With the         of instances as the validation set. Once finishing the training, we
capacity of learning long-term dependencies, LSTM is widely used         evaluate the model’s ability to predict testing data in the following
and demonstrates decent performance on many recent studies. To           sections.
solve the problem of vanishing and exploding gradient, LSTM re-
places the kernel of RNN structure with the idea of cell state, a        4.1     Performance Indicators
representation of the state of input sequence transmitted informa-       According to [7], as the increase in data skewness, ROC would be
tion through the entire network. Moreover, LSTM demonstrates             a misleading indicator. Since the imbalance in our data distribu-
the capacity to add or remove information from the cell state by a       tion is highly significant (22x-30x), we employ the F 1 score as the
mechanism called gates.                                                  performance evaluation principle.
3.4.3 Hyper-parameter Setting and Implementation Details. To make
                                                                                                         Pre ∗ Rec
sure that two models train under a similar condition, we construct                                F1 = 2 ∗         ,                         (2)
both models with the structure having an approximate number                                              Pre + Rec
of untrained nodes. For DNN, we build up a five layers network           where Pre stands for the precision and Rec is the recall.
with one input layer of 1024 nodes, three hidden layers (1024, 256,
and 256 nodes particularly), and one output layer of 2 nodes while       4.2     Numerical Results
our task is a binary classification. On the other hand, we construct     In the experiment, we analyze the effect of query contents with
the LSTM model as a four layers structure (one input layer of 256        different environmental settings. Table 5 and 6 describe the nu-
nodes, two hidden layers of 256 nodes, and one output layer of 2         merical results of PSAC for purchase prediction under different
nodes). Moreover, layers in both models are fully connected, and         scenarios and deep learning framework. Explicitly, the simulation
the input layer’s shape will match the size of the embedding layer       results are computed by taking the average of all segments’ outcome.
given in Section 2.3.3. As for the naive setting (under M = 1 and        Moreover, the average improvement illustrates the enhancement
use UBSbase ), both models deploy roughly 1.3M untrained weights         between current input settings with baseline approach (UBS). Based
during the training process.                                             on the results, we made the following observations.
SIGIR 2019 eCom, July 2019, Paris, France                                                            Wei-Cheng Chen and Chi h-Yu Wang, et al.

      Table 5: Prediction Performance (DNN, F 1 score)                             Table 6: Prediction Performance (LSTM, F 1 score)

M     fset    µ =1     µ =2     µ =3     µ =4     µ = 5 improvement (avg.)   M     fset    µ =1     µ =2     µ =3     µ =4      µ = 5 improvement (avg.)
      UBS     0.1154                                                               UBS     0.1158
      UBS+    0.1181                                         +2.34%                UBS+    0.1167                                               +0.78%
     PSACa    0.1175                                         +1,82%               PSACa    0.1163                                               +0.43%
1




                                                                             1
     PSACf    0.1420                                         +23.05%              PSACf    0.1435                                               +23.92%
     PSAC+f   0.1429                                         +23.83%              PSAC+f   0.1439                                               +24.27%

      UBS     0.1132   0.1109                                                      UBS     0.1122   0.1092
      UBS+    0.1139   0.1115                                +0.58%                UBS+    0.1128   0.1103                                      +0.77%
     PSACa    0.1133   0.1137                                +1.31%               PSACa    0.1135   0.1119                                      +1.82%
2




                                                                             2
     PSACf    0.1346   0.1360                                +20.77%              PSACf    0.1363   0.1322                                      +21.27%
     PSAC+f   0.1346   0.1369                                +21.17%              PSAC+f   0.1352   0.1345                                      +21.83%

      UBS     0.1103   0.0937   0.0930                                             UBS     0.1091   0.1068   0.1014
      UBS+    0.1131   0.1093   0.1071                       +11.45%               UBS+    0.1106   0.1080   0.1004                             +0.50%
     PSACa    0.1108   0.1113   0.1105                       +12.68%              PSACa    0.1081   0.1082   0.1017                             +0.23%
3




                                                                             3
     PSACf    0.1317   0.1317   0.1334                       +34.47%              PSACf    0.1329   0.1274   0.1253                             +21.56%
     PSAC+f   0.1310   0.1319   0.1344                       +34.68%              PSAC+f   0.1325   0.1303   0.1279                             +23.20%

      UBS     0.1112   0.1061   0.1048   0.1010                                    UBS     0.1065   0.1066   0.0990   0.0892
      UBS+    0.1076   0.1072   0.1045   0.1028               -0.18%               UBS+    0.1080   0.1077   0.0978   0.0926                    +1.26%
     PSACa    0.1069   0.1081   0.1070   0.1077              +1.69%               PSACa    0.1068   0.1045   0.0991   0.0981                    +2.10%
4




     PSACf    0.1292   0.1296   0.1299   0.1319              +23.22%         4    PSACf    0.1302   0.1232   0.1201   0.1156                    +22.18%
     PSAC+f   0.1283   0.1292   0.1305   0.1317              +23.02%              PSAC+f   0.1299   0.1262   0.1226   0.1180                    +24.12%

      UBS     0.1075   0.1048   0.1024   0.1006   0.1006                           UBS     0.1045   0.1053   0.0968   0.0876    0.0822
      UBS+    0.1067   0.1055   0.1031   0.1023   0.1012     +0.58%                UBS+    0.1064   0.1045   0.0955   0.0904    0.0894          +2.33%
     PSACa    0.1053   0.1062   0.1050   0.1059   0.1063     +2.55%               PSACa    0.1034   0.1023   0.0967   0.0962    0.0924          +3.64%
5




                                                                             5




     PSACf    0.1271   0.1274   0.1278   0.1262   0.1282     +23.50%              PSACf    0.1293   0.1199   0.1152   0.1108    0.1067          +22.58%
     PSAC+f   0.1263   0.1257   0.1271   0.1283   0.1290     +23.46%              PSAC+f   0.1274   0.1218   0.1178   0.1137    0.1089          +24.31%


                                                                                 Table 7: Prediction Performance (all models, F 1 score)
   First of all, we discover a slight increase when we add basic query-
related features (query length and query number, UBS+ ) to the
baseline approach. As for the PSAC which utilizes context-related                 M fset       model    µ =1      µ =2         µ =3      µ =4    µ =5
features (PSACa and PSACf ), the performance of both methods                          PSAC+f DNN 0.1429
                                                                                  1




outperform the baseline outputs; however, their results are entirely                  PSAC+f LSTM 0.1439
different. If we take averaged values as the input features, the im-
                                                                                      PSAC+f DNN 0.1346 0.1369
provement compared with the first approach is under 10%. On the
                                                                                  2




                                                                                      PSAC+f LSTM 0.1352 0.1345
other hand, if we extend the input layers with the complete queries
information (PSACf ), the performance consistently surpasses the                      PSAC+f DNN 0.1310 0.1319 0.1344
                                                                                  3




baseline outcomes around 20% to 30%. We conclude that a vast                          PSAC+f LSTM 0.1325 0.1303 0.1279
amount of information is missing during the regularization pro-
                                                                                      PSAC+f DNN 0.1283 0.1292 0.1305 0.1317
cedure. Furthermore, we can see that query information plays a
                                                                                  4




vital role in perceiving the user’s intention. Next, we observe that                  PSAC+f LSTM 0.1299 0.1262 0.1226 0.1180
the results demonstrate that the position attribute is another useful                 PSAC+f DNN 0.1263 0.1257 0.1271 0.1283 0.1290
                                                                                  5




factor. We recognize one to two percent increase after considering                    PSAC+f LSTM 0.1274 0.1218 0.1178 0.1137 0.1089
the position attribute on most of the session sets with different
N_GRAM M.
   If we consider only the baseline approach and the enhanced                the evolution of the user’s query contents globally and to capture
version (UBS, UBS+ ), the prediction performance declines as the             their intention more appropriately in an attention-like structure.
length of the input sequence (µ) grow. After we take query-related           Additionally, this phenomenon occurs in all M settings. Contrary
features into account, the influences of µ on these two deep learning        to DNN, the score of LSTM decreases when applying query-related
frameworks are different. As for DNN model, once we construct                features if we utilize more µ in training. While the rationale behind
the input layer with query-related features (PSACa , PSACf , and             RNN-based models pushes itself to put more focuses on closer
PSAC+  f
         ), the model’s performance gradually raises as µ increases.         steps, it is plausible that a more extended sequence might impair the
Since DNN recognizes each action in the input sequence equally               performance. For example, a user might determine his/her shopping
during the training process, it is possible for us to comprehend             list in the middle of the searching operations and continues to
PSAC: Context-based Purchase Prediction Framework via User’s Sequential Actions                                          SIGIR 2019 eCom, July 2019, Paris, France


browse other commodities afterward. If the user did not pass the                          [10] Bernard Kubiak and Paweł Weichbroth. 2010. Cross- And Up-selling Techniques
thought onto the next action, it is hard for LSTM to perceive the                              In E-Commerce Activities. Journal of Internet Banking and Commerce 15 (12
                                                                                               2010).
conversion. Finally, we provide the comparison of DNN and LSTM                            [11] Rohan Kumar, Mohit Kumar, Neil Shah, and Christos Faloutsos. 2018. Did We Get
with the PSAC+  f
                  in Table 7. We can see that LSTM outperforms                                 It Right? Predicting Query Performance in E-commerce Search. arXiv:1808.00239
                                                                                               [cs] (Aug 2018). http://arxiv.org/abs/1808.00239 arXiv: 1808.00239.
DNN for shorter action sequences. However, as the users begin to                          [12] Caroline Lo, Dan Frankowski, and Jure Leskovec. 2016. Understanding Behaviors
reshape their thoughts and to start a new searching action, DNN is                             that Lead to Purchasing: A Case Study of Pinterest. In Proceedings of the 22nd
                                                                                               ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
more capable in realizing the connection between each action and                               - KDD ’16. ACM Press, 531–540. https://doi.org/10.1145/2939672.2939729
query.                                                                                    [13] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient
                                                                                               Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs] (Jan
                                                                                               2013). http://arxiv.org/abs/1301.3781 arXiv: 1301.3781.
5    CONCLUSIONS                                                                          [14] Yabo Ni, Dan Ou, Shichen Liu, Xiang Li, Wenwu Ou, Anxiang Zeng, and Luo Si.
With the growing usage of e-commerce platforms, the analysis                                   2018. Perceive Your Users in Depth: Learning Universal User Representations
                                                                                               from Multiple E-commerce Tasks. arXiv:1805.10727 [cs, stat] (May 2018). http:
of user’s purchase behavior has attracted attention in the field.                              //arxiv.org/abs/1805.10727 arXiv: 1805.10727.
Through an integrated prediction framework, the numerical re-                             [15] Chanyoung Park, Donghyun Kim, Min-Chul Yang, Jung-Tae Lee, and Hwanjo Yu.
                                                                                               2017. Your Click Knows It: Predicting User Purchase through Improved User-Item
sults in our study indicate that query-related features are essential                          Pairwise Relationship. arXiv:1706.06716 [cs] (Jun 2017). http://arxiv.org/abs/1706.
in making purchase prediction on e-commerce platforms. To uti-                                 06716 arXiv: 1706.06716.
lize query-related features correctly, we need to consider several                        [16] Michael Rose. [n.d.]. What Is Account-Based Marketing? https://www.forbes.
                                                                                               com/sites/forbesagencycouncil/2017/11/01/what-is-account-based-marketing/
query’s attributes, such as user’s behavior in reshaping their ideas,                     [17] Timo Saari, Niklas Ravaja, Jari Laarni, Marko Turpeinen, and Kari Kallinen.
the relationship between continuous queries, and the approach to                               2004. Psychologically targeted persuasive advertising and product information
represent query’s embedded components. These results also support                              in e-commerce. In Proceedings of the 6th international conference on Electronic
                                                                                               commerce - ICEC ’04. ACM Press, 245. https://doi.org/10.1145/1052220.1052252
that both deep learning frameworks have their advantages; how-                            [18] Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5—RmsProp: Divide
ever, DNN’s performance is more robust in capturing customer’s                                 the gradient by a running average of its recent magnitude. COURSERA: Neural
                                                                                               Networks for Machine Learning. (2012).
purchasing intention as the searching sequence grow. Furthermore,                         [19] Shuangfei Zhai, Keng-hao Chang, Ruofei Zhang, and Zhongfei Mark Zhang. 2016.
the construction of session-based data structure is useful in storing                          DeepIntent: Learning Attentions for Online Advertising with Recurrent Neural
customers’ behavior sequences, and we provide several mechanisms                               Networks. In Proceedings of the 22nd ACM SIGKDD International Conference
                                                                                               on Knowledge Discovery and Data Mining - KDD ’16. ACM Press, 1295–1304.
in explaining the phenomenon of time-shifting in the real world.                               https://doi.org/10.1145/2939672.2939759
Such evidence should be of importance in e-commerce purchase                              [20] Meizi Zhou, Zhuoye Ding, Jiliang Tang, and Dawei Yin. 2018. Micro Behaviors:
prediction.                                                                                    A New Perspective in E-commerce Recommender Systems. In Proceedings of the
                                                                                               Eleventh ACM International Conference on Web Search and Data Mining - WSDM
                                                                                               ’18. ACM Press, 727–735. https://doi.org/10.1145/3159652.3159671
ACKNOWLEDGMENTS
This work was supported by the Ministry of Science and Technology
under Grant MOST 105-2221-E-001-003-MY3 and the Academia
Sinica under Grand Challenge Seed Project AS-GC-108-01.

REFERENCES
 [1] [n.d.]. Global retail e-commerce market size 2014-2021. https://www.statista.
     com/statistics/379046/worldwide-retail-e-commerce-sales/
 [2] Bijaya Adhikari, Parikshit Sondhi, Wenke Zhang, Mohit Sharma, and B. Aditya
     Prakash. 2018. Mining E-Commerce Query Relations using Customer Interac-
     tion Networks. In Proceedings of the 2018 World Wide Web Conference on World
     Wide Web - WWW ’18. ACM Press, 1805–1814. https://doi.org/10.1145/3178876.
     3186174
 [3] J.L. Beckmann, A. Halverson, R. Krishnamurthy, and J.F. Naughton. 2006. Ex-
     tending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute
     Storage Format. (:unav) (2006). https://doi.org/10.1109/icde.2006.67
 [4] Liliya Besaleva and Alfred C. Weaver. 2017. Classification of imbalanced data in
     E-commerce. In 2017 Intelligent Systems Conference (IntelliSys). IEEE, 744–750.
     https://doi.org/10.1109/IntelliSys.2017.8324212
 [5] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory.
     Neural Computation 9, 8 (Nov 1997), 1735–1780. https://doi.org/10.1162/neco.
     1997.9.8.1735
 [6] Hong Huang, Bo Zhao, Hao Zhao, Zhou Zhuang, Zhenxuan Wang, Xiaoming Yao,
     Xinggang Wang, Hai Jin, and Xiaoming Fu. 2018. A Cross-Platform Consumer
     Behavior Analysis of Large-Scale Mobile Shopping Data. In Proceedings of the
     2018 World Wide Web Conference on World Wide Web - WWW ’18. ACM Press,
     1785–1794. https://doi.org/10.1145/3178876.3186169
 [7] Laszlo A. Jeni, Jeffrey F. Cohn, and Fernando De La Torre. 2013. Facing Imbalanced
     Data–Recommendations for the Use of Performance Metrics. (:unav) (Sep 2013).
     https://doi.org/10.1109/acii.2013.47
 [8] Ru Jia, Ru Li, Meiju Yu, and Shanshan Wang. 2017. E-commerce purchase predic-
     tion approach by user behavior data. (Jul 2017), 1–5. https://doi.org/10.1109/
     CITS.2017.8035294
 [9] Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti-
     mization. arXiv:1412.6980 [cs] (Dec 2014). http://arxiv.org/abs/1412.6980 arXiv:
     1412.6980.