<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Com, Tokyo, Japan, August</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Predicting Shopping Behavior with Mixture of RNNs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arthur Toth</string-name>
          <email>arthur.toth@rakuten.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Di Fabbrizio</string-name>
          <email>difabbrizio@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Louis Tan</string-name>
          <email>ts-louis.tan@rakuten.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ankur Datta</string-name>
          <email>ankur.datta@rakuten.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Rakuten Institute of Technology</institution>
          ,
          <addr-line>Boston, Massachusetts 02110</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>5</volume>
      <abstract>
        <p>We compare two machine learning approaches for early prediction of shoppers' behaviors, leveraging features from clickstream data generated during live shopping sessions. Our baseline is a mixture of Markov models to predict three outcomes: purchase, abandoned shopping cart, and browsing-only. We then experiment with a mixture of Recurrent Neural Networks. When sequences are truncated to 75% of their length, a relatively small feature set predicts purchase with an F-measure of 0.80 and browsing-only with an F-measure of 0.98. We also investigate an entropy-based decision procedure.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Computing methodologies → Neural networks; • Applied
computing → Online shopping;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        Recent e-commerce forecast analysis estimates that more than 1.77
billion users will shop online by the end of 2017 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Although
this is impressive growth, conversion rates for online shoppers
are substantially lower than rates for traditional brick-and-mortar
stores.
      </p>
      <p>
        Consumers shopping on e-commerce web sites are in uenced
by numerous factors and may decide to stop the current session,
leaving products in their shopping carts. Once a user has interacted
with a shopping cart, abandonment rates range between 25% and
88%, signi cantly reducing merchants’ selling opportunities [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
Several potential purchase inhibitors have been analyzed in the
2
We treat predicting user behavior from clickstream data as sequence
classi cation, which is broadly surveyed by Xing et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], who
divide it into feature-based, sequence distance-based, and
modelbased methods. Previous feature-based work on clickstream
classi cation includes the random forest used by Awalker et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
the deep belief networks and stacked denoising auto-encoders by
Viera [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and recurrent neural networks by Wu et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
Previous distance-based work includes the large margin nearest neighbor
approach by Pai et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Previous model-based work by Bertsimas
et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] used a mixture of Markov chains.
      </p>
      <p>
        Our baseline approach is based on the latter work, whereas our
new approach uses a mixture of RNNs. Although Wu et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] used
RNNs, their approach is not applicable to our scenario, since its
bi-directional RNN uses entire clickstream sequences. Our goal is
to classify incomplete sequences. Also their model is not a mixture.
      </p>
      <p>Finally, our approach di ers from most others in its use of a
ternary classi cation scheme. We classify clickstreams as either
being a purchase, abandon or browsing-only session instead of just
purchase and non-purchase.</p>
    </sec>
    <sec id="sec-3">
      <title>CLICKSTREAM DATA</title>
      <p>We consider clickstream data collected over two weeks and
consisting of n0 = 1; 560; 830 sessions. A session Si ; i = 1; 2; : : : ; n0, is a
chronological sequence of recorded page views, or “clicks.” Let Vi; j
be the jth page view of session i so that Si = (Vi;1; Vi;2; : : : Vi;mi ),
and mi is de ned as the length of session i. We exclude session
i from our experiments if mi &lt; 4, after which n = 198; 936
sessions remain. Sessions with purchases are truncated before the rst
purchase con rmation. Table 1 summarizes the n sessions.</p>
      <p>Our experiments use both the page type and dwell time of Vi; j .
The page type of Vi; j , denoted as Pi; j , belongs to one of eight
categories, including search pages, product view pages, login pages, etc.
The dwell time, Di; j , of Vi; j is the amount of time the user spends
viewing the page, and is not available until the (j + 1)th page view.
After the jth page view, the clickstream data gathered for session i is
given by Si jj = ((Pi;1; Di;0); (Pi;2; Di;1); : : : ; (Pi; j ; Di; j 1)) where
Di;0 is unde ned, i.e., Di;0 = ;. To reduce sparsity, dwell times
were placed in 8 bins, evenly spaced by percentiles.
4</p>
    </sec>
    <sec id="sec-4">
      <title>MODELING APPROACHES</title>
      <p>Our goal is to classify customer behavior into nal decision
categories. In particular, clickstream sequences receive one of the
following labels: PURCHASE, if the sequence leads to an item
purchase; ABANDON, if an item was left in the shopping cart, but
there was no purchase; and BROWSING-ONLY, when the shopping
cart was not used. The nal two categories can be combined to
investigate PURCHASE vs. NON_PURCHASE behavior. In
preliminary studies, the ABANDON sequences were much more similar
to the PURCHASE sequences than to the BROWSING-ONLY
sequences, so having three categories helped account for some of
the confusability of the data. Our eventual goal of applying our
models in a live system adds a constraint. The classi er must work
for incomplete sequences, without using data from the “future”.
4.1</p>
    </sec>
    <sec id="sec-5">
      <title>Mixture of High-Order Markov Chains</title>
      <p>
        Bertsimas et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] modeled a similar problem with a mixture of
Markov chains, using maximum a posteriori estimation to predict
sequence labels. In this approach, the training data is partitioned by
class and separate Markov chain models are trained for each one.
The resultant models can estimate class likelihoods from sequences.
Let Ci be a random variable representing the outcome of session Si ,
where Ci 2 Ω, and Ω is the set of possible clickstream outcomes.
The models would be used to estimate the likelihoods P (Si jCi = ω )
for all ω 2 Ω. Using Bayes’ theorem and the law of total probability,
the class posteriors for each of the three classes can be estimated
by the equation
      </p>
      <p>P (Si jCi = ω )P (Ci = ω )
P (Ci = ω jSi ) =</p>
      <p>Pω 2Ω P (Si jCi = ω )P (Ci = ω )
with the prior, P (Ci = ω ), estimated from counts.</p>
      <p>This model ts the problem’s constraints, because likelihoods
can be produced from subsequences without using “future” clicks.
Although each chain is trained only on click data, separating data
by class implicitly conditions them on class.</p>
      <p>
        Taking inspiration from the Automatic Speech Recognition (ASR)
community and similarities to “Language Modeling”, we adapted
some of their more recent techniques to our problem. In preliminary
experiments, 5-grams performed better than shorter chains, so we
used them. Longer chains cause greater sparsity, so we addressed
this with Kneser-Ney smoothing, which performed best in a study
of language modeling techniques [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We used the MITLM toolkit
to train the Markov chains [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
4.2
      </p>
    </sec>
    <sec id="sec-6">
      <title>Recurrent Neural Networks</title>
      <p>
        Taking further inspiration from the ASR community, we replaced
the Markov-chains in our mixtures with RNNs. Earlier language
modeling work used feed-forward arti cial neural networks [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
but RNNs have performed better recently, both in direct likelihood
metrics and in overall ASR task metrics [
        <xref ref-type="bibr" rid="ref16 ref9">9, 16</xref>
        ]. Click-stream data
di ers from ASR text, and our mixture model di ered from the
typical ASR approach, so it was unclear whether RNNs would help
in our scenario.
      </p>
      <p>
        Earlier RNN-based language models used the “simple recurrent
neural network” architecture [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The underlying idea, depicted in
Figure 1, is that an input sequence, represented by fX0; :::; Xt 1g,
is connected to a recurrent layer, represented by fA0; :::At g, which
is also connected to itself at the next point in time. The recurrent
layer is also connected to an output layer. In our models, each RNN
tries to predict the next input, Xn+1, after each input, Xn .
      </p>
      <p>
        “Simple” RNN-based language models for ASR were
outperformed by RNNs using the Long Short-Term Memory (LSTM)
conguration and “drop-out” [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. LSTM addressed the “vanishing
gradient” and “exploding gradient” problems. “Drop-out” addressed
over tting by probabilistically ignoring the contributions of
nonrecurrent nodes during training.
      </p>
      <p>
        In LSTM RNNs, some nodes function as “memory cells”. Some
connections retrieve information from them, and others cause them
to forget. The LSTM equations are [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]:
      </p>
      <p>LST M :htl 1;htl 1;ctl 1!htl ;ctl
i sigm
*...... gof //+////=*...... ssiiggmm //+///T2n;4n .* hhllt 1 +/
, - , sigm /- , t 1
ctl =f ctl 1+i g
htl =o t anh (ctl )
where h and c represent hidden states and memory cells, subscripts
refer to time, superscripts refer to layers, Tn;m is an a ne transform
from Rn to Rm , refers to element-wise multiplication, and sigm
and tanh are applied to each element.</p>
      <p>
        Since LSTM RNNs with drop-out worked much better than
Markov chains for ASR, we replaced the Markov chains with them
in our clickstream mixture models. Additional work was
necessary, since our scenario did not exactly match language modeling.
During training, input tokens were still used to predict following
tokens. During testing, however, our goal was the sequence
probabilities. These were calculated from token probabilities present
in intermediate softmax layers in each LSTM model. Due to the
network architecture, “future” events were not used for this. We
used TensorFlow to implement our LSTM RNNs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>5 EXPERIMENTS</title>
      <p>We experimented on the dataset described in Section 3 and
summarized in Table 1. It was partitioned into an 80% training/20% testing
split, using strati ed sampling to retain class proportions.</p>
      <p>All RNNs were trained for 10 epochs, using batches of 20
sequences. We tested 16 combinations of the remaining parameters.
The number of recurrent layers was 1 or 2, the keep probability
was 1 or 0.5, and the hidden state size was 10, 20, 40, or 80. For a
particular mixture model, all the RNNs used the same parameter
values.</p>
    </sec>
    <sec id="sec-8">
      <title>5.1 Results</title>
      <p>For each model, we evaluated the prediction performance by
truncating the page view sequences at di erent lengths. Table 2 shows
the results for the mixture of Markov models, and for one of the
mixture of RNN trials. Although we tested 16 di erent RNN
parameter combinations, results were so similar that we are only reporting
on one of them.</p>
      <p>Table 2 reports precision, recall, and F1-measure for each speci c
sequence outcome when considering 25%, 50%, 75%, and 100% of
the total length of the sequence. For instance, when splitting at
50%, the Markov chain model can predict a PURCHASE with a
0:42 precision and 0:11 recall, resulting in an overall F1-measure
of 0:17. For the same conditions, the RNN-based model reaches a
precision of 0:82 with 0:71 recall and an F1-measure of 0:76. We
also report the accuracy when randomly selecting a class based
on the prior distribution of the clickstream corpus. RNN mixture
components substantially outperform Markov chain components.
This is particularly evident from Figure 2, which shows the
F1measure by sequence length for the mixtures of MCMs (dotted
line) and of RNNs (solid line). Both models monotonically increase
performance as the model observes more data from 25% splits to
100%, but the mixture of RNNs has an immediate edge even with
the short 25% sequences. The MCMs present similar F1-measures
for the majority class (i.e., BROWSING-ONLY), but it is penalized
F1-measure by sequence length
by the lack of data for the less represented sequences (i.e., 14.7% for
ABANDON and 20.7% for PURCHASE). RNNs instead generalize
better due to the memory component that can model long-distance
dependencies.</p>
      <p>
        Similarly to Bertsimas et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], in Figure 3, we plot 100
logprobability trajectories, with lengths from 2 through 20 page views,
estimated by the MCM mixture along page view sequences for
each class. This plot demonstrates how probabilities evolve during
interactions and how con dent each model is compared to others.
      </p>
      <p>The legend in Figure 3 corresponds to the true nal state labels
of the test examples: dashed red lines for ABANDON sequences,
dashed and dotted green lines for BROWSING-ONLY sequences,
and solid blue lines for PURCHASE sequences. Ideally, the model
would score the PURCHASE sequences (solid blue lines) high, and
the other sequences low, and the earlier the distinction could be
made, the better. Looking at this gure, there does appear to be
some level of discrimination between the categories. In general,
the BROWSING-ONLY sequences seem more separable from the
PURCHASE sequences than the ABANDON sequences, as expected.</p>
      <p>
        Although Table 2 can be used to compare other work [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], it
depends on sequence lengths. A useful live system must predict nal
actions before sequences are complete and needs a decision process
for accepting the predicted label. We experimented with taking
the highest scoring label once the entropy fell below a threshold.
Figure 4 shows the F1-measures at di erent thresholds, which are
proportions of the maximum possible entropy. Since the entropy
might not drop below the threshold, it is important to consider
how many sequences have predicted labels. When we calculated
F1measures, sequences without predictions were counted as misses.
Figure 5 shows the number of sequences where predictions were
made based on entropy threshold. Again, the horizontal axis
represents the proportion of the maximum possible entropy value.
The vertical axis represents the number of sequences where a
decision can be made based on threshold crossing. As an example, a
threshold of 0:55 led to reasonable F1-measures while producing
predictions for 99% of the sequences before they were complete.
Choosing higher entropy thresholds allows decisions to be made
for more sequences, but performance can su er since decisions can
be made while class probabilities are more uniform and con dence
is lower. Choosing lower entropy thresholds forces the class
probabilities to be more distinct, which leads to more con dent decisions,
but performance starts to su er when fewer sequences receive
decisions. In practice, the threshold would be tuned on held-out
data.
6
      </p>
    </sec>
    <sec id="sec-9">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>We presented two models for the real-time, early prediction of
shopping session outcomes on an e-commerce platform. We
demonstrated that LSTM RNNs generalize better and with less data than
high-order Markov chain models used in previous work. Our
approach, in addition to distinguishing browsing-only and cart-interaction
sessions, can also accurately discriminate between cart
abandonment and purchase sessions. Future work will focus on features,
single RNN architectures, and decision strategies.</p>
      <p>P</p>
      <p>P</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Martín</given-names>
            <surname>Abadi</surname>
          </string-name>
          , Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis,
          <source>Je rey Dean</source>
          , Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geo rey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke,
          <string-name>
            <given-names>Yuan</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Xiaoqiang</given-names>
            <surname>Zheng</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <source>TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems</source>
          . (
          <year>2015</year>
          ). http://tensor ow.
          <source>org/ Software available from tensor ow.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Awalkar</surname>
          </string-name>
          , Ibrahim Ahmed, and
          <string-name>
            <given-names>Tejas</given-names>
            <surname>Nevrekar</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Prediction of User's Purchases using Clickstream Data</article-title>
          .
          <source>International Journal of Engineering Science and Computing</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          , Réjean Ducharme, Pascal Vincent, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Janvin</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>A Neural Probabilistic Language Model</article-title>
          .
          <source>J. Mach. Learn. Res. 3 (March</source>
          <year>2003</year>
          ),
          <fpage>1137</fpage>
          -
          <lpage>1155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Dimitris</given-names>
            <surname>Bertsimas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Adam J.</given-names>
            <surname>Mersereau</surname>
          </string-name>
          , and
          <string-name>
            <surname>Nitin</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Patel</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Dynamic Classi cation of Online Customers</article-title>
          .
          <source>In Proceedings of the Third SIAM International Conference on Data Mining</source>
          , San Francisco, CA, USA, May 1-
          <issue>3</issue>
          ,
          <year>2003</year>
          .
          <fpage>107</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Robert</given-names>
            <surname>Florentin</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Online shopping abandonment rate a new perspective : the role of choice con icts as a factor of online shopping abandonment</article-title>
          .
          <source>Master's thesis</source>
          . University of Twente.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Joshua</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Goodman</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>A Bit of Progress in Language Modeling</article-title>
          .
          <source>Comput. Speech Lang</source>
          .
          <volume>15</volume>
          ,
          <issue>4</issue>
          (Oct.
          <year>2001</year>
          ),
          <fpage>403</fpage>
          -
          <lpage>434</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Bo-June Paul</surname>
          </string-name>
          Hsu and
          <string-name>
            <surname>James R. Glass</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Iterative language model estimation: e cient data structure &amp; algorithms</article-title>
          .
          <source>In INTERSPEECH</source>
          <year>2008</year>
          ,
          <article-title>9th Annual Conference of the International Speech Communication Association</article-title>
          , Brisbane, Australia,
          <source>September 22-26</source>
          ,
          <year>2008</year>
          .
          <fpage>841</fpage>
          -
          <lpage>844</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Monika</given-names>
            <surname>Kukar-Kinney</surname>
          </string-name>
          and
          <string-name>
            <given-names>Angeline G.</given-names>
            <surname>Close</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>The determinants of consumers' online shopping cart abandonment</article-title>
          .
          <source>Journal of the Academy of Marketing Science</source>
          <volume>38</volume>
          ,
          <issue>2</issue>
          (
          <year>2010</year>
          ),
          <fpage>240</fpage>
          -
          <lpage>250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Tomáš</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Martin Kara át, Lukáš Burget, Jan Černocký, and
          <string-name>
            <given-names>Sanjeev</given-names>
            <surname>Khudanpur</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Recurrent neural network based language model</article-title>
          .
          <source>In The 11th Annual Conference of the International Speech Communication Association (INTERSPEECH</source>
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Deepak</surname>
            <given-names>Pai</given-names>
          </string-name>
          , Abhijit Sharang, Meghanath Macha Yadagiri, and
          <string-name>
            <given-names>Shradha</given-names>
            <surname>Agrawal</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Modelling Visit Similarity Using Click-Stream Data: A Supervised Approach</article-title>
          . Springer International Publishing,
          <volume>135</volume>
          -
          <fpage>145</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11] The Statistics Portal.
          <year>2014</year>
          .
          <article-title>Digital buyer penetration worldwide from 2014 to 2019</article-title>
          . http://www.statista.com/statistics/261676/ digital-buyer
          <string-name>
            <surname>-</surname>
          </string-name>
          penetration-worldwide/. (
          <year>2014</year>
          ). Accessed:
          <fpage>2016</fpage>
          -09-10.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Armando</given-names>
            <surname>Vieira</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Predicting online user behaviour using deep learning algorithms</article-title>
          .
          <source>The Computing Research Repository (CoRR) abs/1511</source>
          .06247 (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Zhenzhou</surname>
            <given-names>Wu</given-names>
          </string-name>
          , Bao Hong Tan, Rubing Duan, Yong Liu, and Rick Siow Mong Goh.
          <year>2015</year>
          .
          <article-title>Neural Modeling of Buying Behaviour for E-Commerce from Clicking Patterns</article-title>
          .
          <source>In Proceedings of the 2015 International ACM Recommender Systems Challenge (RecSys '15 Challenge)</source>
          . ACM, New York, NY, USA, Article
          <volume>12</volume>
          , 4 pages.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Zhengzheng</surname>
            <given-names>Xing</given-names>
          </string-name>
          , Jian Pei, and
          <string-name>
            <given-names>Eamonn</given-names>
            <surname>Keogh</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>A Brief Survey on Sequence Classi cation</article-title>
          .
          <source>SIGKDD Explor. Newsl</source>
          .
          <volume>12</volume>
          ,
          <issue>1</issue>
          (Nov.
          <year>2010</year>
          ),
          <fpage>40</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Yin</given-names>
            <surname>Xu</surname>
          </string-name>
          and
          <string-name>
            <surname>Jin-Song Huang</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Factors In uencing Cart Abandonment in the Online Shopping Process</article-title>
          .
          <source>Social Behavior and Personality: an international journal 43</source>
          , 10 (Nov.
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Wojciech</surname>
            <given-names>Zaremba</given-names>
          </string-name>
          , Ilya Sutskever, and
          <string-name>
            <given-names>Oriol</given-names>
            <surname>Vinyals</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Recurrent Neural Network Regularization</article-title>
          .
          <source>The Computing Research Repository (CoRR) abs/1409</source>
          .2329 (
          <year>2014</year>
          ). http://arxiv.org/abs/1409.2329
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>