=Paper=
{{Paper
|id=Vol-1688/paper-08
|storemode=property
|title=Modelling Session Activity with Neural Embedding
|pdfUrl=https://ceur-ws.org/Vol-1688/paper-08.pdf
|volume=Vol-1688
|authors=Oren Barkan,Yael Brumer,Noam Koenigstein
|dblpUrl=https://dblp.org/rec/conf/recsys/BarkanBK16
}}
==Modelling Session Activity with Neural Embedding==
<pdf width="1500px">https://ceur-ws.org/Vol-1688/paper-08.pdf</pdf>
<pre>
          Modelling Session Activity with Neural Embedding
             Oren Barkan                                       Yael Brumer                                      Noam Koenigstein
            Microsoft, Israel                                 Microsoft, Israel                                  Microsoft, Israel


ABSTRACT                                                                  and is based on a sample of activity sessions from March to June
Neural embedding techniques are being applied in a growing                2016. Each action, whether it is click or purchase, is uniquely
number of machine learning applications. In this work, we                 identified by the session id, timestamp and item id. From this
demonstrate a neural embedding technique to model users’ session          sample, all sessions with less than two different clicked items, or
activity. Specifically, we consider a dataset collected from              without purchase event are removed. The effective dataset consists
Microsoft’s App Store consisting of user sessions that include            of 8,785,295 distinct sessions that contains 43,956,340 clicks and
sequential click actions and item purchases. Our goal is to learn a       18,838,796 purchases. On average, each session is associated with
latent manifold that captures users’ session activity and can be          5.003 clicks and 2.144 purchases, while the total number of distinct
utilized for contextual recommendations in an online app store.           items is 22,139. In every session, the actions are ordered by their
                                                                          timestamps.
Keywords
Skip-Gram, Collaborative Filtering, Recommender Systems                   3. NEURAL ACTION EMBEDDING
                                                                          Our model is inspired by Skip-Gram with Negative Sampling
1. INTRODUCTION                                                           (SGNS) also known as Word2vec [2]. As explained above, we
Neural embedding models have significantly advanced state-of-             wish to model the user actions in a dataset =                 of K
the-art in the field of Natural Language Processing [1], [2]. In          ordered user activity sequences where the i’th sequence is =
Recommender Systems research, neural networks have been                      , , , ,…, ,     , and     is its length. The set of all possible
applied in Collaborative Filtering (CF) [3], and basket completion        actions is denoted by        and includes in our case click and
[4]. Specifically, [5] presented a neural model for embedding items       purchase events of different items from the items catalog. We
in a latent manifold that encodes CF information. These early works       further define a function : → ,             that maps between an
have been published very recently and indicate a growing interest         action to its type (click or purchase).
in neural embedding techniques for recommendations.                          Our objective is to maximize the following term:
   In this work, we take a different direction and utilize neural
embedding techniques to model users’ session activity.                                    1
Specifically, we consider a dataset collected from Microsoft’s App                                             log   ,! "   ,#   .   (1)
Store consisting of user sessions that include sequential click                                      (!,#)∈'

actions and item purchases. Our goal is to learn a latent manifold        where, ) ⊆ (+, ,): 0 ≤ , < + ≤ } is a set that contains tuples
                                                                                                                   ,! " ,# is defined by
that captures users’ session activity and can be utilized for             of sequential actions. The probability
contextual recommendations of apps in an online app store.                                                                  <
   Most prominent CF techniques such as Matrix Factorization [6]
                                                                                         ,! "   ,#    = 0 1235 ,4 63 ,7 8 9 0 1−235 ,4 63; 8
do not take into account the sequential order of user actions prior                                                     =
to purchasing an item. Recently, there is a lot of interest around user
behavior modeling to predict purchases. One of the latest                 where σ ( x) = 1 / 1 + exp( − x) , 23 ∈ >(⊂ ℝA ) and 63 ∈ B(⊂
competition “Tmall Recommendation Prize” requires to predict              ℝA ) are latent vectors corresponding to the target and context
future user purchases on Tmall website [7]. While they build user         representation of action . The parameter m is chosen
profiles to predict purchases, we try to model session behavior           empirically through cross-validation. N is a parameter that
regardless the user profile. Another approach [8] uses LSTM-              determines the number of negative examples to be drawn per a
BiRNN to learn sequence clicks made at the same session to predict        positive example. A negative action          is sampled from a
all purchases associated with this session, while we try to predict       distribution that is proportional to the frequency of the item
the next purchased item given a click action that made only in a          that is associated with      . In this work, we use the unigram
pre-defined window before the purchase.                                   distribution raised to the 3/4rd power.
   The underlying assumption in this work is that users consider             In order to mitigate the effect of popularity and produce
several items prior to their ultimate decision to purchase. Hence,        better modeling for unpopular items, we subsample the
we model users’ session activity as a sequence of click events on         sessions. Specifically, we discard each action a with the
item detail pages and purchase events. For example, (C1, C2, C3,
C4, C5, P5) denotes a user session consisting of 5 click event on 5       probability (CD E FC| ) = 1 − HI/K(L( )) where L( ) is the
different items followed by a single purchase event. Note that an         item that is associated with the action , K(M) is the frequency
item purchase event is always preceded by a click event on the same       of the item M and I is a parameter that controls how aggressive
item. By learning to predict these sequences, one can improve the         is the subsampling. Finally, the latent vectors are estimated by
overall user experience by recommending the items that the user is        applying a stochastic gradient ascent with respect to the
most likely to purchase.                                                  objective in Eq. (1).

2. DATASET                                                                4. EVALUATION
Our dataset collected from Microsoft’s App Store consisting of user       In this section, we describe our evaluation of the proposed model.
sessions that include sequential click actions and item purchases         Our prediction task is to predict the next purchased item given a
                                                                          click event. To this end, we split the dataset according to the session


Copyright held by the author(s)
order. The first 90% sessions are used as a training set and the         Table 1. Percentile Rank (PR) of the hidden (purchased) item
remaining 10% are used as a test set. For each test session, we form
                                                                                                 CP                 CC                  PP
a set of test (C, P) tuples, where each tuple corresponds to a
purchase action that is preceded by a click action. A tuple (C, P) is          Mean PR          6.49%             11.84%              10.91%
considered only if C and P distant by at most three other actions.             Median PR        0.72%             0.89%                0.89%
For example, for a given test session (C1, P1, C2, C3, C4, P4), the
tuple (C3, P4) can be made because the distance between them is a                Table 2. Precision@K values for difference models
single action C4. On the other hand, the tuple (C1, P4) cannot be
made since the distance between them is four actions. Furthermore,             Graphics          CP                 CC                  PP
we exclude trivial tuples that consists of click and purchase of the             K=10            0.21              0.16                 0.16
same item. The resulted test set contains ~ 2M tuples.                           K=25            0.30              0.25                 0.26
4.1 Model Variants                                                               K=50            0.38              0.33                 0.33
We compare three variants of the proposed model. The variants                   K=100            0.46              0.42                 0.42
differ by the tuples that the model is trained with. The set of tuples
for each model is determined by the choice of ) in Eq. (1).              affected by a small number of bad examples but in most cases the
   The first model dubbed ‘CP’ comprised of tuples that are created      predictions are much better than the Mean PR values. This behavior
in a similar manner to the test tuples. Specifically, for a given        characterizes all three models, but more dominant in the CC and
training session , we set                                                CP models.
) = (+, ,): +, , ∈ 0. .      ∧ 2 ≤ + − , ≤ C ∧ ( ,! ) = ∧                   Our second evaluation is based on the Precision@K metric: For
   ( ,# ) = }.                                                           each model, we measure the percentage of test examples in which
As a result, in this model, > and B are the representations of           the hidden item was ranked in the top K.
the clicks and purchases, respectively.                                     Table 2 presents Precision@K values for different values of K.
   The second model is dubbed ‘CC’ and comprised of the                  The results coincide with those of Table 1 where the CP model
sequential click events without the purchase events. The reasoning       shows significantly better results across the board. Again, these
behind this model is the fact that each purchase event is preceded       results emphasize that unlike most present day models, that
by a click on the same item that was purchased. Hence, by                consider only one type of events, there is significant added value in
predicting the next item the user will click upon, we are also           the CP approach that models click and purchase event
predicting the next item that would be purchased. Therefore, in this     simultaneously.
model we set
) = (+, ,): +, , ∈ 0. .      ∧ 1 ≤ + − , ≤ C ∧ ( ,! ) = ∧                5. CONCLUSION AND FUTURE WORK
   ( ,# ) = }.                                                           In this work, we present and evaluate several variants of neural
   The third model, dubbed ‘PP’ is comprised of sequential               embedding models for predicting purchases from user activity
purchase events (without clicks). Many Collaborative Filtering           sessions. The evaluation shows that learning from click-purchase
algorithms are designed to predict the next item a user will             relations in different scales provide better results than learning from
purchase, given the items he already purchased. Hence, the ‘PP’          either click-click or purchase-purchase relations.
model was chosen as a baseline that follows a similar approach              In future, we plan to investigate the contribution of additional
taken by many contemporary algorithms. In the ‘PP’ model we use          hidden layers to the model presented in this paper and compare
) = (+, ,): +, , ∈ 0. .      ∧ 1 ≤ + − , ≤ C ∧ ( ,! ) = ∧                between our model to sequential neural models such as LSTM [8]
   ( ,# ) = }.
                                                                         for the same prediction task.

4.2 Parameters                                                           6. REFERENCES
We used the following parameter configuration: we set the                [1]       Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, “A Neural
negative to positive ratio P to 15. I was set to 1e-3. C was set                   Probabilistic Language Model,” JMLR, vol. 3, pp. 1137–1155,
                                                                                   Mar. 2003.
to 4. All three models were trained for 50 iterations. It is
                                                                         [2]       T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,
important to clarify that we experimented with different values                    “Distributed Representations of Words and Phrases and their
of C = 2. .10 and no significant change in the results was                         Compositionality,” in NIPS 26, 2013, pp. 3111–3119.
observed.                                                                [3]       F. Strub, J. Mary, and R. Gaudel, “Hybrid Collaborative Filtering
                                                                                   with Neural Networks,” CoRR, vol. abs/1603.00806, 2016.
4.3 Evaluation Metrics and Results                                       [4]       S. Wan, Y. Lan, P. Wang, J. Guo, J. Xu, and X. Cheng, “Next
Our first evaluation is based on measuring the Percentile Ranks                    Basket Recommendation with Neural Networks,” in Poster
(PR) of the hidden items (purchased items). We report results in                   Proceedings of RecSys 2015.
terms of Mean Percentile Rank as well as Median Percentile Rank.         [5]       O. Barkan and N. Koenigstein, “Item2Vec : Neural Item
   Table 1 summarizes the mean and median PR for the different                     Embedding for Collaborative Filtering,” arXiv preprint
models. The CP model clearly outperforms both the CC and PP                        arXiv:1603.04259, 2016.
                                                                         [6]       Y. Koren, R. Bell, and C. Volinsky, “Matrix Factorization
models. We therefore conclude that both previous click events as                   Techniques for Recommender Systems,” IEEE Computer, vol.
well as previous purchases are relevant in this prediction task.                   42, no. 8, pp. 30–37, Aug. 2009.
Ignoring each of these signals, undermines the model’s ability to        [7]       Zhang, Yuyu, et al. "Large Scale Purchase Prediction with
detect the hidden item.                                                            Historical User Actions on B2C Online Retail Platform." arXiv
   A second observation is the fact that the Median PR values are                  preprint arXiv:1408.6515, 2014
much better than the Mean PR values and the performance                  [8]       WU, Zhenzhou, et al. Neural Modeling of Buying Behaviour for
difference between the models becomes smaller when the Median                      E-Commerce from Clicking Patterns. In: Proceedings of the 2015
PR is considered. This suggests that the Mean PR values are highly                 International ACM Recommender Systems Challenge. ACM,
                                                                                   2015. p. 12.

</pre>