=Paper=
{{Paper
|id=Vol-1688/paper-08
|storemode=property
|title=Modelling Session Activity with Neural Embedding
|pdfUrl=https://ceur-ws.org/Vol-1688/paper-08.pdf
|volume=Vol-1688
|authors=Oren Barkan,Yael Brumer,Noam Koenigstein
|dblpUrl=https://dblp.org/rec/conf/recsys/BarkanBK16
}}
==Modelling Session Activity with Neural Embedding==
Modelling Session Activity with Neural Embedding
Oren Barkan Yael Brumer Noam Koenigstein
Microsoft, Israel Microsoft, Israel Microsoft, Israel
ABSTRACT and is based on a sample of activity sessions from March to June
Neural embedding techniques are being applied in a growing 2016. Each action, whether it is click or purchase, is uniquely
number of machine learning applications. In this work, we identified by the session id, timestamp and item id. From this
demonstrate a neural embedding technique to model users’ session sample, all sessions with less than two different clicked items, or
activity. Specifically, we consider a dataset collected from without purchase event are removed. The effective dataset consists
Microsoft’s App Store consisting of user sessions that include of 8,785,295 distinct sessions that contains 43,956,340 clicks and
sequential click actions and item purchases. Our goal is to learn a 18,838,796 purchases. On average, each session is associated with
latent manifold that captures users’ session activity and can be 5.003 clicks and 2.144 purchases, while the total number of distinct
utilized for contextual recommendations in an online app store. items is 22,139. In every session, the actions are ordered by their
timestamps.
Keywords
Skip-Gram, Collaborative Filtering, Recommender Systems 3. NEURAL ACTION EMBEDDING
Our model is inspired by Skip-Gram with Negative Sampling
1. INTRODUCTION (SGNS) also known as Word2vec [2]. As explained above, we
Neural embedding models have significantly advanced state-of- wish to model the user actions in a dataset = of K
the-art in the field of Natural Language Processing [1], [2]. In ordered user activity sequences where the i’th sequence is =
Recommender Systems research, neural networks have been , , , ,…, , , and is its length. The set of all possible
applied in Collaborative Filtering (CF) [3], and basket completion actions is denoted by and includes in our case click and
[4]. Specifically, [5] presented a neural model for embedding items purchase events of different items from the items catalog. We
in a latent manifold that encodes CF information. These early works further define a function : → , that maps between an
have been published very recently and indicate a growing interest action to its type (click or purchase).
in neural embedding techniques for recommendations. Our objective is to maximize the following term:
In this work, we take a different direction and utilize neural
embedding techniques to model users’ session activity. 1
Specifically, we consider a dataset collected from Microsoft’s App log ,! " ,# . (1)
Store consisting of user sessions that include sequential click (!,#)∈'
actions and item purchases. Our goal is to learn a latent manifold where, ) ⊆ (+, ,): 0 ≤ , < + ≤ } is a set that contains tuples
,! " ,# is defined by
that captures users’ session activity and can be utilized for of sequential actions. The probability
contextual recommendations of apps in an online app store. <
Most prominent CF techniques such as Matrix Factorization [6]
,! " ,# = 0 1235 ,4 63 ,7 8 9 0 1−235 ,4 63; 8
do not take into account the sequential order of user actions prior =
to purchasing an item. Recently, there is a lot of interest around user
behavior modeling to predict purchases. One of the latest where σ ( x) = 1 / 1 + exp( − x) , 23 ∈ >(⊂ ℝA ) and 63 ∈ B(⊂
competition “Tmall Recommendation Prize” requires to predict ℝA ) are latent vectors corresponding to the target and context
future user purchases on Tmall website [7]. While they build user representation of action . The parameter m is chosen
profiles to predict purchases, we try to model session behavior empirically through cross-validation. N is a parameter that
regardless the user profile. Another approach [8] uses LSTM- determines the number of negative examples to be drawn per a
BiRNN to learn sequence clicks made at the same session to predict positive example. A negative action is sampled from a
all purchases associated with this session, while we try to predict distribution that is proportional to the frequency of the item
the next purchased item given a click action that made only in a that is associated with . In this work, we use the unigram
pre-defined window before the purchase. distribution raised to the 3/4rd power.
The underlying assumption in this work is that users consider In order to mitigate the effect of popularity and produce
several items prior to their ultimate decision to purchase. Hence, better modeling for unpopular items, we subsample the
we model users’ session activity as a sequence of click events on sessions. Specifically, we discard each action a with the
item detail pages and purchase events. For example, (C1, C2, C3,
C4, C5, P5) denotes a user session consisting of 5 click event on 5 probability (CD E FC| ) = 1 − HI/K(L( )) where L( ) is the
different items followed by a single purchase event. Note that an item that is associated with the action , K(M) is the frequency
item purchase event is always preceded by a click event on the same of the item M and I is a parameter that controls how aggressive
item. By learning to predict these sequences, one can improve the is the subsampling. Finally, the latent vectors are estimated by
overall user experience by recommending the items that the user is applying a stochastic gradient ascent with respect to the
most likely to purchase. objective in Eq. (1).
2. DATASET 4. EVALUATION
Our dataset collected from Microsoft’s App Store consisting of user In this section, we describe our evaluation of the proposed model.
sessions that include sequential click actions and item purchases Our prediction task is to predict the next purchased item given a
click event. To this end, we split the dataset according to the session
Copyright held by the author(s)
order. The first 90% sessions are used as a training set and the Table 1. Percentile Rank (PR) of the hidden (purchased) item
remaining 10% are used as a test set. For each test session, we form
CP CC PP
a set of test (C, P) tuples, where each tuple corresponds to a
purchase action that is preceded by a click action. A tuple (C, P) is Mean PR 6.49% 11.84% 10.91%
considered only if C and P distant by at most three other actions. Median PR 0.72% 0.89% 0.89%
For example, for a given test session (C1, P1, C2, C3, C4, P4), the
tuple (C3, P4) can be made because the distance between them is a Table 2. Precision@K values for difference models
single action C4. On the other hand, the tuple (C1, P4) cannot be
made since the distance between them is four actions. Furthermore, Graphics CP CC PP
we exclude trivial tuples that consists of click and purchase of the K=10 0.21 0.16 0.16
same item. The resulted test set contains ~ 2M tuples. K=25 0.30 0.25 0.26
4.1 Model Variants K=50 0.38 0.33 0.33
We compare three variants of the proposed model. The variants K=100 0.46 0.42 0.42
differ by the tuples that the model is trained with. The set of tuples
for each model is determined by the choice of ) in Eq. (1). affected by a small number of bad examples but in most cases the
The first model dubbed ‘CP’ comprised of tuples that are created predictions are much better than the Mean PR values. This behavior
in a similar manner to the test tuples. Specifically, for a given characterizes all three models, but more dominant in the CC and
training session , we set CP models.
) = (+, ,): +, , ∈ 0. . ∧ 2 ≤ + − , ≤ C ∧ ( ,! ) = ∧ Our second evaluation is based on the Precision@K metric: For
( ,# ) = }. each model, we measure the percentage of test examples in which
As a result, in this model, > and B are the representations of the hidden item was ranked in the top K.
the clicks and purchases, respectively. Table 2 presents Precision@K values for different values of K.
The second model is dubbed ‘CC’ and comprised of the The results coincide with those of Table 1 where the CP model
sequential click events without the purchase events. The reasoning shows significantly better results across the board. Again, these
behind this model is the fact that each purchase event is preceded results emphasize that unlike most present day models, that
by a click on the same item that was purchased. Hence, by consider only one type of events, there is significant added value in
predicting the next item the user will click upon, we are also the CP approach that models click and purchase event
predicting the next item that would be purchased. Therefore, in this simultaneously.
model we set
) = (+, ,): +, , ∈ 0. . ∧ 1 ≤ + − , ≤ C ∧ ( ,! ) = ∧ 5. CONCLUSION AND FUTURE WORK
( ,# ) = }. In this work, we present and evaluate several variants of neural
The third model, dubbed ‘PP’ is comprised of sequential embedding models for predicting purchases from user activity
purchase events (without clicks). Many Collaborative Filtering sessions. The evaluation shows that learning from click-purchase
algorithms are designed to predict the next item a user will relations in different scales provide better results than learning from
purchase, given the items he already purchased. Hence, the ‘PP’ either click-click or purchase-purchase relations.
model was chosen as a baseline that follows a similar approach In future, we plan to investigate the contribution of additional
taken by many contemporary algorithms. In the ‘PP’ model we use hidden layers to the model presented in this paper and compare
) = (+, ,): +, , ∈ 0. . ∧ 1 ≤ + − , ≤ C ∧ ( ,! ) = ∧ between our model to sequential neural models such as LSTM [8]
( ,# ) = }.
for the same prediction task.
4.2 Parameters 6. REFERENCES
We used the following parameter configuration: we set the [1] Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, “A Neural
negative to positive ratio P to 15. I was set to 1e-3. C was set Probabilistic Language Model,” JMLR, vol. 3, pp. 1137–1155,
Mar. 2003.
to 4. All three models were trained for 50 iterations. It is
[2] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,
important to clarify that we experimented with different values “Distributed Representations of Words and Phrases and their
of C = 2. .10 and no significant change in the results was Compositionality,” in NIPS 26, 2013, pp. 3111–3119.
observed. [3] F. Strub, J. Mary, and R. Gaudel, “Hybrid Collaborative Filtering
with Neural Networks,” CoRR, vol. abs/1603.00806, 2016.
4.3 Evaluation Metrics and Results [4] S. Wan, Y. Lan, P. Wang, J. Guo, J. Xu, and X. Cheng, “Next
Our first evaluation is based on measuring the Percentile Ranks Basket Recommendation with Neural Networks,” in Poster
(PR) of the hidden items (purchased items). We report results in Proceedings of RecSys 2015.
terms of Mean Percentile Rank as well as Median Percentile Rank. [5] O. Barkan and N. Koenigstein, “Item2Vec : Neural Item
Table 1 summarizes the mean and median PR for the different Embedding for Collaborative Filtering,” arXiv preprint
models. The CP model clearly outperforms both the CC and PP arXiv:1603.04259, 2016.
[6] Y. Koren, R. Bell, and C. Volinsky, “Matrix Factorization
models. We therefore conclude that both previous click events as Techniques for Recommender Systems,” IEEE Computer, vol.
well as previous purchases are relevant in this prediction task. 42, no. 8, pp. 30–37, Aug. 2009.
Ignoring each of these signals, undermines the model’s ability to [7] Zhang, Yuyu, et al. "Large Scale Purchase Prediction with
detect the hidden item. Historical User Actions on B2C Online Retail Platform." arXiv
A second observation is the fact that the Median PR values are preprint arXiv:1408.6515, 2014
much better than the Mean PR values and the performance [8] WU, Zhenzhou, et al. Neural Modeling of Buying Behaviour for
difference between the models becomes smaller when the Median E-Commerce from Clicking Patterns. In: Proceedings of the 2015
PR is considered. This suggests that the Mean PR values are highly International ACM Recommender Systems Challenge. ACM,
2015. p. 12.