Modelling Session Activity with Neural Embedding

Modelling Session Activity with Neural Embedding OrenBarkanMicrosoft IsraelYael BrumerMicrosoft IsraelNoam KoenigsteinMicrosoft Modelling Session Activity with Neural Embedding 3849D3F6C34A14944B4D681C6018D522 GROBID - A machine learning software for extracting information from scholarly documents Skip-Gram Collaborative Filtering Recommender Systems

Neural embedding techniques are being applied in a growing number of machine learning applications. In this work, we demonstrate a neural embedding technique to model users' session activity. Specifically, we consider a dataset collected from Microsoft's App Store consisting of user sessions that include sequential click actions and item purchases. Our goal is to learn a latent manifold that captures users' session activity and can be utilized for contextual recommendations in an online app store.

INTRODUCTION

Neural embedding models have significantly advanced state-ofthe-art in the field of Natural Language Processing [1], [2]. In Recommender Systems research, neural networks have been applied in Collaborative Filtering (CF) [3], and basket completion [4]. Specifically, [5] presented a neural model for embedding items in a latent manifold that encodes CF information. These early works have been published very recently and indicate a growing interest in neural embedding techniques for recommendations.

In this work, we take a different direction and utilize neural embedding techniques to model users' session activity. Specifically, we consider a dataset collected from Microsoft's App Store consisting of user sessions that include sequential click actions and item purchases. Our goal is to learn a latent manifold that captures users' session activity and can be utilized for contextual recommendations of apps in an online app store.

Most prominent CF techniques such as Matrix Factorization [6] do not take into account the sequential order of user actions prior to purchasing an item. Recently, there is a lot of interest around user behavior modeling to predict purchases. One of the latest competition "Tmall Recommendation Prize" requires to predict future user purchases on Tmall website [7]. While they build user profiles to predict purchases, we try to model session behavior regardless the user profile. Another approach [8] uses LSTM-BiRNN to learn sequence clicks made at the same session to predict all purchases associated with this session, while we try to predict the next purchased item given a click action that made only in a pre-defined window before the purchase.

The underlying assumption in this work is that users consider several items prior to their ultimate decision to purchase. Hence, we model users' session activity as a sequence of click events on item detail pages and purchase events. For example, (C1, C2, C3, C4, C5, P5) denotes a user session consisting of 5 click event on 5 different items followed by a single purchase event. Note that an item purchase event is always preceded by a click event on the same item. By learning to predict these sequences, one can improve the overall user experience by recommending the items that the user is most likely to purchase.

DATASET

Our dataset collected from Microsoft's App Store consisting of user sessions that include sequential click actions and item purchases and is based on a sample of activity sessions from March to June 2016. Each action, whether it is click or purchase, is uniquely identified by the session id, timestamp and item id. From this sample, all sessions with less than two different clicked items, or without purchase event are removed. The effective dataset consists of 8,785,295 distinct sessions that contains 43,956,340 clicks and 18,838,796 purchases. On average, each session is associated with 5.003 clicks and 2.144 purchases, while the total number of distinct items is 22,139. In every session, the actions are ordered by their timestamps.

NEURAL ACTION EMBEDDING

Our model is inspired by Skip-Gram with Negative Sampling (SGNS) also known as Word2vec [2]. As explained above, we wish to model the user actions in a dataset = of K ordered user activity sequences where the i'th sequence is = , , , , … , , , and is its length. The set of all possible actions is denoted by and includes in our case click and purchase events of different items from the items catalog. We further define a function : → , that maps between an action to its type (click or purchase).

Our objective is to maximize the following term:

1 log ,! " ,# (!,#)∈' .(1)

where, ) ⊆ (+, ,): 0 ≤ , < + ≤ } is a set that contains tuples of sequential actions. The probability ,! " ,# is defined by In order to mitigate the effect of popularity and produce better modeling for unpopular items, we subsample the sessions. Specifically, we discard each action a with the probability (CD E FC| ) = 1 − HI/K(L( )) where L( ) is the item that is associated with the action , K(M) is the frequency of the item M and I is a parameter that controls how aggressive is the subsampling. Finally, the latent vectors are estimated by applying a stochastic gradient ascent with respect to the objective in Eq. (1).

EVALUATION

In this section, we describe our evaluation of the proposed model. Our prediction task is to predict the next purchased item given a click event. To this end, we split the dataset according to the session order. The first 90% sessions are used as a training set and the remaining 10% are used as a test set. For each test session, we form a set of test (C, P) tuples, where each tuple corresponds to a purchase action that is preceded by a click action. A tuple (C, P) is considered only if C and P distant by at most three other actions. For example, for a given test session (C1, P1, C2, C3, C4, P4), the tuple (C3, P4) can be made because the distance between them is a single action C4. On the other hand, the tuple (C1, P4) cannot be made since the distance between them is four actions. Furthermore, we exclude trivial tuples that consists of click and purchase of the same item. The resulted test set contains ~ 2M tuples.

Model Variants

We compare three variants of the proposed model. The variants differ by the tuples that the model is trained with. The set of tuples for each model is determined by the choice of ) in Eq. ( 1).

The first model dubbed 'CP' comprised of tuples that are created in a similar manner to the test tuples. Specifically, for a given training session , we set

) = (+, ,): +, , ∈ 0. . ∧ 2 ≤ + − , ≤ C ∧ ( ,! ) = ∧ ( ,# ) = }.

As a result, in this model, > and B are the representations of the clicks and purchases, respectively.

The second model is dubbed 'CC' and comprised of the sequential click events without the purchase events. The reasoning behind this model is the fact that each purchase event is preceded by a click on the same item that was purchased. Hence, by predicting the next item the user will click upon, we are also predicting the next item that would be purchased. Therefore, in this model we set

) = (+, ,): +, , ∈ 0. . ∧ 1 ≤ + − , ≤ C ∧ ( ,! ) = ∧ ( ,# ) = }.

The third model, dubbed 'PP' is comprised of sequential purchase events (without clicks). Many Collaborative Filtering algorithms are designed to predict the next item a user will purchase, given the items he already purchased. Hence, the 'PP' model was chosen as a baseline that follows a similar approach taken by many contemporary algorithms. In the 'PP' model we use ) = (+, ,): +, , ∈ 0. .

∧ 1 ≤ + − , ≤ C ∧ ( ,! ) = ∧ ( ,# ) = }.

Parameters

We used the following parameter configuration: we set the negative to positive ratio P to 15. I was set to 1e-3. C was set to 4. All three models were trained for 50 iterations. It is important to clarify that we experimented with different values of C = 2. .10 and no significant change in the results was observed.

Evaluation Metrics and Results

Our first evaluation is based on measuring the Percentile Ranks (PR) of the hidden items (purchased items). We report results in terms of Mean Percentile Rank as well as Median Percentile Rank.

Table 1 summarizes the mean and median PR for the different models. The CP model clearly outperforms both the CC and PP models. We therefore conclude that both previous click events as well as previous purchases are relevant in this prediction task. Ignoring each of these signals, undermines the model's ability to detect the hidden item.

A second observation is the fact that the Median PR values are much better than the Mean PR values and the performance difference between the models becomes smaller when the Median PR is considered. This suggests that the Mean PR values are highly Our second evaluation is based on the Precision@K metric: For each model, we measure the percentage of test examples in which the hidden item was ranked in the top K.

Table 2 presents Precision@K values for different values of K. The results coincide with those of Table 1 where the CP model shows significantly better results across the board. Again, these results emphasize that unlike most present day models, that consider only one type of events, there is significant added value in the CP approach that models click and purchase event simultaneously.

CONCLUSION AND FUTURE WORK

In this work, we present and evaluate several variants of neural embedding models for predicting purchases from user activity sessions. The evaluation shows that learning from click-purchase relations in different scales provide better results than learning from either click-click or purchase-purchase relations.

In future, we plan to investigate the contribution of additional hidden layers to the model presented in this paper and compare between our model to sequential neural models such as LSTM [8] for the same prediction task.

2 3 ∈ >(⊂ ℝ A ) and 6 3 ∈ B(⊂ ℝ A ) are latent vectors corresponding to the target and context representation of action . The parameter m is chosen empirically through cross-validation. N is a parameter that determines the number of negative examples to be drawn per a positive example. A negative action is sampled from a distribution that is proportional to the frequency of the item that is associated with . In this work, we use the unigram distribution raised to the 3/4rd power.

Table 1 . Percentile Rank (PR) of the hidden (purchased) item1CPCCPPMean PR6.49%11.84%10.91%Median PR0.72%0.89%0.89%

Table 2 . Precision@K values for difference models2GraphicsCPCCPPK=100.210.160.16K=250.300.250.26K=500.380.330.33K=1000.460.420.42affected by a small number of bad examples but in most cases thepredictions are much better than the Mean PR values. This behaviorcharacterizes all three models, but more dominant in the CC andCP models.

<author> <persName><surname>References</surname></persName> </author> <imprint/> </monogr> </biblStruct> <biblStruct xml:id="b1"> <analytic> <title level="a" type="main">A Neural Language Model YBengio RDucharme PVincent CJanvin JMLR 3 Mar. 2003 Distributed Representations of Words and Phrases and their Compositionality TMikolov ISutskever KChen GSCorrado JDean NIPS 26 2013 Hybrid Collaborative Filtering with Neural Networks FStrub JMary RGaudel . abs/1603.00806 CoRR 2016 Next Basket Recommendation with Neural Networks SWan YLan PWang JGuo JXu XCheng arXiv:1603.04259 Item2Vec : Neural Item Embedding for Collaborative Filtering OBarkan NKoenigstein 2016 arXiv preprint Poster Proceedings of RecSys 2015 Matrix Factorization Techniques for Recommender Systems YKoren RBell CVolinsky IEEE Computer 42 8 Aug. 2009 Large Scale Purchase Prediction with Historical User Actions on B2C Online Retail Platform YuyuZhang arXiv:1408.6515 2014 arXiv preprint Neural Modeling of Buying Behaviour for E-Commerce from Clicking Patterns ZhenzhouWu Proceedings of the 2015 International ACM Recommender Systems Challenge the 2015 International ACM Recommender Systems Challenge ACM 2015 12