<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Modelling Session Activity with Neural Embedding</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Oren</forename><forename type="middle">Barkan</forename><surname>Microsoft</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Israel</forename><surname>Yael</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Brumer</forename><surname>Microsoft</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Israel</forename><surname>Noam</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Koenigstein</forename><surname>Microsoft</surname></persName>
						</author>
						<title level="a" type="main">Modelling Session Activity with Neural Embedding</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">3849D3F6C34A14944B4D681C6018D522</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-19T15:55+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Skip-Gram</term>
					<term>Collaborative Filtering</term>
					<term>Recommender Systems</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Neural embedding techniques are being applied in a growing number of machine learning applications. In this work, we demonstrate a neural embedding technique to model users' session activity. Specifically, we consider a dataset collected from Microsoft's App Store consisting of user sessions that include sequential click actions and item purchases. Our goal is to learn a latent manifold that captures users' session activity and can be utilized for contextual recommendations in an online app store.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>Neural embedding models have significantly advanced state-ofthe-art in the field of Natural Language Processing <ref type="bibr" target="#b1">[1]</ref>, <ref type="bibr" target="#b2">[2]</ref>. In Recommender Systems research, neural networks have been applied in Collaborative Filtering (CF) <ref type="bibr" target="#b3">[3]</ref>, and basket completion <ref type="bibr" target="#b4">[4]</ref>. Specifically, [5] presented a neural model for embedding items in a latent manifold that encodes CF information. These early works have been published very recently and indicate a growing interest in neural embedding techniques for recommendations.</p><p>In this work, we take a different direction and utilize neural embedding techniques to model users' session activity. Specifically, we consider a dataset collected from Microsoft's App Store consisting of user sessions that include sequential click actions and item purchases. Our goal is to learn a latent manifold that captures users' session activity and can be utilized for contextual recommendations of apps in an online app store.</p><p>Most prominent CF techniques such as Matrix Factorization <ref type="bibr">[6]</ref> do not take into account the sequential order of user actions prior to purchasing an item. Recently, there is a lot of interest around user behavior modeling to predict purchases. One of the latest competition "Tmall Recommendation Prize" requires to predict future user purchases on Tmall website <ref type="bibr" target="#b6">[7]</ref>. While they build user profiles to predict purchases, we try to model session behavior regardless the user profile. Another approach <ref type="bibr" target="#b7">[8]</ref> uses LSTM-BiRNN to learn sequence clicks made at the same session to predict all purchases associated with this session, while we try to predict the next purchased item given a click action that made only in a pre-defined window before the purchase.</p><p>The underlying assumption in this work is that users consider several items prior to their ultimate decision to purchase. Hence, we model users' session activity as a sequence of click events on item detail pages and purchase events. For example, (C1, C2, C3, C4, C5, P5) denotes a user session consisting of 5 click event on 5 different items followed by a single purchase event. Note that an item purchase event is always preceded by a click event on the same item. By learning to predict these sequences, one can improve the overall user experience by recommending the items that the user is most likely to purchase.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">DATASET</head><p>Our dataset collected from Microsoft's App Store consisting of user sessions that include sequential click actions and item purchases and is based on a sample of activity sessions from March to June 2016. Each action, whether it is click or purchase, is uniquely identified by the session id, timestamp and item id. From this sample, all sessions with less than two different clicked items, or without purchase event are removed. The effective dataset consists of 8,785,295 distinct sessions that contains 43,956,340 clicks and 18,838,796 purchases. On average, each session is associated with 5.003 clicks and 2.144 purchases, while the total number of distinct items is 22,139. In every session, the actions are ordered by their timestamps.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">NEURAL ACTION EMBEDDING</head><p>Our model is inspired by Skip-Gram with Negative Sampling (SGNS) also known as Word2vec <ref type="bibr" target="#b2">[2]</ref>. As explained above, we wish to model the user actions in a dataset = of K ordered user activity sequences where the i'th sequence is = , , , , … , , , and is its length. The set of all possible actions is denoted by and includes in our case click and purchase events of different items from the items catalog. We further define a function : → , that maps between an action to its type (click or purchase).</p><p>Our objective is to maximize the following term:</p><formula xml:id="formula_0">1 log ,! " ,# (!,#)∈' .<label>(1)</label></formula><p>where, ) ⊆ (+, ,): 0 ≤ , &lt; + ≤ } is a set that contains tuples of sequential actions. The probability ,! " ,# is defined by In order to mitigate the effect of popularity and produce better modeling for unpopular items, we subsample the sessions. Specifically, we discard each action a with the probability (CD E FC| ) = 1 − HI/K(L( )) where L( ) is the item that is associated with the action , K(M) is the frequency of the item M and I is a parameter that controls how aggressive is the subsampling. Finally, the latent vectors are estimated by applying a stochastic gradient ascent with respect to the objective in Eq. (1).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">EVALUATION</head><p>In this section, we describe our evaluation of the proposed model. Our prediction task is to predict the next purchased item given a click event. To this end, we split the dataset according to the session order. The first 90% sessions are used as a training set and the remaining 10% are used as a test set. For each test session, we form a set of test (C, P) tuples, where each tuple corresponds to a purchase action that is preceded by a click action. A tuple (C, P) is considered only if C and P distant by at most three other actions. For example, for a given test session (C1, P1, C2, C3, C4, P4), the tuple (C3, P4) can be made because the distance between them is a single action C4. On the other hand, the tuple (C1, P4) cannot be made since the distance between them is four actions. Furthermore, we exclude trivial tuples that consists of click and purchase of the same item. The resulted test set contains ~ 2M tuples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Model Variants</head><p>We compare three variants of the proposed model. The variants differ by the tuples that the model is trained with. The set of tuples for each model is determined by the choice of ) in Eq. ( <ref type="formula" target="#formula_0">1</ref>).</p><p>The first model dubbed 'CP' comprised of tuples that are created in a similar manner to the test tuples. Specifically, for a given training session , we set</p><formula xml:id="formula_1">) = (+, ,): +, , ∈ 0. . ∧ 2 ≤ + − , ≤ C ∧ ( ,! ) = ∧ ( ,# ) = }.</formula><p>As a result, in this model, &gt; and B are the representations of the clicks and purchases, respectively.</p><p>The second model is dubbed 'CC' and comprised of the sequential click events without the purchase events. The reasoning behind this model is the fact that each purchase event is preceded by a click on the same item that was purchased. Hence, by predicting the next item the user will click upon, we are also predicting the next item that would be purchased. Therefore, in this model we set</p><formula xml:id="formula_2">) = (+, ,): +, , ∈ 0. . ∧ 1 ≤ + − , ≤ C ∧ ( ,! ) = ∧ ( ,# ) = }.</formula><p>The third model, dubbed 'PP' is comprised of sequential purchase events (without clicks). Many Collaborative Filtering algorithms are designed to predict the next item a user will purchase, given the items he already purchased. Hence, the 'PP' model was chosen as a baseline that follows a similar approach taken by many contemporary algorithms. In the 'PP' model we use ) = (+, ,): +, , ∈ 0. .</p><formula xml:id="formula_3">∧ 1 ≤ + − , ≤ C ∧ ( ,! ) = ∧ ( ,# ) = }.</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Parameters</head><p>We used the following parameter configuration: we set the negative to positive ratio P to 15. I was set to 1e-3. C was set to 4. All three models were trained for 50 iterations. It is important to clarify that we experimented with different values of C = 2. .10 and no significant change in the results was observed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Evaluation Metrics and Results</head><p>Our first evaluation is based on measuring the Percentile Ranks (PR) of the hidden items (purchased items). We report results in terms of Mean Percentile Rank as well as Median Percentile Rank.</p><p>Table <ref type="table" target="#tab_0">1</ref> summarizes the mean and median PR for the different models. The CP model clearly outperforms both the CC and PP models. We therefore conclude that both previous click events as well as previous purchases are relevant in this prediction task. Ignoring each of these signals, undermines the model's ability to detect the hidden item.</p><p>A second observation is the fact that the Median PR values are much better than the Mean PR values and the performance difference between the models becomes smaller when the Median PR is considered. This suggests that the Mean PR values are highly  Our second evaluation is based on the Precision@K metric: For each model, we measure the percentage of test examples in which the hidden item was ranked in the top K.</p><p>Table <ref type="table" target="#tab_1">2</ref> presents Precision@K values for different values of K. The results coincide with those of Table <ref type="table" target="#tab_0">1</ref> where the CP model shows significantly better results across the board. Again, these results emphasize that unlike most present day models, that consider only one type of events, there is significant added value in the CP approach that models click and purchase event simultaneously.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">CONCLUSION AND FUTURE WORK</head><p>In this work, we present and evaluate several variants of neural embedding models for predicting purchases from user activity sessions. The evaluation shows that learning from click-purchase relations in different scales provide better results than learning from either click-click or purchase-purchase relations.</p><p>In future, we plan to investigate the contribution of additional hidden layers to the model presented in this paper and compare between our model to sequential neural models such as LSTM <ref type="bibr" target="#b7">[8]</ref> for the same prediction task.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>2 3 ∈ &gt;(⊂ ℝ A ) and 6 3 ∈ B(⊂ ℝ A ) are latent vectors corresponding to the target and context representation of action . The parameter m is chosen empirically through cross-validation. N is a parameter that determines the number of negative examples to be drawn per a positive example. A negative action is sampled from a distribution that is proportional to the frequency of the item that is associated with . In this work, we use the unigram distribution raised to the 3/4rd power.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 . Percentile Rank (PR) of the hidden (purchased) item</head><label>1</label><figDesc></figDesc><table><row><cell></cell><cell>CP</cell><cell>CC</cell><cell>PP</cell></row><row><cell>Mean PR</cell><cell>6.49%</cell><cell>11.84%</cell><cell>10.91%</cell></row><row><cell>Median PR</cell><cell>0.72%</cell><cell>0.89%</cell><cell>0.89%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 . Precision@K values for difference models</head><label>2</label><figDesc></figDesc><table><row><cell>Graphics</cell><cell>CP</cell><cell>CC</cell><cell>PP</cell></row><row><cell>K=10</cell><cell>0.21</cell><cell>0.16</cell><cell>0.16</cell></row><row><cell>K=25</cell><cell>0.30</cell><cell>0.25</cell><cell>0.26</cell></row><row><cell>K=50</cell><cell>0.38</cell><cell>0.33</cell><cell>0.33</cell></row><row><cell>K=100</cell><cell>0.46</cell><cell>0.42</cell><cell>0.42</cell></row><row><cell cols="4">affected by a small number of bad examples but in most cases the</cell></row><row><cell cols="4">predictions are much better than the Mean PR values. This behavior</cell></row><row><cell cols="4">characterizes all three models, but more dominant in the CC and</cell></row><row><cell>CP models.</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><surname>References</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A Neural Language Model</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ducharme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vincent</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Janvin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">JMLR</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="1137" to="1155" />
			<date type="published" when="2003-03">Mar. 2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Distributed Representations of Words and Phrases and their Compositionality</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">NIPS</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="page" from="3111" to="3119" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Hybrid Collaborative Filtering with Neural Networks</title>
		<author>
			<persName><forename type="first">F</forename><surname>Strub</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gaudel</surname></persName>
		</author>
		<idno>. abs/1603.00806</idno>
	</analytic>
	<monogr>
		<title level="j">CoRR</title>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Next Basket Recommendation with Neural Networks</title>
		<author>
			<persName><forename type="first">S</forename><surname>Wan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Cheng</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1603.04259</idno>
	</analytic>
	<monogr>
		<title level="m">Item2Vec : Neural Item Embedding for Collaborative Filtering</title>
				<editor>
			<persName><forename type="first">O</forename><surname>Barkan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Koenigstein</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
	<note>Poster Proceedings of RecSys 2015</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Matrix Factorization Techniques for Recommender Systems</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Koren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Volinsky</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Computer</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="30" to="37" />
			<date type="published" when="2009-08">Aug. 2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Large Scale Purchase Prediction with Historical User Actions on B2C Online Retail Platform</title>
		<author>
			<persName><forename type="first">Yuyu</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1408.6515</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Neural Modeling of Buying Behaviour for E-Commerce from Clicking Patterns</title>
		<author>
			<persName><forename type="first">Zhenzhou</forename><surname>Wu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 International ACM Recommender Systems Challenge</title>
				<meeting>the 2015 International ACM Recommender Systems Challenge</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page">12</biblScope>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
