<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TRACE: Transformer-based user Representations from Attributed Clickstream Event sequences</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>William Black</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Manlove</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jack Pennington</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Marchini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ercument Ilhan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vilda Markeviciute</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Expedia Group</institution>
          ,
          <addr-line>407 St John St, London EC1V 4EX</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>For users navigating travel e-commerce websites, the process of researching products and making a purchase often results in intricate browsing patterns that span numerous sessions over an extended period of time. The resulting clickstream data chronicle these user journeys and present valuable opportunities to derive insights that can significantly enhance personalized recommendations. We introduce TRACE, a novel transformerbased approach tailored to generate rich user embeddings from live multi-session clickstreams for real-time recommendation applications. Prior works largely focus on single-session product sequences, whereas TRACE leverages site-wide page view sequences spanning multiple user sessions to model long-term engagement. Employing a multi-task learning framework, TRACE captures comprehensive user preferences and intents distilled into low-dimensional representations. We demonstrate TRACE's superior performance over vanilla transformer and LLM-style architectures through extensive experiments on a large-scale travel e-commerce dataset of real user journeys, where the challenges of long page-histories and sparse targets are particularly prevalent. Visualizations of the learned embeddings reveal meaningful clusters corresponding to latent user states and behaviors, highlighting TRACE's potential to enhance recommendation systems by capturing nuanced user interactions and preferences.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;transformers</kwd>
        <kwd>user embeddings</kwd>
        <kwd>clickstream data</kwd>
        <kwd>multi-task</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        On tourism e-commerce websites users often exhibit complex navigation patterns whilst they browse
travel and accommodation options before making a purchase. A typical user could land on the homepage,
search for a flight then bounce, only to return a few days later to browse hotels and then purchase a
package holiday. The resulting clickstream data captures these intricate journeys and ofers valuable
insights into users’ behaviour and intentions. By harnessing this data and better understanding users’
latent psychological states and preferences, we can significantly enhance personalized experiences by
matching them with more relevant content [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1, 2, 3, 4, 5</xref>
        ] and adapting the experience to better suit their
context [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For instance, users earlier in their search can be presented with more exploratory content,
as compared to users nearer the end of the purchase funnel.
      </p>
      <p>However, achieving this level of personalization can be challenging as user journeys often span
multiple sessions over an extended period of time, and specific goals, such as completing a purchase,
occur infrequently within this window. This is a particularly pertinent challenge within the tourism
industry as users will often only make one booking a year, which can take weeks of searching and
planning before purchasing it months in advance.</p>
      <p>In this work, we present TRACE (Transformer-based Representations of Attributed Clickstream Event
sequences), a novel approach for generating rich user embeddings from live multi-session clickstream
data with sparse targets. TRACE employs a multi-task learning (MTL) framework, where a lightweight
transformer encoder is trained to predict multiple user engagement targets based on sequences of
attributed clickstream events. By jointly predicting a diverse set of user future engagement signals, the
model is encouraged to learn robust versatile representations. We demonstrate its efectiveness using a
real-world travel e-commerce dataset.</p>
      <p>
        Numerous works have explored the use of statistical and machine learning techniques on clickstreams
to mine patterns [
        <xref ref-type="bibr" rid="ref6">6, 7, 8</xref>
        ] or cluster user behaviors [9, 10, 11, 12] for analytical insights or motivating
recommendations. Comparable works have also investigated neural and MTL approaches to user
modeling, but typically focus on product-level interactions or single session sequences [13, 14, 15, 16].
TRACE instead ingests live clickstream data and addresses more general sequences of site-wide page
views spanning multiple sessions in order to obtain rich user journey representations for real-time
downstream applications. PinnerFormer [17] notably uses a transformer, but relies on previously
learned embeddings and abundant pin-based interactions. TRACE learns directly and exclusively from
the sequence of attributed page views and employs a MTL approach to overcome sparse engagement
signals. Zhuang et al. [18] studied attributes at the sequence level, whereas TRACE is more granular
and addresses attributes at the event-level. Where Rahmani et al. [19] incorporated temporal signals in
sequential recommendations, TRACE instead adopts learnable positional encodings which capture both
event and session positions.
      </p>
      <p>Overall, the key distinction of TRACE is the use of a transformer-based MTL framework with
eventsession position encoding to generate versatile user embeddings from enriched multi-session clickstream
sequences with event-level attributes, which has not been explored in depth by previous research nor
applied to travel e-commerce.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <sec id="sec-2-1">
        <title>2.1. Problem Formulation</title>
        <p>Each time a user visits a new page it is logged in a clickstream as a page view event  ∈ , characterized
by a small set of contextual features including the page name and timestamp  . These events collectively
form user sessions , representing ordered sequences of the pages visited within defined time intervals.
Formally, a session  = {0, 1, ...,  }, where  denotes the th page the user visited in this session,
subject to the condition that
 − − 1 ≤ ,
∀ ∈ [1,  ].</p>
        <p>(1)</p>
        <p>Here  is a fixed constant, often in the order of magnitude of a few hours. If the diference in
timestamps of two sequential page view events is greater than  , the latter is considered to be in a new
session.</p>
        <p>Then for each user, we define their journey  as the chronological sequence of their sessions, where
 = {0, 1, ..., }, with  representing their th session. In this way a journey  is the sequence of
pages a user has visited across multiple sessions. We use a corpus of user journeys  = {0, 1, ...}
captured on a large-scale travel e-commerce site over a few months, where | | &gt; 50M, and the
vocabulary exceeds 1000 page names.</p>
        <p>Our objective is to predict future engagement of users using their past navigation patterns on the
website. Formally, we want to learn a model  :  → R for some positive integer , which summarises
these journeys in rich low dimensional representations that can then be used for downstream machine
learning applications, such as content personalisation and product recommendations. As such the
model  must satisfy three main requirements:
1. Efectively capture the intricate page navigation patterns in users’ journeys which span multiple
sessions.
2. Meaningfully distill user journeys into embeddings that can predict engagement across diverse
tasks and contexts.
3. Scale eficiently to accommodate high-trafic real-time production environments.</p>
        <p>...</p>
        <p>Session k-1
...</p>
        <p>Session k
...</p>
        <p>Session k+1
...</p>
        <p>TmimeFe</p>
        <p>e Feaatuturreess</p>
        <p>ITnInippuuttJJoouurrnneeyy</p>
        <sec id="sec-2-1-1">
          <title>FFFeeaeataututrurereeEEEnngngigninieneeeererinrinigngg</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>Event PositionEEnEncnoccododindegerr</title>
          <p>EEvevennttPPoosistiiotionn</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>FutuTTrimeimeEenFFgeeaaagtuteurmreesesnt</title>
          <p>Target Extraction
Journey
Embedding</p>
          <p>Transformer Encoder Block</p>
          <p>Feed Forward Network</p>
          <p>Task 1 Task 2 Task 3 Task 4 Task 5
Dense Layer Dense Layer Dense Layer Dense Layer Dense Layer</p>
          <p>Multi Task Loss</p>
          <p>To generate our datasets, we split each journey at a random point and designate the pages before as
the input journey, and those after to be used for target generation.</p>
          <p>In our proposed approach, TRACE, we train a multi-task transformer. This model takes as input
some sequence of pages in the form of some journey  , and predicts a cohort of future user engagement
targets. We extract the output of the final layer of the shared backbone of the model as the journey
embedding  ∈ R. We hypothesise that if the embeddings are predictive across a cohort of diverse
user engagement tasks, they will capture a generalised understanding of a user’s diverse intents. Figure
1 illustrates the components of TRACE. We address each in more detail below.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Feature Engineering and Position Encoding</title>
        <p>We first crop each input journey, taking up to the  most recent page view events, where  is chosen
in a way to capture most users’ entire recent page view history.</p>
        <p>Each page view event  has a set of categorical attributes, such as a page name and the user’s device
type, which are passed through their own learnable embedding layer to produce a dense representation
in R32. We engineer two features from the event timestamp; the time interval between consecutive
events and the time elapsed until the most recent event, both logged and standard scaled. Additionally
we encode session ID where events in the th most recent session are given value . These time-based
features aim to capture planning phases and session gaps common in extended travel user journeys. All
features are standard scaled and concatenated s.t. each  is now represented by a vector in R, where
 is approximately a few hundred.</p>
        <p>We also enumerate the event position, where the th most recent event in the entire journey is given
value . Then both the event and session position indexes are independently embedded in R via
their own learnable layers, and added onto the final feature vector, acting as an event-session position
encoding. This was designed to allow the model to learn representations specific to session and position
combinations, enabling it to capture dynamics both within and across multiple sessions more efectively.</p>
        <p>For input journeys of length less than  we pad the  features with value 0. As such each journey 
can now be encoded as some matrix  ∈ R× .</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Model Architecture</title>
        <p>In TRACE, we use a transformer encoder architecture to process the input sequences of pages, and train
it in a multi-task regime across five diferent targets, representing a variety of future user engagement
signals.</p>
        <p>An encoded journey  ∈ R×  is passed through a single transformer encoder block, constisting of
a multi-head self-attention layer with 8 heads followed by a position-wise fully connected feed-forward
network (FFN) with an intermediate dimension of 128. We employ dropout and a residual connection
around each of the two sub-layers, followed by layer normalization. Global max pooling is applied to
the output of this encoder block, before being forward passed through a FFN. For an input journey 
the output of this shared backbone is some  ∈ R.</p>
        <p>This tensor  is then passed through five separate task-specific dense layers, each compressing
down to a scalar value so the final output of the model is some five logits y^ ∈ R5. After training
we then remove the five task-specific heads, and take the output of the shared backbone  as the
journey embedding. We deliberately restrict the heads to be simple logistic regression layers. This
approach encourages the shared backbone to capture most of the nuance, ensuring the embeddings are
information-rich and generalizable, as opposed to relying too heavily on the task-specific layers.</p>
        <p>Throughout the architecture we use ReLU activations, except for the final shared dense layer where
sigmoid is used for its desirable bounding property. This ensures normalization of the output embedding,
with our experiments demonstrating no performance loss. We set dimension  = 32 for the embedding,
which is well suited for downstream applications.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Multi Task Training Regime and Objective</title>
        <p>The motivation behind the MTL approach is that by jointly predicting a diverse set of user engagement
signals, the model is encouraged to learn comprehensive and generalizable representations that can be
efectively utilized across a variety of downstream applications, extending beyond just the tasks during
training. Furthermore, by mixing the infrequent targets such as purchases, with more common events
like product searches, the model learns from a stronger signal and as our results show perform better
on those sparse tasks. This is especially advantageous in the travel domain for events such as bookings,
as demonstrated in our experiments.</p>
        <p>The model is trained on five binary classification tasks which represent potential future actions of
a user: (PW2) Make any purchase within two weeks; (BN5) Bounce within next five pages, and the
following which relate to actions within rest of session; (SRP) Make a search for a product; (PDP) View
a product details page; and (VUO) View an upcoming order. For more details on the metrics used in
training see Table 1. Each task head has its own class-weighted binary cross-entropy loss function. The
overall objective is expressed as a linear combination of these task-specific losses. For a journey  with
model prediction y^ and true labels y, the loss is defined as:
ℒ(, y) = −</p>
        <p>5
∑︁ [ ·  · log(^) + (1 − ) · log(1 − ^)] .
=1
(2)
Class weights  are computed as the reciprocal of the proportion of positive samples for each task ,
in order to account for task-specific class imbalance. We weights tasks equally to encourage the model
to develop features which generalize across each task.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Results</title>
      <sec id="sec-3-1">
        <title>3.1. Downstream Embedding Evaluation</title>
        <p>Supervised probing techniques have previously been developed to assess linguistic embeddings [20,
21, 22], although are not directly suited to this scenario. We instead propose a downstream strategy
for evaluating the richness of information contained within a set of embeddings. After training, we
compute ground truth targets on an unseen test set of historical user journeys. These targets seek
to encapsulate users’ latent psychological states and future intentions. For this, we use the same five
tasks from the TRACE objective in eqn. 2, and introduce three more evaluation tasks that were not
previously seen. These include: (PWS) whether a user converts in the current session; (HOM) returns
to the homepage in the current session; and (RE7) whether they return to the site within seven days.
For more details see Table 1. This captures a broad scope of user outcomes, allowing us to characterize
how well the embeddings generalize.</p>
        <p>We pass the unseen test journeys through the model to obtain a corresponding set of embeddings. Next,
we train XGBoost models [23] on these test set embeddings. We fit one XGBoost model independently to
each unique evaluation task and optimize hyperparameters, such as max_depth and learning_rate, using
K-fold cross validation. The trained XGBoosts then undergo evaluation and we compute performance
metrics on the model predictions. These metrics serve as proxies for assessing the richness of embeddings
and exemplify downstream model performance across various use cases. Throughout this section, we
evaluate each upstream embedding model by using the same procedure on the same unseen test set.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Comparable Models vs TRACE</title>
        <p>We evaluate the quality of TRACE embeddings against several comparable approaches. We express our
comparisons as the mean uplift taken over all evaluation tasks. Results are shown in Table 2.</p>
        <p>Myopic Baseline. Our baseline predicts targets using explicit attributes from only the most recent
event. We report all results as percentage uplifts from this. TRACE significantly outperforms this,
highlighting the benefits of mining a user’s full navigation history.</p>
        <p>Single Task Cohort. To demonstrate the efectiveness of TRACE’s MTL approach, we trained a
dedicated single-task transformer for each of the evaluation tasks. These models each produce an
embedding. In Table 3, the TRACE score on a given task is compared to the corresponding dedicated
ST model embedding’s score. Overall results show that the TRACE embeddings outperform every
task-specific equivalent on the 5 tasks TRACE was trained on, and even wins on all but one of the
unseen targets, demonstrating the advantages of the MTL approach.</p>
        <p>Single Task Aggregated. Here we combine the task-specific models’ embeddings into a single
embedding of the same length by taking the mean along each dimension.</p>
        <p>Multi-Task LSTM. We note the demonstrated eficacy of LSTMs in related works [ 14, 18, 24, 25].
We train a comparable LSTM minimizing the same multi-task objective function shown in (2).</p>
        <p>Mini-GPT. We train a small GPT-style model [26] on the page name sequences, with a single
transformer block and causal masking in the attention layer for next event prediction. Embeddings are
computed from the mean of the transformer block outputs.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Ablation Experiments</title>
        <p>In Table 4 we list the results of our ablation studies.
3.3.1. Position Encodings
In section 2.2, we discussed our approach to position encoding which is designed to tackle event
sequences over multiple sessions. Static trigonometric position encodings are also widely popular
[13, 27]. We trained a variant including this additional encoding, but found better performance without
it.
3.3.2. Number of Encoders
Here, we vary the number of transformer encoders, ℎ. Our results suggest that ℎ = 1 encoder is
suficient for capturing the structure of the data, likely due to our sequences being of relatively shorter
length with small vocabulary compared to typical NLP applications [27]. We measured the time taken
for the forward pass in each variant. The experiments were conducted on a system equipped with an
Nvidia T4 Tensor Core GPU (16 GiB VRAM) and an Intel Xeon processor (32 vCPUs, 128 GiB RAM, 2.5
GHz clock speed). We repeat the model call 10,000 times and measure the mean and standard deviation
for various encoder configurations. The results are as follows:
• ℎ = 1 encoder: 27.5 ms ± 0.1 ms
• ℎ = 2 encoders: 40.8 ms ± 0.1 ms
• ℎ = 3 encoders: 54.7 ms ± 0.6 ms
• ℎ = 4 encoders: 67.7 ms ± 0.4 ms</p>
        <p>Our final model design used only a single encoder ℎ = 1, which is suficiently fast taking only 27.5
milliseconds on average for the forward pass. This is well within our self-imposed upper limit of 100ms
latency, which we find to be practical for real-time applications.
Trigonometric
1†
2
3
4</p>
        <p>Timestamp &amp; Session†</p>
        <p>Timestamp</p>
        <p>None
+7.23
+6.64
+6.87
+6.84
+6.84
+6.62
+6.36
+13.58
+12.22
+12.62
+12.76
+12.76
+12.24
+12.0
†Final variant used in proposed TRACE model.
3.3.3. Chronological Features
To better understand the specific performance gains from chronological features, we train variants
which omit these. The "Timestamp" variant retains event timestamps but removes session ID, thereby
eliminating explicit information about session continuity. The "None" variant excludes both session
IDs and timestamps, retaining only the sequential order of events. Results demonstrate that including
timestamp features enhances performance, but the greatest improvement arises from incorporating
TRACE’s session encoding on top of this, as used in our final variant. This highlights the efectiveness of
TRACE in exploiting the multi-session structure of the sequences, and its significance for applications
in e-commerce recommendation systems.</p>
        <p>Trips Overview
Bounced
Product Details Page
Booking Form
Booking Confirmation
Search Results Page
Home Page</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Visualisation of Learned Embeddings</title>
        <p>In Fig. 2, we present a visualization of the 32-dimensional embeddings learned by TRACE, reduced to 2
dimensions using t-SNE [28]. This subset of observations was uniformly sampled with respect to users’
next visited page, ensuring equal representation from seven common pages. We note the emergence of
clusters corresponding to the next page visited by users, despite TRACE never being explicitly exposed
to this information during training. Qualitatively, the clusters appear to loosely align with how a user
traverses a website, going from homepage at the bottom progressing through to search and product
pages, before reaching checkout and order confirmation. This underscores TRACE’s ability to identify
and encode patterns in user journeys, showcasing the efectiveness of our approach for generating
information-rich embeddings.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this work, we have presented TRACE, a novel approach for generating user embeddings from
multi-session page view sequences through a multi-task learning (MTL) framework, which employs a
lightweight, encoder-only transformer to process real-time cross-session clickstream data. Our
experiments on a large-scale, real-world travel e-commerce dataset demonstrate the superior performance
of TRACE embeddings compared to traditional single-task and LSTM-based models, and highlight its
potential for enhancing tourism recommender systems. The learned embeddings exhibit strong results
on a diverse set of targets and demonstrate the ability to generalize well to unseen tasks, underscoring
their utility for applications like content personalization and user modeling. Visualizations reveal that
TRACE can efectively capture meaningful clusters corresponding to latent user intents and behaviors.</p>
      <p>To reinforce the performance of TRACE, we plan to publish results showing its strength on a public
e-commerce user-journey dataset produced by Coveo [29]. Although this dataset is neither multi-session
nor tourism-specific, its user journeys exhibit comparable navigation patterns, which will underscore
the robustness of the TRACE architecture. Additionally, we intend to integrate these embeddings into
our in-house recommendation systems and evaluate their efectiveness in online experiments.</p>
      <p>In the future, we plan to explore the integration of LLMs, as in [30, 31], and investigate hierarchical
models to further improve the model’s representational capacity.
[7] S. D. Bernhard, C. K. Leung, V. J. Reimer, J. Westlake, Clickstream prediction using sequential
stream mining techniques with markov chains, in: Proceedings of the 20th international database
engineering &amp; applications symposium, 2016, pp. 24–33.
[8] Y. S. Kim, B.-J. Yum, Recommender system based on click stream data using association rule
mining, Expert Systems with Applications 38 (2011) 13320–13327.
[9] G. Wang, X. Zhang, S. Tang, H. Zheng, B. Y. Zhao, Unsupervised clickstream clustering for user
behavior analysis, in: Proceedings of the 2016 CHI conference on human factors in computing
systems, 2016, pp. 225–236.
[10] Q. Su, L. Chen, A method for discovering clusters of e-commerce interest patterns using
clickstream data, electronic commerce research and applications 14 (2015) 1–13.
[11] J. Wei, Z. Shen, N. Sundaresan, K.-L. Ma, Visual cluster exploration of web clickstream data, in:
2012 IEEE conference on visual analytics science and technology (VAST), IEEE, 2012, pp. 3–12.
[12] M. Zavali, E. Lacka, J. De Smedt, Shopping hard or hardly shopping: Revealing consumer segments
using clickstream data, IEEE Transactions on Engineering Management 70 (2021) 1353–1364.
[13] H. Bai, D. Liu, T. Hirtz, A. Boulenger, Expressive user embedding from churn and recommendation
multi-task learning, in: Companion Proceedings of the ACM Web Conference 2023, 2023, pp.
37–40.
[14] M. Alves Gomes, R. Meyes, P. Meisen, T. Meisen, Will this online shopping session succeed?
predicting customer’s purchase intention using embeddings, in: Proceedings of the 31st ACM
international conference on information &amp; knowledge management, 2022, pp. 2873–2882.
[15] B. Requena, G. Cassani, J. Tagliabue, C. Greco, L. Lacasa, Shopper intent prediction from clickstream
e-commerce data with minimal browsing information, Scientific reports 10 (2020) 16983.
[16] C. H. Tan, A. Chan, M. Haldar, J. Tang, X. Liu, M. Abdool, H. Gao, L. He, S. Katariya, Optimizing
airbnb search journey with multi-task learning, in: Proceedings of the 29th ACM SIGKDD
Conference on Knowledge Discovery and Data Mining, KDD ’23, ACM, 2023. URL: http://dx.doi.
org/10.1145/3580305.3599881. doi:10.1145/3580305.3599881.
[17] N. Pancha, A. Zhai, J. Leskovec, C. Rosenberg, Pinnerformer: Sequence modeling for user
representation at pinterest, in: Proceedings of the 28th ACM SIGKDD conference on knowledge
discovery and data mining, 2022, pp. 3702–3712.
[18] Z. Zhuang, X. Kong, R. Elke, J. Zouaoui, A. Arora, Attributed sequence embedding, in: 2019 IEEE</p>
      <p>International Conference on Big Data (Big Data), IEEE, 2019, pp. 1723–1728.
[19] M. Rahmani, J. Caverlee, F. Wang, Incorporating time in sequential recommendation models, in:</p>
      <p>Proceedings of the 17th ACM Conference on Recommender Systems, 2023, pp. 784–790.
[20] I. Tenney, P. Xia, B. Chen, A. Wang, A. Poliak, R. T. McCoy, N. Kim, B. Van Durme, S. R. Bowman,
D. Das, et al., What do you learn from context? probing for sentence structure in contextualized
word representations, in: 7th International Conference on Learning Representations, ICLR 2019,
2019.
[21] J. Hewitt, C. D. Manning, A structural probe for finding syntax in word representations, in:
Proceedings of the 2019 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019,
pp. 4129–4138.
[22] J. Hewitt, P. Liang, Designing and interpreting probes with control tasks, 2019.
[23] T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd acm
sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
[24] D. Koehn, S. Lessmann, M. Schaal, Predicting online shopping behaviour from clickstream data
using deep learning, Expert Systems with Applications 150 (2020) 113342.
[25] C. O. Sakar, S. O. Polat, M. Katircioglu, Y. Kastro, Real-time prediction of online shoppers’
purchasing intention using multilayer perceptron and lstm recurrent neural networks, Neural
Computing and Applications 31 (2019) 6893–6908.
[26] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., Improving language understanding by
generative pre-training (2018).
[27] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,</p>
      <p>Attention is all you need, Advances in neural information processing systems 30 (2017).
[28] L. Van der Maaten, G. Hinton, Visualizing data using t-sne., Journal of machine learning research
9 (2008).
[29] J. Tagliabue, C. Greco, J.-F. Roy, B. Yu, P. J. Chia, F. Bianchi, G. Cassani, Sigir 2021 e-commerce
workshop data challenge, 2021. URL: https://arxiv.org/abs/2104.09423. arXiv:2104.09423.
[30] K. Christakopoulou, A. Lalama, C. Adams, I. Qu, Y. Amir, S. Chucri, P. Vollucci, F. Soldo, D. Bseiso,
S. Scodel, et al., Large language models for user interest journeys, arXiv preprint arXiv:2305.15498
(2023).
[31] Z. Zhao, W. Fan, J. Li, Y. Liu, X. Mei, Y. Wang, Z. Wen, F. Wang, X. Zhao, J. Tang, et al., Recommender
systems in the era of large language models (llms), IEEE Transactions on Knowledge and Data
Engineering (2024).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q. Zhang,</surname>
          </string-name>
          <article-title>Recommending complementary products in e-commerce push notifications with a mixture model approach</article-title>
          ,
          <source>in: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>909</fpage>
          -
          <lpage>912</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>X.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Katzur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Learning to personalize recommendation based on customers' shopping intents</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>05279</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Kangas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schwoerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Bernardi</surname>
          </string-name>
          ,
          <article-title>Recommender systems for personalized user experience: Lessons learned at booking</article-title>
          .com,
          <source>in: Proceedings of the 15th ACM Conference on Recommender Systems</source>
          , RecSys '21,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          , p.
          <fpage>583</fpage>
          -
          <lpage>586</lpage>
          . URL: https://doi.org/10.1145/3460231.3474611. doi:
          <volume>10</volume>
          .1145/3460231.3474611.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W.</given-names>
            <surname>Black</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ilhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marchini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Markeviciute</surname>
          </string-name>
          ,
          <article-title>Adaptex: A self-service contextual bandit platform</article-title>
          ,
          <source>in: Proceedings of the 17th ACM Conference on Recommender Systems, RecSys '23</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: http://dx.doi.org/10.1145/3604915.3608870. doi:
          <volume>10</volume>
          .1145/3604915.3608870.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Grbovic</surname>
          </string-name>
          , H. Cheng,
          <article-title>Real-time personalization using embeddings for search ranking at airbnb</article-title>
          ,
          <source>in: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery &amp; data mining</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>311</fpage>
          -
          <lpage>320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Olmezogullari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Aktas</surname>
          </string-name>
          ,
          <article-title>Representation of click-stream datasequences for learning user navigational behavior by using embeddings</article-title>
          ,
          <source>in: 2020 IEEE International Conference on Big Data (Big Data)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>3173</fpage>
          -
          <lpage>3179</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>