<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Spatiotemporal-Enhanced Network for Click-Through Rate Prediction in Location-based Services</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shaochuan Lin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yicong Yu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiyu Ji</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Taotao Zhou</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hengxu He</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zisen Sang</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jia Jia</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guodong Cao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ning Hu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alibaba Group</institution>
          ,
          <addr-line>Beijiing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Alibaba Group</institution>
          ,
          <addr-line>Hangzhou</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Alibaba Group</institution>
          ,
          <addr-line>Shanghai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In Location-Based Services(LBS), user behavior naturally has a strong dependence on the spatiotemporal information, .., in diferent geographical locations and at diferent times, user click behavior will change significantly. Appropriate spatiotemporal enhancement modeling of user click behavior and large-scale sparse attributes is key to building an LBS model. Although most of existing methods have been proved to be efective, they are dificult to apply to takeaway scenarios due to insuficient modeling of spatiotemporal information. In this paper, we address this challenge by seeking to explicitly model the timing and locations of interactions and proposing a Spatiotemporal-Enhanced Network, namely StEN. In particular, StEN applies a Spatiotemporal Profile Activation module to capture common spatiotemporal preference through attribute features. A Spatiotemporal Preference Activation is further applied to model the personalized spatiotemporal preference embodied by behaviors in detail. Moreover, a Spatiotemporal-aware Target Attention mechanism is adopted to generate diferent parameters for target attention at diferent locations and times, thereby improving the personalized spatiotemporal awareness of the model. Comprehensive experiments are conducted on three large-scale industrial datasets, and the results demonstrate the state-of-the-art performance of our methods. In addition, we have also released an industrial dataset for takeaway industry to make up for the lack of public datasets in this community.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;spatiotemporal systems</kwd>
        <kwd>click-through rate prediction</kwd>
        <kwd>location-based services</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        a user prefers fast food in the work area on weekdays and
may choose fried chicken in his or her residential area on
Location-Based Services (LBS) are mobile services that weekends. This changes in user behavioral interests are
provide the user with current location-relevant content bonded with the changes of location and time. Although
on smartphones or other services. Among them, take- there are some initial eforts[
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] to integrate
spatiotemaway service is the most popular and convenient com- poral information into sequential recommendation, most
mercial service. Like other LBS, it also requires timely of them consider partial spatiotemporal information, and
delivery, which results in a strong dependence on time eforts to fully and thoroughly model such integrated
and geographical location for users. In this way, recom- spatiotemporal patterns are still lacking. Diferent from
mending products suitable for the user’s temporal and the above scenarios, there are some common attributes
spatial demands in LBS is a pretty challenging problem. in the takeaway scenario which have a weak correlation
      </p>
      <p>
        Recently, some methods[
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ] have been proved efec- with the user’s historical behavior. For example, milk
tive in e-commerce through the user’s historical behavior, tea is naturally suitable to be recommended at afternoon
but it is not easy to adapt them into the LBS scenario. The tea. On the other hand, the historical behaviors of users
main reason is that most of them do not pay attention to imply their personal dietary preferences.
users’ strong spatial and temporal demands. For instance, To tackle above problems, we propose a
Spatiotemporal-Enhanced Network(StEN), to
betDL4SR’22: Workshop on Deep Learning for Search and Recommen- ter meet users’ temporal and spatial demands. Specially,
Idnaftoiromn,atcioo-nloacnadteKdnwowitlhedtgheeM3a1nstagAeCmMenItn(tCeIrKnMat)i,oOncatlobCeorn1f7e-r2e1n,c2e0o22n, StEN applies Spatiotemporal Profile Activation ( StPro)
Atlanta, USA module to model user’s common spatiotemporal
* Corresponding author. preference by activating attribute features (user and
$ lin.lsc@alibaba-inc.com (S. Lin); yicongyu.yyc@alibaba-inc.com item). For the personalized spatiotemporal preference
(Y. Yu); jixiyu.jxy@alibaba-inc.com (X. Ji); of users, a novel Spatiotemporal Preference Activation
(tHao.tHaoe.)z;hzoisue@n.lsazzsa@dak.ocoumbei(.Tc.oZmh(oZu.);Shanengg);xu.hhx@alibaba-inc.com (StPre) and a Spatiotemporal-aware Target Attention
jj229618@alibaba-inc.com (J. Jia); guodong.cao@alibaba-inc.com (StTA) module are proposed. StPre disassembles the
spa(G. Cao); huning.hu@alibaba-inc.com (N. Hu) tiotemporal preference embodied by the user’s historical
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License behavior in detail, which including Temporal Evolving
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)
      </p>
      <sec id="sec-1-1">
        <title>Activation(TEA), Temporal periodic Fusion(TPF) and</title>
        <p>Spatial Preference Activation(SPA). While StTA employs
diferent spatiotemporal information to generate
diferent parameters and feed them into target attention
to improve the personalized spatiotemporal awareness
of the model. In addition, we have released an industrial
dataset for takeaway industry to make up for the lack of
public datasets in this community.</p>
        <p>All our contributions can be summarized as follows:
in the user’s historical behavior sequence are diverse.</p>
        <p>
          Faced with a particular product, only part of the interests
associated with that product will influence user’s
behavior. Based on this, DIN designs a local activation module
to extract diferent user interests from the sequence for
various target commodities. DIEN[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] further explores
the interrelationships between users’ historical behaviors
and proposes the concept of user interest evolution. It
designs an auxiliary loss and a structure based on GRU.
        </p>
        <p>
          Inspired by the success of the self-attention mechanism
in sequence-to-sequence tasks, BST[12] leverages a
transformer layer instead of GRU to mine information about
the user’s interest. DSIN[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] observes that the user’s
interests in a short period are concentrated, while long-term
interests are scattered. It splits the sequence into
diferent sessions and explores the information through the
self-attention mechanism and Bi-LSTM module. SIM[13]
proposes an interest mining method for life-long user
sequences. However, all historical behavior sequences of
users are very long, which may lead to time-consuming
and noise problems. To overcome this, SIM provides
a search-based long sequence extraction method to
extract top-k behavior sequences from life-long sequences
through soft and hard search technology.
• StEN applies Spatiotemporal Profile Activation
(StPro) module to model user’s common
spatiotemporal preference by activating attribute
features (user and item).
• For the personalized spatiotemporal preference
of users, a novel Spatiotemporal Preference
Activation (StPre) is proposed, which disassembles
the spatiotemporal preference embodied by the
user’s historical behavior in detail, and extracts
preferences from three small modules:
Temporal Evolving Activation (TEA), Temporal Periodic
Fusion (TPF) and Spatial Preference Activation
(SPA).
• We also propose a Spatiotemporal-aware Target
        </p>
        <p>
          Attention (StTA) module, which employs
diferfeenrtensptaptairoatmemetpeorsraalnidnffeoerdmtahteiomnintotogteanrgereattaettdeinf-- 2.2. Time Aware Attention Model
tion to improve the personalized spatiotemporal The above deep CTR models do not explicitly make use
awareness of the model of the click time information in the user’s historical
be• In addition, we have also released an industrial havior, where the click time information has an impact
dataset for takeaway industry to make up for the on the user’s evolutionary behavior and the user’s
perilack of public datasets in this community. Experi- odic behavior. The user’s evolutionary behavior denotes
mental results demonstrate that our method has that the user’s interest changes over time, and the user’s
achieved the state-of-the-art on three large-scale periodic behavior indicates the user’s periodic actions.
industrial datasets and the online A/B testing re- Specially, TIEN[14] pays more attention to the user’s
sults further show its practical value. evolutionary behavior, and believes that the closer the
historical behavior is to the current time, the greater
the weight should be. TLSAN[15] leverages the absolute
2. Related Work value of the time diference and then uses its reciprocal
as the time position embedding. TiSASRec[16] models
2.1. Sequence-based Model items’ relative time intervals by sine and cosine
funcEarlier deep CTR approaches hope to eliminate the com- tion to explore the evolutionary behavior of users and
plicated work of feature engineering jobs and focus then utilizes items’ absolute temporal signals, such as
more on automatically mining the correlations between month(M), weekday(W), date(D) and hour(H), to detect
features[
          <xref ref-type="bibr" rid="ref6 ref7">6, 7, 8, 9, 10</xref>
          ]. Later on, researchers[
          <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
          ] periodic behavior of users. TimelyRec[17] captures
pofound that the users’ historical behavior sequence con- tential irregularity information in user’s periodic
pattains richer and more direct information, which brought terns, and then integrates the information to compute
breakthroughs to the entire recommendation commu- the similarity between target time and users interactions
nity. Many researches focus on exploring potential in- with an attention mechanism.
terests in the user’s historical behavior sequence. They
extract sequence features by incorporating structures 2.3. Spatiotemporal Model
such as Pooling, RNN, and Attention into the model.
        </p>
        <p>
          YoutubeDNN[11] proposes a feature embedding on items Spatial location is also important for some location-aware
method and then takes the average value to extract his- platforms, such as Facebook Places[18] and Airbnb[19].
torical sequence features. DIN[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] believes that interests Thus, it is a natural way to integrate temporal
inEmbedding Layer
        </p>
        <p>StPro</p>
        <p>CTR
FC+BN+LReLU (256)
FC+BN+LReLU (512)
FC+BN+LReLU (1024)</p>
        <p>Concat
StPre +
TEA</p>
        <p>TPF</p>
        <p>SPA
Concat</p>
        <p>Concat</p>
        <p>Concat
User Feature</p>
        <p>Spatiotemporal Feature</p>
        <p>Target Item Feature
SPA</p>
        <p>TPF
formation and spatial location to optimize
recommendation models. However, due to the complexity of
model design, publicly available existing work is
limited. CaledarGNN[20] utilizes GNN and GRU to extract
StTA</p>
        <p>Softmax</p>
        <p>MatMul</p>
        <p>Concat</p>
        <p>Concat</p>
        <p>Concat
MatMul
Stack</p>
        <p>...</p>
        <p>User Behavior</p>
        <p>User Embedding
(User Id、User Views in the last 30 days …)</p>
        <p>Target Spatiotemporal Embedding
( Hour、User Geohash …)</p>
        <p>Target Query Embedding
( Target Id、Target Category Id …)</p>
        <p>User Behavior Embedding
( Item Id、City Id …)
the segmented time and geographic information in the
user’s historical behavior sequence. While efective, it is
applied to article browsing of web pages without regard
to the geographic location of the item. So it is not
suitable for our takeaway industry. TRISAN[21] extracts the
spatiotemporal information from the user’s historical
behavior sequence by employing two spatial activation and
one temporal similarity activation modules in the model.</p>
        <p>However, it does not detail the information contained
in the user’s spatiotemporal behavior, which leads to
insuficient spatiotemporal information exploring. While
TRISAN is of great relevance for our purposes,
unfortunately, the method has not been open-sourced and the
dataset used in this paper is not publicly available. So
we cannot perform method comparisons with it in the
Section 4.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Spatiotemporal-Enhanced</title>
    </sec>
    <sec id="sec-3">
      <title>Network</title>
      <p>In this paper, we denote  = (, , , ) ∈  as input
data, where  is the target item feature,  is the user, 
is the user click behavior and  is the spatiotemporal
feature.</p>
      <p>In particular, we geocode1 the user’s latitude and
longitude and convert them to hexadecimal numbers to obtain
geohash-6, which is then combined with the user’s
Areaof-Interest(AOI)[22] and serve as the spatial feature  in
this paper. While the temporal feature is represented by
hour of day, time period of day(breakfast, lunch,
afternoon tea, dinner and night snack) and day of the week.
User features  include user id, user gender and other
features, while item features  include item id, item category
and other features. Before all features enter the model,
we will perform a vectorized representation of them. For
the convenience of description, in the latter part of this
article, , , ,  all represent the embedding vectors of
the corresponding features. Denoting  ∈  as the click
label, and our CTR prediction task can be defined as:
 ( = 1|) =  (;  )( ∈  )
(1)
where  (;  ) is a probability value obtained after we
forward the input data  into any CTR network, and
then activate by a sigmoid function.  represents the
parameters of the network. Typically, each of our user
history behaviors includes the item , the item’s location
, the click time  and the click period of time . The
CTR task of Equation 1 above is then mainly achieved
by minimizing the following cross-entropy loss function
during training,
ℒ(, , ) =
=1

1 ∑︁ −  (;  )

− (1 − )(1 −  (;  ))
where  ∈ {0, 1} is the ground-truth label,  is the
mini-batch size and  is the index of the input data. We
set  to 1024 in this paper.</p>
      <sec id="sec-3-1">
        <title>3.2. Spatiotemporal Profile Activation</title>
        <p>
          This module is mainly used to capture common
spatiotemporal preferences that are less correlated with user
behavior. E-commerce scenarios only need to consider
the personalized behavior of user, but in the takeaway
scenario, we need to consider the impact of time and
location on users and items. For instance, there is a natural
diference between the user’s order in the workplace and
the residential area. Therefore, we use spatiotemporal
features  to extract common spatiotemporal preference
for the static item and user features. Below we will take
the user feature as an example,
 = (
 () · 
√

)
where  () ∈ R*  , is the linear
transformation of the ,  is the last dimension of ,  is the
last dimension of . Inspired by [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], we then
concatenate  and  and add their diferences, their
common values, to get the final activation value ℎ =
(, ,  − ,  * ).
(2)
(3)
        </p>
        <sec id="sec-3-1-1">
          <title>Through the above same activation method, we can</title>
          <p>obtain the final activation value of the item and is denoted
as ℎ. Finally, we concat the above activation values to
obtain the spatiotemporal profile activation value ℎ.
Fig. 2(a) shows the structure.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. Spatiotemporal Preference Activation</title>
        <sec id="sec-3-2-1">
          <title>We further propose a Spatiotemporal Preference Activation(Stpre) to model the personalized spatiotemporal preference embodied by user behaviors in detail.</title>
          <p>3.3.1. Temporal Evolving Activation(TEA)
The time sequence of user clicks will have a certain
impact on the current behavior. For example, a user who
frequently clicks on milk tea in a short period of time will
cause him to be more willing to click on dessert in the
next time slot. To model this temporal evolving pattern,
we first calculate the time interval  between request
time  and each historical behavior click time  . Then
we eliminate the noise by applying a nonlinear
transformation to the time interval, thus obtaining the temporal
evolution factor ,</p>
          <p>=  2( ( 1(−  ))) + −  (4)
where  1 ∈ R* ℎ and  2 ∈ Rℎ*  denotes two
fully connected layers,  ∈ R*  , ℎ is the hidden
size, and  is the sequence length we set. In this paper,
we abbreviate the structure of Equation 4 as FFN. Then
we normalize the above temporal evolution factor 
through a softmax function to get the weight of temporal
evolution . After that,  can help to obtain temporal
activation features related to the behavior order,
 =  ·    (( ()) · )
(5)
where  () ∈ R* 1,  is the last dimension
of the feature . Finally our robust temporal
evolution fusion feature be obtained by ℎ =  *
  (   ())+ * . Mean weight
 and time interval weight  are two trainable
weight parameters used to balance the output. The
module is depicted in Fig. 2(c).
3.3.2. Temporal periodic Fusion(TPF)</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>User historical behavior contains rich but scattered be</title>
          <p>havioral interests. However, when we explore user
behavior from the perspective of time period, we are pleased
to find that users’ behavioral interests are more
concentrated and periodic. Model would be messy if we directly
learn mixed user behavior without any behavioral slices.</p>
          <p>In this case, we propose a Temporal periodic Fusion mod- awareness of the model. Taking ,  as an example,
ule to learn the user periodic preference in takeaway we can get that,
industry.</p>
          <p>Based on the period of time , we first divide the   =  ·  +  → ,  (8)
user historical behavior  into five time slices  =
where  ∈ R× (* +) and  ∈ R* + are
{, , , , }. Then we feed each period of time
sequence into the FFN and mean pooling in turn to get
the characteristics of breakfast behaviors , lunch
behaviors , afternoon tea behaviors ,
dinner behaviors , and night snack behaviors
. Take the breakfast behavior as an example,
the parameters of a fully-connected layer.  is the
dimension of ,  is the dimension of input embedding (such
as target item embedding  or user behavior embedding
) and  is the dimension of final output embedding.</p>
          <p>Then we can split   into two parts(, ) as
parameters of the subsequent target attention fully
connected layer. Specially, we take the first *  parameters
as  and the last  parameters as . In the same way,
we can obtain  ( ,  ) and  ( ,  )
through the spatiotemporal feature . After that, we
utilize the primitive target attention mechanism to obtain
the final module output ℎ,
 =   (   ())</p>
          <p>(6)</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>Further, to obtain a more general periodic representation</title>
          <p>ℎ , we fuse the above periodic characteristics through
an average operation. Fig. 2(d) illustrates a outline of this
architecture.
3.3.3. Spatial Preference Activation(SPA)
User’s geographic location afects his personalized
dietary choices. For example, when the user works in
company, he may choose rice, and when the user is at home,
he may prefer fried chicken. We call this the user’s spatial
preference. To capture this spatial preference, we utilize
the spatial features  and combine them with the user’s
feature . We then feed the above-combined values into
a fully connected layer and activate through a sigmoid
function to get the geolocation activation value of ,
 = ( ((, )))
(7)
where   ∈ R* 1,  is the dimension of the
combine value  and . Further, we use  to activate all
of the user history behavior to explore the user’s spatial
preferences ℎ through FFN and mean pooling. The
architecture can be observed in Fig. 2(b).</p>
          <p>Finally, we fuse the output of the above three small
modules together to obtain our final spatiotemporal
preference activation value ℎ = ℎ +  * ℎ +
 * ℎ. Period of time weight  and spatial
weight  are also two trainable weight parameters
used to balance the output.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.4. Spatiotemporal-aware Target Attention</title>
        <p>=  ·  + ,
 =  ·  +  ,
 =  ·  + 

ℎ =  ( √
)

(9)</p>
        <sec id="sec-3-3-1">
          <title>Where  is the dimension of . Fig. 1(a) illustrates the</title>
          <p>structure.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.5. Dense Tower for StEN</title>
        <p>Once we have all the feature vector representations, we
can fuse all the above module outputs to get the final
prediction 0 = (ℎ, ℎ, ℎ). A
threelayer perceptron structure is then applied,
+1 =  ( ( ()))
(10)
where  = 0, 1, 2. We then get the prediction
of click via a sigmoid activation ( = 1|) =
( (3)).   ∈ R* 1.
Finally, we optimize the parameters of our whole model
by Equation 2 defined above. The detail is illustrated in
Fig. 1(a).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. EXPERIMENTS</title>
      <sec id="sec-4-1">
        <title>4.1. Datasets</title>
        <p>To more efectively explore the spatiotemporal
relationships between historical user behavior and target Due to the lack of public spatiotemporal datasets in the
item, we propose a Spatio-temporal-aware Target Atten- takeaway industry, we conducted experimental
compartion(StTA) mechanism. Drawing on the ideas of CAN[23] isons on three industrial datasets (1, 2 and 3)
coland AdaptPGM[24], we generate diferent parameters lected from Ele.me, a major LBS platform in China. The
through spatiotemporal information for target atten- dataset 1 mainly recommend stores to users, which
tion, thereby improving the personalized spatiotemporal consists of over 5 billion samples. Dataset 2 and 3
mainly recommend meals to users and contain more than
Table 2 DIN: Deep Interest Network (DIN) designs a local
acOverall performance on 1, 2 and 3. StPro: Spa- tivation module to capture the information in the user
tiotemporal Profile Activation. StPre: Spatiotemporal Pref- behavior sequence that will afect the user behavior when
erence Activation. DIN+StPro+StPre, DHAN+StPro+StPre, facing the target item. At the same time, DIN does not
DIEN+StPro+StPre are three variation models to investigate model the interrelationships among items in a sequence
the generalization of our module. of actions.</p>
        <p>Model 1 2 3 DHAN: Deep Hierarchical Attention
NetDIN 0.7209 0.7294 0.6403 works(DHAN) designs a set of attention networks with
DHAN 0.7265 0.7312 0.6419 multi-dimensional and multi-level structures, which
DIEN 0.7346 0.7452 0.6531 can capture the interest expression of users in various
DIN+StPro+StPre 0.7236 0.7324 0.6434 dimensions. At the same time, the attention network
DHAN+StPro+StPre 0.7271 0.7336 0.6445 can extract features that are similar to the knowledge
DIEN+StPro+StPre 0.7348 0.7458 0.6571 expression of the tree structure.</p>
        <p>StEN 0.7353 0.7535 0.6627 DIEN: Deep Interest Evolution Network (DIEN) adapts
the interest evolution factors in user behavior. It designs
an AUGRU-based module to model the evolution process
and trend of user interests.
4.3. Overall Performance Comparison
500 million and 100 million samples, respectively. For
3, we collected one week’s data from the server logs as
training set and one day’s data as the test set. We have
publicly released the dataset 3 2 to further advance
the exploration of spatiotemporal patterns in the LBS
community. The details of our datasets can be seen in
Table 1.</p>
        <p>
          Table 2 compares StEN with three well-known CTR
prediction models on 1, 2 and 3. We find that
DHAN[28] performs better than DIN[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] on both datasets
due to the addition of a multi-dimensional and
multilevel attention mechanism. For example, DHAN surpass
DIN on by margins of 0.56% on dataset 1. Notably, 0.1%
4.2. Experimental Settings improvement of AUC is significant for online model
deAll models in this paper are implemented with Pyhton ployment to improve the actual CTR in production. Due
2.7 and Tensorflow 1.4. AdagradDecay[ 25] is chosen as to the excellent performance of LSTM module in
explorour optimizer to train the model. To avoid overfitting ing user behavior sequence, DIEN[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] outperforms DHAN
in the early stage of model training and maintain the in both datasets. However, it is worth noting that
recurtraining stability, we adopt a warm-up[26] strategy for all rent neural networks such as LSTM have slow training
methods. We set the learning rate to 0.001 and gradually and prediction problems and are prone to high response
increased it to 0.015 within 1M steps. We set the batchsize time problems when serving online. By comparison, our
 to 1024. We repeated all the experiments five times StEN advantages all of them to a new level. We have
and averaged the metrics to obtain more reliable results. achieved AUC=0.7353, AUC=0.7525 and AUC=0.6627
In our experiments, We adapt Area Under Cure (AUC) on 1, 2 and 3, respectively. Our method is 0.96%
and RelaImpr[27] as our evaluation metric. higher than current best results (DIEN) on dataset 3.
        </p>
        <p>To show the efectiveness of our method, we select At the same time, to investigate the generalization of
three well-known and industry-proven CTR prediction our module, we have conducted variation experiments
models as our baselines. by adding StPre and StPro to the above baseline models.
Note that the main diference among the above three
methods is the attention module, so our StTA will not
(a) Eleme App homepage
(b) Eleme App recommendations page
be added to interfere. It can be observed from Table 2
that when we directly adapt our two proposed activation
modules to the three baselines mentioned above, there is a
certain stable improvement in performance. For example,
DIN obtains a significant improvement of 0.27% on 1
, 0.30% on the 2 and 0.31% on the 3, while DIEN
has the weaker improvement of 0.02% on 1 , 0.06%
on 2 and 0.4% on the 3. All these variation models
further demonstrate that our proposed modules have
good generalizability and can be added to other existing
models as a plug-and-play module.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.4. Ablation Study</title>
        <sec id="sec-4-2-1">
          <title>To investigate the efectiveness of our proposed method,</title>
          <p>we conduct ablation studies in Table 3. Our BaseModel
in this paper consists of a primitive Target Attention
module mentioned in Section 3.4. Observed from Table 3,
each module has played a diferent positive role after
being added.</p>
          <p>We then show the efect of Spatiotemporal Profile
Activation (StPro) by adding it to the BaseModel. Observed
From Table 3, we can see that our "w/ StPro" has brought a
relatively stable improvement in efect. In particular,
compared to BaseModel, the ofline AUC rises from 0.7332 to
0.7345 (+0.13%) and 0.7414 to 0.7474 (+0.6%) when tested
on 1 and 2, respectively. The results demonstrate
that Spatiotemporal Profile Activation is an efective way
to model user’s common spatiotemporal preference.</p>
          <p>Next, we validate the efectiveness of Spatiotemporal
Preference Activation (StPre) over the model. As reported
in table 3, "w/ StPre" increases the results of "BaseModel"
by 0.17% and by 1.07% on dataset of 1 and 2,
respectively. In order to see the efect of the three small modules
(TEA, TPF and SPA) in StPre, we also performed some
ablation experiments in Table 3. We can observe that
module SPA shows the best performance when tested on
dataset 1, while module TEA achieves better
performance when tested on dataset 2. This illustrates that in
diferent scenarios, the user’s spatiotemporal preferences
will focus on diferent emphasis, specific focus needs to
be specifically determined.</p>
          <p>We also evaluate the efect of Spatiotemporal-aware
Target Attention (StTA) mechanism. In Table 3,
we observe a significant improvement after adding
Spatiotemporal-aware Target Attention into the system.
For example, "w/ StTA" achieves an ofline AUC of 0.7350
when tested on the dataset of 1. This is higher than
"BaseModel" by 0.18%. The improvement demonstrates
that our proposed Target Attention mechanism can meet
the user’s spatiotemporal demands compared to the
primitive target attention module. Injecting our StTA into the
model could improve the efectiveness of system in LBS.
Furthermore, our "StEN(StPre+StPro+StTA)" consistently
improves the results of "w/ StPre", "w/ StPro" and "w/
StTA". This is because more appropriate
spatiotemporal enhancement has been conducted by integrating the
three module we proposed in this paper.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.5. Online A/B Testing</title>
        <sec id="sec-4-3-1">
          <title>We have deployed our method on the Ele.me platform and</title>
          <p>conducted an online A/B test for one month in November
2021, which is under the bucket test. One bucket is the
BaseModel we have defined in Section 4.4 and the other
bucket is our model StEN. Compared with the
onlineserving BaseModel, our method has increased the CTR of
one-hop by 1.6%, the CTR of the second-hop by 2.4%, the
order volume by 2.1%, and the order UV by 2.4%. These
online benefits from our method are crucial for the
recommendation systems of Ele.me. On the one hand, an
eficient model can improve user click eficiency. On the
other hand, the emphasis on spatiotemporal
characteristics can also improve user experience and increase the
user stickiness of the platform.</p>
          <p>For better understanding, we also compare the
recommendation results of the online-serving model with our
StEN on the Ele.me platform, as shown in Figure 3. The
items (red box) on the left of Figure 3(a) and Figure 3(b)
are not suitable for afternoon tea, but are more
appropriate for breakfast and staple food, respectively. While our
StEN (green box) recommends the sweetmeats and milk
tea that are suitable for afternoon tea. Therefore, StEN
does a better job of capturing users’ strong spatial and
temporal demands and can improve the user experience.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <sec id="sec-5-1">
        <title>In this paper, we propose a novel spatiotemporal</title>
        <p>enhanced network StEN. In particular, StEN applies a
StPro module to capture common spatiotemporal
preference by activating attribute features. A StPre module is
further applied to model the personalized spatiotemporal
preference embodied by the behaviors in detail.
Moreover, a StTA mechanism is adopted to generate diferent
parameters for target attention at diferent locations and
times, thereby improving the personalized
spatiotemporal awareness of the model. Comprehensive experiments
are conducted on three large-scale industrial datasets,
and the results demonstrate the state-of-the-art
performance of our methods.
[8] R. Wang, B. Fu, G. Fu, M. Wang, Deep &amp; cross jana, Slovenia, April 19-23, 2021, ACM, 2021, p.
network for ad click predictions, in: Proceedings 1274–1283.
of the ADKDD’17, Halifax, NS, Canada, August 13 - [18] C. Ting, S. Yizhou, Task-guided and
path17, 2017, ACM, 2017, pp. 12:1–12:7. augmented heterogeneous network embedding for
[9] J. Lian, X. Zhou, F. Zhang, Z. Chen, X. Xie, G. Sun, author identification, in: WSDM ’17: The 10th ACM
xdeepfm: Combining explicit and implicit feature International Conference on Web Search and Data
interactions for recommender systems, in: Pro- Mining, Cambridge, United Kingdom, February 2-6,
ceedings of the 24th ACM SIGKDD International 2017, ACM, 2017, pp. 295–304.</p>
        <p>Conference on Knowledge Discovery &amp; Data Min- [19] Grbovic, Mihajlo, H. Cheng, Real-time
personalizaing, KDD 2018, London, UK, August 19-23, 2018, tion using embeddings for search ranking at airbnb,
ACM, 2018, pp. 1754–1763. in: KDD ’18: The 24th ACM SIGKDD Conference
[10] W. Song, C. Shi, Z. Xiao, Z. Duan, Y. Xu, M. Zhang, on Knowledge Discovery and Data Mining, London,
J. Tang, Autoint: Automatic feature interaction United Kingdom, August 19-23, 2018, ACM, 2018,
learning via self-attentive neural networks, in: Pro- pp. 311–320.
ceedings of the 28th ACM International Confer- [20] D. Wang, M. Jiang, M. Syed, O. Conway, V. Juneja,
ence on Information and Knowledge Management, S. Subramanian, N. V. Chawla, Calendar graph
CIKM 2019, Beijing, China, November 3-7, 2019, neural networks for modeling time structures in
ACM, 2019, pp. 1161–1170. spatiotemporal user behaviors, in: KDD ’20: The
[11] P. Covington, J. Adams, E. Sargin, Deep neural net- 26th ACM SIGKDD Conference on Knowledge
Disworks for youtube recommendations, in: S. Sen, covery and Data Mining, Virtual Event, CA, USA,
W. Geyer, J. Freyne, P. Castells (Eds.), Proceedings August 23-27, 2020, ACM, 2020, pp. 2581–2589.
of the 10th ACM Conference on Recommender Sys- [21] Y. Qi, K. Hu, B. Zhang, J. Cheng, J. Lei, Trilateral
tems, Boston, MA, USA, September 15-19, 2016, spatiotemporal attention network for user
behavACM, 2016, pp. 191–198. ior modeling in location-based search, in: CIKM
[12] Q. Chen, H. Zhao, W. Li, P. Huang, W. Ou, Behavior ’21: The 30th ACM International Conference on
sequence transformer for e-commerce recommen- Information and Knowledge Management, Gold
dation in alibaba, in: Proceedings of the 1st Inter- Coast, Australia, November 1-5, 2021, ACM, 2021,
national Workshop on Deep Learning Practice for pp. 3373–3377.</p>
        <p>High-Dimensional Sparse Data, 2019, pp. 1–4. [22] Y. Hu, S. Gao, K. Janowicz, B. Yu, W. Li, S. Prasad,
[13] Q. Pi, G. Zhou, Y. Zhang, Z. Wang, L. Ren, Y. Fan, Extracting and understanding urban areas of
inX. Zhu, K. Gai, Search-based user interest model- terest using geotagged photos, Comput. Environ.
ing with lifelong sequential behavior data for click- Urban Syst. (2015) 240–254.
through rate prediction, in: CIKM ’20: The 29th [23] G. Zhou, W. Bian, K. Wu, L. Ren, Q. Pi, Y. Zhang,
ACM International Conference on Information and C. Xiao, X. Sheng, N. Mou, X. Luo, C. Zhang, X. Qiao,
Knowledge Management, Virtual Event, Ireland, S. Xiang, K. Gai, X. Zhu, J. Xu, CAN: revisiting
October 19-23, 2020, ACM, 2020, pp. 2685–2692. feature co-action for click-through rate prediction,
[14] L. Xiang, W. Chao, T. Bin, T. Jiwei, Z. Tao, CoRR abs/2011.05625 (2020).</p>
        <p>Deep time-aware item evolution network for click- [24] Shishi, Thinking and practice of alimama’s search
through rate prediction, in: CIKM ’20: The 29th advertising prediction model 2021, 2022. URL: https:
ACM International Conference on Information and //zhuanlan.zhihu.com/p/446993392.
Knowledge Management, Virtual Event, Ireland, [25] J. C. Duchi, E. Hazan, Y. Singer, Adaptive
subgraOctober 19-23, 2020, ACM, 2020, pp. 785–794. dient methods for online learning and stochastic
[15] J. Zhang, D. Wang, D. Yu, Tlsan: Time-aware long- optimization, J. Mach. Learn. Res. 12 (2011) 2121–
and short-term attention network for next-item rec- 2159.</p>
        <p>ommendation, volume 441, 2021, pp. 179–191. [26] K. He, X. Zhang, S. Ren, J. Sun, Deep residual
learn[16] Y. Wang, L. Zhang, Q. Dai, F. Sun, Y. Bao, Regular- ing for image recognition, in: 2016 IEEE
Conferized adversarial sampling and deep time-aware at- ence on Computer Vision and Pattern Recognition,
tention for click-through rate prediction, in: CIKM CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016,
’19: The 28th ACM International Conference on IEEE Computer Society, 2016, pp. 770–778.
Information and Knowledge Management, Beijing, [27] L. Yan, W. Li, G. Xue, D. Han, Coupled group lasso
China, November 3-7, 2019, ACM, 2019, p. 349–358. for web-scale CTR prediction in display
advertis[17] C. Junsu, H. Dongmin, K. Seongku, Y. Hwanjo, ing, in: Proceedings of the 31th International
ConLearning heterogeneous temporal patterns of user ference on Machine Learning, ICML 2014, Beijing,
preference for timely recommendation, in: WWW China, 21-26 June 2014, volume 32 of JMLR
Work’21: Proceedings of the Web Conference 2021, Ljubl- shop and Conference Proceedings, JMLR.org, 2014,
pp. 802–810.
[28] W. Xu, H. He, M. Tan, Y. Li, J. Lang, D. Guo, Deep
interest with hierarchical attention network for
clickthrough rate prediction, in: Proceedings of the 43rd
International ACM SIGIR conference on research
and development in Information Retrieval, SIGIR
2020, Virtual Event, China, July 25-30, 2020, 2020,
pp. 1905–1908.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gai</surname>
          </string-name>
          ,
          <article-title>Deep interest network for click-through rate prediction</article-title>
          , in: Y.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Farooq</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, KDD</source>
          <year>2018</year>
          , London, UK,
          <year>August</year>
          19-
          <issue>23</issue>
          ,
          <year>2018</year>
          , ACM,
          <year>2018</year>
          , pp.
          <fpage>1059</fpage>
          -
          <lpage>1068</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Pi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Bian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gai</surname>
          </string-name>
          ,
          <article-title>Deep interest evolution network for click-through rate prediction</article-title>
          ,
          <source>in: The ThirtyThird AAAI Conference on Artificial Intelligence</source>
          ,
          <source>AAAI</source>
          <year>2019</year>
          ,
          <source>The Thirty-First Innovative Applications of Artificial Intelligence Conference</source>
          ,
          <string-name>
            <surname>IAAI</surname>
          </string-name>
          <year>2019</year>
          ,
          <source>The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI</source>
          <year>2019</year>
          , Honolulu, Hawaii, USA, January 27 - February 1,
          <year>2019</year>
          , AAAI Press,
          <year>2019</year>
          , pp.
          <fpage>5941</fpage>
          -
          <lpage>5948</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Deep session interest network for clickthrough rate prediction</article-title>
          , in: S. Kraus (Ed.),
          <source>Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI</source>
          <year>2019</year>
          , Macao, China,
          <source>August 10-16</source>
          ,
          <year>2019</year>
          , ijcai.org,
          <year>2019</year>
          , pp.
          <fpage>2301</fpage>
          -
          <lpage>2307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Cai, ST-PIL: spatial-temporal periodic interest learning for next point-of-interest recommendation</article-title>
          , in: G. Demartini, G. Zuccon,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Culpepper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          , H. Tong (Eds.),
          <source>CIKM '21: The 30th ACM International Conference on Information and Knowledge Management</source>
          , Virtual Event, Queensland, Australia, November 1 -
          <issue>5</issue>
          ,
          <year>2021</year>
          , ACM,
          <year>2021</year>
          , pp.
          <fpage>2960</fpage>
          -
          <lpage>2964</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Wang,
          <article-title>Location and time aware social collaborative retrieval for new successive point-ofinterest recommendation</article-title>
          ,
          <source>in: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</source>
          , CIKM '15,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2015</year>
          , p.
          <fpage>1221</fpage>
          -
          <lpage>1230</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cheng</surname>
          </string-name>
          , L. Koc,
          <string-name>
            <given-names>J.</given-names>
            <surname>Harmsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Shaked</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chandra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Aradhye</surname>
          </string-name>
          , G. Anderson,
          <string-name>
            <given-names>G.</given-names>
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ispir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Anil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Haque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <article-title>Wide &amp; deep learning for recommender systems</article-title>
          ,
          <source>in: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, DLRSRecSys</source>
          <year>2016</year>
          , Boston, MA, USA,
          <year>September 15</year>
          ,
          <year>2016</year>
          , ACM,
          <year>2016</year>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Deepfm: A factorization-machine based neural network for CTR prediction</article-title>
          ,
          <source>in: Proceedings of the TwentySixth International Joint Conference on Artificial Intelligence, IJCAI</source>
          <year>2017</year>
          , Melbourne, Australia,
          <source>August 19-25</source>
          ,
          <year>2017</year>
          , ijcai.org,
          <year>2017</year>
          , pp.
          <fpage>1725</fpage>
          -
          <lpage>1731</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>