<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An End-to-end Model of Predicting Diverse Ranking On Heterogeneous Feeds</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zizhe Gao</string-name>
          <email>zizhe.gzz@alibaba-inc.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zheng Gao</string-name>
          <email>gao27@indiana.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Heng Huang</string-name>
          <email>gongchong.hh@taobao.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhuoren Jiang</string-name>
          <email>jiangzhr3@mail.sysu.edu.cn</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuliang Yan</string-name>
          <email>yuliang.yyl@alibaba-inc.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>e-Commerce Search, Heterogeneous Feed Ranking, Multi-Armed</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alibaba Group</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Bandit</institution>
          ,
          <addr-line>Deep Neural Network</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Indiana University Bloomington</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Sun Yat-sen University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>As an external assistance for online shopping, multimedia content (feed) plays an important role in e-Commerce field. Feeds in formats of post, item list and video bring in richer auxiliary information and more authentic assessments of commodities (items). In Alibaba, the largest Chinese online retailer, besides traditional item search engine (ISE), a content search engine (CSE) is utilized for feeds recommendation as well. However, the diversity of feed types raises a challenge for the CSE to rank heterogeneous feeds. In this paper, a two-step end-to-end model including Heterogeneous Type Sorting and Homogeneous Feed Ranking is proposed to address this problem. In the first step, an independent Multi-Armed bandit (iMAB) model is proposed first, and an improved personalized Markov Deep Neural Network (pMDNN) model is developed later on. In the second step, an existing Deep Structured Semantic Model (DSSM) is utilized for homogeneous feed ranking. A/B test on Alibaba product environment shows that, by considering user preference and feed type dependency, pMDNN model significantly outperforms than iMAB model to solve heterogeneous feed ranking problem.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Search Engine plays a vital role in e-Commerce industry, which
can navigate users’ potential purchasing behavior. Therefore,
designing an elaborate ranking algorithm is the key challenge for
every search engine. Traditionally in e-Commerce, search engines
are all item search engine (ISE), meaning the returned results for
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).</p>
      <p>SIGIR 2018 eCom, July 2018, Ann Arbor, Michigan, USA
© 2018 Copyright held by the owner/author(s).</p>
      <p>ACM ISBN 123-4567-24-567/08/06. . . $15.00
https://doi.org/10.475/123_4
a given query is a ranking list of items. However, with the boom
of diverse types multimedia, it is not enough to recommend items
solely on item direct information such as reviews. As the
emergence of self-media on internet, more and more users are willing to
share their expressions publicly. It is a way to advertise themselves
while ofer shopping tips to other users meantime. Those types
of shared multimedia are various such as post (article), list and
video which provide more detailed information about the items,
like item introduction, maintenance guideline, usage demonstration
and so on. In this paper, a piece of multimedia content is named as
a piece of “feed”. As the external information ofered in feeds can
help users make better purchasing choices , content search engine
(CSE) is developed in need to recommend high quality feeds back
to users. In recent years, jointly managing item search engine (ISE)
and content search engine (CSE) together in e-Commerce has been
proved to be able to better attract user attentions and increase the
click-through rate on item pages already. One example can be found
in Figure 1. A user can issue queries in both ISE and CSE to retrieve
an item ranking list as well as a feed ranking list. As items and feeds
can be mapped based on feed contents ( i.e. feed 1 and feed k both
are related to item 1 based on their content. And feed k also talks
about item k as well), the user can take advises from feeds to make
purchase decisions. Hence, improving the quality of heterogeneous
feed ranking in CSE has a great meaning for e-Commerce.</p>
      <p>Currently, there are two challenges remained for heterogeneous
feed ranking. First, cross-domain knowledge between ISE and CSE
needs to be explored and uncovered to support better CSE ranking
performance. Second, as the types of content are heterogeneous
(including post, list and video) in CSE, novel algorithms needs to
be designed to deal with heterogeneous type sorting problem.</p>
      <p>
        Although heterogeneous feed ranking is a new topic, some of
previous studies have ofered possible solutions to deal with similar
problems. To solve cross-domain challenge, HEGS model proposed
by [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] uses a two step approach by sampling data from all domains
based on a designed clustering algorithm constrained by KL
divergence first, and taking a regression model to learn a optimized label
for input data afterwards. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] introduces a novel cross-domain
ranking by transferring the preference order between domains. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
uses matrix factorization methods on heterogeneous dataset and
utilize user reviews to generate item ranking list back to users. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
uses cross-domain collaborative filtering techniques to figure out
the most important heterogeneous data resource from all domains.
      </p>
      <p>
        Previously, Multi-armed bandit (MAB) framework is widely used
to deal with heterogeneous ranking problem [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] develops a fast
MAB algorithm by considering past user behaviors. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] introduces
two online learning algorithms based on user click records to rank
diverse documents on the web. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] applies MAB model on user side
information and develop an epoch-greedy model to recommend
the most relevant ad on a web page. While [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] formalizes two
MAB models on e-Commerce targeting on e greedy solution and
independent solution. In recent years, deep learning techniques
becomes popular in recommendation and information retrieval
domain. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposes a content-based recommendation system and
uses a rich feature set to represent users, according to their web
browsing history and search queries so as to map users and items
into a latent space via a deep learning approach. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] presents a
deep model to learn item properties and user behaviors jointly
from review text. And [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] develops a RNN model to predict user
shopping behavior leveraging features from clickstream data.
      </p>
      <p>
        Alibaba, the largest Chinese online retailer, ofers both ISE and
CSE to users on mobile application. Derived from all theoretical
and practical foundations mentioned above, in this paper, we aim to
solve heterogeneous feed ranking problem in Alibaba CSE and
recommend users adequate feeds to benefit their purchasing choices.
We divide the heterogeneous feed ranking problem in CSE into
two phases: Heterogeneous Type Sorting and Homogeneous Feed
Ranking. In this paper, we mainly focus on solving the first
heterogeneous type sorting problem, and formulate the second step with
an existing well known model called Deep Structured Semantic
Model (DSSM) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In this paper, the ith “slot” means the ith position
in the ranking list to hold a feed. Its type is learned and determined
in the first step. In the second step, proper type of feeds will be
selected and filled into relevant slots. The two steps are learned and
trained together via a proposed end-to-end model. To solve
heterogeneous type sorting task, two novel models are proposed based
on diferent assumptions about user preference and slot denpdency.
An independent Multi-Armed Bandit (iMAB) model is designed to
rank feeds assuming slots independent with each other and
generates a global model for feed ranking. While a personalized Markov
Deep Neural Netowrk (pMDNN) model is designed to jointly select
feed types for all slots and ofers a personalized feed type result
for each user. The iMAB model solely relies on CSE record and
generates the feed type for each slot independently in a statistical
estimation method. While pMDNN model integrates both ISE and
CSE historical record to build up user profile and uses a three-layer
neural network to generate feed types for each slot at the same
time. The generated feed types of both models will be utilized in
a Deep Structured Semantic Model (DSSM) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to predict relevant
feeds for each slot. Results based on A/B test on Alibaba product
environment shows that by considering user preference and feed
type dependency, pMDNN model outperforms than iMAB model
for heterogeneous feed ranking in the CSE.
      </p>
      <p>The contribution of this paper is fourfold. First, we integrate cross
domain knowledge from both ISE and CSE to generate feed ranking
list in CSE. Second, an end-to-end model is designed to solve the
heterogeneous feed ranking problem which avoids involving extra
parameters during the model training process. Third, two novel
models are designed and compared for solving heterogeneous feed
type selection problem. User preference and feed type dependency
are both considered in the second model. Fourth, A/B test of the two
models are conducted in Alibaba product environment to generate
a convincing comparison between the two models.</p>
      <p>The paper is organized in following sections: Section 2 introduces
basic concepts of our paper including Alibaba business background
and the data information used in our approach; Section 3 mainly
introduces how our model is designed to deal with heterogeneous
feed ranking problem; Section 4 shows the details of the experiment
result and Section 5 draws a conclusion and points out the future
work.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>BASIC CONCEPTS</title>
    </sec>
    <sec id="sec-3">
      <title>Business Background</title>
      <p>In our line of business, Alibaba owns both CSE (Content Search
Engine) and ISE (Item Search Engine) which are highly interacted
with each other to create an online shopping environment for users.
All items in ISE are associated with a bunch of feeds in CSE. And
users can travel and search between two search engines freely
without boundary.</p>
      <p>The mixture usage of Alibaba ISE and CSE can benefit user online
shopping. Generally, users logging in Alibaba and interacting with
search engines are always with intentions. However, if they only
search items in ISE, they might get lost when facing numerous
items. For example, in Alibaba oficial website, hot categories like
clothing may contain thousands of items. Each item is labelled
under dozens of keywords such as fashionable style, slim cut and
Korean-like style, etc. It challenges users to distinguish appropriate
items from a huge set of item candidates without any instruction.</p>
      <p>Consequently, to help users avoid hesitation and ofer them
reliable shopping suggestions, CSE is came into being on behalf of a
shopping guide for users. Given queries from users, CSE organizes a
proper feed ranking list as a returned result instead of item ranking
list. And feeds are represented in format of post (article), list (item
list) and video. They are produced by “Daren”s who are experts of a
certain e-Commerce field (clothing, travelling, or cosmetics, etc.). In
their feeds, “Daren”s introduce pros and cons of certain items and
raise personal advises to specific subjects based on their domain
knowledge. A post feed is an article to describe the properties of
particular items; A list feed is a set of recommended items ofered
for a specific field ; A video feed is a short video shot to demonstrate
Slot pv post ipv pv list ipv pvvideiopv
slot 0 1731732 230592 73540 5460 0 0
slot 1 1704866 177854 49348 4288 0 0
slot 2 785993 63730 696546 51032 234 0
slot 3 949621 57600 693031 36384 0 0
slot 4 755625 36886 825816 63276 0 0</p>
      <p>Table 1: Feed historical data in CSE top 5 slots
the suggested items. By taking suggestions from “Daren”, users can
make better choices for purchasing items online.</p>
      <p>Daily data in product environment empirically shows that user
travel rate between the two search engines are frequent. Before
users jump into CSE, they always have searching records in ISE
already. It implies that users actually are willing to pursue advises
from ”Daren”s. And ofering better CSE searching result can help
users to target suitable items more easily and make purchases
afterwards, which is the primary goal of e-Commerce. However, it is still
confronted with challenges. First, as feed types are heterogeneous,
the fitness of diferent types of feeds are incomparable towards
a given query. For example, whether a list feed is better than a
post feed heavily depends on user preference. Second, majority of
users entered Alibaba CSE carry with user behaviors in ISE. And
how to deal with this cross-domain information and build up user
profiles to form a personalized feed ranking in CSE also needs to
be explored.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Data Preparation</title>
      <p>In our approach, we aim to return a heterogeneous feed ranking list
Rl (f eed)|u, q given a query q issued by a user u. Each of the Top K
ranked feed will be located and displayed in a “slot” in CSE from top
to bottom. To learn the independent Multi-armed Bandit (iMAB)
model and personalized Markov Deep Neural Network (pMDNN)
model which will be introduced in the next section, both slot related
statistical data (global information) and user click streaming data
(personalized information) need to be obtained.</p>
      <p>2.2.1 Slot Related Statistical Data. One assumption of user
preference is that feed type in each slot is independent with each other.
And for each slot, the probability of three candidate feed types
(post, list, video) follow their own Beta Distributions . Therefore, to
estimate the prior distribution p(θ |α , β, T , s) of a feed type θ given
all candidate types T in a slot s , it is necessary to know all related
statistical data of each slot so that α and β can be estimated. The slot
related statistical data contains two parts: online real-time data and
ofline historical data. Online real-time data refers to the streaming
data about the number of clicks and displays for a particular slot
type produced by users each day. And ofline historical data refers
to the past N days total number of page view (pv) and item page
view (ipv). The online daily data is streaming data that can only
observed in real time. While ofline historical data can be tracked and
obtained from repository. We calculate and illustrate the statistics
of top 5 slots in Table 1.</p>
      <p>As we can see from Table 1, the total number of pv and ipv in
each relevant slots varies, and video feeds hardly appear in the
user_feature query_feature feed type
0.0073, 0.4694, -0.0135, -0.0278, 100
..., 0.0809 ..., 0.00613
Table 2: A example of user personalized data for each slot
top 5 slots in CSE. It indicates that, in a global view, users prefer
diferent feed types in diferent slots.</p>
      <p>2.2.2 User Click Streaming Data. User behavior sequence data
from both ISE and CSE comprehensively is also useful to train
a personalized feed ranking result. To build up user profile, we
set a window size w and only consider the latest w behaviours
users take in ISE. The behaviors can be represented as two types
of triplet as &lt; user , issue, query &gt; and &lt; user , click, item &gt;. And
the number of times users click on items shows the relationship
strength between users and items, while the number of times users
issue the same query shows the relationship strength between users
and queries. Based on that, a given dimensional embedding can be
learned for each user/query under the same latent space later on.
Moreover, feed type in each slot is encoded via one-hot encoder.
In the end, all users, queries and feed types can be represented as
vectors. An example is shown in Table 2. The first two columns
refer to the learned representation for each user fu and an issued
query fq . The third column refers to the one-hot representation
feed type ft in each slot.
3</p>
    </sec>
    <sec id="sec-5">
      <title>METHODS</title>
      <p>We are willing to observe better heterogeneous feeds ranking for
users’ preference. The whole process contains Heterogeneous Type
Sorting step and Homogeneous Feed Ranking step. For the first step,
an independent Multi-Armed Bandit (iMAB) model is designed for
slot independent scenario and an improved personalized Markov
Deep Neural Network (pMDNN) model is designed for slot
dependent scenario. Section 3.1 and 3.2 introduce the two models
individually. And for the second step, a DSSM model is utilized to
assign proper type of feeds in each slot. The details is introduced
in Section 3.3. pMDNN model can be trained together with DSSM
to formalize an end-to-end model.
3.1</p>
      <p>independent Multi-Armed Bandit
In iMAB model, the evaluation metric of heterogeneous feed
ranking is the ratio θ between ipv and pv. Higher θ means when a user
browse a feed in CSE, the user is more likely to click the feed. So
θ can be used to evaluate the fitness of the heterogeneous feed
ranking towards users’ real need. Hence, for each independent slot,
we estimate a prior ratio θ distribution of each feed type, and are
willing to choose the feed type that is able to generate the highest
θ value.</p>
      <p>Theoretically, as Beta distribution can naturally represent any
kind of distributions controlled by two parameters α and β , it makes
sense to assume the ratio θ of each type has a prior distribution
following θi ∼ B α 0, βi0) where i ∈ U = {post , list , video}. αi0 is the
( i
type i historical ipv number and βi0 is the diference between type i
historical pv number and ipv number. It is because the expectation
Algorithm 1 independent Multi-Armed Bandit</p>
      <p>exp(θtype )
p(type) = Íj∈types exp(θj )
13: draw type based on p(type)
14: set type in slot
15: end for
of B(αi0, βi0) is α 0α+i0βi0 , which is the historical ratio between ipv
i
and pv. Hence the posterior ratio distribution can be updated by
online real-time stream data each day, and represented as θi |Di ∼
B(αi0 + λDipv , βi0 + λ(Dpv − Dipv )) where Di refers to the coming
data each day of feed type i and λ is a time impact factor as new
data should have more influence to update ratio distribution.</p>
      <p>In the end, we apply a two step sampling strategy to choose the
type of each slot. First, for each feed type i, a value θi is randomly
generated as the estimation of the ratio between pv and ipv followed
by the probability distribution below.</p>
      <p>p(θi |Di ) = (θi )αi′−1(1 − θi )βi′−1</p>
      <p>B(αi ′, βi ′)
(1)
where B(αi ′, βi ′) is a constant given αi ′ and βi ′. αi ′ = α0+λDiipv
and βi ′ = β0 + λ(Dipv − Diipv ).</p>
      <p>Second, a Softmax function is applied on all feed types to
generate a normalized selection probability for each feed type.</p>
      <p>p(i) = Íj ∈eUxpe(xθip)(θj ) (2)
where i refers to one of the three feed types and θi is the random
value generated following posterior probability distribution D(θ )
showed in Formula 1.</p>
      <p>In this way, feed types in all slots are selected independently.
The pseudo code of the whole procedure is showed in Algorithm 1.
3.2</p>
      <p>personalized Markov Deep Neural Network
Dependent heterogeneous feed type selection is determined by
three factors: user, query and previous slot feed types in the same
page. First, diferent users may express various preferences on items
under same query. For example, when one user searches “dress”,
she may be willing to see more posts about the description of the
dress. While for other users, they might prefer lists because they
want to see more item choices rather than single item introduction.
Second, user preference of feed types on current slot may be
potentially influenced by previous feed types, which can be regarded
as a Markov process. For example, no user is willing to see a single
type of feeds in all slots, they more or less expect to see diverse
types of feeds. And third, diferent queries should also result in
different feed type allocation in all slots. To integrate user preference,
query and previous recommended feed types together, we propose
a personalized Markov Deep Neural Network (pMDNN) model to
generate the recommended feed type ti |(user , query, t1, ..., ti−1) for
the ith slot. The whole model can be decomposed into two sub
tasks including an user &amp; query representation learning task and
a personalized slot type prediction task, which is demonstrated in
Figure 2.</p>
      <p>
        3.2.1 Representation of User and Query. Based on statistics,
more than 80 percent of users in CSE come from ISE. Hence, by
using their user behavior sequence data, we can construct a graph
to describe the relationship between users, queries and items. After
that, node2vec [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] applies a skip gram model to learn embeddings
of users and queries in the end. Detailed pipeline is shown in the
upper part of Figure 2 and the objective function is listed below:
O(f®v ) = logσ (f®t · f®v )) + kEu ∈Pnoise [−logσ (f®u · f®v ))]
(3)
f®v is the embedding of current node v. t is a positive neighbour
node of node v and u is a negatively sampled node of v. It means that
given a node v, we need to learn a node embedding representation
that can maximize the probability to generate its positive neighbor
node u and minimize the probability to generate its negative node
node sets Pnoise .
      </p>
      <p>The middle part of Figure 2 shows how to train node embedding
representation. The input layer is one-hot encoding of node. The
weight matrix W is the all nodes embedding, it can help to project
the input one-hot encoding node into a |D | dimension latent space.
And then maximize the probability to generate neighbour nodes of
the node u.</p>
      <p>In the end, all users and queries can have embedding
representations with a given length dimension. And we use the user &amp; query
embeddings as the input for slot feed type prediction.</p>
      <p>3.2.2 Type prediction. We are willing to predict feed types in
each slot given users, queries and previous slot feed types
information. Hence, the objective function of our goal is showed as:
i=1</p>
      <p>K
argmax Ö p(Φ(Xi ) = c |ui , qi , fi )</p>
      <p>Φ
where Xi is input feature vectors for the ith slot, which is related
to users ui , queries qi and previous slot feed types fi . Φ is the
transformation function for input feature vectors to the output feed
type. c is true feed type of current slot. Our goal is to maximize the
joint probability of successfully predicting slot feed types.</p>
      <p>To simplify our pMDNN model and accelerate the running speed,
only one-order Markov process of slot feed type is applied in this
model. It means that to predict the ith slot feed type, only the
(i − 1)th slot feed type has latent impact on that. While it brings
a problem to predict the first slot feed type for a user u. Because
there will be no previous slot feed type information. To generate a
pseudo information for the first slot, the favorite item i of user u is
detected in ISE according to the number of viewed times and the
length of stayed time. Then we map the item i in ISE to its related
feed f in CSE and use the type of f as a substitution.</p>
      <p>We build up the pMDNN model to recommend the feed type
with given embedding of user and query as well as previous slot
types. The input layer is the concatenation of user embedding (U),
query embedding (Q) and previous slot types (T ). User and query
embedding are learned via node2vec on constructed graphs. The
whole input layer construction can be viewed as:</p>
      <p>X = U ⊕ Q ⊕ T</p>
      <p>After that, three fully connected hidden layers are attached to the
input layer. Every layer utilizes linear classifiers and cross entropy
as the loss function. Activation function in each hidden layer is
set to ReLu and the output layer applies Softmax as the activation
function. Throughout gradient descent and back propagation, we
can train our model until convergence. The output layer is a vector
which contains a probability distribution of three feed types on
each specific slot after taking a Softmax activation function.</p>
      <p>L1 = ReLu(w0 · X)
L2 = ReLu(w1 · L1)
L3 = ReLu(w2 · L2)</p>
      <p>
        L = So f tmax (w3 · L3)
L represents the true label of current slot feed type. the pMDNN
model will be trained in ofline phase,and we could manage the
trained model to predict real-time user request. And L1, L2, L3
refer to three hidden layers respectively. The first part of Figure 2
illustrate this workflow.
(4)
(5)
(6)
The next step is to rank homogeneous feeds and fill in related slots.
For example, if sloti , slotj , slotk are chosen to have “post” feed type,
we need to rank all post feeds and select the top 3 feeds with highest
relevance score towards the issued query. As all types of feeds are
all associated with textual information such as title, an existing
Deep Structured Semantic Model (DSSM) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is applied to rank all
post feeds to fill in the three slots.
      </p>
      <p>In DSSM, instead of encoding each word with one-hot
representation, a Word Hashing method is raised to leverage n-gram model
to decompose each word. It leads to a dimension reduction of word
representation.</p>
      <p>Afterwards, a Deep Neural Network (DNN) model uses query and
feeds as input layer, and train the model parameters by maximizing
the likelihood of the clicked documents given the queries across
the training set. Equivalently, the model needs to minimize the
following loss function:</p>
      <p>L(Λ) = −loд
Ö
Q, D+
p(D+ |Q)
(7)
where Λ denotes the parameter set of the neural networks. D+ is
the true labelled feed and Q is the user-issued query. The model is
trained readily using gradient-based numerical optimization
algorithms.</p>
      <p>In the end, given a query, all candidate feeds can be ranked
by the generative probability calculated from this model. It can
be trained with pMDNN at mean time to formalize an end-to-end
model, which is showed in Figure 2. While it still needs to be trained
separately from iMAB model.
4</p>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENT</title>
      <p>We conduct our experiments on item search engine (ISE) and
content search engine (CSE) of Alibaba product environment with
data introduced in Section 2.2 which has been partitioned into 80%
training and 20% testing randomly split. User behavior sequences
collected from logs of Alibaba during N = 90 days are constructed
to a behaviors graph which are in favor of representing users and
queries to dimensional embeddings.
4.1</p>
    </sec>
    <sec id="sec-7">
      <title>Model Setup</title>
      <p>
        For iMAB model, in online part, we implement a real-time Flink[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
job which parses user behavior logs and extracts a series of
status that represent whether user click or browse displayed feeds
on diferent slots. Then the user behaviors are synchronized as
online rewards to iMAB model. As user behavior logs of Alibaba
are of huge amounts, to make sure Flink job are in low latency, we
assign 256 workers to do parsing and joining then 64 workers to
do aggregating. As we expected, online rewards are transferred to
iMAB model under 3 seconds which makes it possible to select arm
represents a probability distribution based on latest user behaviors.
While in ofline part, more than 100 million ipv and pv records are
aggregated to estimate Beta Distribution. Based on empirical study,
we set λ = 10 as a time impact factor in iMAB model.
      </p>
      <p>Besides, pMDNN model needs a training phase in which loss
function, optimizer and parameters are set as followed:
• User and Query representation: 128 dimensional graph
embedding
• Feed type representation: One-hot vector
• Activation function: ReLu and Softmax
• Loss Function: Cross entropy
• Optimizer: Gradient descent optimizer
• Learning rate: 0.0001
• epochs: 100,000</p>
      <p>Well trained pMDNN model exported in specific saved_model
format will be serving in CSE, which receives real-time online
requests that contain user, query and preceding feed type and then
predict next slot type in order with converted embedding vectors.
And the default settings in original DSSM is applied in our model.
4.2 A/B Test
We deploy our proposed models to three buckets. Each of them
equally handle user requests via a hash partition function. We
select 5 major indices to compare performance between iMAB and
pMDNN. pv stands for the number of displayed items, while pv
click is how many displayed items are clicked; Similarly, uv is the
total number of distinct users entered CSE and uv click represents
the number of users who clicked feeds; As to uv CTR, it is the ratio
of users who clicked or not.</p>
      <p>Table 3 shows experimental results, in which pMDNN
generally outperforms iMAB in comparison to primitive ofline ranking
method. Especially uv click and uv ctr, they are essential to our
scenario, because the increase of uv click shows that more users
tend to CSE so that it facilitates their shopping experiences, in the
meanwhile, the boost of uv ctr shows users entered CSE are really
interested in model ranking results. As to pv click, it also shows
that our proposed model works fine since more users are willing to
click feeds after issued queries.</p>
      <p>Based on pv click and uv CTR, we can conclude that pMDNN
is superior to iMAB by applying cross-domain knowledge and
optimizing ranking results in whole page. Besides, combining user
preference information could increase the probability of user
clicking as shown by uv click.</p>
    </sec>
    <sec id="sec-8">
      <title>5 CONCLUSION</title>
      <p>To facilitate user purchasing behavior, content search engine (CSE)
emerges as the supplement of item search engine (ISE). Item
introduction, Shopping guide and expert advises in post, list and video
type could be ofered in CSE which makes it critical to users’
shopping choices especially confronting hundreds of item candidates.
Provide a diverse and personalized feed ranking result can benefit
users on item selection.</p>
      <p>In this paper, we presented an end-to-end model of predicting
diverse ranking on heterogeneous feeds that is a two-step approach
- Heterogeneous Type Sorting and Homogeneous Feed Ranking. In
the first step, two models independent Multi-Armed bandit (iMAB)
and personalized Markov Deep Neural Network (pMDNN) are
proposed to tackle heterogeneous data sorting. Being an online learning
algorithm, iMAB combines historical statistics and online rewards
which could quickly converge in each slot but fail to optimize the
whole page results. Consequently, we put forward pMDNN based
on ISE to CSE cross-domain knowledge and formalize it as a
oneorder Markov process which not only provide user preferred feeds
on specific slot but fix the problem of slot independence. An existing
DSSM model leverages deep learning techniques to rank same-type
feeds afterwards. Via A/B test on Alibaba product environment,
result shows pMDNN outperforms than iMAB on most of well
known metrics used in e-Commerce field. Future work will involve
more cross-domain knowledge like purchasing intention to afect
ranking result as well as more analysis on user sequential data in
ISE.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Róbert</given-names>
            <surname>Busa-Fekete</surname>
          </string-name>
          and
          <string-name>
            <given-names>Eyke</given-names>
            <surname>Hüllermeier</surname>
          </string-name>
          .
          <article-title>A survey of preference-based online learning with bandit algorithms</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and
          <string-name>
            <given-names>Kostas</given-names>
            <surname>Tzoumas</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Apache flink: Stream and batch processing in a single engine</article-title>
          .
          <source>Bulletin of the IEEE Computer Society Technical Committee on Data Engineering</source>
          <volume>36</volume>
          ,
          <issue>4</issue>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Ali</given-names>
            <surname>Mamdouh</surname>
          </string-name>
          <string-name>
            <given-names>Elkahky</given-names>
            ,
            <surname>Yang Song</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Xiaodong</given-names>
            <surname>He</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A multi-view deep learning approach for cross domain user modeling in recommendation systems</article-title>
          .
          <source>In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee</source>
          ,
          <fpage>278</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Grover</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jure</given-names>
            <surname>Leskovec</surname>
          </string-name>
          .
          <year>2016</year>
          . node2vec:
          <article-title>Scalable feature learning for networks</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM</source>
          ,
          <volume>855</volume>
          -
          <fpage>864</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Po-Sen</surname>
            <given-names>Huang</given-names>
          </string-name>
          , Xiaodong He,
          <string-name>
            <surname>Jianfeng Gao</surname>
            , Li Deng,
            <given-names>Alex</given-names>
          </string-name>
          <string-name>
            <surname>Acero</surname>
            , and
            <given-names>Larry</given-names>
          </string-name>
          <string-name>
            <surname>Heck</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Learning deep structured semantic models for web search using clickthrough data</article-title>
          .
          <source>In Proceedings of the 22nd ACM international conference on Conference on information &amp; knowledge management. ACM</source>
          ,
          <volume>2333</volume>
          -
          <fpage>2338</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Mohsen</given-names>
            <surname>Jamali</surname>
          </string-name>
          and
          <string-name>
            <given-names>Laks</given-names>
            <surname>Lakshmanan</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>HeteroMF: recommendation in heterogeneous information networks using context dependent factor models</article-title>
          .
          <source>In Proceedings of the 22nd international conference on World Wide Web. ACM</source>
          ,
          <volume>643</volume>
          -
          <fpage>654</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Pushmeet</given-names>
            <surname>Kohli</surname>
          </string-name>
          , Mahyar Salek, and
          <string-name>
            <given-names>Greg</given-names>
            <surname>Stoddard</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A Fast Bandit Algorithm for Recommendation to Users With Heterogenous Tastes.</article-title>
          .
          <source>In AAAI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>John</given-names>
            <surname>Langford</surname>
          </string-name>
          and
          <string-name>
            <given-names>Tong</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>The epoch-greedy algorithm for multiarmed bandits with side information</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>817</volume>
          -
          <fpage>824</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Louëdec</surname>
          </string-name>
          , Max Chevalier, Josiane Mothe, Aurélien Garivier, and
          <string-name>
            <given-names>Sébastien</given-names>
            <surname>Gerchinovitz</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A Multiple-Play Bandit Algorithm Applied to Recommender Systems.</article-title>
          .
          <source>In FLAIRS Conference</source>
          .
          <volume>67</volume>
          -
          <fpage>72</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Filip</surname>
            <given-names>Radlinski</given-names>
          </string-name>
          , Robert Kleinberg, and
          <string-name>
            <given-names>Thorsten</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Learning diverse rankings with multi-armed bandits</article-title>
          .
          <source>In Proceedings of the 25th international conference on Machine learning. ACM</source>
          ,
          <volume>784</volume>
          -
          <fpage>791</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Shaghayegh</given-names>
            <surname>Sahebi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Peter</given-names>
            <surname>Brusilovsky</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>It takes two to tango: An exploration of domain pairs for cross-domain collaborative filtering</article-title>
          .
          <source>In Proceedings of the 9th ACM Conference on Recommender Systems. ACM</source>
          ,
          <volume>131</volume>
          -
          <fpage>138</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Xiaoxiao</surname>
            <given-names>Shi</given-names>
          </string-name>
          , Qi Liu, Wei Fan,
          <string-name>
            <surname>Qiang Yang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Philip S Yu</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Predictive modeling with heterogeneous sources</article-title>
          .
          <source>In Proceedings of the 2010 SIAM International Conference on Data Mining. SIAM</source>
          ,
          <fpage>814</fpage>
          -
          <lpage>825</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Arthur</surname>
            <given-names>Toth</given-names>
          </string-name>
          , Louis Tan, Giuseppe Di Fabbrizio, and
          <string-name>
            <given-names>Ankur</given-names>
            <surname>Datta</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Predicting Shopping Behavior with Mixture of RNNs</article-title>
          . (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Bo</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Jie</given-names>
            <surname>Tang</surname>
          </string-name>
          , Wei Fan, Songcan Chen,
          <string-name>
            <given-names>Zi</given-names>
            <surname>Yang</surname>
          </string-name>
          , and Yanzhu Liu.
          <year>2009</year>
          .
          <article-title>Heterogeneous cross domain ranking in latent space</article-title>
          .
          <source>In Proceedings of the 18th ACM conference on Information and knowledge management. ACM</source>
          ,
          <volume>987</volume>
          -
          <fpage>996</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Lei</surname>
            <given-names>Zheng</given-names>
          </string-name>
          , Vahid Noroozi, and
          <string-name>
            <surname>Philip S Yu</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Joint deep modeling of users and items using reviews for recommendation</article-title>
          .
          <source>In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM</source>
          ,
          <volume>425</volume>
          -
          <fpage>434</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>