<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Heterogeneous Conversational Recommender System for Financial Products</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mao Kang</string-name>
          <email>kangmao028@pingan.com.cn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ye Bi</string-name>
          <email>biye645@pingan.com.cn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhenyu Wu</string-name>
          <email>wuzhenyu447@pingan.com.cn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jianming Wang</string-name>
          <email>wangjianming888@pingan.com.cn</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jing Xiao</string-name>
          <email>xiaojing661@pingan.com.cn</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Conversational Recommender System, Financial Products Recom-</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ACM Reference Format:</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Mao Kang, Ye Bi, Zhenyu Wu, Jianming Wang, and Jing Xiao. 2020. A Heterogeneous Conversational Recommender System for Financial Products.</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ping An Technology (Shenzhen) Co., Ltd</institution>
          ,
          <addr-line>Shanghai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Ping An Technology (Shenzhen) Co., Ltd</institution>
          ,
          <addr-line>Shenzhen</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>mendation</institution>
          ,
          <addr-line>Heterogeneous Modelling</addr-line>
          ,
          <country>Deep Neural Networks</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>Financial products recommendation distinguishes itself from ecommerce and web recommendation. Financial products have fewer available items, are more expensive, less frequently purchased and subject to user specific constraints. The study in financial products recommendation is quite limited and current industry application is still focusing on exploiting machine learning techniques. Behavioral Finance theory states financial decisions are afected by psychological behavior biases, which are generally identified via conversation with professional advisors. Besides, in a conversation customer actively express subjective requirements and interests, which cannot be known from their static structured data. Inspired by that, we propose an innovative heterogeneous conversational recommender system (HConvoNet) which will consider not only customer's static profile but also the implicit behavior biases and interests, thus is adaptive to customer. The proposed framework consists of two modules: profile module and conversation module. The profile module aims to capture customer's important static needs, while the conversation module aims to extract behavior biases and dynamic interests. By integrating profile module and conversation module, HConvoNet can recommend financial products in an adaptive way. The experiments are conducted on three internal datasets from Ping An Insurance and try to predict customer's purchase intention. We compare our model with several baselines and see that our proposed model has a significant improvement.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Recommender systems; •
Computing methodologies → Information extraction;
1</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>Recommender Systems are extensively used in various areas. Most
of the researches focus on collaborative-filtering and content-based
ifltering. Collaborative-filtering assumes that users agreed in the
past will like similar items. Content-based filtering tries to
recommend similar items the user liked in the past. E-commerce
companies like Amazon, ebay and Alibaba use well-developped
collaborative-filtering algorithms to recommend products. Video
and music websites like Youtube and Spotify use content-based
ifltering to recommend playlists.</p>
      <p>In recent years, the research has extended to recommend
financial products and insurances (we will call them together as
“financial products”). Financial products recommendation is quite
diferent from the above mentioned recommendation. E-commerce
companies usually have large amount of data and frequent user
actions. While, financial products have fewer available items and
are not frequently purchased. Besides, they are usually more
expensive and subject to user specific constraints. Knowledge-based
Recommender System is a specific type of Recommender System
which uses knowledge base and user profile to make
personalized recommendation. It is typically applied in the domains where
collaborative-filtering and content-based filtering cannot be applied,
such as financial products recommendation. Most of the current
studies on this topic are still in the scope of constraint-based or
case-based reasoning. In practice, building knowledge base is
complicated and costly, thus the practical application still relies on
exploiting customer profiles using machine learning techniques for
the simplicity, robustness and good explanation, such as Random
Forest and Generalized Linear Models.</p>
      <p>
        Recommending financial products requires thorough
understanding about financial decision-making process. Behavioral Finance[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
studies the psychology of financial decision-making process. It
states that market participants are not rational and are subject to
multiple behavior biases, which further afect the decision-making.
Some typical biases observed are overconfidence bias, herding bias
and status quo bias. Overconfidence bias occurs when market
participants overestimate their intuitive ability and underestimate risk.
Herding is when individuals follow the crowd’s decision. Status
quo bias refers to the tendency to stay in current status and
unwillingness to make changes.
      </p>
      <p>Financial advisors usually identify behavior biases from customer
statements and question-askings in a conversation with them. The
optimal suggestions will be given by taking the behavior biases
into account. Besides, we believe more dynamic interests and
subjective requirements can be observed in a conversation. Inspired
by that, in this paper we propose a heterogeneous conversational
recommender system (HConvoNet), which integrates unstructured
conversation with structured profile and make more adaptive
recommendations. In brief, our proposed framework consists of two
modules: customer profile module and conversation module. The
profile module aims to capture customer’s important static needs,
while the conversation module aims to extract behavior biases and
dynamic interests. This is feasible since most companies have stored
huge amount of conversation data from routine businesses, like
telemarketing.</p>
      <p>
        We model the structured profile data in a deep way, adopting
DeepFM framework[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. To capture the information embedded in
a conversation comprehensively, we build the architecture using
a two-level bidirectional Gated Recurrent Unit (GRU) with
selfattention mechanism. The lower level encodes each single utterance
and the upper level encodes the whole conversation considering
contextual interactions among utterances.
      </p>
      <p>We conduct the experiments on three internal datasets from Ping
An Insurance, ESB, Wuyou and Anxin, which are popular insurance
products in Ping An Insurance. In a conversation between insurance
agent and customer, agent usually asks multiple questions in order
to infer the insurance needs and preferences. The objective is to
predict customer’s purchase intention. The baseline models include
industry popular methods and some variants of HConvoNet. Results
show that our proposed model has a significant improvement over
the baselines.</p>
      <p>To summarize, we make the following main contributions:
• We propose an innovative heterogeneous conversational
recommender system (HConvoNet) for financial products,
which adapts customer behavior biases and dynamic
interests.
• The proposed HConvoNet integrates structured customer
profile data and unstructured conversation data and adopts
cutting-edge NLP techniques.
• The proposed HConvoNet can be applied to most practical
cases and has huge commercial value.
2</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>In this paper, we propose an innovative heterogeneous
conversational recommender system (HConvoNet) for financial products.
The most related domains are recommender system and textual
information extraction. In this section, we will discuss the related
work.</p>
      <p>
        Recommender System. Factorization Machines[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] is a
classical approach to model feature interactions using factorized
parameters. Field-aware FM[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is one of the variants of FM, which adds
the field index into feature space. FNN[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], Wide &amp; Deep[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
DeepFM[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] are examples of using deep neural networks to learn
more complex feature interactions. Deep learning techniques have
also been applied to collaborative-filtering and content-based
recommendation like [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] exploited RNN to develop a
session-based recommender system. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] uses RNN to build a
recommender system for movie recommendation. Google develops a
twostage deep learning framework for YouTube video recommendation[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] propose hybrid models, which use deep learning to
learn features of various domains.
      </p>
      <p>
        However, most of the researches are exploiting the objective
item/user nature and ignore unstructured data, which is subjective
to user and afect the decision. There are some work focus on
mining short text review to capture user sentiment like [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], but
these approaches are not suitable for financial products.
      </p>
      <p>
        Textual Information Extraction. Recurrent neural networks
(RNN) is a standard way to extract sequential information. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
extends RNN to a bidirectional RNN. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposes the framework
of long short-term memory (LSTM). Gated recurrent unit (GRU),
proposed by [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is seen as a better network to capture long
sequential relationships. All these networks have succeeded in many
natural language processing tasks. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] uses LSTM for sentiment
analysis. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] proposes to use biLSTM to extract relationship. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
uses biLSTM for speech recognition. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] uses GRU for emotion
recognitition and [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] uses GRU for document classification.
3
3.1
      </p>
    </sec>
    <sec id="sec-4">
      <title>PROPOSED FRAMEWORK</title>
    </sec>
    <sec id="sec-5">
      <title>Problem Definition</title>
      <p>The dataset contains unstructured conversation transcripts and
structured profile data, D = {Cs , Ps , ys }sN=1, where N is the number
of samples, Cs , Ps and ys represent the conversation transcript,
structured profile data and label of sample s respectively. Each
conversation contains multiple utterances said either by the agent
or customer, C = {ui }in=1, where ui represents utterance i and n is
number of utterances in the conversation. Each utterance consists of
multiple words, ui = {wi, j }jK=i1, where Ki is the number of words in
utterance i. We aim to use heterogeneous data to predict customer’s
preference.</p>
      <p>The overall architechture of our proposed framework can be
seen in Figure 1. The framework can be explained in three parts:
the profile module, conversation module and fusion part. We will
clarify each one in the following content.
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Profile Module</title>
      <p>
        The profile module takes the form of DeepFM[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which has two
parts: FM part and DNN part.
      </p>
      <p>
        FM part. FM is good at handling sparse data and can model the
ifrst-order impact and second-order interactions among all features.
According to [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], FM can be expressed as:
d
Õ
d
Õ
j1=1 j2=j1+1
yF M =&lt; ω, x &gt; +
&lt; Vi , Vj &gt; xj1 · xj2
(1)
where ω and Vi are parameters to estimate, ω ∈ Rd , Vi ∈ Rk (k is
given as the feature embedding size) and &lt;, &gt; is the dot product.
      </p>
      <p>Profile
Module</p>
      <p>yprof</p>
      <p>Profile Embedding
FM Part
1st-order
2nd-order</p>
      <p>DNN Part
DenseE1</p>
      <p>DenseE2
...</p>
      <p>DenseEm
...</p>
      <p>Field m</p>
      <p>Softmax
Fully Connected</p>
      <p>Conversation
Module
h1
h1
u1
h2,1
h2,1
e(w2,1)</p>
      <p>yconvo
Conversation Embedding
Self-Attention with Max Pool</p>
      <p>Bidirectional</p>
      <p>GRU
...</p>
      <p>Utterance
Embedding
...</p>
      <p>hn
hn
un
h2
h2
u2
h2,2
h2,2
Self-Attention with Max Pool</p>
      <p>Bidirectional
GRU
h2,k
h2,k
e(w2,2) ... Word</p>
      <p>Embedding... e(w2,k)</p>
      <p>Conversation
Level
Utterance
Level</p>
      <p>DNN part. DNN part models the more complex non-linear
interactions among feature embeddings. Feed the output of embedding
layer into the deep neural network and follow the forward process:
al +1 = σ (Wfl · al + bfl )
where σ is the activation function, l is the layer depth, Wf is the
weight matrix and bf is the bias. We use Relu as the activation
function and take output of the last layer aL as DNN part representation
yD N N .</p>
      <p>The final representation of the customer profile is the
concatenation of both FM part and DNN part.</p>
      <p>ypr of = [yF M ; yD N N ]</p>
    </sec>
    <sec id="sec-7">
      <title>3.3 Conversation Module</title>
      <p>
        The conversation module takes advantage of the cutting-edge
natural language processing techniques. It can be seen as a two-level
bidirectional GRU. The lower level encodes each single utterance
and the upper level encodes the whole conversation considering
contextual interactions among utterances. Besides, we propose the
use of self-attention mechanism[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to focus on more important
information in utterance level embedding and conversation level
embedding.
      </p>
      <p>Uterance level . Suppose a single utterance ui contains Ki words,
ui = {wi, j }jK=i1, where Ki is the number of words in utterance i. For
(2)
(3)
each word wi, j , we have:
→ →
h i, j = GRU (e(wi, j ), h i, j−1)
←hi, j = GRU (e(wi, j ), ←hi, j+1) (5)
where e(wi, j ) is the word embedding obtained from pre-trained
word embeddings. The forward and backward hidden states are
→ ←
concatenated into hi, j = [ h i, j ; h i, j ]. Suppose the dimension of a
unidirectional hidden state is m. Then hi, j has a dimension of 2m.</p>
      <p>
        We apply self-attention mechanism[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to the concatenated
hidden states to pay more attention to important words. We denote
Hi = (hi,1; hi,2; ...; hi, j ; ...; hi,Ki ), where Hi ∈ RKi ×2m . The weight
matrix in self-attention mechanism is calculated as :
      </p>
      <p>Ai = so f tmax</p>
      <p>Hi · HiT !
√2m
where Ai ∈ RKi ×Ki and √2m is a scale factor. The self-attended
hidden states for words is then computed as:</p>
      <p>Hisa = Ai · Hi (7)
where Hisa will have the same shape of Hi , which is Ki × 2m. The
single utterance embedding is then obtained by max-pooling over
all words’ self-attended hidden states:</p>
      <p>e(ui ) = maxpool (Hisa )
where e(ui ) ∈ R2m .
(4)
(6)
(8)
Conversation level. We find the conversation embedding by
a similar way as utterance embedding. Suppose a conversation
consists of n utterances. We feed utterance embeddings obtained
from the previous step into another bidirectional GRU:
→ →
h i = GRU (e(ui ), h i−1)
←hi = GRU (e(ui ), ←hi+1) (10)
We concatenate the forward and backward hidden states hi =
→ ←
[ h i ; h i ] and represent all concatenated hidden states as a n × 2m
matrix H .</p>
      <p>Again, utterances are not of the same importance. We apply
self-attention mechanism to learn the relative weights:
A = so f tmax</p>
      <p>H · HT
√2m
where √2m is a scale factor. The self-attended hidden states matrix
for utterances is then computed as:</p>
      <p>Hisa = A · H
The final conversation embedding is obtained by max-pooling over
all utterances’ hidden states:
(12)
yconvo = maxpool (H sa )
3.4</p>
    </sec>
    <sec id="sec-8">
      <title>Making Prediction</title>
      <p>To generate the prediction, we concatenate the outputs from both
profile module and the conversation module and feed into a
FullyConnected layer followed by a softmax function:</p>
      <p>yf inal = [ypr of ; yconvo ]
yˆ = so f tmax (W · yf inal + b)
The categorical cross-entropy is used as the loss function:
loss = −
ÕN C</p>
      <p>Õ yi, jloд(yiˆ, j )
i=1 j=1
where yi, j and yiˆ, j are the groundtruth and prediction.
4
4.1</p>
    </sec>
    <sec id="sec-9">
      <title>EXPERIMENTS</title>
    </sec>
    <sec id="sec-10">
      <title>Dataset</title>
      <p>We conduct our experiments on three internal datasets from Ping
An Insurance, ESB, Wuyou and Anxin. ESB, Wuyou and Anxin
are three popular insurance products in Ping An Insurance. ESB
is a kind of medical insurance. Wuyou is an universal insurance
product, which has some investment feature. Anxin is an accident
insurance. All three datasets contain unstructured conversation
data and structured customer profile data. Labels are collected
according to customer’s purchase records after conversation within 15
days. The objective is to predict the customer’s purchase intention,
given his profile and conversation data. The time window of our
datasets is May 2019. Due to the unbalanced distribution, we further
downsample datasets to a rough ratio of 1:5. Table 1 provides the
detailed information about each dataset. We randomly take 80% as
the training set and 20% as the test set. We further partition the
training set into development set and validation set with a 80/20
ratio.
(9)
(11)
(13)
(14)
(15)
(16)
We follow the general feature engineering process to preprocess
the structured data. We preprocess the conversation data by the
following steps: (1) We first extract the textual transcripts of the
conversation audio using Automatic Speech Recognition (ASR)
technique and clean the data due to some noises introduced by
the previous step; (2) We segment each utterance into tokens by
jieba package and add some business terminologies; (3) We remove
all non-alphanumerics, stop words and the words with frequency
lower than two; (4) We use the publicly available 300-dimensional
word2vec1 vectors trained on a large corpus across various domains.
Words not in word2vec are randomly initialized.</p>
      <p>
        Training. We adopt Adam[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] as the optimizer and set the
initial learning rate to 2 ∗ 10−4. An annealing strategy is utilized by
decaying the learning rate by half every 20 epochs. For
regularization purpose, we apply dropout with a rate of 0.5. Early stopping
with a patience of 10 is adopted to terminate training based on the
F-measure of the validation set.
      </p>
    </sec>
    <sec id="sec-11">
      <title>Evaluation Metrics</title>
      <p>We adopt F-measure as our evaluation metric. F-measure is the
harmonic average of precision and recall and is often used for
measuring performance in industry and many research fields.
4.6</p>
    </sec>
    <sec id="sec-12">
      <title>Results</title>
      <p>We also test the impact of self-attention and diferent pooling
method. Table 3 presents the performances on three datasets. We
see HConvoNet achieves better performance over
HConvoNetnsa, indicating the efects of self-attention mechanism. Comparing
meanpool and maxpool, we find that the diference is negligible.
Our proposed HConvoNet succeeds in most cases.
5</p>
    </sec>
    <sec id="sec-13">
      <title>CONCLUSIONS</title>
      <p>In this paper, we propose an innovative heterogeneous
conversational recommender system (HConvoNet) for financial products.
We improve the traditional recommendation by integrating
unstructured conversation data with structured profile data, thus
considering customer static needs, behavior biases and dynamic
interests. Future work could include exploring diferent methods to
fuse heterogeneous data and involving multiple modalities of the
conversation, like audio.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Heng-Tze</surname>
            <given-names>Cheng</given-names>
          </string-name>
          , Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye,
          <string-name>
            <surname>Glen Anderson</surname>
            , Gregory S. Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and
            <given-names>Hemal</given-names>
          </string-name>
          <string-name>
            <surname>Shah</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Wide &amp; Deep Learning for Recommender Systems</article-title>
          . In DLRS@RecSys.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Kyunghyun</given-names>
            <surname>Cho</surname>
          </string-name>
          , Bart van Merrienboer,
          <string-name>
            <surname>Dzmitry Bahdanau</surname>
            , and
            <given-names>Yoshua</given-names>
          </string-name>
          <string-name>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>On the Properties of Neural Machine Translation: Encoder-Decoder Approaches</article-title>
          .
          <source>ArXiv abs/1409</source>
          .1259 (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Covington</surname>
          </string-name>
          , Jay Adams, and
          <string-name>
            <given-names>Emre</given-names>
            <surname>Sargin</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Deep Neural Networks for YouTube Recommendations</article-title>
          . In RecSys.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Alex</given-names>
            <surname>Graves</surname>
          </string-name>
          ,
          <source>Navdeep Jaitly, and Abdel rahman Mohamed</source>
          .
          <year>2013</year>
          .
          <article-title>Hybrid speech recognition with Deep Bidirectional LSTM</article-title>
          .
          <source>2013 IEEE Workshop on Automatic Speech Recognition and Understanding</source>
          (
          <year>2013</year>
          ),
          <fpage>273</fpage>
          -
          <lpage>278</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Huifeng</given-names>
            <surname>Guo</surname>
          </string-name>
          , Ruiming Tang, Yunming Ye,
          <string-name>
            <given-names>Zhenguo</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Xiuqiang</given-names>
            <surname>He</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>DeepFM: A Factorization-Machine based Neural Network for CTR Prediction</article-title>
          .
          <source>ArXiv abs/1703</source>
          .04247 (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Balázs</given-names>
            <surname>Hidasi</surname>
          </string-name>
          , Massimo Quadrana, Alexandros Karatzoglou, and
          <string-name>
            <given-names>Domonkos</given-names>
            <surname>Tikk</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Parallel Recurrent Neural Network Architectures for Feature-rich Sessionbased Recommendations</article-title>
          . In RecSys.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jürgen</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Long Short-Term Memory</article-title>
          .
          <source>Neural Computation</source>
          <volume>9</volume>
          (
          <year>1997</year>
          ),
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Wenxiang</given-names>
            <surname>Jiao</surname>
          </string-name>
          , Haiqin Yang,
          <string-name>
            <given-names>Irwin</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Michael R.</given-names>
            <surname>Lyu</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>HiGRU: Hierarchical Gated Recurrent Units for Utterance-Level Emotion Recognition</article-title>
          .
          <source>In NAACL-HLT.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Yu-Chin</surname>
            <given-names>Juan</given-names>
          </string-name>
          , Yong Zhuang,
          <string-name>
            <surname>Wei-Sheng Chin</surname>
          </string-name>
          , and
          <string-name>
            <surname>Chih-Jen Lin</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Fieldaware Factorization Machines for CTR Prediction</article-title>
          . In RecSys.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Diederik</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kingma</surname>
            and
            <given-names>Jimmy</given-names>
          </string-name>
          <string-name>
            <surname>Ba</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Adam: A Method for Stochastic Optimization</article-title>
          .
          <source>CoRR abs/1412</source>
          .6980 (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Soujanya</surname>
            <given-names>Poria</given-names>
          </string-name>
          , Erik Cambria, Devamanyu Hazarika, Navonil Majumder, Amir Zadeh, and
          <string-name>
            <surname>Louis-Philippe Morency</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Context-Dependent Sentiment Analysis in User-Generated Videos</article-title>
          .
          <source>In ACL.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Preethi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Venkata</surname>
          </string-name>
          <string-name>
            <surname>Krishna</surname>
          </string-name>
          , Mohammad S. Obaidat, Vankadara Saritha, and
          <string-name>
            <given-names>Sumanth</given-names>
            <surname>Yenduri</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Application of Deep Learning to Sentiment Analysis for recommender system on cloud</article-title>
          .
          <source>2017 International Conference on Computer, Information and Telecommunication Systems (CITS)</source>
          (
          <year>2017</year>
          ),
          <fpage>93</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Stefen</given-names>
            <surname>Rendle</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <string-name>
            <given-names>Factorization</given-names>
            <surname>Machines</surname>
          </string-name>
          .
          <source>2010 IEEE International Conference on Data Mining</source>
          (
          <year>2010</year>
          ),
          <fpage>995</fpage>
          -
          <lpage>1000</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Mike</given-names>
            <surname>Schuster and Kuldip</surname>
          </string-name>
          <string-name>
            <given-names>K.</given-names>
            <surname>Paliwal</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Bidirectional recurrent neural networks</article-title>
          .
          <source>IEEE Trans. Signal Processing</source>
          <volume>45</volume>
          (
          <year>1997</year>
          ),
          <fpage>2673</fpage>
          -
          <lpage>2681</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Shefrin</surname>
          </string-name>
          and Oxford University Press.
          <year>2002</year>
          .
          <article-title>Beyond Greed and Fear: Understanding Behavioral Finance and the Psychology of Investing</article-title>
          . Oxford University Press. https://books.google.com/books?id=hX18tBx3VPsC
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Ashish</surname>
            <given-names>Vaswani</given-names>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
          <string-name>
            <given-names>Aidan N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Lukasz Kaiser, and
          <string-name>
            <given-names>Illia</given-names>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Attention Is All You Need</article-title>
          .
          <source>In NIPS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Hao</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Naiyan</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Dit-Yan Yeung</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Collaborative Deep Learning for Recommender Systems</article-title>
          . In KDD.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Xinxi</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ye</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Improving Content-based and Hybrid Music Recommendation using Deep Learning</article-title>
          .
          <source>In ACM Multimedia.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Chao-Yuan</surname>
            <given-names>Wu</given-names>
          </string-name>
          , Amr Ahmed, Alex Beutel,
          <string-name>
            <given-names>Alexander J.</given-names>
            <surname>Smola</surname>
          </string-name>
          , and
          <string-name>
            <given-names>How</given-names>
            <surname>Jing</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Recurrent Recommender Networks</article-title>
          .
          <source>In WSDM.</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Zichao</surname>
            <given-names>Yang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Diyi</given-names>
            <surname>Yang</surname>
          </string-name>
          , Chris Dyer, Xiaodong He,
          <string-name>
            <surname>Alexander J. Smola</surname>
          </string-name>
          , and
          <string-name>
            <surname>Eduard</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Hovy</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Hierarchical Attention Networks for Document Classification</article-title>
          . In HLT-NAACL.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Fuzheng</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Nicholas Jing Yuan, Defu Lian, Xing Xie, and
          <string-name>
            <surname>Wei-Ying Ma</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Collaborative Knowledge Base Embedding for Recommender Systems</article-title>
          . In KDD.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Weinan</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Tianming Du, and
          <string-name>
            <given-names>Jun</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Deep Learning over Multiifeld Categorical Data: A Case Study on User Response Prediction</article-title>
          .
          <source>ArXiv abs/1601</source>
          .02376 (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Lei</surname>
            <given-names>Zheng</given-names>
          </string-name>
          , Vahid Noroozi, and
          <string-name>
            <surname>Philip</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Joint Deep Modeling of Users and Items Using Reviews for Recommendation</article-title>
          .
          <source>In WSDM.</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Peng</surname>
            <given-names>Zhou</given-names>
          </string-name>
          , Wei Shi, Jun Tian, Zhenyu Qi,
          <string-name>
            <given-names>Bingchen</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Hongwei</given-names>
            <surname>Hao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Bo</given-names>
            <surname>Xu</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification</article-title>
          .
          <source>In ACL.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>