<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Neural Feature Embedding for User Response Prediction in Real-Time Bidding (RTB)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Enno Shioji</string-name>
          <email>1Enno.Shioji@adform.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Masayuki Arai</string-name>
          <email>2arai@ics.teikyo-u.ac.jp</email>
        </contrib>
      </contrib-group>
      <fpage>8</fpage>
      <lpage>13</lpage>
      <abstract>
        <p>In the area of ad-targeting, predicting user responses is essential for many applications such as Real-Time Bidding (RTB). Many of the features available in this domain are sparse categorical features. This presents a challenge especially when the user responses to be predicted are rare, because each feature will only have very few positive examples. Recently, neural embedding techniques such as word2vec which learn distributed representations of words using occurrence statistics in the corpus have been shown to be e ective in many Natural Language Processing tasks. In this paper, we use real-world data set to show that a similar technique can be used to learn distributed representations of features from users web history, and that such representations can be used to improve the accuracy of commonly used models for predicting rare user responses.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Predicting the probability of user response such as click, conversion etc. given
an ad-impression is crucial for many advertisement applications, such as
RealTime Bidding (RTB). Because of its e ciency, linear models such as logistic
regression are the most widely used for this purpose[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The models are
commonly trained on sparse categorical features such as user agent, IDs of visited
websites etc., which are encoded as sparse binary features via one-hot
encoding[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. One of the prominent problem with these models is the sparsity of data.
Especially when feature interaction is used, the feature representation becomes
extremely sparse, making it di cult to exploit the features e ciently. Moreover,
traditionally the industry has focused on predicting clicks, but recently the focus
has shifted to optimizing for other, much rarer user responses like conversions,
which exacerbates this problem[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We refer to this problem as feature sparsity
problem.
      </p>
      <p>
        A similar issue has been recognized in Natural Language Processing (NLP)[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Many mainstream models rely on bag-of-words representation, which su ers
from the same issue outlined above. Recently, neural embedding techniques
known as word2vec, paragraph2vec etc. that map words and documents into
low-dimensional vector space has been shown to yield state-of-the-art results in
various NLP tasks[
        <xref ref-type="bibr" rid="ref10 ref8">10, 8</xref>
        ]. In this approach, occurrence statistics in the corpus
is used learn distributed word representations that are much more amenable to
generalization.
      </p>
      <p>In this paper, we use real-world data set to show that a similar technique can
be applied to user response prediction in RTB. Similar to the situation in Natural
Language Processing, a large amount of user web history can be used to learn
high quality feature representations, which can then be used to predict (rare)
user responses. The technique was shown to improve the accuracy of commonly
used models, especially when labeled data was scarce.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>
        Various methods have been employed to address the feature sparsity problem.
For example, higher order category information derived from human annotation,
or from the data via unsupervised methods such as topic modelling, clustering
etc.[
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ] has been used to improve generalization. Other techniques such as
counting features can also help by allowing rare features to contribute jointly[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Another category of solutions involve embedding sparse categorical features
into low-dimensional vector space. Various feature transformation methods that
yield dense features has been investigated in conjunction with deep neural
networks resulting in improvement over major state-of-the-art models[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Zhang et
al. also investigated a framework they refer to as implicit look-alike modelling,
in which entities like users, web-pages, ads etc. are mapped into a latent
vector space using both general web browsing behavior and ad response behaviour
data[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        In this paper, we report initial results of applying a feature transformation
technique similar to neural word embedding to user response prediction in RTB.
The technique has been successfully applied to other domains, such as product
recommendation [
        <xref ref-type="bibr" rid="ref11 ref2">11, 2</xref>
        ]. The technique shares the bene ts of its counterpart in
NLP, such as the ability to encode feature sequences, the ability to
incrementally update the embeddings with new data, and the availability of numerous
improvements and extensions that have been developed since its advent. The
result opens up exciting opportunities to apply techniques that have been
successfully used with neural word embeddings, such as deep neural networks.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Neural Feature Embedding for User Response</title>
    </sec>
    <sec id="sec-4">
      <title>Prediction</title>
      <p>
        We rst provide a brief overview of the neural word embedding technique
developed by Mikolov et al[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. We consider one of its simplest form, the Continuous
Bag-of-Word Model (CBOW) with a single context window. Given a word t in
the corpus and a surrounding word c, we parametrise such that the conditional
probabilities p(tjc; ) is maximized for the corpus. p(tjc; ) can be modelled using
soft-max as follows:
(1)
(2)
p(tjc; ) =
      </p>
      <p>
        evt vc
where vt and vc 2 Rn are vector representations for t and c, and C is the set
of all available contexts. n is a hyper-parameter that determines the size of the
embedding, and is chosen empirically. Note that we use a distinct representation
for target and context, following the literature. This objective is straightforward
but expensive to calculate. To alleviate this problem, a technique called negative
sampling[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is used, wherein random pairs of (t; c) is sampled from the corpus,
assuming they are wrong. This yields the following objective:
arg max
      </p>
      <p>X log
(t;c)2D</p>
      <p>1
1 + e vt vc
+</p>
      <p>X
(t;c)2D0
log(</p>
      <p>1
1 + evt vc )
where D is the set of all target-context pair in the corpus and D0 are randomly
generated (t; c) pairs. The objective is now cheap to calculate.</p>
      <p>
        In this paper, we consider a dataset consisting of ad impressions. When an
ad is shown to a user, some of the browsing history of that user is available as
sequence of content IDs. It is thus relatively straightforward to apply techniques
such as CBOW[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], skip-gram[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] etc. to this data. For this experiment we chose
to discard the sequence of the content IDs and only use the co-occurrence
information. More speci cally, we generated our positive (t; c) pairs by randomly
sampling content IDs from the set of content IDs the user had consumed at the
time of the impression, and our negative pairs randomly from the corpus. It is
known that the probability distribution of such sampling in uences the quality
of the embeddings[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], but we used a uniform distribution for this initial
experiment. We then used the resulting content embeddings as features in our user
response model, for which we use logistic regression.
4
4.1
      </p>
    </sec>
    <sec id="sec-5">
      <title>Experiment and Discussion</title>
      <sec id="sec-5-1">
        <title>Dataset</title>
        <p>
          We used a real-world RTB dataset provided by Adform. Each record in the
data corresponds to an ad-impression, and is ordered chronologically. The record
consists of a binary label that indicates whether the user subsequently clicked
the ad (click), and a set of content IDs (content_ids) the user had consumed
in the past 30 days, up to the time of the impression. The data was taken from
Adform's impression logs of July 2016. Records for which no content_ids were
available were ltered out. Further, negative examples were down-sampled at
a rate of 0.01 as the data is extremely imbalanced. After the down-sampling,
there were 5.0M examples in total, with 1.1M positive examples. There were
891K distinct content IDs. A newer, larger version of the dataset with additional
elds has been published [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. The content_ids correspond to feature c9 in this
dataset.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Experiment Protocol</title>
        <p>
          The experiment consisted of an unsupervised stage and a supervised stage.
Unsupervised stage. Content embeddings were learned from content_ids as
described above. I.e. the click eld was discarded and not used for this stage.
Out of the 5.0M data instances, the oldest 4.0M were used for this stage. We
trained the embeddings with varying embedding sizes n (2k2[1: :7]). Tensor ow[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
was used to implement this stage.
        </p>
        <p>
          Supervised stage. In the supervised stage, binary classi ers that predicts
click were trained, using di erent features (see below). For all experiments,
Logistic Regression with L2 normalization was used. Out of the remaining 1.0M
data instances, the newest 30% (300K) were held-out as validation dataset. The
training was done with varying amounts of data (0.3K, 1K, 10K, 100K) that were
randomly sampled from the remaining data (700K). To evaluate the performance
of the models, area under the ROC curve (AUC) was used, which is a commonly
used metric for evaluating user response prediction models in RTB[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Grid
search was performed with varying regularization strength (10k2[ 2: :1]) and
embedding size, and the best result was used as measurement. scikit-learn[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] was
used for the implementation. Below is the list of features we compared:
SB: Sparse Binary. content_ids were encoded as Sparse Binary features
via one-hot encoding. This is our baseline.
        </p>
        <p>DR: Distributed Representation. Each dimension of the resulting
embeddings were scaled by its maximum absolute value. For each content_id
in content_ids, the corresponding embedding was looked up and the
mean of the embeddings were used as the feature vector. The resulting
feature vector had thus the same length n as the embeddings.</p>
        <p>SB+DR: Sparse Binary and Distributed Representation. The feature vector of</p>
        <p>SB and DR were concatenated.
4.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Performance Comparison and Discussion</title>
        <p>Table 1 shows the best results obtained for each condition using the
aforementioned grid-search. The results of SB+DR and DR is compared against SB (our
baseline). DR outperforms SB when training data is scarce. SB+DR outperforms
SB in all conditions, especially stronger when training data is scarcer. This is
likely because when training data is scarce, the sparsity issue is more acute and
thus the ability to generalize across features has a larger e ect. However, when
large amount of data is available, the lower-dimensional feature representation
of DR likely limits the degree of di erentiation between individual content IDs.
When SB and DR is concatenated, both advantages can be preserved.</p>
        <p>Figure 1 shows the di erence in AUC from the SB baseline for DR and
SB+DR, for varying embedding sizes (n). Increasing n improves AUC, but the
return diminishes after about 16 dimensions.
In this paper, we reported initial results of applying neural feature embedding
technique to user response prediction in RTB, using a real-world dataset. To
our best knowledge, this is the rst time this technique was applied to this
problem. We have demonstrated that the technique can improve performance
of commonly used model in the industry, especially when labeled data is scarce
and when thus the feature sparsity problem is most acute. The fact that large
amount of data can readily be used for training of the feature embeddings, and
that the commonly used logistic regression can be used at prediction time make
the result ideal for industrial implementation.</p>
        <p>
          The result also opens up exciting opportunities to apply improvements and
techniques that have been developed around neural word embeddings, such as
incorporating global context, using multiple representations per word[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ],
optimizing the embeddings for a speci c supervised task using target labels[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], using
a global log-bilinear regression instead of the earlier local context window
methods[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], applying deep neural networks on the embeddings etc.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barham</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghemawat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Irving</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kudlur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levenberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monga</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murray</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steiner</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tucker</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasudevan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warden</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wicke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Tensor ow: A system for large-scale machine learning</article-title>
          .
          <source>CoRR abs/1605</source>
          .08695 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Barkan</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koenigstein</surname>
          </string-name>
          , N.:
          <article-title>Item2vec: Neural item embedding for collaborative ltering</article-title>
          .
          <source>CoRR abs/1603</source>
          .04259 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ducharme</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vincent</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janvin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A neural probabilistic language model</article-title>
          .
          <source>J. Mach. Learn. Res</source>
          .
          <volume>3</volume>
          ,
          <issue>1137</issue>
          {1155 (Mar
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dalessandro</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hook</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perlich</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Provost</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Evaluating and Optimizing Online Advertising: Forget the Click, But There are Good Proxies</article-title>
          . Social Science Research Network Working Paper Series (Oct
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Atallah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herbrich</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bowers</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Candela</surname>
            ,
            <given-names>J.Q.n.</given-names>
          </string-name>
          :
          <article-title>Practical lessons from predicting clicks on ads at facebook</article-title>
          .
          <source>In: Proceedings of the Eighth International Workshop on Data Mining for Online Advertising</source>
          . pp.
          <volume>5</volume>
          :
          <issue>1</issue>
          {
          <issue>5</issue>
          :
          <fpage>9</fpage>
          . ADKDD'14,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>E.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          :
          <article-title>Improving word representations via global context and multiple word prototypes</article-title>
          . In:
          <article-title>Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long PapersVolume 1</article-title>
          . pp.
          <volume>873</volume>
          {
          <fpage>882</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Labutov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lipson</surname>
          </string-name>
          , H.:
          <article-title>Re-embedding words</article-title>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of sentences and documents</article-title>
          .
          <source>CoRR abs/1405</source>
          .4053 (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>CoRR abs/1301</source>
          .3781 (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>CoRR abs/1310</source>
          .4546 (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Nedelec</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smirnova</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasile</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Content2vec: Specializing joint representations of product images and text for the task of product recommendation</article-title>
          .
          <source>Unpublished Manuscript</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <volume>2825</volume>
          {
          <fpage>2830</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Shioji</surname>
          </string-name>
          , E.:
          <article-title>Adform click prediction dataset</article-title>
          .
          <source>Harvard Dataverse doi:10</source>
          .7910/DVN/TADBY7 (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Display advertising with real-time bidding (RTB) and behavioural targeting</article-title>
          .
          <source>CoRR abs/1610</source>
          .03013 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , W.,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Implicit look-alike modelling in display ads: Transfer collaborative ltering to CTR estimation</article-title>
          .
          <source>CoRR abs/1601</source>
          .02377 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Deep learning over multi- eld categorical data: A case study on user response prediction</article-title>
          .
          <source>CoRR abs/1601</source>
          .02376 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Real-time bidding benchmarking with ipinyou dataset</article-title>
          .
          <source>CoRR abs/1407</source>
          .7073 (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>