<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Attribution of Customers' Actions Based on Machine Learning Approach?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Timur Kadyrov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitry I. Ignatov</string-name>
          <email>dignatov@hse.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <addr-line>Russian Federation</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A multichannel attribution model based on gradient boosting over trees is proposed, which was compared with the state of the art models: bagged logistic regression, Markov chains approach, shapely value. Experiments on digital advertising datasets showed that the proposed model is better than the solutions considered by ROC AUC metric. In addition, the problem of probability prediction of conversion by the consumer using the ensemble of the analyzed algorithms was solved, the meta-features obtained were enriched with consumers and o ine activities of the advertising campaign data.</p>
      </abstract>
      <kwd-group>
        <kwd>Multi-touch attribution</kwd>
        <kwd>Gradient boosting</kwd>
        <kwd>Digital advertising</kwd>
        <kwd>Data-driven marketing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In recent years, the volume of advertising on the Internet has begun to reach the
volume of advertising on television [
        <xref ref-type="bibr" rid="ref1 ref3">1,3</xref>
        ], which together makes up 80% of the
advertising market. According to the annual issue of the World Survey of the
Entertainment and Media Industry, we should expect a decrease in the share of
television advertising and a signi cant increase in the share of online advertising
by 2021 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Changes in the advertising and marketing industries are
encouraging advertisers to pay particular attention to researching online advertising
campaigns. One of the e ective methods of researching advertising campaigns
on the Internet is multichannel attribution. In the past decade, due to the lack
of data on advertising campaigns, methods based on intuition and heuristics
were used to solve the multichannel attribution problem, which did not always
give adequate results. Currently, each advertiser uses the services of advertising
servers, which allow counting the number of user interactions with advertising,
such as impressions and clicks. Thanks to the data stored on advertising servers,
advertisers move away from heuristic-based attribution and solve this problem
? Copyright c 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
based on data. Which, in turn, allows you to build more complex and accurate
models based on machine learning. The data-driven approach includes several
well-known methods for solving problems that are part of the multichannel
attribution spectrum. These are machine learning methods such as Bagged Logistic
Regression, Hidden Markov Chains, Shapley value approach, Survival Analysis,
relative weights, and probabilistic approaches. Each of these methods has its
own advantages over similar methods of solving the problems of multichannel
attribution, which in turn allows you to choose one or another method based
on individual needs and goals of advertisers. However, it is worth noting that
today, in machine learning inventory, the Gradient Boosting is proven as one of
the most e ective algorithms. At the same time, we assume that the Gradient
Boosting method copes better with the tasks of multi-attribute attribution due
to unbalanced data sets for classi cation.
      </p>
      <p>The paper is organised as follows. Section 2 introduces mathematical
formalisation of the problem. Section 3 shortly describes how to address multichannel
attribution problem with such ensemble learning techniques as Gradient
Boosting over Decision Trees and Staking. Related work on both heuristic-based and
data-driven approaches is summarised in 4. Section 5 is devoted to machine
experiments with the data of three real Internet advertising campaigns. Finally,
Section 6 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Problem Statement</title>
      <p>Let us consider a set of users or clients U = fu1; : : : ; ung and a set of
advertisement channels C = fc1; : : : ; cpg. Our data are represented by a matrix-vector
pair (A; y), where A 2 Rn p is a matrix of user-to-channel interactions, and
its element aij shows the number of interactions of the user ui 2 U with the
advertising channel cj 2 C, while y is a binary vector of size n with yi = 1 for
happened conversion action and yi = 0 otherwise.</p>
      <p>Let the function f (Ai; ) describes a certain classi er that receives the vector
of interactions Ii of the consumer ui with advertising channels C as an input;
the function determines a certain value of r, which re ects the chance of a
conversion action by this consumer. Under certain conditions, the value r can
be transformed into a probability that a consumer will take a conversion action;
one of these conditions is monotonicity meaning the higher the value of r, the
higher the probability of a conversion. It is necessary to nd such parameters
under which the classi er will give the best probability estimates in terms of
the selected metrics. The weight of the in uence of the advertising channel on
the decision to perform the conversion action should be considered as follows: we
choose k consumers U 0 U , for them we get the matrix I0 2 Rk p. We introduce
an indicator function
1(ui; cj ) =
then we express the in uence of the advertising channel as
impact(cj ) =</p>
      <p>ui2U0
1 X f (I0; ) 1(ui; cj ):
k
Ensemble Learning for Multi-Channel Attribution</p>
      <sec id="sec-2-1">
        <title>Gradient Boosting over Decision Trees</title>
        <p>
          Boosting over decision trees is considered one of the most e cient machine
learning algorithms. The idea of boosting approach is as follows: it iteratively trains
new basic classi ers that improve the composition of previously chosen ones, i.e.
each new classi er compensates the errors of the composition of all the previously
ones. In turn, gradient boosting optimises the di erentiable loss function. The
initial idea of boosting arose from the question [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]: is it possible to get strong
classi er using many weak classi ers? Due to e ectiveness of this machine
learning approach, it is an important part of many search engines [
          <xref ref-type="bibr" rid="ref14 ref4">14,4</xref>
          ] and a tool of
choice for data science athletes that won many machine learning competitions
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. An additional motivation for using this machine learning approach comes
from the reduction of the task of multi-channel attribution to the binary
classi cation problem. We need to train a classi er that can sort out consumers
into two classes: those who will perform the conversion action and those who
will not. Since only 0.5% {2% of all consumers who saw ads reach conversion,
this classi cation problem contains highly unbalanced classes. Gradient boosting
over trees is just one of the techniques that address this class of problems well.
Here, we use one of the most successful implementations of gradient boosting
over trees named XGBoost.
3
3.1
3.2
4
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Other ensemble approaches and Stacking</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Today, there are several methods for solving the multichannel attribution
problem. They can be divided into two types: heuristic-based approaches and
databased approaches [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
4.1
      </p>
      <sec id="sec-3-1">
        <title>Heuristic approaches</title>
        <p>Let us consider heuristic-based multichannel attribution techniques.</p>
        <p>
          1. Last-touch attribution is the most common method of attribution, which is
based on intuition to a greater extent. This method assigns all the \weight" to the
last channel, after which the consumer completed conversion action. However,
in essence, the approach is erroneous [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], since it does not take into account the
e ect of other channels through which the consumer was attracted.
2. First-touch attribution is an approach, on the contrary, where the greatest
\weight" is given to the rst channel with which the consumer interacted.
Although this method may be useful for understanding how consumers are involved
in an advertising campaign, this method does not allow to correctly assess the
impact of advertising channels on consumers. For this reason, this approach is
used much less often than others.
        </p>
        <p>3. In case of linear-touch attribution, all \weights" are equally distributed
between all channels leading to the conversion.</p>
        <p>4. Time decay attribution is a model in which the largest \weights" are given
to the most recent consumer interactions with advertising channels.
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Bagged Logistic Regression</title>
        <p>
          In order to evaluate the impact of advertising channels on consumers'
decision to perform a conversion action, it is proposed to use logistic regression [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
Therefore, the task of multi-channel attribution is reduced to the classi cation
problem, in which all consumers are divided into two classes, those who
performed the conversion action and those who did not such an action within the
short term (during the analysis period).
        </p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] the authors propose the use of logistic regression because of its easily
interpretable coe cients. To cope with the problem of multicollinearity of
independent variables, the use of the bagging technique is proposed, which leads to
a stable and reproducible result, while maintaining an easy interpretation of the
usual logistic regression. The training of this meta-algorithm takes place in two
stages:
1. From the data set, the observations are selected in accordance with a
predetermined proportion. In the same way, characteristics with a share are
selected. For the selected observations, we obtain a new data set on which
the logistic regression is trained. The estimated logistic regression coe cients
are recorded.
2. Step 1 is repeated once. And the nal estimate of the logistic regression
coe cient is obtained by taking the average of all the coe cients obtained
in iterations.
        </p>
        <p>
          The proportions of observations and attributes as well as the number of
iterations M are hyper parameters of bagged-logistic regression. The authors
conclude that for fractions that are very di erent from 0 and 1, the results of
the meta-algorithm are similar, and the number of iterations does not a ect the
results [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
4.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Markov Chains approach</title>
        <p>
          In this approach, the authors propose using a graph model based on Markov
chains [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Markov chains are probabilistic models that can represent
relationships between sequences of observations of a random value.
        </p>
        <p>Visits to all consumers are presented as chains in the Markov graph. Formally,
this model can be formulated as follows:</p>
        <p>Let us consider a Markov graph M = (S; W ), which is composed from the
states S = fs1; : : : ; sng along with the associated transition matrix W with edge
weights wij = P (Xt = sijXt 1 = sj ), such that 0 wij 1 and PiN=1 wij = 1
for all i.</p>
        <p>Consumer chains contain one or more interactions with advertising channels.
In this model, each state corresponds to one advertising channel. Three special
states are also introduced: START is a state that represents the beginning of the
consumer chain, CONVERSION is a state indicating the successful completion of
the conversion action, and for chains that were not completed by the conversion
action, the NULL state is introduced.</p>
        <p>The element of the transition matrix wij corresponds to the probability that
after interacting with advertising channel i, interaction with advertising channel
j will follow. For the rst channel in the chain, an incoming connection with the
START state is added. If the consumers sequence of actions ends with a
conversion, then after the last interaction with the advertising channel, a connection
with the CONVERSION state is added. Otherwise, it falls into the NULL state,
and each CONVERSION state goes into the NULL state as well.</p>
        <p>Since the number of parameters in such a model grows exponentially with
the chain length, the authors limit themselves to a maximum chain order of four.</p>
        <p>To assess the impact of each advertising channel on the conversion action,
the authors propose using the e ect of removing the advertising channel si from
the model and tracking the change of the probability of reaching CONVERSION
from START state. Since this removal e ect re ects well the degree of change in
conversion, it can serve as an estimate of the contribution of each channel.
4.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Shapley value approach</title>
        <p>
          This attribution methodology is equivalent to the Shapely Value solution to
value distribution in Cooperative Game Theory [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. We refer interested readers
to [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] for a thorough treatment on the rst application of the Shapely Value
distribution methodology to value allocation in advertising attribution. Instead,
we would like to provide a reader with a tailored machine learning interpretation
of Shapley values named SHAP (SHapley Additive exPlanation) values [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>To compute SHAP values the authors de ne fx(S) = E[f (x) j xS ] where
E[f (x) j xS ] is the expected value of the function conditioned on a subset S of
the input features.</p>
        <p>SHAP values combine these conditional expectations with the classic Shapley
values from game theory to attribute i values to each feature:
i =</p>
        <p>X
S Nnfig
jSj!(M</p>
        <p>jSj
M !
1)!
(fx(S [ fig)
fx(S)) ;
(1)
where N is the set of all input features. In our advertising setting channels play
the role of features.
All experiments performed in this work were carried out on real data from
advertising campaigns of advertisers from the food industry and the production of
consumer goods. The experiments were carried out on three data sets.</p>
        <p>Each data set contains data on the interaction of consumers with an
advertising message with a time stamp. For each consumer and advertising message, it is
known on which site the interaction took place, the category of the advertising
message (online video, promo post on the social network, banner advertising,
etc.), creative ID, consumer action category (ad display, click, conversion), time
stamp, the identi er of the location of advertising on the site.</p>
        <p>It is worth noting that two of the three data sets have a great advantage over
the third. In fact, most data on advertising campaigns is collected only in special
advertising servers, where each consumer is identi ed by cookie. This approach
has several disadvantages:
{ A consumer may have multiple cookies, for example, when a consumer uses
multiple devices;
{ One cookie may belong to multiple consumers, for example, when more than
one person uses the same device
{ Advertising servers may \nullify" some cookies when trying to use
thirdparty targeting services;
{ In such data, many cookies may be assigned to crawlers, not real persons;
{ Browsers update cookies every month.</p>
        <p>To cope with the above problems, each cookie is matched with the account
of a real person on the advertiser's website and only after that the consumer ID
is used.</p>
        <p>Below are the statistics for each advertising campaign.</p>
        <p>For advertising campaign 2 additional data about consumers are known: city
of residence, region of residence, platform used (web, iOS, Android, etc.), browser
used, categories of purchased items, data on the advertiser's o ine activities
(TV advertising, outdoor advertising, etc.). These additional data were used in
conjunction with meta-attributes to train the meta-algorithm.</p>
        <p>As follows from Table 5, in our case, the proposed method for solving the
multichannel attribution problem provides better results than the
state-of-theart solutions according to the ROC AUC metric. However, it can be seen from
In this work, the problem of multichannel attribution was considered, which is
devoted to assessing the impact of advertising channels on consumer conversion
actions. In the scienti c literature, the problem of multichannel attribution is
usually divided into two approaches: heuristic, attribution based on intuition,
and a data-driven approach. Due to the growing interest and development of
cloud technologies in the modern world, many advertisers are beginning to
collect more data about their advertising campaigns, which makes data-driven
approaches the most popular.</p>
        <p>We considered di erent approaches to multi-channel attribution: the
approach based on game theory, the approach based on Markov chains, and Bagging
of logistic regressions among the others.</p>
        <p>To solve the multichannel attribution problem, we used Gradient Boosting
method and an ensemble of classical algorithms to increase the accuracy of
predicting the probability of a conversion action by a consumer.</p>
        <p>The quality of the proposed algorithms was compared with conventional
solutions of the multichannel attribution problem for three real data sets. The
proposed solution gave the best result among all in terms of ROC AUC. Thus,
the used ensemble classi cation has improved the ability to estimate the
likelihood of a consumer to perform a conversion action.</p>
        <p>
          One of the direction for future work might be analyses of market segments
(represented by groups of users) with respect to channels where the positive
response was recorded by means of abject-attribute biclustering [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
Acknowledgements. The study was implemented in the framework of the
Basic Research Program at the National Research University Higher School of
Economics (Sections 2 and 5), and funded by the Russian Academic
Excellence Project '5-100'. The second author was also supported by Russian Science
Foundation (Section 1, 3, and 4) under grant 17-11-01276 at St. Petersburg
Department of Steklov Mathematical Institute of Russian Academy of Sciences,
Russia.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>The volume of advertising in its distribution media in 2017</article-title>
          . https://www. akarussia.ru/knowledge/market_size/id8180 {
          <article-title>O cial website of the Association of Communication Agencies of Russia (</article-title>
          <year>2017</year>
          ), [Online; last accessed 30-Sep2019]
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>2. The O cial Blog of Kaggle.com</article-title>
          . http://blog.kaggle.com/tag/xgboost (
          <year>2019</year>
          ), [Online; last accessed 30-Sep-2019]
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <article-title>The volume of advertising in the means of its distribution in the rst half of 2019</article-title>
          . http://www.akarussia.ru/knowledge/market_size/id8955 {
          <article-title>O cial website of the Association of Communication Agencies of Russia (</article-title>
          <year>2019</year>
          ), [Online; last accessed 30-Sep-2019]
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Amini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Truong</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goutte</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A boosting algorithm for learning bipartite ranking functions with partially labeled data</article-title>
          .
          <source>In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2008</year>
          , Singapore,
          <source>July 20-24</source>
          ,
          <year>2008</year>
          . pp.
          <volume>99</volume>
          {
          <issue>106</issue>
          (
          <year>2008</year>
          ). https://doi.org/10.1145/1390334.1390354, https://doi.org/10. 1145/1390334.1390354
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Anderl</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Becker</surname>
          </string-name>
          , I.,
          <string-name>
            <surname>von Wangenheim</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schumann</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          :
          <article-title>Mapping the customer journey: A graph-based framework for online attribution modeling (</article-title>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dalessandro</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perlich</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stitelman</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Provost</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Causally motivated attribution for online advertising</article-title>
          .
          <source>In: Proceedings of the Sixth International Workshop on Data Mining for Online Advertising and Internet Economy</source>
          . pp.
          <volume>7</volume>
          :
          <issue>1</issue>
          {
          <issue>7</issue>
          :
          <fpage>9</fpage>
          . ADKDD '12,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2012</year>
          ). https://doi.org/10.1145/2351356.2351363, http://doi.acm.
          <source>org/10</source>
          .1145/ 2351356.2351363
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. van Eeden,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Chow</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          :
          <source>Perspectives from the Global Entertainment &amp; Media Outlook</source>
          <year>2019</year>
          {
          <year>2023</year>
          . https://www.pwc.com/gx/en/industries/tmt/media/ outlook.html (
          <year>2019</year>
          ), [Online; last accessed 30-Sep-2019]
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ignatov</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poelmans</surname>
          </string-name>
          , J.:
          <article-title>Concept-based biclustering for internet advertisement</article-title>
          .
          <source>In: 12th IEEE International Conference on Data Mining Workshops</source>
          , ICDM Workshops, Brussels, Belgium, December
          <volume>10</volume>
          ,
          <year>2012</year>
          . pp.
          <volume>123</volume>
          {
          <issue>130</issue>
          (
          <year>2012</year>
          ). https://doi.org/10.1109/ICDMW.
          <year>2012</year>
          .
          <volume>100</volume>
          , https://doi.org/ 10.1109/ICDMW.
          <year>2012</year>
          .100
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lovett</surname>
          </string-name>
          , J.:
          <article-title>Attribution Methods and Models: A Marketers Framework</article-title>
          . Web Analytics Demysti ed, Inc. http://media.dmnews.com/documents/52/attribution_ methods_and_models_12971.
          <string-name>
            <surname>pdf</surname>
          </string-name>
          (
          <year>2014</year>
          ), [Online; last accessed 30-Sep-2019]
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lundberg</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.I.:</given-names>
          </string-name>
          <article-title>A uni ed approach to interpreting model predictions</article-title>
          . In: Guyon,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.V.</given-names>
            ,
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Vishwanathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Garnett</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          , pp.
          <volume>4765</volume>
          {
          <fpage>4774</fpage>
          . Curran Associates, Inc. (
          <year>2017</year>
          ), http://papers.nips.cc/paper/ 7062-a
          <article-title>-unified-approach-to-interpreting-model-predictions</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Schapire</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          :
          <article-title>The strength of weak learnability</article-title>
          .
          <source>Machine Learning</source>
          <volume>5</volume>
          ,
          <volume>197</volume>
          {
          <fpage>227</fpage>
          (
          <year>1990</year>
          ). https://doi.org/10.1007/BF00116037, https://doi.org/10.1007/ BF00116037
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Shao</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Data-driven multi-touch attribution models</article-title>
          .
          <source>In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , San Diego, CA, USA,
          <year>August</year>
          21-
          <issue>24</issue>
          ,
          <year>2011</year>
          . pp.
          <volume>258</volume>
          {
          <issue>264</issue>
          (
          <year>2011</year>
          ). https://doi.org/10.1145/2020408.2020453, https://doi.org/10. 1145/2020408.2020453
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Shapley</surname>
            ,
            <given-names>L.S.:</given-names>
          </string-name>
          <article-title>A value for n-person games</article-title>
          .
          <source>Contributions to the Theory of Games</source>
          <volume>2</volume>
          (
          <issue>28</issue>
          ),
          <volume>307</volume>
          {
          <fpage>317</fpage>
          (
          <year>1953</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Valizadegan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Zhang, R.,
          <string-name>
            <surname>Mao</surname>
          </string-name>
          , J.:
          <article-title>Learning to rank by optimizing NDCG measure</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December</source>
          <year>2009</year>
          , Vancouver, British Columbia, Canada. pp.
          <year>1883</year>
          {
          <year>1891</year>
          (
          <year>2009</year>
          ), http://papers.nips.cc/paper/ 3758
          <article-title>-learning-to-rank-by-optimizing-ndcg-measure</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wolpert</surname>
            ,
            <given-names>D.H.</given-names>
          </string-name>
          :
          <article-title>Stacked generalization</article-title>
          .
          <source>Neural Networks</source>
          <volume>5</volume>
          (
          <issue>2</issue>
          ),
          <volume>241</volume>
          {
          <fpage>259</fpage>
          (
          <year>1992</year>
          ). https://doi.org/10.1016/S0893-
          <volume>6080</volume>
          (
          <issue>05</issue>
          )
          <fpage>80023</fpage>
          -
          <lpage>1</lpage>
          , https://doi.org/10. 1016/S0893-
          <volume>6080</volume>
          (
          <issue>05</issue>
          )
          <fpage>80023</fpage>
          -
          <lpage>1</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Multi-touch attribution in online advertising with survival theory</article-title>
          .
          <source>In: 2014 IEEE International Conference on Data Mining, ICDM</source>
          <year>2014</year>
          , Shenzhen, China,
          <source>December 14-17</source>
          ,
          <year>2014</year>
          . pp.
          <volume>687</volume>
          {
          <issue>696</issue>
          (
          <year>2014</year>
          ). https://doi.org/10.1109/ICDM.
          <year>2014</year>
          .
          <volume>130</volume>
          , https://doi.org/10.1109/ICDM.
          <year>2014</year>
          . 130
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>