<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Uplift Modeling with ROC: An SRL Case Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Houssam Nassif</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Finn Kuusisto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elizabeth S. Burnside</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jude Shavlik</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Wisconsin</institution>
          ,
          <addr-line>Madison</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>40</fpage>
      <lpage>45</lpage>
      <abstract>
        <p>Uplift modeling is a classi cation method that determines the incremental impact of an action on a given population. Uplift modeling aims at maximizing the area under the uplift curve, which is the di erence between the subject and control sets' area under the lift curve. Lift and uplift curves are seldom used outside of the marketing domain, whereas the related ROC curve is frequently used in multiple areas. Achieving a good uplift using an ROC-based model instead of lift may be more intuitive in several areas, and may help uplift modeling reach a wider audience. We alter SAYL, an uplift-modeling statistical relational learner, to use ROC instead of lift. We test our approach on a screening mammography dataset. SAYL-ROC outperforms SAYL on our data, though not significantly, suggesting that ROC can be used for uplift modeling. On the other hand, SAYL-ROC returns larger models, reducing interpretability.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Uplift modeling is a modeling and classi cation method initially used in
marketing to determine the incremental impact of an advertising campaign on a given
population [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Seminal work includes Radcli e and Surry's true response
modeling [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Lo's true lift model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and Hansotia and Rukstales' incremental value
modeling [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In some applications, especially medical decision support systems,
gaining insight into the underlying classi cation logic can be as important as
system performance. Reviewing the classi cation logic in medical problems can
be an important method to discover disease patterns that may not be known
or easily otherwise gleaned from the data. Such insight can be achieved using
rule-learners. Decision trees [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ], inductive logic programming (ILP) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and
statistical relational learning (SRL) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] methods have been proposed.
      </p>
      <p>Uplift modeling aims at maximizing uplift, which is the di erence in a model
or intervention M 's lift scores over the subject and control sets:
U plif tM = Lif tM (subject)</p>
      <p>
        Lif tM (control):
(1)
Given a fraction such that 0 1, a model M 's lift is de ned as the
number of positive examples amongst the model's -highest ranking examples.
Uplift thus captures the additional number of positive examples obtained due to
the intervention. Quality of an uplift model is often evaluated by computing an
uplift curve [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], generated by ranging from 0 to 1 and plotting U plif tM . The
higher the uplift curve, the more pro table a marketing model/intervention is.
The area under the uplift curve (AUU) is often used as a metric to optimize.
      </p>
      <p>Let P be the number of positive examples and N the number of negative
examples in a given dataset D. Lift represents the number of true positives
detected by model m amongst the top-ranked fraction . Varying 2 [0; 1]
produces a lift curve. The area under the lift curve (AUL) for a given model and
data becomes:</p>
      <p>Z
AU L =</p>
      <p>Lif t(D; )d
1 P +N</p>
      <p>X ( k+1
2
k=1
k)(Lif t(D; k+1) + Lif t(D; k)) (2)</p>
      <p>Let s be the subject set, and c the controls. For a given , we can rewrite
equation 1 as:</p>
      <p>U plif tM ( ) = Lif tM (s; )</p>
      <p>Lif tM (c; ):
Since uplift is a function of a single value for , the area under the uplift curve
(AUU) is the di erence between the areas under the lift curves (AUL) for the
subjects and the controls, (AU L):</p>
      <p>AU U = AU Ls</p>
      <p>AU Lc =
(AU L):</p>
      <p>
        Lift and uplift curves are seldom used outside of the marketing domain,
whereas the related ROC curve is frequently used in the machine learning and
biomedical informatics communities. Especially in the biomedical domain, using
ROC may be more intuitive, and may help uplift modeling reach a wider
audience. This work investigates the use of the area under the ROC curve (AUR)
as an alternate scoring method, while still resulting in a good model uplift. We
alter SAYL [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the state-of-the-art relational uplift modeling algorithm, to
select rules that optimize (AU R) instead of (AU L). We test our approach on
a screening mammography dataset.
2
      </p>
      <p>Lift and ROC Area Under the Curve
There is a strong connection between AUL and AUR. Let
probability for the positive class or skew, then:</p>
      <p>P
= P +N be the prior
(</p>
      <p>2
AU L = P
+ (1</p>
      <p>) AU R) [11; p:549]:</p>
      <p>
        Uplift modeling aims at optimizing uplift, the di erence in lift over two sets.
It constructs a new classi er such that:
(AU L ) &gt;
(AU L)
As discussed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], by expanding and simplifying we get:
      </p>
      <p>AU Ls</p>
      <p>AU Lc &gt; AU Ls</p>
      <p>AU Lc
Ps( s + (1
2
s)AU Rs )</p>
      <p>Pc( c</p>
      <p>2
Ps(1
s)AU Rs
Ps(1</p>
      <p>Pc(1
(3)
(4)
(5)
(6)
and nally</p>
      <p>AU Rs
AU Rc</p>
      <p>AU Rs &gt; Pc 1
AU Rc Ps 1
c :
s</p>
      <p>In a balanced dataset, we have c = s = 12 and Pc = Ps, so we have that
Pc 1 c = 1. If the subject and control sets have the same numbers and skew,
Ps 1 s
we can conclude that (AU L ) &gt; (AU L) implies (AU R ) &gt; (AU R).
If the two sets are skewed or their numbers di er, we cannot guarantee that
(AU L ) &gt; (AU L) implies (AU R ) &gt; (AU R), as we can increase uplift
with rules that have similar accuracy but cover more cases in the positive set. In
general, the two metrics are related, with uplift being more sensitive to variations
in coverage when the two groups have di erent size.
(7)
3</p>
    </sec>
    <sec id="sec-2">
      <title>SAYL-ROC</title>
      <p>
        SAYL [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a Statistical Relational Learner based on SAYU [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that integrates
uplift modeling with the search for relational rules. Similar to SAYU, every
valid rule generated is used to construct a Bayesian network (alongside with
current theory rules) via propositionalization, but instead of constructing a single
classi er, SAYL constructs two TAN [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] classi ers; one Bayes net for each of the
subject and control groups. Both classi ers use the same set of attributes, but
are trained only on examples from their respective groups. SAYL uses the TAN
generated probabilities to construct the lift and uplift curves. If a rule improves
AUU by threshold , the rule is added to the attribute set. Otherwise, SAYL
continues the search.
      </p>
      <p>Algorithm 1 SAYL</p>
      <p>Rs fg; M0s; M0c InitClassif iers(Rs)
while DoSearch() do
es+ RandomSeed();
?es+ saturate(e);
while c reduce(?es+ ) do</p>
      <p>M s; M c LearnClassif iers(Rs [ fcg);
if Better(M s; M c; M0s; M0c) then</p>
      <p>Rs Rs [ fcg; M0s; M0c M s; M c;
break
end if
end while
end while</p>
      <p>
        The SAYL algorithm is shown as Algorithm 1. SAYL maintains a current set
of clauses, Rs, and current reference classi ers for the subjects M s and controls
M c. SAYL requires separate training and tuning sets, accepting a rule only
when it improves the score on both sets. This requirement is extended with the
threshold of improvement , and a minimal rule coverage requirement minpos.
Finally, SAYL has two search modes, greedy and exploration. Refer to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for
details.
      </p>
      <p>SAYL guides the rule search by using the AUU score. It computes AUU by
computing AUL for each of the groups using the two classi ers, and returning the
di erence (AU L) (Equation 4). We implement SAYL-ROC, a SAYL variant
that computes AUR instead for each of the groups using the two classi ers, and
returns (AU R) as a rule score to guide the search. SAYL thus optimizes for
(AU L), while SAYL-ROC optimizes for (AU R).</p>
      <p>AU Rs
AU Rc</p>
      <p>AU Rs &gt; 0:86:</p>
      <p>AU Rc
4</p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Results</title>
      <p>
        We test SAYL-ROC on a breast cancer mammography dataset, fully described
in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Our subject and control sets are respectively older and younger patients
with con rmed breast cancer. Positive instances have in situ cancer, and negative
instances have invasive cancer. The aim is to maximize the in situ cases' uplift.
      </p>
      <p>The older cohort has 132 in situ and 401 invasive cases, while the younger
132
one has 110 in situ and 264 invasive. The skews are Ps = 132, s = 132+401
110
(older), and Pc = 110, c = 110+264 (younger). Thus equation 7 becomes:
(8)</p>
      <p>We use 10-fold cross-validation, making sure all records pertaining to the
same patient are in the same fold. We run both SAYL and SAYL-ROC with a
time limit of one hour per fold. For each cross-validated run, we use 4 training,
5 tuning and 1 testing folds. For each fold, we used the best combination of
parameters according to a 9-fold internal cross-validation using 4 training, 4
tuning and 1 testing folds. We try both search modes, vary minpos between 7
and 13 (respectively 5% and 10% of older in situ examples), and set to 1%,
5% and 10%. We evaluate the nal SAYL and SAYL-ROC models using their
nal uplift curves, concatenated from the results of each testing set.</p>
      <p>
        Table 1 compares SAYL-ROC, SAYL, and the ILP-based methods Di
erential Prediction Search (DPS) and Model Filtering (MF) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], both of which had
minpos = 13 (10% of older in situ). A baseline random classi er achieves an
AUU of 11. We use the paired Mann-Whitney test at the 95% con dence level
to compare two sets of experiments. We plot the uplift curves in Figure 1.
AU Lc
SAYL and SAYL-ROC signi cantly outperform previous methods (Table 1,
Figure 1), but there is no signi cant di erence between the two. Even though
SAYLROC is optimizing for (AU R) during its training phase, it returns a slightly
better testing (AU L) than SAYL, which optimizes for (AU L).
      </p>
      <p>This result suggests that, on a moderately subject/control skewed data, AUR
can indeed be used for uplift modeling. ROC is more frequently used than lift,
and may be more intuitive in many domains. Nevertheless, more experiments
are needed to establish ROC-based uplift performance. We plan on measuring
(AU L) vs. (AU R) for various Equation 7 skews.</p>
      <p>
        SAYL-ROC produces as many rules as ILP-based methods, more than twice
that of SAYL. The ILP theory is a collection of independent rules that each
individually increases uplift [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It is thus easy to interpret the nal model. SAYL and
SAYL-ROC theory rules are conditioned on each other as nodes in a Bayesian
network, decreasing rule interpretability especially in larger graphs. Individual
rules may not increase uplift, but the nal network does. At an average of 9:3
rules, a SAYL model is interpretable, whereas at 24:7, SAYL-ROC sacri ces
interpretability.
      </p>
      <p>We note that Equation 7 depends on both the positive number and skew.
Even if the subject and control positive skews were equal, say Pc = 100, Nc =
200, Ps = 10 and Ns = 20, we will have 11 s
c = 1 but PPsc = 10, maintaining a
subject/control Equation 7 skew.</p>
      <p>This work uses the de nition of lift as the number of positives amongst the
highest ranking examples. An alternative lift de nition is the fraction of positives
amongst the -highest ranking examples. Equation 7 then becomes:
(9)</p>
      <p>AU Rs &gt; 1
AU Rc 1
eliminating the dependence on the number of positive instances. We plan on
investigating how (AU L) and (AU R) empirically relate under this de nition.</p>
      <p>In conclusion, SAYL-ROC exhibits a similar performance to SAYL on our
data, suggesting that ROC can be used for uplift modeling. SAYL-ROC returns
larger models, reducing interpretability. More experiments are needed to test
ROC-based uplift over di erent subject/control skews.</p>
      <p>Acknowledgments We thank NIH grant R01-CA165229, the Carbone Cancer
Center, and NCI grant P30CA014520 for support.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burnside</surname>
            ,
            <given-names>E.S.</given-names>
          </string-name>
          , de Castro Dutra,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Page</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Santos</given-names>
            <surname>Costa</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          :
          <article-title>An integrated approach to learning Bayesian Networks of rules</article-title>
          .
          <source>In: Proceedings of the 16th European Conference on Machine Learning</source>
          . pp.
          <volume>84</volume>
          {
          <fpage>95</fpage>
          .
          <string-name>
            <surname>Porto</surname>
          </string-name>
          ,
          <string-name>
            <surname>Portugal</surname>
          </string-name>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geiger</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldszmidt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Bayesian network classi ers</article-title>
          .
          <source>Machine Learning</source>
          <volume>29</volume>
          ,
          <volume>131</volume>
          {
          <fpage>163</fpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hansotia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rukstales</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Incremental value modeling</article-title>
          .
          <source>Journal of Interactive Marketing</source>
          <volume>16</volume>
          (
          <issue>3</issue>
          ),
          <volume>35</volume>
          {
          <fpage>46</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lo</surname>
            ,
            <given-names>V.S.:</given-names>
          </string-name>
          <article-title>The true lift model - a novel data mining approach to response modeling in database marketing</article-title>
          .
          <source>SIGKDD Explorations</source>
          <volume>4</volume>
          (
          <issue>2</issue>
          ),
          <volume>78</volume>
          {
          <fpage>86</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Nassif</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Page</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ayvaci</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shavlik</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burnside</surname>
            ,
            <given-names>E.S.:</given-names>
          </string-name>
          <article-title>Uncovering age-speci c invasive and DCIS breast cancer rules using Inductive Logic Programming</article-title>
          .
          <source>In: ACM International Health Informatics Symposium (IHI)</source>
          . pp.
          <volume>76</volume>
          {
          <fpage>82</fpage>
          .
          <string-name>
            <surname>Arlington</surname>
            ,
            <given-names>VA</given-names>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Nassif</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuusisto</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burnside</surname>
            ,
            <given-names>E.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Page</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shavlik</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Santos</given-names>
            <surname>Costa</surname>
          </string-name>
          , V.:
          <article-title>Score as you lift (SAYL): A statistical relational learning approach to uplift modeling</article-title>
          .
          <source>In: European Conference on Machine Learning (ECML-PKDD)</source>
          . pp.
          <volume>595</volume>
          {
          <fpage>611</fpage>
          .
          <string-name>
            <surname>Prague</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Nassif</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Santos</given-names>
            <surname>Costa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Burnside</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.S.</given-names>
            ,
            <surname>Page</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>Relational di erential prediction</article-title>
          .
          <source>In: European Conference on Machine Learning (ECML-PKDD)</source>
          . pp.
          <volume>617</volume>
          {
          <fpage>632</fpage>
          .
          <string-name>
            <surname>Bristol</surname>
          </string-name>
          , UK (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Radcli e,
          <string-name>
            <given-names>N.J.</given-names>
            ,
            <surname>Surry</surname>
          </string-name>
          , P.D.:
          <article-title>Di erential response analysis: Modeling true response by isolating the e ect of a single action</article-title>
          .
          <source>In: Credit Scoring and Credit Control VI. Edinburgh</source>
          ,
          <string-name>
            <surname>Scotland</surname>
          </string-name>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Radcli e,
          <string-name>
            <given-names>N.J.</given-names>
            ,
            <surname>Surry</surname>
          </string-name>
          , P.D.:
          <article-title>Real-world uplift modelling with signi cance-based uplift trees</article-title>
          .
          <source>White Paper TR-2011-1</source>
          ,
          <string-name>
            <given-names>Stochastic</given-names>
            <surname>Solutions</surname>
          </string-name>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Rzepakowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaroszewicz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Decision trees for uplift modeling with single and multiple treatments</article-title>
          .
          <source>Knowledge and Information Systems</source>
          <volume>32</volume>
          ,
          <fpage>303</fpage>
          {
          <fpage>327</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <article-title>Tu ery, S.: Data Mining and Statistics for Decision Making</article-title>
          . John Wiley &amp; Sons, 2nd edn. (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>