<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Adaptive Incremental Learning for Statistical Relational Models Using Gradient-Based Boosting</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yulong Gu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Missier</string-name>
          <email>paolo.missier@newcastle.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computing Science, Newcastle University Newcastle upon Tyne</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>4</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>We consider the problem of incrementally learning models from relational data. Most existing learning methods for statistical relational models use batch learning, which becomes computationally expensive and eventually infeasible for large datasets. The majority of the previous work in relational incremental learning assumes the model's structure is given and only the model's parameters needed to be learned. In this paper, we propose algorithms that can incrementally learn the model's parameters and structure simultaneously. These algorithms are based on the successful formalisation of the relational functional gradient boosting system (RFGB), and extend the classical propositional ensemble methods to relational learning for handling evolving data streams.</p>
      </abstract>
      <kwd-group>
        <kwd>Gradient-Based Boosting</kwd>
        <kwd>Incremental Learning</kwd>
        <kwd>Hoe ding Bound</kwd>
        <kwd>Ensemble Methods</kwd>
        <kwd>Statistical Relational Learning</kwd>
        <kwd>Relational Regression Tree</kwd>
        <kwd>Concept Drift</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Copyright c by the paper's authors. Copying permitted for private and academic purposes.
already been extensions that enable the RFGB system to handle imbalanced data with Soft Margin [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and
incomplete data with Structural EM [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In this paper, inspired by HTILDE-RT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and DWM [6], we have
further developed the RFGB system for incremental learning by upgrading the RRT used in the RFGB system
with the use of Hoe ding bound and the concept adaptation strategy that is successfully employed in the CVFDT
[7], and incorporating the upgraded RRT with ensemble methods through a Hoe ding-based stability metric to
cope with concept drift.
      </p>
      <p>In this work, Hoe ding Relational Regression Tree (HRRT), Relational Incremental Boosting (RIB) and
Relational Boosted Forest (RBF) are introduced for adaptive incremental learning in SRL setting. This paper is
organized as follows: in Section 2, the background including RFGB and CVFDT are reviewed; in Section 3, the
rule stability, HRRT, RIB, and RBF are explained; in Section 4, conclusions and future work are provided.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <sec id="sec-2-1">
        <title>Relational Functional Gradient Boosting</title>
        <p>
          Friedman [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] proposed a boosting approach where functional gradients are computed for each example over the
objective function. The RFGB [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] considers the objective function as a sigmoid function (Equation 2). These
gradients correspond to the di erence between the true label and predicted probability for an example and are
then used to generate a regression dataset. In each iteration, a RRT i is learned to t to these gradients and
added into the potential function m:
        </p>
        <p>This approach learns a boosted probabilistic model that corresponds to the local conditional probability
distribution in some SRL models.</p>
        <p>m =
0 +
1 + ::: +
The propositional incremental dceision tree learner VFDT [8] uses Hoe ding bounds to guarantee that its output
is asymptotically nearly identical to that of a batch learner. CVFDT [7] introduces a concept adaptation strategy
to VFDT for handling concept drift. CVFDT keeps the incrementally learned tree consistent with a sliding
window of examples. In this paper, we call the contents of a sliding window the window data. The concept
adaptation strategy is implemented as follows: The statistics of the window data is encoded into the su cient
statistics for nodes in the tree. These su cient statistics are then used to periodically check if the Hoe ding
bound holds at each node , and if not, the algorithm will create an alternate sub-tree for the node to compete
with the failed sub-tree in terms of prediction accuracy. In the case where the alternate sub-tree beats the failed
one, the failed sub-tree and the old con icting rules encoded in it will be discarded and replaced by the alternate
sub-tree. The disadvantage of eliminating old con icting rules entirely is that the model will only support current
window data and usually result in larger prediction variance.</p>
        <p>In the following section, we will introduce our algorithms using ensemble methods to handle concept drift by
allowing the coexistence of con icting rules in relational setting.
In this paper, we qualify a tree as stable with the following considerations. The tree is highly consistent with
the window data, and the rules encoded in the tree are real rules that can, with high con dence, interpret the
distribution associated with the data generator. We will boost a tree when it is stable so that the objective
function (equation 2) is best optimized for the current window data and the newly found stable rules can be
solidi ed and transformed into established rules. On the other hand, a stable tree is highly resistant to concept
drift, as the newly found stable rules will require many counter-examples to be invalidated. Inspired by DWM
[6] in which the ensemble methods are proven e cient by extensive experiments for adapting drifting concepts,
we propose to employ ensemble methods to tackle the concept drift problem in the relational setting.</p>
        <p>In the presence of concept drift, we want to learn how stable the rules that the tree has learned so far
are, as an indicator of the stability of a tree. We introduce a metric that measures the stability of a tree as follows.
De nition 1. De ne the Rule Stability of a model as the size of the smallest change in sample D that
may cause rule r0 to become superior to r. This is as shown in equation 3, where D0 is D after change.</p>
        <p>Learner : (Dif f (D; D0) = n; r) ! r0
(3)</p>
        <p>According to CVFDT [7], to guarantee that the rule learned from a data sample is the real rule r from
the corresponding population with desired con dence 1 , the condition GXa;Xb &gt; must be met, where</p>
        <p>GXa;Xb = G(Xa) G(Xb), G(Xi) is the average of results from splitting function G(Xi), Xa and Xb is the
working and second best test respectively at a node, and is a boundary calculated based on the size of sample,
and choice of splitting criterion and . As the streaming process continues, a con icting rule r0 of rule r might
be introduced. To adapt r0, the condition GXa;Xb &lt; must be met to trigger the concept adaptation strategy.
Therefore, with con dence 1 , the size of the smallest change that may cause r0 to become superior to r in
the CVFDT context is T olerance = GXa;Xb . According to De nition 1, the T olerence measures the rule
stability of an inner node. The T reeT ol which is the sum of the T olerance of every inner node in a tree denotes
the stability of the tree.
3.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Hoe ding Relational Regression Tree</title>
        <p>
          Our HRRT algorithm upgrades the RRT learner used in RFGB by incorporating methods from CVFDT [7].
It has a test search space including conjunctions of recursive and aggregated predicates by using re nement
operator with -subsumption and aggregate condition [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Most of the algorithm is similar to CVFDT except
that HRRT is using learning from interpertation setting, for detailed explanation please refer to CVFDT and
HTILDE-RT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. We implement the T reeT ol in StabilityCheck function with a threshold de ned by user to
periodically check whether the tree quali es as stable in the sense that the tree is stable when T reeT ol is greater
than the threshold. HRRT and StabilityCheck are the building blocks for the following algorithms.
3.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Relational Incremental Boosting</title>
        <p>As shown in Algorithm 1, the initial tree is incrementally learned by HRRT and will be boosted when it has
satis ed the StabilityCheck (line 9). The empty tree will then be trained to t on the functional gradients
of the boosted tree (lines 5-7) for example in the window data. In the presence of concept drift, the error
introduced by the old con icting rules in will be corrected by . When has also met the StabilityCheck and
the execution signal S is true (line 6), the algorithm adds to , and then boosts their sum in that itself is
just a functional gradient tree (FGT) of , only the composite of them can represent the newly found stable
rules.</p>
        <p>Algorithm 1 Relational Incremental Boosting
1: procedure RIB(DataStream; p)
2: Initialize empty tree and
3: for each d in DataStream do
4: After every p examples do f ; Sg EvalCentre( ; d)
5: if :boosted then HRRT ( ; GradExpGen( ; d))
6: if StabilityCheck( ) and S then Boosting( + ) and reset
7: end if
8: else HRRT ( ; d)
9: if StabilityCheck( ) then Boosting( ), :boosted T rue
10: end if
11: end if
12: end for
13: return ( + )
14: end procedure</p>
        <p>The execution signal S is set by the EvalCentre (line 4) that handles two crucial issues with RIB and the
following RBF. One issue is that the complexity of the resulting model grows with the increasing size of learned
data. Inspired by DWM [6], we introduce an evaluation centre that periodically evaluates the contribution to
error of each FGT in the resulting model, and discards the FGTs that perform poorly over time. In such a way,
the model's complexity decreases at the expense of its integrity. Another issue is that if the concept never drifts,
new boosted trees will be added to the model constantly but provide trivial improvement in performance. In this
case, the EvalCentre evaluates the global performance of the model and stops executing boosting by setting the
signal S to false when the performance reaches a pre-de ned threshold, indicating that strong consistency of the
model to the window data is achieved.
3.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Relational Boosted Forest</title>
        <p>In this RBF explanation, the model makes predictions based on the weighted average of regression values of
trees in the forest with normalized weights. Other prediction models can be applied depending on the scenario.
As shown in Algorithm 2, the forest and weights are initialized along with an empty tree and its associated
weight w set to 1 (line 2). When has passed the StabilityCheck and the execution signal S is true, the boosted
and its weight w will be added to the forest and weights respectively (lines 7-8). When a boosted tree makes
a mistake in a predictive attempt, the EvalCentre in RBF will decrease its weight. The forest is in such a way
recursively populated. Each boosted tree contains established rules that co-exist in the forest, and the weights
are dynamically tuned to adapt the window data. In response to the complexity problem, the poorly performing
trees with a weight less than a pre-de ned threshold will be removed from the forest. The signal S is set in the
same way as RIB to handle the no drifting concept issue.</p>
        <p>Algorithm 2 Relational Boosted Forest
1: procedure RBF(DataStream; p)
2: Initialize empty F orest &amp; W eights, empty tree and w
3: for each d in DataStream do
4: After every p examples do
5: fF orest; W eights; Sg
6: HRRT ( ; d)
7: if StabilityCheck( ) and S then Boosting( )
8: Add to F orest, w to W eights and reset ; w
9: end if
10: end for
11: return fF orest; W eigthsg
12: end procedure
1</p>
        <p>1</p>
        <p>EvalCentre(F orest; W eights; d)
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusions and Future Work</title>
      <p>
        In this paper, we have introduced three adaptive incremental learning algorithms: the HRRT, RIB and RBF. All
these algorithms can incrementally and adaptively learn the parameters and structure simultaneously for SRL
models such as the Relational Dependency Network (RDNs) and the Markov Logic Network (MLNs). The RIB
and RBF extend the classical ensemble methods for the rst time to relational scenarios for handling drifting
concepts. As they are developed in the RFGB system, RIB and RBF can be naturally integrated with algorithms
designed for RFGB system such as the Soft Margin [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and the Structural EM [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for modelling imbalanced and
incomplete data in an incremental fashion.
      </p>
      <p>In future work, we will evaluate these algorithms on some of the standard SRL benchmark datasets, and
compare their performance against each other and with other state-of-the-art online structure learning algorithms.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Natarajan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khot</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kersting</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutmann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shavlik</surname>
          </string-name>
          , J.:
          <article-title>Gradient-based boosting for statistical relational learning: The relational dependency network case</article-title>
          .
          <source>Mach. Learn</source>
          .
          <volume>86</volume>
          ,
          <fpage>25</fpage>
          -
          <lpage>56</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          :
          <article-title>Greedy function approximation: A gradient boosting machine</article-title>
          . Ann. Stat.
          <volume>29</volume>
          ,
          <fpage>1189</fpage>
          -
          <lpage>1232</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khot</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kersting</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kunapuli</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hauser</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Natarajan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Learning from Imbalanced Data in Relational Domains: A Soft Margin Approach</article-title>
          .
          <source>Proc. - IEEE Int. Conf. Data Mining, ICDM. 2015-Janua</source>
          ,
          <fpage>1085</fpage>
          -
          <lpage>1090</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Khot</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Natarajan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kersting</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shavlik</surname>
          </string-name>
          , J.:
          <article-title>Gradient-based boosting for statistical relational learning: the Markov logic network and missing data cases</article-title>
          .
          <source>Mach. Learn</source>
          .
          <volume>100</volume>
          ,
          <fpage>75</fpage>
          -
          <lpage>100</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Menezes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaverucha</surname>
          </string-name>
          , G.:
          <article-title>HTILDE-RT: Scaling up relational regression trees for very large datasets</article-title>
          .
          <source>Inductive Log. Program.</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>