<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Nature Machine Intelligence</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1038/s42256</article-id>
      <title-group>
        <article-title>Feature Importance and Efects</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maximilian Muschalik</string-name>
          <email>Maximilian.Muschalik@lmu.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabian Fumagalli</string-name>
          <email>ffumagalli@techfak.uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barbara Hammer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eyke Hüllermeier</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Munich</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Explainable Artificial Intelligence, Interpretable Machine Learning, Online Learning, Concept Drift</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bielefeld University</institution>
          ,
          <addr-line>D-33619 Bielefeld</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LMU Munich</institution>
          ,
          <addr-line>D-80539 Munich</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>2</volume>
      <issue>2020</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In dynamic machine learning environments, where data streams continuously evolve, traditional explanation methods struggle to remain faithful to the underlying model or data distribution. Therefore, this work presents a unified framework for eficiently computing incremental model-agnostic global explanations tailored for time-dependent models. By extending static model-agnostic methods such as Permutation Feature Importance, SAGE, and Partial Dependence Plots into the online learning context, the proposed framework enables the continuous updating of explanations as new data becomes available. These incremental variants ensure that global explanations remain relevant while minimizing computational overhead. The framework also addresses key challenges related to data distribution maintenance and perturbation generation in online learning, ofering time and memory eficient solutions like geometric reservoir-based sampling for data replacement.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In applied machine learning, data often evolves over time, which necessitates changes to prediction
models. Ensuring the reliability of such time-dependent models is increasingly important in high-stake
applications, such as financial services [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], sensor [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] and network [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] analysis. In recent years,
eXplainable Artificial Intelligence (XAI) has targeted such time-dependent explanations of predictions
that react to changes in the underlying data distributions and prediction models [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In extreme cases,
where data is observed sequentially over time from a data stream, models are updated incrementally
with each now observation, known as online learning or incremental learning [6]. In this context,
re-computing XAI methods from scratch can become computationally infeasible, where incremental
variants have been proposed [7, 8, 9, 10, 11].
      </p>
      <p>In this work, we present a unified framework that allows to eficiently compute incremental variants of
model-agnostic global explanations (MAGEs). We demonstrate that existing incremental XAI techniques
are summarized in the incremental MAGE framework. Furthermore, static MAGEs cover a wide range
of existing model-agnostic XAI methods, including Shapley interactions [12], which expand the range
of eficient incremantal XAI techniques for interpretability of black-box online learning models.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>We first introduce background on model-agnostic global explanations (Section 2.1), as well as online
learning from data streams (Section 2.2). We consider a trained black-box model  ∶  → 
with
input domain  equipped with a  -dimensional feature representation  = {1, … , }
, e.g.  = ℝ  , and</p>
      <p>CEUR</p>
      <p>ceur-ws.org
output domain  . We do do not make any further assumption on the model architecture and instead
only allow access to the model by predicting instances. This is known as model-agnostic explanations
[13].</p>
      <sec id="sec-2-1">
        <title>2.1. Model-Agnostic Global Explanations</title>
        <p>A global explanation of a black-box model considers the behavior of  across a whole labeled dataset
(  ,   ) ∈  ×  with  = 1, … ,  . Global feature importance (FI) is an instance of global explanations that
outputs an importance score  FI ∶  → ℝ for every feature  ∈  [14]. Global FI measures a change in
a model’s performance, if the model’s access to this feature’s information is restricted. Permutation FI
(PFI) [15, 16] is computed by permuting the values of the target feature and measuring the change in
performance across a dataset. By permuting the feature’s value, the model’s access to this information
is limited and, thus, PFI yields an eficient way to compute global FI. However, a feature’s information
provided to the model’s performance might strongly depend on other features. Therefore, perturbing a
single feature’s value in the presence of all remaining features is a limitation of PFI. Shapley additive
global importance (SAGE) [14, 17] accounts for this limitation by computing the increase in loss across
sampled permutations  ∶  →  over the feature space. For such a permutation  , each feature
 ∈  appears at a certain position. SAGE proposes to measure the average increase in loss for the
preceding features of  with and without  . By sampling over several permutations, an approximation of
the Shapley Value (SV) [18] is obtained, a concept from cooperative game theory that guarantees that
the SAGE values fairly decompose the overall loss. While global FI quantifies the impact of individual
features, it is limited in its expressivity.</p>
        <p>To understand Feature Efects (FE), Partial Dependence Plots (PDPs) [ 19] visualize the efect of
imputing a specified feature’s value cross all observations and compute the average prediction of all
observations, when this feature’s value is set. The PDP visualizes this average prediction across a
range of diferent values, which allows to globally interpret the efect of changing this feature’s value
on average [20]. Besides PDP, there exist other FE methods [20, 21, 22] with extensions to regional
explanations [20, 23]. Another way of quantifying FEs is by using interaction indices that distribute
contributions to all individuals and groups of features up to a maximum group size  . In recent work,
several Shapley-based interaction indices have been proposed [24, 25, 26] as well as their eficient
computation in a model-agnostic setting [12, 24, 25, 27, 28, 29]. Model-agnostic global explanations
were widely applied in static environments [30], however, in practice, data is often of dynamic nature,
where explanations become outdated when models are adapted over time.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Online Learning From Data Streams</title>
        <p>
          In many real-world applications [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] data is observed sequentially over time. In an extreme setting, we
observe a data stream ( 0,  0), … , (  ,   ), where at time  the data point (  ,   ) is observed. The goal of
online learning [6] is to train a time-dependent model   by using the current observation (  ,   ) once to
obtain an updated model  +1 , i.e.
        </p>
        <p>IncrementalUpdate(  ,   ,   ) ⟶  +1 .</p>
        <p>
          Prominent instances of online learning algorithms include Hoefding adaptive trees [ 31] and adaptive
random forests [32], where splits and tree-structures are replaced, if they become outdated. Other
training schemes, such stochastic gradient descent, inherently allow for incremental updates [6]. Online
learning is especially important, if the underlying data distribution changes over time. This phenomenon
is known as concept drift and occurs in many applications [ 33]. Detecting concept drift and reacting
adequately by updating the model accordingly is one of the major applications of incremental learning
[6]. A common approach to detect concept drift is via accuracy-based drift detectors, where a sudden
change in accuracy of the model indicates a change of distributions [33]. Recently, it was proposed
to enhance such detection schemes using global FI methods [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. However, the computation of such
methods is a challenging problem that has been mainly considered in static scenarios.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. A Unified Framework for Explaining Change in Models and Data</title>
      <p>We now present unified framework that allows to eficiently explore incremental variants of
modelagnostic global explanations in an online learning setting. In a static setting, a global explanation
is typically computed for individual features (global FI) or groups of features (global FE), which we
summarizes in the following definition.</p>
      <p>Definition 1 ( ℰ). Global explanations are computed for every element in the explanation domain ℰ,
which is a collection of features and interactions ℰ ⊆ 2 .</p>
      <p>Given an explanation domain, the explanation can be computed for each element in a static setting.</p>
      <p>1
 =1</p>
      <p>−1
1
 =0
  () ∶=
∑   (  ,   , , 
  , ) .</p>
      <p>Here,    , is a set of data points and   is a method-specific explanation function.
is visualized.</p>
      <p>Typically, the perturbation data  , is constructed by using a combination of the data point  and
another sampled data point  ̃, where the feature’s values from  and − ∶=  ∖ 
are taken from either
 or  ̃ [14]. Thereby, the sampling of  ̃ may be done dependently or independently of  . Instantiations
of static MAGEs include PFI [15] where ℰ contains individual features, and   measure the increase in
loss. Therein,    , includes a single data point constructed by the values of   for features in − and
the values of  from another data point obtained from the dataset using a permutation. SAGE [14] is
also covered in this framework by choosing   as the average over sampled permutations over  , as
described in Section 2.1. Lastly, PDPs [19] are contained in this framework, where   is chosen as the
prediction of a combination of   and  ∈̃</p>
      <p>, , where  , contains the data points for which the PDP</p>
      <p>Having established a unified view on static MAGEs, we now turn our focus to an online learning
setting as described in Section 2.2. Using the observed data points at time  a naive way to compute
MAGEs is via Definition 2 as
Definition 2 (Static MAGE).
a set of features  is</p>
      <p>A static Model-Agnostic Global Explanation (MAGE)   ∶ ℰ → ℝ for
  () ∶=
∑    (  ,   , , 
  , ) .</p>
      <p>(1)</p>
      <p>
        Re-computing Eq. 1 at every time step  is an exhaustive operative, since static MAGEs are already
time-consuming when computed once [14]. Moreover, Eq. 1 requires to store the full data stream,
which is typically considered infeasible. As a remedy, practitioners might restrict the computation to
a time window of fixed size [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, reducing the number of observations lowers the quality
of the explanation, which increases the variance. In the following, we propose a framework for an
incremental computation of   , similar to the incremental update of the model   . Our goal is to leverage
the previously calculated MAGE and update this explanation using the currently available datapoint, i.e.
      </p>
      <p>IncrementalUpdate(  ,   ,   ,   ) ⟶  +1
By introducing a smoothing parameter 0 &lt;  &lt; 1 , we define the incremental MAGE.
Definition 3 (Incremental MAGE). Let 0 &lt;  &lt; 0 . We define an incremental MAGE as
  () ∶= (1 − ) ⋅  −1 () +  ⋅    (  ,   , , 
  , ) .</p>
      <p>The incremental MAGE computes a single term of the sum in Eq. 1 at each time step and exploits the
previously computed MAGE values. This drastically reduces the computational complexity, which is at
time  equal to computing MAGE once with Eq. 1. However, the incremental MAGE allows to obtain  
for every time step  without sacrificing computational resources. Incremental variants of PFI [ 7] and
SAGE [8], as well as PDP [9] have been recently proposed. They can be viewed as an instantiation of
incremental MAGEs.</p>
      <p>A major challenge in computing incremental MAGEs is the maintenance of the perturbation dataset
 , over time, i.e. eficiently constructing perturbed data points that adhere to the data distribution.
Reservoir sampling [34] has been adapted to eficiently store the data distribution with minimum
resources [7]. Geometric sampling [7] proposes to store a reservoir of fixed lengths, where data points
are replaced over time and more recent observations have a higher probability to be present in the
reservoir compared to older observations. This mechanism allows to maintain a time-dependent
marginal data distribution with limited resources. More advanced techniques maintain conditional
distributions using online decision trees and allow for conditional sampling as required for instance in
conditional SAGE [8]. It has been shown that both sampling techniques yield substantially diferent
explanations [35]. Geometric sampling with marginal distributions highlights the structure of the
model, whereas observational approaches via conditional sampling include the data distribution in the
explanation [35].</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>We summarized popular model-agnostic global explanation techniques, such as FI-based PFI and SAGE,
as well as FE-based PDPs, into the MAGE framework for static learning environments. We then proposed
the incremental MAGE framework to directly compute these explanations for online learning on data
streams. Incremental MAGE allows to incrementally update previous estimates of MAGEs at each time
step using minimal resources. We have shown that incremental variants, such as iPFP, iSAGE and iPDP
can be summarized in the incremental MAGE framework. Incremental MAGE ofers opportunities to
expand the range of incremental variants of MAGE-techniques. For instance, recently proposed methods
to estimate Shapley interactions [12, 25, 29] may be placed in the incremental MAGE framework to
discover for complex interactions beyond isolated FE. Moreover, with increasing variety of explanations
using diferent complexity levels, human-centered presentations and visualizations are important future
work.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments References</title>
      <p>We gratefully acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research
Foundation): TRR 318/1 2021 – 438445824.
[6] V. Losing, B. Hammer, H. Wersing, Incremental On-line Learning: A Review and Comparison
of State of the Art Algorithms, Neurocomputing 275 (2018) 1261–1274. doi:10.1016/j.neucom.
2017.06.084.
[7] F. Fumagalli, M. Muschalik, E. Hüllermeier, B. Hammer, Incremental permutation feature
importance (ipfi): towards online explanations on data streams, Mach. Learn. 112 (2023) 4863–4903.</p>
      <p>URL: https://doi.org/10.1007/s10994-023-06385-y. doi:10.1007/S10994- 023- 06385- Y.
[8] M. Muschalik, F. Fumagalli, B. Hammer, E. Hüllermeier, isage: An incremental version of SAGE
for online explanation on data streams, in: D. Koutra, C. Plant, M. G. Rodriguez, E. Baralis,
F. Bonchi (Eds.), Machine Learning and Knowledge Discovery in Databases: Research Track
- European Conference, ECML PKDD 2023, Turin, Italy, September 18-22, 2023, Proceedings,
Part III, volume 14171 of Lecture Notes in Computer Science, Springer, 2023, pp. 428–445. URL:
https://doi.org/10.1007/978-3-031-43418-1_26. doi:10.1007/978- 3- 031- 43418- 1\_26.
[9] M. Muschalik, F. Fumagalli, R. Jagtani, B. Hammer, E. Hüllermeier, ipdp: On partial dependence
plots in dynamic modeling scenarios, in: L. Longo (Ed.), Explainable Artificial Intelligence
First World Conference, xAI 2023, Lisbon, Portugal, July 26-28, 2023, Proceedings, Part I, volume
1901 of Communications in Computer and Information Science, Springer, 2023, pp. 177–194. URL:
https://doi.org/10.1007/978-3-031-44064-9_11. doi:10.1007/978- 3- 031- 44064- 9\_11.
[10] A. P. Cassidy, F. A. Deviney, Calculating feature importance in data streams with concept drift
using online random forest, in: 2014 IEEE International Conference on Big Data (Big Data 2014),
2014, pp. 23–28. doi:10.1109/BigData.2014.7004352.
[11] H. M. Gomes, R. F. d. Mello, B. Pfahringer, A. Bifet, Feature scoring using tree-based ensembles for
evolving data streams, in: 2019 IEEE International Conference on Big Data (Big Data 2019), 2019,
p. 761–769.
[12] F. Fumagalli, M. Muschalik, P. Kolpaczki, E. Hüllermeier, B. E. Hammer, SHAP-IQ: Unified
approximation of any-order shapley interactions, in: Thirty-seventh Conference on Neural
Information Processing Systems (NeurIPS 2023), 2023.
[13] A. Adadi, M. Berrada, Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence
(XAI), IEEE Access 6 (2018) 52138–52160. doi:10.1109/ACCESS.2018.2870052.
[14] I. Covert, S. M. Lundberg, S.-I. Lee, Understanding Global Feature Contributions With Additive
Importance Measures, in: Proceedings of International Conference on Neural Information Processing
Systems (NeurIPS 2020), 2020, p. 17212–17223.
[15] L. Breiman, Random Forests, Machine Learning 45 (2001) 5–32.
[16] A. Fisher, C. Rudin, F. Dominici, All Models are Wrong, but Many are Useful: Learning a Variable’s
Importance by Studying an Entire Class of Prediction Models Simultaneously, Journal of Machine
Learning Research 20 (2019) 1–81.
[17] G. Casalicchio, C. Molnar, B. Bischl, Visualizing the Feature Importance for Black Box Models,
volume 11051 of Lecture Notes in Computer Science, Springer International Publishing, Cham, 2019,
p. 655–670. doi:10.1007/978- 3- 030- 10925- 7\_40.
[18] L. S. Shapley, A Value for n-Person Games, in: Contributions to the Theory of Games (AM-28),</p>
      <p>Volume II, Princeton University Press, New Jersey, USA, 1953, pp. 307–318.
[19] J. H. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of</p>
      <p>Statistics 29 (2001) 1189–1232. URL: http://www.jstor.org/stable/2699986.
[20] J. Herbinger, B. Bischl, G. Casalicchio, REPID: regional efect plots with implicit
interaction detection, in: G. Camps-Valls, F. J. R. Ruiz, I. Valera (Eds.), International Conference
on Artificial Intelligence and Statistics, AISTATS 2022, 28-30 March 2022, Virtual Event,
volume 151 of Proceedings of Machine Learning Research, PMLR, 2022, pp. 10209–10233. URL:
https://proceedings.mlr.press/v151/herbinger22a.html.
[21] D. W. Apley, J. Zhu, Visualizing the efects of predictor variables in black box supervised learning
models, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (2016). URL:
https://api.semanticscholar.org/CorpusID:88522102.
[22] S. M. Lundberg, G. G. Erion, H. Chen, A. J. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb,
N. Bansal, S. Lee, From local explanations to global understanding with explainable AI for trees,</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Clements</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yousefi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Efimov</surname>
          </string-name>
          ,
          <article-title>Sequential Deep Learning for Credit Risk Monitoring with Tabular Financial Data</article-title>
          , CoRR abs/
          <year>2012</year>
          .15330 (
          <year>2020</year>
          ). arXiv:
          <year>2012</year>
          .15330.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bahri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bifet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Gomes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Maniu</surname>
          </string-name>
          ,
          <article-title>Data stream analysis: Foundations, major tasks and tools</article-title>
          ,
          <source>Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</source>
          <volume>11</volume>
          (
          <year>2021</year>
          )
          <article-title>e1405</article-title>
          . doi:
          <volume>10</volume>
          .1002/widm.1405.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Davari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Veloso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Pereira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gama</surname>
          </string-name>
          ,
          <article-title>Predictive maintenance based on anomaly detection using deep learning for air production unit in the railway industry</article-title>
          ,
          <source>in: 8th IEEE International Conference on Data Science and Advanced Analytics (DSAA</source>
          <year>2021</year>
          ), IEEE,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . doi:
          <volume>10</volume>
          .1109/DSAA53316.
          <year>2021</year>
          .
          <volume>9564181</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. G.</given-names>
            <surname>Atli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jung</surname>
          </string-name>
          ,
          <article-title>Online Feature Ranking for Intrusion Detection Systems</article-title>
          , CoRR abs/
          <year>1803</year>
          .00530 (
          <year>2018</year>
          ). arXiv:
          <year>1803</year>
          .00530.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Muschalik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fumagalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hammer</surname>
          </string-name>
          , E. Hüllermeier,
          <article-title>Agnostic explanation of model change based on feature importance</article-title>
          ,
          <source>Künstliche Intell</source>
          .
          <volume>36</volume>
          (
          <year>2022</year>
          )
          <fpage>211</fpage>
          -
          <lpage>224</lpage>
          . URL: https://doi.org/10.1007/ s13218-022-00766-6. doi:
          <volume>10</volume>
          .1007/S13218- 022- 00766- 6.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>