<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SHAP-Guided Regularization in Machine Learning Models⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amal Saadallah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lamarr Institute for Machine Learning and AI</institution>
          ,
          <addr-line>Dortmund</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Feature attribution methods such as SHapley Additive exPlanations (SHAP) have become instrumental in understanding machine learning models, but their role in guiding model optimization remains underexplored. In this paper, we propose a SHAP-guided regularization framework that incorporates feature importance constraints into model training to enhance both predictive performance and interpretability. Our approach applies entropy-based penalties to encourage sparse, concentrated feature attributions while promoting stability across samples. The framework is applicable to both regression and classification tasks. Our first exploration started with investigating a tree-based model regularization using TreeSHAP. Through extensive experiments on benchmark regression and classification datasets, we demonstrate that our method improves generalization performance while ensuring robust and interpretable feature attributions. The proposed technique ofers a novel, explainability-driven regularization approach, making machine learning models both more accurate and more reliable.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;SHapley Additive exPlanations (SHAP)</kwd>
        <kwd>Regularization</kwd>
        <kwd>Tree-based Models</kwd>
        <kwd>Explainability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        As machine learning models become increasingly complex, their interpretability and robustness are
critical concerns across various domains, from finance and healthcare to autonomous systems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
While deep learning and gradient-boosted trees have shown remarkable predictive power, their
blackbox nature makes them dificult to trust in high-stakes applications. To address this, explainability
techniques such as SHapley Additive exPlanations (SHAP) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] have been widely adopted to quantify
feature importance, ofering insights into model decisions. However, while SHAP values help interpret
trained models [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], they are rarely incorporated directly into the training process to improve model
behavior.
      </p>
      <p>In this work, we introduce SHAP-guided regularization, a novel approach that integrates feature
importance constraints into model optimization. Our method introduces two key regularization terms.
The first term consists of SHAP entropy penalty – Encourages the model to rely on a sparse,
welldistributed subset of important features. The second term is SHAP stability penalty– Ensures that
feature attributions remain stable across diferent samples, reducing sensitivity to small perturbations in
the data. By embedding these explainability-driven constraints into the learning objective, our method
enhances both predictive accuracy and interpretability. The framework is applicable to both regression
and classification tasks, and first experiments have shown that it is particularly efective for tree-based
models such as LightGBM, XGBoost, and CatBoost.</p>
      <p>We evaluate our approach on a diverse set of benchmark datasets, comparing its performance
against standard models. Our results show that SHAP-guided regularization improves generalization
by reducing overfitting to spurious correlations, enhances interpretability by concentrating feature
importance on the most relevant predictors, and increases robustness by ensuring stable attributions
across samples.</p>
      <p>The rest of this paper is structured as follows: Section 2 discusses related works, including SHAP-based
model interpretation and feature importance-driven regularization. Section 3 details our SHAP-guided
regularization framework and training procedure. Section 4 presents empirical results, demonstrating
the efectiveness of our approach across regression and classification tasks. Finally, Section 5 concludes
with insights and future directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <sec id="sec-2-1">
        <title>2.1. Feature Importance and Explainability in Machine Learning</title>
        <p>
          Interpretability in machine learning has gained significant attention, particularly in domains where
model decisions impact critical outcomes, such as finance, healthcare, and autonomous systems.
Traditional feature importance measures, such as permutation importance [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and Gini importance in
decision trees [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], provide insights into model behavior but often sufer from instability and bias toward
correlated features.
        </p>
        <p>
          SHapley Additive exPlanations (SHAP) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] are a widely used approach that attributes feature
importance based on cooperative game theory principles. Unlike other methods, SHAP ensures fair and
consistent feature attribution, making it a popular tool for understanding model predictions.
However, most applications of SHAP focus on post hoc analysis—explaining trained models—rather than
integrating feature attributions into the learning process [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Regularization for Improved Generalization and Interpretability</title>
        <p>
          Regularization techniques such as L1 (Lasso) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and L2 (Ridge) penalties [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] are commonly employed
to improve model generalization by controlling feature weights. While these methods help prevent
overfitting, they do not explicitly guide the model to focus on the most meaningful features. Other forms
of feature selection, such as tree-based pruning [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and attention mechanisms in deep learning [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ],
aim to refine model decision-making but often rely on heuristic approaches rather than interpretable
attributions like SHAP values.
        </p>
        <p>
          Some studies have explored feature importance-driven regularization. For instance, Alvarez-Melis
and Jaakkola [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] propose stability-driven constraints to ensure consistent model explanations across
similar samples. Meanwhile, Lundberg et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] discuss the use of SHAP for feature selection but
do not incorporate it into the training objective. To our knowledge, no prior work has introduced a
SHAP-guided regularization framework that is applicable to both regression and classification tasks
while explicitly optimizing for interpretability, stability, and predictive performance.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. SHAP-Guided Learning: Bridging Interpretability and Optimization</title>
        <p>
          A few recent works have begun exploring SHAP-integrated learning. In [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], a neural network
architecture that incorporates Shapley values as latent representations. This design allows for intrinsic,
layer-wise explanations during the model’s forward pass, facilitating explanation regularization during
training and enabling rapid computation of explanations at inference time. The authors in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] propose
X-SHIELD, a regularization technique that enhances model explainability by selectively masking input
features based on explanations. Seamlessly integrated into the objective function, X-SHIELD improves
both the performance and interpretability of AI models. SHAPNN [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] is a deep learning architecture
tailored for tabular data, integrating Shapley values as a regularization mechanism during training.
This approach not only provides valid explanations without additional computational overhead but
also enhances model performance and robustness in handling streaming data.
        </p>
        <p>Our proposed SHAP-guided regularization framework bridges this gap by incorporating SHAP-based
entropy and stability penalties to encourage sparse and robust feature attributions, making the method
applicable to both regression and classification in a unified manner and enhancing generalization
while preserving explainability, a crucial factor in real-world decision-making. In the next section, we
formalize our approach, detailing the mathematical formulation, training procedure, and advantages of
SHAP-guided regularization.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. SHAP-Based Regularization for Learning Models</title>
        <p>Our method integrates SHAP values into the model training process by introducing regularization terms
based on entropy and stability of the feature attributions. The goal is to improve both the predictive
performance and the interpretability of the model by guiding its focus towards the most relevant
features while maintaining stable feature importance across similar inputs. This section describes how
we incorporate SHAP-guided regularization into the model’s loss function.</p>
        <p>Given a set of training samples {(, )}, where  represents the feature vector and  the target,
our objective is to learn a model  () that minimizes a regularized loss function. For both regression
and classification tasks, the total loss function total can be defined as:</p>
        <p>total = task +  1entropy +  2stability
where task is the standard loss function for the task (e.g., mean squared error for regression or binary
cross-entropy for classification). entropy is the entropy penalty that encourages sparse and interpretable
feature importance distributions. stability is the stability penalty that promotes consistency in SHAP
attributions across similar samples.  1 and  2 are the regularization hyperparameters that control the
influence of the interpretability penalties.</p>
        <p>2
stability =  ( − 1) =1 ′=1 =1</p>
        <p>′̸=
  
∑︁ ∑︁ ∑︁ | − ′ |,
where  and ′ denote the SHAP values for the -th feature of samples  and ′ respectively, and
 is the number of features. This formulation measures the average pairwise discrepancy in feature</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Regularization Terms Based on SHAP</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. SHAP Entropy Penalty (Sparsity)</title>
          <p>The entropy penalty entropy is designed to sparsify the model’s focus on important features. It is
calculated as the Shannon entropy of the normalized SHAP values across all features for each prediction:
 
1 ∑︁ ∑︁ ˆ log(ˆ )
entropy = −</p>
          <p>=1 =1
where:  is the number of samples,  is the number of features, and ˆ represents the normalized
absolute SHAP value for the -th feature in the -th sample. The entropy captures the uncertainty in the
feature importance. The penalty encourages models to focus on a small subset of important features,
reducing the influence of irrelevant ones. A higher penalty  1 leads to more sparse explanations.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. SHAP Stability Penalty (Consistency)</title>
          <p>The stability regularization term, stability, is designed to enforce consistency in the model’s explanations
by penalizing variations in SHAP values across similar input samples. Specifically, it quantifies how
much the attribution of feature importance fluctuates between diferent but similar data points. Given a
dataset of  samples and their associated SHAP values  for feature  of sample , the stability loss is
defined as:
(1)
(2)
(3)
attributions across all sample pairs, normalized by the total number of comparisons. By minimizing
stability, the model is encouraged to produce SHAP value distributions that are smooth and consistent
across similar instances, thereby enhancing the robustness and reliability of the explanations. The
regularization coeficient  2 controls the strength of this penalty; increasing  2 places greater emphasis
on producing stable, coherent explanations during model optimization.</p>
          <p>Our SHAP-guided regularization method ofers several notable advantages. Firstly, by penalizing
entropy and enforcing stability, the approach ensures that the model emphasizes the most critical
features, leading to sparse and consistent feature attributions. This enhances interpretability, as the
model’s decisions become more transparent and understandable. Secondly, the incorporation of these
regularization terms aids in reducing overfitting. By guiding the model to depend on a smaller, more
stable subset of features, it promotes better generalization to unseen data. Thirdly, the framework’s
lfexibility allows its application to both regression and classification tasks, providing a unified approach
across diferent problem domains.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental Setup</title>
        <p>We conduct experiments on 10 diverse datasets spanning regression and classification tasks. These
datasets vary in size, feature dimensionality, and complexity, ensuring a comprehensive evaluation of
our proposed SHAP-guided training approach. Table 1 summarizes the dataset characteristics.</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. SHAP-Guided Model</title>
          <p>
            LightGBM [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] was selected as the foundational model for implementing SHAP-guided regularization
due to several compelling attributes. Its histogram-based algorithm significantly enhances
computational eficiency by discretizing continuous feature values into discrete bins, thereby accelerating
training processes and reducing memory usage. Additionally, LightGBM’s inherent support for
TreeSHAP (SHapley Additive exPlanations) facilitates precise estimation of feature importance, making
it particularly suitable for interpretability-focused modifications. The model’s scalability is another
advantage, as it adeptly manages large datasets with extensive feature sets. Furthermore, LightGBM
consistently delivers robust performance across both classification and regression tasks. By integrating
SHAP-guided regularization into LightGBM, the objective is to harmonize predictive accuracy with
enhanced feature interpretability, ensuring that the model not only performs well but also provides
transparent insights into its decision-making processes
Model Training Procedure Our first exploration of the combined loss function started by applying
SHAP-guided regularization within the gradient-boosting framework, specifically using LightGBM for
both classification and regression tasks. The training procedure proceeds as follows:
1. Initialization: Initialize the LightGBM model with default hyperparameters. Set the
regularization hyperparameters  1 and  2 based on experimental settings.
2. Iterative Training: Train the model using LightGBM’s iterative boosting mechanism. At each
iteration , we train a new decision tree and update the model’s parameters.
3. Loss Function Update: After each boosting iteration, the total loss total is computed, which
includes the task loss task, and the regularization terms entropy and stability. The model parameters
are then updated to minimize this total loss function.
4. Model Evaluation: After training, the model is evaluated on a validation set using appropriate
metrics (e.g., F1 score and AUC for classification, RMSE for regression).
          </p>
          <p>Hyperparameter Tuning and Optimization To fine-tune the performance of the SHAP-guided
method, we use cross-validation to select optimal values for  1 and  2. Typically, a grid search or
random search is employed to find the combination of hyperparameters that minimizes the combined
loss function.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Evaluation</title>
          <p>To assess the efectiveness of SHAP-guided regularization, we compare our proposed SHAP-guided
LightGBM model against several tree-based baseline machine learning models commonly used for
structured data tasks (Decision Tree, Random Forest, LightGBM, XGBoost, and CatBoost).</p>
          <p>Our SHAP-guided LightGBM extends the standard LightGBM model by incorporating SHAP-based
regularization terms that encourage interpretability and stability in feature attributions.</p>
          <p>To evaluate the performance of diferent models, we utilize the following metrics tailored for
regression and classification tasks:
• Regression: RMSE (Root Mean Squared Error), 2, SHAP Entropy, Top-k Concentration
(Quantifies how concentrated SHAP attributions are among the top-k features), Stability.
• Classification: F1-score, AUC (Area Under the Curve), SHAP Entropy, Top-k Concentration,</p>
          <p>Stability.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>For regression tasks, the SHAP-guided LightGBM maintains competitive predictive performance
while improving interpretability. The model achieves an RMSE of 11.45, which is comparable to standard
LightGBM (11.78) and outperforms other baselines. Similarly, the 2 score remains at 0.83, confirming
that the model retains its ability to explain variance in the data. In terms of interpretability, SHAP
Entropy is reduced to 1.12, indicating that feature importance is more concentrated and less dispersed
compared to standard LightGBM (1.17) and Random Forest (1.55). This suggests that SHAP-guided
regularization encourages a more structured attribution pattern, enhancing transparency in feature
importance. Furthermore, Top-k Concentration improves to 0.89, surpassing the standard LightGBM
(0.86) and XGBoost (0.87), meaning that the model places greater emphasis on the most relevant features.
Stability remains at 0.63, aligning closely with baseline models, demonstrating that the regularization
does not introduce fluctuations in feature attributions.</p>
        <p>For classification tasks, similar trends are observed. The SHAP-guided LightGBM achieves an F1-score
of 0.9207 and an AUC of 0.9641, both slightly surpassing the standard LightGBM (0.9141 F1-score, 0.9592
AUC). This indicates that the introduction of SHAP-based regularization does not degrade predictive
performance. More importantly, SHAP Entropy is reduced to 1.6542, compared to 1.8261 for LightGBM
and 2.1783 for Random Forest, highlighting a more refined and focused attribution distribution. Top-k
Concentration is the highest among all models (0.8905), confirming that the model consistently assigns
importance to a small subset of critical features, which enhances interpretability. Stability remains
competitive at 0.8604, slightly lower than LightGBM (0.8647) but higher than other baselines, ensuring
robustness in feature attributions.</p>
        <p>Figure 1 shows an illustration of SHAP diagram using standard lightGBM (Baseline Model) and the
SHAP-guided LightGBM for the airfoil regression dataset. It is clear that the SHAP regularization
promotes stability by compromising similar feature importance to similar samples (more condensed
regions in Figure 1b). This is confirmed further by Figure 2, which shows lower variance of SHAP
values across diferent features using the guided-SHAP model for the same dataset.</p>
        <p>Overall, these results demonstrate that SHAP-guided regularization efectively enhances
interpretability without compromising predictive accuracy. The method successfully reduces SHAP Entropy, leading
to sparser and more meaningful feature attributions, while increasing Top-k Concentration, ensuring the
model prioritizes the most relevant features. Stability remains comparable to non-regularized models,
confirming that the proposed method does not introduce instability in feature attributions. These
ifndings indicate that SHAP-guided learning can serve as a powerful tool for balancing interpretability
and predictive performance in tree-based models.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We introduced a first exploration of SHAP-guided loss training. First experiments on LightGBM showed
that SHAP-based regularization promotes interpretable and stable feature attributions while maintaining
strong predictive performance.</p>
      <p>SHAP regularization requires further detailed exploration as it seems to be a potential tool for:
• Improving SHAP-based interpretability metrics without degrading accuracy.
• Enhancing feature attribution stability across datasets.</p>
      <p>(a) Baseline lightGBM
(b) SHAP-guided lightGBM</p>
      <p>• Providing a novel approach to balancing predictive performance with interpretability in ML
models.</p>
      <p>These insights demonstrate that SHAP-guided learning is a promising direction for explainable
machine learning.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The author has not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rad</surname>
          </string-name>
          ,
          <article-title>Opportunities and challenges in explainable artificial intelligence (xai): A survey</article-title>
          , arXiv preprint arXiv:
          <year>2006</year>
          .
          <volume>11371</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>C. Molnar,</surname>
          </string-name>
          <article-title>Interpretable machine learning</article-title>
          ,
          <source>Lulu. com</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Belle</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Papantonis</surname>
          </string-name>
          ,
          <article-title>Principles and practice of explainable machine learning</article-title>
          ,
          <source>Frontiers in big Data</source>
          <volume>4</volume>
          (
          <year>2021</year>
          )
          <fpage>688969</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Extracting spatial efects from machine learning model using local interpretation method: An example of shap and xgboost</article-title>
          , Computers,
          <source>Environment and Urban Systems</source>
          <volume>96</volume>
          (
          <year>2022</year>
          )
          <fpage>101845</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Altmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Toloşi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sander</surname>
          </string-name>
          , T. Lengauer,
          <article-title>Permutation importance: a corrected feature importance measure</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>26</volume>
          (
          <year>2010</year>
          )
          <fpage>1340</fpage>
          -
          <lpage>1347</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B. H.</given-names>
            <surname>Menze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Kelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Masuch</surname>
          </string-name>
          , U. Himmelreich,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bachert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Petrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Hamprecht</surname>
          </string-name>
          ,
          <article-title>A comparison of random forest and its gini importance with standard chemometric methods for the feature selection and classification of spectral data</article-title>
          ,
          <source>BMC bioinformatics 10</source>
          (
          <year>2009</year>
          )
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Antwarg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shapira</surname>
          </string-name>
          , L. Rokach,
          <article-title>Explaining anomalies detected by autoencoders using shapley additive explanations</article-title>
          ,
          <source>Expert systems with applications 186</source>
          (
          <year>2021</year>
          )
          <fpage>115736</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          , G. Fung,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosales</surname>
          </string-name>
          ,
          <article-title>Optimization methods for l1-regularization</article-title>
          , University of British Columbia,
          <source>Technical Report TR-2009-19</source>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cortes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohri</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Rostamizadeh,</surname>
          </string-name>
          <article-title>L2 regularization for learning kernels</article-title>
          ,
          <source>arXiv preprint arXiv:1205.2653</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>A rough set approach to feature selection based on power set tree</article-title>
          ,
          <source>Knowledge-Based Systems</source>
          <volume>24</volume>
          (
          <year>2011</year>
          )
          <fpage>275</fpage>
          -
          <lpage>281</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Brauwers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Frasincar</surname>
          </string-name>
          ,
          <article-title>A general survey on attention mechanisms in deep learning</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>35</volume>
          (
          <year>2021</year>
          )
          <fpage>3279</fpage>
          -
          <lpage>3298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Alvarez Melis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jaakkola</surname>
          </string-name>
          ,
          <article-title>Towards robust interpretability with self-explaining neural networks</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>31</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. I. Inouye</surname>
          </string-name>
          ,
          <article-title>Shapley explanation networks</article-title>
          ,
          <source>arXiv preprint arXiv:2104.02297</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sevillano-García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luengo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <article-title>X-shield: Regularization for explainable artificial intelligence</article-title>
          ,
          <source>arXiv preprint arXiv:2404.02611</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Cheng</surname>
          </string-name>
          , S. Qu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Shapnn: Shapley value regularized tabular neural network</article-title>
          ,
          <source>arXiv preprint arXiv:2309.08799</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Finley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W. Ma,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ye</surname>
          </string-name>
          , T.-Y. Liu,
          <article-title>Lightgbm: A highly eficient gradient boosting decision tree</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>