<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ECAI</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>SHAP-Driven Explainability in Survival Analysis for Predictive Maintenance Applications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Monireh Kargar-Sharif-Abad</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zahra Kharazian</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ioanna Miliou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tony Lindgren</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Stockholm University, Department of Computer and Systems Sciences</institution>
          ,
          <addr-line>Kista, SE-164 07</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>19</volume>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In the dynamic landscape of industrial operations, ensuring machines operate without interruption is crucial for maintaining optimal productivity levels. To avoid unexpected equipment failures, minimize downtime, and improve operational eficiency, estimating the Remaining Useful Life is very important in Predictive Maintenance. Survival analysis is a beneficial approach in this context due to its power of handling censored data (here referred to industrial assets that have not experienced a failure during the study period). Recently, with a big increase in the amount of recorded data, Machine Learning Survival models have been developed to find more complex patterns in predicting failure. However, the black-box nature of these models requires the use of explainable AI for greater transparency and interpretability. In this paper, we evaluate three Machine Learning-based Survival Analysis methods (Random Survival Forest, Gradient Boosting Survival Analysis, and Survival Support vector machine) and a traditional Survival Analysis model (Cox Proportional Hazards) using real-world data from SCANIA AB that includes 90% censored data. Results indicate that Random Survival Forest outperforms other models. In addition, we employ SHAP analysis to provide global and local explanations, highlighting the importance and interaction of features in our best-performing model. To overcome the limitation of applying SHAP on survival output, we utilize a surrogate model. Finally, SHAP identifies specific influential features, shedding light on their efects and interactions. This methodology tackles the inherent black-box nature of machine learning-based survival analysis models, providing valuable insights into their predictions. The findings from our SHAP analysis confirm the pivotal role of these identified features and their interactions, thereby enriching our comprehension of the factors influencing Remaining Useful Life predictions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Explainable Artificial Intelligence</kwd>
        <kwd>Predictive Maintenance</kwd>
        <kwd>Survival Analysis</kwd>
        <kwd>XPdM</kwd>
        <kwd>Censored data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the era of Industry 4.0, Predictive Maintenance (PdM) has become a cornerstone of modern
manufacturing, which leverages IoT and digitization to increase machine longevity and
eficiency. In fact, self-monitoring machinery that can predict and prevent failures helps minimize
downtime and optimize maintenance scheduling [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A key aspect of PdM is to estimate
industrial assets’ Remaining Useful Life (RUL), which leads to appropriate maintenance actions, cost
reductions, and operational eficiency improvements [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. One of the significant challenges in
PdM is handling censored data in which the exact failure time of components is not observed [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
To handle censored data, the Survival analysis models (SA) were initially developed in clinical
research. These models apply statistical techniques to estimate the timing of events. Traditional
SA techniques, such as the Cox Proportional Hazards (CPH) model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], provide valuable insights
into survival probabilities and hazard rates [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, these models have some restricted
assumptions, such as linearity and proportional hazards, which limit their applicability and
performance in complex industrial settings [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        To overcome these limitations, machine learning-based survival analysis (ML-based SA)
models have been developed [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ]. These advanced models, including random survival
forests and deep learning approaches, provide superior predictive performance [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] but are
often criticized for their lack of transparency. The “black-box” nature of these models makes
it challenging for professionals to distinguish the factors influencing the model’s outputs and
trust in their prediction results. Understanding these factors is essential for domain experts to
trust the models and make informed maintenance decisions in real-world scenarios [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. For
instance, the feature importance analysis provides knowledge about the most influential factors
that afect the failure. Moreover, it enables domain experts to estimate the usefulness of new
sensors and thus enables the calculation of optimal sensor equipment.
      </p>
      <p>
        Explainable AI (XAI) methods, such as SHAP (SHapley Additive exPlanations) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], have been
introduced to address interpretability issues in predictive models. In fact, in the field of PdM,
several research studies focused on improving the interpretability of machine learning models
[13, 14, 15, 16, 17]. Despite these eforts, the application of XAI to ML-based SA models in PdM
remains underexplored [13]. One reason for this is that XAI methods are primarily designed
for conventional machine learning models, like classifiers and regressors, which provide point
predictions. In contrast, survival models produce functions, such as survival or hazard functions,
representing the probability of events occurring over time rather than single-point outcomes.
This diference necessitates the development of specialized XAI methods for survival models
[18].
      </p>
      <p>Moreover, the scarcity of real-world data in the development of RUL prediction models
poses a significant challenge, especially for academia. Only a minority (24.14%) of datasets
used in research accurately depict actual industrial conditions [19]. Leveraging real-world data
improves the quality and cost-efectiveness of maintenance strategies and products [20].</p>
      <p>This paper focuses on evaluating several ML-based SA models against the traditional CPH
model for predicting the RUL of a specific component in truck engines manufactured by SCANIA
AB in Sweden by using real-world dataset [21, 22]. The assessment is carried out by applying
Harrell’s concordance index (C-index), which is a standard evaluation metric in survival models,
and evaluates how well they predict the ranking of survival times. Moreover, we utilize SHAP
analysis to get insight into the factors that are most influential for the best-performing ML-based
SA model, enhancing its interpretability. Given the challenge of applying XAI methods to SA
models, we first employ a surrogate ML model to transform the SA output into a regression
problem. This enables the possibility of applying XAI techniques to the surrogate regression
model, thereby providing insights into the predictions of the original SA model. Overall,
integrating SHAP with ML-based survival models for predictive maintenance and RUL estimation
ofers a promising solution to the challenge of model interpretability, where the data include a
vast majority of censored data. Consequently, this integration promotes the development of
transparent and reliable predictive models that can significantly increase maintenance strategies
and decision-making processes in industrial applications.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Background</title>
      <sec id="sec-2-1">
        <title>2.1. Survival Analysis</title>
        <p>
          Survival analysis is a statistical method that analyzes and estimates the time until an event of
interest happens. More specifically, it provides insights into the probability of survival beyond
a specific time point , defined as () = Pr( &gt; ) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], where  represents the survival time
variable. Traditional SA models vary based on their assumptions regarding the survival time
distribution; non-parametric methods make no assumptions about the distribution of survival
times, semi-parametric methods assume some distribution aspects, and parametric methods
fully specify the distribution[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          Survival models were initially developed in the clinical domain [23] to estimate the lifetime
of patients where their end of life was not observed and was censored. Then, they expanded
to other fields, such as engineering and industrial domains like PdM [
          <xref ref-type="bibr" rid="ref10 ref5">5, 10</xref>
          ]. Depending on
the field of applying SA, the definition of the event of interest varies between death, recovery,
failure of the machine, etc.
        </p>
        <p>
          Censored data is inevitable and commonly encountered in real-world datasets, especially in
PdM, where the quality of components is usually high. Many components may not fail within
the data collection/observation time. As a consequence, the application of survival analysis
models in predictive maintenance, especially in estimating RUL, is experiencing significant
growth. Traditional survival analysis techniques, such as the CPH model, have been used
to handle highly censored data in predicting the RUL of assets such as turbofan engines [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
and mobile working assets [24]. However, due to their strict assumptions, which may only
sometimes be true [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and the larger modeling capacity of ML models, there is increasing
exploration of integrating machine learning with survival analysis.
        </p>
        <p>
          ML-based SA models outperform traditional SA models in many cases. For instance, Voronov
et al. [25] applied Random Survival Forest (RSF) to predict truck battery life, demonstrating its
efectiveness with high-censoring datasets. In another study, Rahat et al. [ 26] found RSF superior
to Gradient Boosting (GB) in predicting RUL, with a lower mean absolute error. Vallarino [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
also compared models for predicting startup failure, and the results indicated that RSF achieved
the highest accuracy among the other models.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Explainability in SA</title>
        <p>
          In recent years, the application of XAI in ML-based SA has attracted increasing attention
from researchers across various domains [
          <xref ref-type="bibr" rid="ref3">3, 27, 28</xref>
          ]. In particular, various XAI methods have
been designed and developed to interpret and explain machine learning models. Among these
XAI methods, SHAP analysis is popular for interpreting ML-based SA models. This method
is designed based on game theory, where every feature is considered as a game player, and
contributes a specific value to the prediction output [ 29, 30]. SHAP analysis interestingly
provides local and global interpretability, enhancing the transparency of complex models [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
For instance, in clinical studies, Moncada-Torres et al. [29] found that ML models, especially
Extreme Gradient Boosting (XGBoost), could outperform conventional Cox models in predicting
survival among breast cancer patients. They applied SHAP analysis to provide clear insights
into model decisions. Moreover, Sarica et al. [30] show RSF’s superior performance over Cox
models, predicting Alzheimer’s disease progression. Additionally, they applied SHAP on RSF
output to gain more transparency on their prediction.
        </p>
        <p>Integrating XAI into ML-based survival models holds significant promise for providing
interpretable, accurate predictions in PdM contexts. However, further research is needed to
refine these methods and fully realize their potential, particularly in the PdM domain [ 30, 18, 31].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology and Problem Formulation</title>
      <p>The overall methodology outlines four main steps: 1) Data preparation, 2) Survival modeling, 3)
Regression, and 4) Explainer, as illustrated in Fig. 1 The detailed information of the steps taken
in these steps are elaborated in the following section and are also summarized in Algorithm 1.</p>
      <sec id="sec-3-1">
        <title>3.1. Data preparation</title>
        <p>Let  = {1, ...,  } denote a given set of multivariate time series, ’s, in which  is the total
number of time series. In our study, the dataset is collected from three sources of information:
operational, time-to-event, and specification data. For each time series  (here referred to as
all readouts of vehicle ), the algorithm selects a random representative readout  ∈ , where
 is a uniformly sampled random index, from all the readouts of vehicle  to manage the size
and complexity of the dataset in processing (lines 1 and 2 of the Algorithm 1). In the next step
(line 3), the algorithm converts the dataset into a format suitable for survival analysis. In this
setting, each data point is characterized by three elements of (, ,  ) where ,  , and  are
 -dimensional feature vectors, event indicators ( = 1 when experiencing the event,  = 0 in
case of censoring), and observed time for individual readouts, respectively.</p>
        <p>Considering our problem, for a random readout of each vehicle, we have  : ( ,  ,  ),
where the observed time is the true target and can be calculated for non-censored
observations by  = failure − readout. While,  for censored observations is equal to
 = censoring − readout where the censoring time is equal to the time of last observation.
Finally, the prepared dataset undergoes the preprocessing step (line 4) to address the missing
values, encode categorical features, and remove highly correlated features.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Survival process</title>
        <p>In this step, the dataset is subsequently divided into training and testing sets. The training set
is used for developing the survival models ℳSA (line 6). The trained model is then employed
to predict the survival curves/functions of the test samples (line 7), and these predictions are
evaluated using the C-index (line 8), which is explained in Section 5.1.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Regression</title>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Explainer</title>
        <p>To facilitate the application of explainability approaches like SHAP, a surrogate regression
model ℳsurrogate is trained on the predictions made by the ℳSA on the training data (line 9). In
other words, this approach translates the complex output of the survival models (i.e., survival
curves) into a compatible format (i.e., point prediction) with standard regression techniques.
Finally, the SHAP explainer is applied to the surrogate model ℳsurrogate to explain the prediction
output. The SHAP explainer provides a detailed understanding of how each feature contributes
to the model’s predictions by assigning SHAP values to each feature. For a given prediction  ()
in which  is the input features, SHAP values (,) represent the contribution of feature
. Features that provide strong power for a specific prediction will have large positive SHAP
values ((,) &gt; 0). Conversely, uninformative features will have SHAP values close to zero
((,) ≈ 0). Features indicating a negative impact on the prediction will have negative SHAP
values ((,) &lt; 0).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Empirical Evaluation</title>
      <p>In this study, we evaluate the performance of three ML-based SA models namely RSF, Gradient
Boosting Survival Analysis (GBSA), and Survival Support Vector Machines (SSVMs) against one
traditional survival analysis model (i.e. CPH), using the C-index as the evaluation metric. Based
on the results, the best-performing model is identified and subjected to surrogate modeling.
Subsequently, SHAP analysis is employed on the output of the surrogate model, providing a
comprehensive understanding of the model’s predictive behavior. The SHAP analysis includes
global explanations conducted for all instances in the test dataset, SHAP dependency plots
applied for the four most influential features, and local explanations explored for three instances
ranging from high to low risk. Through these analyses, the influence of individual features on
the model’s predictions is examined, highlighting both the magnitude and direction of their
Algorithm 1: Survival Analysis with SHAP Explainer for PdM</p>
      <p>Input: Multivariate time series dataset , ℳSA, ℳsurrogate</p>
      <p>Output: Φ : SHAP values for explainability of the survival model
1 for each time series/vehicle  ∈  do
2 Select a random index  ∼ Uniform(1, );
3 Selected readout  ←  : ( ,  ,  );</p>
      <p>SHAP(ℳsurrogate, test);
4 Preprocessing;
5 Data split:  into training set (train) and test set (test);
6  = ℳSA. (train);
7 Calculate survival functions for the test set test;
8 (, test);
9 ℳsurrogate. (_, ℳSA.(_));
10 SHAP(ℳsurrogate, train);
11 Compute SHAP values to explain the surrogate model’s predictions;
12 Φ =
impact. This approach allows the key factors afecting the model’s performance and their
implications for survival predictions to be elucidated. For this experiment, Python was utilized
alongside packages such as scikit-survival and SHAP for model development and analysis.</p>
      <sec id="sec-4-1">
        <title>4.1. Dataset</title>
        <p>The data used in this study is a publicly available1, real-world dataset from SCANIA AB,
focusing on a specific anonymized engine component, called Component , of SCANIA heavy
trucks [21, 22]. This dataset is ideal for investigating the interpretability of survival models for
predicting the RUL of heavy vehicle components, as it provides extensive real-world operational
data without the need for time-consuming raw data collection.</p>
        <p>The dataset consists of operational data, repair records (time to event), and truck
specifications from 23,550 distinct trucks, organized into training, testing, and validation sets. For our
analysis, we only utilized the training data because its size was suficiently large to support our
experiments. The operational dataset is a multivariate time series, with the ’time_step’ column
indicating the period each vehicle has been operating with Component . It comprises 105
features which represent 14 operational variables collected by truck sensors and stored in vehicle
control units. These features contain numerical data, carefully selected and anonymized by
experts and named by number codes such as “Number_index”. The specification dataset includes
categorical data representing the truck configuration, named after seven distinct specifications
and their categories [22].</p>
        <p>The repair records dataset includes the “length_of_study_time_step” column, which shows the
number of operational time steps since the component started working, and the “in_study_repair”
column that serves as a class label, with 1 indicating a repair and 0 indicating no repair during
the observation time. The dataset is mostly censored (90.35%), with the majority of the data
1https://snd.se/en/catalogue/dataset/2024-34
indicating no repair during the observation period.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Data Preprocessing</title>
        <p>After merging the operational and specification data, the dataset contains 112 features. By
enumerating the categorical data, using the get_dumies function, the number of features
increases to 195. Following the removal of highly correlated features, 104 features remain. We
opted to select one random readout from each vehicle. This approach also simulates real-world
scenarios when the complete data is not available for each vehicle from the start of its operation.
Next, the RUL is computed as explained in Section 3.1. It is worth mentioning that the dataset
is split into training and testing sets, with 80% allocated for training and 20% for testing.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Survival Analysis Models</title>
        <p>The SA models evaluated in this study include one traditional model and three ML-based models.
The following sections provide a brief overview of each model.</p>
        <sec id="sec-4-3-1">
          <title>4.3.1. Traditional Model</title>
          <p>
            The CPH model is a semi-parametric survival analysis model widely employed for its
interpretability and efectiveness across numerous applications [ 32]. It is utilized to estimate the
efect of covariates on the risk of an event [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. However, despite its popularity due to its
interpretability, this model relies on strict assumptions that may not always hold true. These include
assuming constant hazard ratios over time and depending on linear combinations of covariates,
which may fail to capture complex and nonlinear relationships in the data.
          </p>
        </sec>
        <sec id="sec-4-3-2">
          <title>4.3.2. ML-based Models</title>
          <p>
            Random Survival Forests [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] is a variation of the traditional random forest algorithm adapted for
survival analysis to predict the survival time of components. Instead of using a single decision
tree, RSFs generate multiple survival trees based on bootstrapped samples of the data. The final
survival prediction for a new observation is obtained by averaging the survival functions of all
the trees in the forest [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ].
          </p>
          <p>Gradient Boosting, similar to Random Forests, is a powerful ensemble learning method known
for its strong performance across various applications. In survival analysis, this technique,
referred to as gradient boosting survival analysis, is adapted to predict survival times[33].</p>
          <p>Survival Support Vector Machines are an adaptation of the traditional support vector machine
framework explicitly designed for survival analysis. They handle censored data by using kernel
functions, which allow for the eficient modeling of complex, high-dimensional feature spaces.
This enables SSVMs to provide accurate risk predictions and efectively manage censored data
in survival analysis [34].</p>
        </sec>
        <sec id="sec-4-3-3">
          <title>4.3.3. Hyperparameter Tuning</title>
          <p>The hyperparameters of the ML-based models are optimized using a grid search. A 5-fold
cross-validation is performed to select hyperparameters that generalize well to unseen data.
The set of hyperparameters for each model used in grid search and the best value for each is
summarized in Table 1.</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Surrogate Model</title>
        <p>In this study, a Random Forest (RF) regression model is employed as the surrogate function.
The RF regression model is trained on the RSF output, providing point predictions, which is
compatible with SHAP analysis.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Evaluation</title>
        <p>This study uses the C-index as the assessment metric for survival analysis models due to its
efectiveness in gauging predictive performance within survival analysis tasks. The C-index
evaluates a model’s predictive accuracy in survival analysis by assessing its ability to correctly
rank pairs of samples based on their survival times. To accommodate censored observations,
the index is computed by summing the concordance values for all compatible pairs and dividing
by the total number of such pairs. This comprehensively evaluates the model’s capability to
accurately predict event order. A higher C-index denotes superior predictive accuracy, with a
score of 1 signifying perfect prediction and 0.5 representing random guessing [35].</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.6. SHAP Analysis</title>
        <p>In this study, various SHAP analysis tools are utilized to interpret the model’s predictions. For
global explanations, SHAP summary plots and dependency plots are used. For local explanations,
SHAP force plots are employed.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <sec id="sec-5-1">
        <title>5.1. Performance evaluation</title>
        <p>To assess the models’ performance on an unseen dataset, each model was trained on the entire
training dataset and then tested on the test dataset, demonstrating their generalization
capabilities. Additionally, the training score for each model is reported to ensure that no overfitting
has occurred. As shown in Table 2, the RSF model outperformed not only the traditional CPH
but also two other ML models, namely GBSA and SSVM. This superior performance of the RSF
model highlights its robustness and efectiveness in accurately predicting survival outcomes
compared to the other models evaluated.</p>
        <p>Additionally, the RSF survival probability curves for ten randomly selected instances are
depicted in Figure 2, providing essential insights into the model’s predictive behavior for further
analysis. Lower survival probabilities (e.g., instance 460) indicate a higher risk of failure, while
higher survival probabilities (e.g., instance 4699) denote healthier components with lower failure
risks. Analyzing these curves allows for observation of the predicted durability and risk profiles
of the selected trucks.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. SHAP Analysis</title>
        <p>The following sections discuss the results of the global explanation of the RSF prediction, as
well as the local explanation for the three examples of trucks with diferent survival behaviors.</p>
        <sec id="sec-5-2-1">
          <title>5.2.1. Global Explanations</title>
          <p>SHAP summary plot Figure 3a illustrates the summary plot of SHAP values of 4710 instances
(test dataset). This plot provides a global explanation of the surrogate model output across
trucks of the test dataset. Each point in the plot corresponds to an individual truck. Their
position along the x-axis represents the SHAP value, indicating the impact of that feature on
the model’s prediction for that specific truck. Features are ordered along the y-axis by their
importance, determined by the mean of their absolute SHAP values. Thus, features higher on
the plot are more significant to the model’s overall predictions. The plot displays the SHAP
values of every important feature and their impacts on the model output. The vertical axis
shows the 30 most important features out of a total of 104 features, arranged in descending order
of importance. Additionally, each feature is represented by a line extending from negative to
positive SHAP values, color-coded with red indicating higher feature values and blue indicating
lower feature values. The negative zone represents a tendency towards censored events (y = 0),
while the positive zone indicates a tendency towards failure occurrences (y = 1). In addition,
the most important features not only have higher mean SHAP values but also exhibit a longer
range of SHAP values along the x-axis. This indicates that these features have a more significant
and varied impact on the model’s predictions across diferent instances. As shown in Figure 3a
features 666_0 and 309_0 are the most important features with the highest mean SHAP values.
As the color code indicates, the higher amounts (red color) of features 666_0, 309_0, 158_8,
and 837_0 and the lower amount (blue color) of feature 167_3 are associated with a greater
likelihood of failure occurrence (higher SHAP value). The same interpretation applies to the
rest of the features in this plot as well.</p>
          <p>Figure 3b represents the mean SHAP value for each feature, allowing for a comparison
between the features. For instance, feature 666_0 exhibits the highest impact on the model
output, which is twice as influential as feature 272_2 and three times as influential as feature
158_1.</p>
          <p>SHAP dependence plot It is a valuable tool provided by SHAP analysis. It shows how features
influence the model’s output and shows interactions between features. Figure 4 illustrates a
SHAP force plot for the four most significant features in our study. These plots demonstrate the
general influence of feature A (color-coded on the right y-axis) on feature B (x-axis for feature
B’s value and left y-axis for its SHAP value) across multiple instances. This plot highlights how
variations in feature A afect the contribution of feature B to the model’s predictions, illustrating
the combined efects of these features on the overall model output. In addition, it identifies the
turning points where the feature value results in a zero SHAP value, indicating a neutral impact
(a) SHAP summary plot
(b) Mean SHAP value
on the model’s prediction at those specific values. For example, Figure 4a shows that when the
feature 666_0 value surpasses 0.1e6 on the x-axis, the corresponding SHAP value turns positive.
This indicates that beyond this threshold, feature 666_0 contributes to predicting failure events
in the model’s output. Regarding feature interaction, when feature 272_1 has a higher value
(indicated by red color), the SHAP value associated with feature 666_0 moves closer to zero.
This means that a high value of 272_1 reduces the impact of 666_0 on the prediction output,
regardless of whether the efect of 666_0 is positive or negative. This interpretation extends to
the other features shown in the Figure 4.</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>5.2.2. Local Explanations</title>
          <p>The SHAP force diagram is a valuable tool that provides local explanations for individual
predictions. It clarifies the influence of each feature on the model’s prediction for a particular
instance, illustrating both the direction and magnitude of each feature’s impact. The blue arrows
(c) Feature 158_8
(d) Feature 837_0
pointing to the left indicate that lower values of certain features are associated with a lower
model output (e.g.,  = 0 indicating no failure occurred). Conversely, red arrows pointing to
the right indicate that higher values of these features correspond to a higher model output (e.g.,
 = 1 indicating a failure occurred).</p>
          <p>Figures 5, 6 and 7 depict instances classified as low risk, medium risk, and high risk,
respectively, based on their survival probability curves (see Figure 2). Interestingly, these instances
share a common set of influential features, including 666_0, 309_0, 158_8, 837_0, and 167_3,
as identified by SHAP global analysis (Figure 3). In addition to these common features, each
instance exhibits specific features that are particularly influential. For example, in the low-risk
instance (see Figure 5) feature 459_12 is important, and in the high-risk instance (see Figure 7)
feature 397_4 is identified as a key feature. Addressing these features is crucial for mitigating
potential failures or adverse events in maintenance scenarios.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, we tackled the challenge of incorporating explainability to survival models using
real-world data, which has more than 90% censored enteries, from truck engine components
manufactured by SCANIA AB. We evaluated the performance of three machine learning-based
survival analysis models against the traditional Cox Proportional Hazards model for predicting
the remaining useful life of truck components. The RSF model emerged as the best-performing
model. To address the inherent black-box nature of ML-based survival analysis models, we
utilized SHAP analysis, providing both global and local insights into feature importance and
interactions. To make the RSF output compatible with SHAP analysis, first, a surrogate model
was applied to the RSF output. Subsequently, SHAP analysis was exclusively applied to the
surrogate model. This comprehensive approach not only identified key factors afecting model
predictions but also demonstrated the potential of SHAP analysis in making complex models
more transparent and understandable. Our work, in fact, can be considered as one of the first
attempts to integrate XAI techniques into survival analysis, which in turn can enhance trust in
predictive models and provide invaluable support for decision-making in real-world industrial
scenarios.</p>
      <p>Future work could investigate the performance of other machine learning models, particularly
Deep Learning-based approaches, to enhance predictions of remaining useful life for heavy
vehicle components. Additionally, exploring changes in feature importance over time, by
utilizing all operational readouts could provide insights into the dynamic nature of predictive
features. Engaging domain experts, such as maintenance engineers or equipment manufacturers,
in future studies, would ensure that the models incorporate relevant domain knowledge and
meet practical requirements for predictive maintenance applications.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgments</title>
      <p>This work has been partially funded by Scania CV AB and the Vinnova program for Strategic
Vehicle Research and Innovation (FFI) through the project RAPIDS (grant no. 2021-02522).
[13] L. Cummins, A. Sommers, S. B. Ramezani, S. Mittal, J. Jabour, M. Seale, S. Rahimi,
Explainable predictive maintenance: A survey of current methods, challenges and opportunities,
arXiv preprint arXiv:2401.07871 (2024).
[14] S. Matzka, Explainable artificial intelligence for predictive maintenance applications, in:
2020 third international conference on artificial intelligence for industries (ai4i), IEEE,
2020, pp. 69–74.
[15] S. Vollert, M. Atzmueller, A. Theissler, Interpretable machine learning: A brief survey
from the predictive maintenance perspective, in: 2021 26th IEEE international conference
on emerging technologies and factory automation (ETFA), IEEE, 2021, pp. 01–08.
[16] M. Kozielski, Contextual explanations for decision support in predictive maintenance,</p>
      <p>Applied Sciences 13 (2023) 10068.
[17] C. W. Hong, C. Lee, K. Lee, M.-S. Ko, K. Hur, Explainable artificial intelligence for the
remaining useful life prognosis of the turbofan engines, in: 2020 3rd ieee international
conference on knowledge innovation and invention (ickii), IEEE, 2020, pp. 144–147.
[18] A. Alabdallah, S. Pashami, T. Rögnvaldsson, M. Ohlsson, Survshap: a proxy-based algorithm
for explaining survival models with shap, in: 2022 IEEE 9th international conference on
data science and advanced analytics (DSAA), IEEE, 2022, pp. 1–10.
[19] C. Ferreira, G. Gonçalves, Remaining useful life prediction and challenges: A literature
review on the use of machine learning methods, Journal of Manufacturing Systems 63
(2022) 550–562.
[20] Y. Zhang, P. Tiňo, A. Leonardis, K. Tang, A survey on neural network interpretability,</p>
      <p>IEEE Transactions on Emerging Topics in Computational Intelligence 5 (2021) 726–742.
[21] T. Lindgren, O. Steinert, O. Andersson Reyna, Z. Kharazian, S. Magnusson, SCANIA
Component X Dataset: A Real-World Multivariate Time Series Dataset for Predictive Maintenance,
2024. URL: https://doi.org/10.58141/1w9m-yz81. doi:10.58141/1w9m-yz81.
[22] Z. Kharazian, T. Lindgren, S. Magnússon, O. Steinert, O. A. Reyna, Scania component x
dataset: A real-world multivariate time series dataset for predictive maintenance, arXiv
preprint arXiv:2401.15199 (2024).
[23] I. Etikan, S. Abubakar, R. Alkassim, The kaplan-meier estimate in survival analysis, Biom</p>
      <p>Biostat Int J 5 (2017) 00128.
[24] Z. Yang, J. Kanniainen, T. Krogerus, F. Emmert-Streib, Prognostic modeling of predictive
maintenance with survival analysis for mobile work equipment, Scientific Reports 12
(2022) 8529.
[25] S. Voronov, E. Frisk, M. Krysander, Data-driven battery lifetime prediction and confidence
estimation for heavy-duty trucks, IEEE Transactions on Reliability 67 (2018) 623–639.
[26] M. Rahat, Z. Kharazian, P. S. Mashhadi, T. Rögnvaldsson, S. Choudhury, Bridging the gap:
A comparative analysis of regressive remaining useful life prediction and survival analysis
methods for predictive maintenance, in: PHM Society Asia-Pacific Conference, volume 4,
2023.
[27] R. Csalódi, Z. Bagyura, J. Abonyi, Mixture of survival analysis models-cluster-weighted
weibull distributions, IEEE Access 9 (2021) 152288–152299.
[28] A. Kapuria, D. G. Cole, Integrating survival analysis with bayesian statistics to forecast the
remaining useful life of a centrifugal pump conditional to multiple fault types, Energies
16 (2023) 3707.
[29] A. Moncada-Torres, M. C. van Maaren, M. P. Hendriks, S. Siesling, G. Geleijnse, Explainable
machine learning can outperform cox regression predictions and provide insights in breast
cancer survival, Scientific reports 11 (2021) 6968.
[30] A. Sarica, F. Aracri, M. G. Bianco, F. Arcuri, A. Quattrone, A. Quattrone, A. D. N. Initiative,
Explainability of random survival forests in predicting conversion risk from mild cognitive
impairment to alzheimer’s disease, Brain Informatics 10 (2023) 31.
[31] R. Passera, S. Zompi, J. Gill, A. Busca, Explainable machine learning (xai) for survival
in bone marrow transplantation trials: A technical report, BioMedInformatics 3 (2023)
752–768.
[32] M. J. Bradburn, T. G. Clark, S. B. Love, D. G. Altman, Survival analysis part ii: multivariate
data analysis–an introduction to concepts and methods, British journal of cancer 89 (2003)
431–436.
[33] T. Hothorn, P. Bühlmann, S. Dudoit, A. Molinaro, M. J. Van Der Laan, Survival ensembles,</p>
      <p>Biostatistics 7 (2006) 355–373.
[34] S. Pölsterl, N. Navab, A. Katouzian, An eficient training algorithm for kernel survival
support vector machines, arXiv preprint arXiv:1611.07054 (2016).
[35] I. Vasilev, M. Petrovskiy, I. Mashechkin, Sensitivity of survival analysis metrics,
Mathematics 11 (2023) 4246.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , Industry
          <volume>4</volume>
          .0
          <article-title>-potentials for predictive maintenance</article-title>
          ,
          <source>in: 6th international workshop of advanced manufacturing and automation</source>
          , Atlantis Press,
          <year>2016</year>
          , pp.
          <fpage>42</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pashami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nowaczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jakubowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Paiva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Davari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bobek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jamshidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sarmadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alabdallah</surname>
          </string-name>
          , et al.,
          <source>Explainable predictive maintenance, arXiv preprint arXiv:2306.05120</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <article-title>Machine learning for survival analysis: A survey, ACM Computing Surveys (CSUR) 51 (</article-title>
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Cox</surname>
          </string-name>
          ,
          <article-title>Regression models and life-tables</article-title>
          ,
          <source>Journal of the Royal Statistical Society: Series B (Methodological) 34</source>
          (
          <year>1972</year>
          )
          <fpage>187</fpage>
          -
          <lpage>202</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hrnjica</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Softic,</surname>
          </string-name>
          <article-title>The survival analysis for a predictive maintenance in manufacturing</article-title>
          ,
          <source>in: Advances in Production Management Systems. Artificial Intelligence for Sustainable and Resilient Production Systems: IFIP WG 5</source>
          .7 International Conference, APMS 2021, Nantes, France, September 5-
          <issue>9</issue>
          ,
          <year>2021</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>III</given-names>
          </string-name>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vallarino</surname>
          </string-name>
          ,
          <article-title>Machine learning survival models restrictions: the case of startups time to failed with collinearity-related issues</article-title>
          ,
          <source>Journal of Economic Statistics</source>
          <volume>1</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Van Belle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Pelckmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Van</given-names>
            <surname>Hufel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Suykens</surname>
          </string-name>
          ,
          <article-title>Support vector methods for survival analysis: a comparison between ranking and regression approaches</article-title>
          ,
          <source>Artificial intelligence in medicine 53</source>
          (
          <year>2011</year>
          )
          <fpage>107</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Katzman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Shaham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cloninger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kluger</surname>
          </string-name>
          ,
          <article-title>Deepsurv: personalized treatment recommender system using a cox proportional hazards deep neural network</article-title>
          ,
          <source>BMC medical research methodology</source>
          <volume>18</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ishwaran</surname>
          </string-name>
          , U. B.
          <string-name>
            <surname>Kogalur</surname>
            ,
            <given-names>E. H.</given-names>
          </string-name>
          <string-name>
            <surname>Blackstone</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          <string-name>
            <surname>Lauer</surname>
          </string-name>
          , Random survival forests (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rahat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kharazian</surname>
          </string-name>
          ,
          <article-title>Survloss: A new survival loss function for neural networks to process censored data</article-title>
          ,
          <source>in: PHM Society European Conference</source>
          , volume
          <volume>8</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V.</given-names>
            <surname>Hassija</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chamola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mahapatra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Scardapane</surname>
          </string-name>
          , I. Spinelli,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mahmud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hussain</surname>
          </string-name>
          ,
          <article-title>Interpreting black-box models: a review on explainable artificial intelligence</article-title>
          ,
          <source>Cognitive Computation 16</source>
          (
          <year>2024</year>
          )
          <fpage>45</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>