SHAP-Driven Explainability in Survival Analysis for
                                Predictive Maintenance Applications
                                Monireh Kargar-Sharif-Abad1,* , Zahra Kharazian1,* , Ioanna Miliou1 and
                                Tony Lindgren1
                                1
                                    Stockholm University, Department of Computer and Systems Sciences, Kista, SE-164 07, Sweden


                                              Abstract
                                              In the dynamic landscape of industrial operations, ensuring machines operate without interruption is
                                              crucial for maintaining optimal productivity levels. To avoid unexpected equipment failures, minimize
                                              downtime, and improve operational efficiency, estimating the Remaining Useful Life is very important in
                                              Predictive Maintenance. Survival analysis is a beneficial approach in this context due to its power of
                                              handling censored data (here referred to industrial assets that have not experienced a failure during the
                                              study period). Recently, with a big increase in the amount of recorded data, Machine Learning Survival
                                              models have been developed to find more complex patterns in predicting failure. However, the black-box
                                              nature of these models requires the use of explainable AI for greater transparency and interpretability.
                                              In this paper, we evaluate three Machine Learning-based Survival Analysis methods (Random Survival
                                              Forest, Gradient Boosting Survival Analysis, and Survival Support vector machine) and a traditional
                                              Survival Analysis model (Cox Proportional Hazards) using real-world data from SCANIA AB that includes
                                              90% censored data. Results indicate that Random Survival Forest outperforms other models. In addition,
                                              we employ SHAP analysis to provide global and local explanations, highlighting the importance and
                                              interaction of features in our best-performing model. To overcome the limitation of applying SHAP
                                              on survival output, we utilize a surrogate model. Finally, SHAP identifies specific influential features,
                                              shedding light on their effects and interactions. This methodology tackles the inherent black-box nature
                                              of machine learning-based survival analysis models, providing valuable insights into their predictions.
                                              The findings from our SHAP analysis confirm the pivotal role of these identified features and their
                                              interactions, thereby enriching our comprehension of the factors influencing Remaining Useful Life
                                              predictions.

                                              Keywords
                                              Explainable Artificial Intelligence, Predictive Maintenance, Survival Analysis, XPdM, Censored data


                                1. Introduction
                                In the era of Industry 4.0, Predictive Maintenance (PdM) has become a cornerstone of modern
                                manufacturing, which leverages IoT and digitization to increase machine longevity and effi-
                                ciency. In fact, self-monitoring machinery that can predict and prevent failures helps minimize
                                downtime and optimize maintenance scheduling [1]. A key aspect of PdM is to estimate indus-

                                HAII5.0: Embracing Human-Aware AI in Industry 5.0, at ECAI 2024, 19 October 2024, Santiago de Compostela, Spain.
                                *
                                  Corresponding authors.
                                $ moka6903@student.su.se (M. Kargar-Sharif-Abad); Zahra.kharazian@dsv.su.se (Z. Kharazian);
                                ioanna.miliou@dsv.su.se (I. Miliou); tony@dsv.su.se (T. Lindgren)
                                 https://www.su.se/profiles/zakh1874-1.623373 (Z. Kharazian); https://www.su.se/profiles/iomi2003-1.548427
                                (I. Miliou); https://www.dsv.su.se/~tony (T. Lindgren)
                                 0000-0002-8430-1606 (Z. Kharazian); 0000-0002-1357-1967 (I. Miliou); 0000-0001-7713-1381 (T. Lindgren)
                                            © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
trial assets’ Remaining Useful Life (RUL), which leads to appropriate maintenance actions, cost
reductions, and operational efficiency improvements [2]. One of the significant challenges in
PdM is handling censored data in which the exact failure time of components is not observed [3].
To handle censored data, the Survival analysis models (SA) were initially developed in clinical
research. These models apply statistical techniques to estimate the timing of events. Traditional
SA techniques, such as the Cox Proportional Hazards (CPH) model [4], provide valuable insights
into survival probabilities and hazard rates [5]. However, these models have some restricted
assumptions, such as linearity and proportional hazards, which limit their applicability and
performance in complex industrial settings [6].
   To overcome these limitations, machine learning-based survival analysis (ML-based SA)
models have been developed [7, 8, 9]. These advanced models, including random survival
forests and deep learning approaches, provide superior predictive performance [10] but are
often criticized for their lack of transparency. The “black-box” nature of these models makes
it challenging for professionals to distinguish the factors influencing the model’s outputs and
trust in their prediction results. Understanding these factors is essential for domain experts to
trust the models and make informed maintenance decisions in real-world scenarios [11]. For
instance, the feature importance analysis provides knowledge about the most influential factors
that affect the failure. Moreover, it enables domain experts to estimate the usefulness of new
sensors and thus enables the calculation of optimal sensor equipment.
   Explainable AI (XAI) methods, such as SHAP (SHapley Additive exPlanations) [12], have been
introduced to address interpretability issues in predictive models. In fact, in the field of PdM,
several research studies focused on improving the interpretability of machine learning models
[13, 14, 15, 16, 17]. Despite these efforts, the application of XAI to ML-based SA models in PdM
remains underexplored [13]. One reason for this is that XAI methods are primarily designed
for conventional machine learning models, like classifiers and regressors, which provide point
predictions. In contrast, survival models produce functions, such as survival or hazard functions,
representing the probability of events occurring over time rather than single-point outcomes.
This difference necessitates the development of specialized XAI methods for survival models
[18].
   Moreover, the scarcity of real-world data in the development of RUL prediction models
poses a significant challenge, especially for academia. Only a minority (24.14%) of datasets
used in research accurately depict actual industrial conditions [19]. Leveraging real-world data
improves the quality and cost-effectiveness of maintenance strategies and products [20].
   This paper focuses on evaluating several ML-based SA models against the traditional CPH
model for predicting the RUL of a specific component in truck engines manufactured by SCANIA
AB in Sweden by using real-world dataset [21, 22]. The assessment is carried out by applying
Harrell’s concordance index (C-index), which is a standard evaluation metric in survival models,
and evaluates how well they predict the ranking of survival times. Moreover, we utilize SHAP
analysis to get insight into the factors that are most influential for the best-performing ML-based
SA model, enhancing its interpretability. Given the challenge of applying XAI methods to SA
models, we first employ a surrogate ML model to transform the SA output into a regression
problem. This enables the possibility of applying XAI techniques to the surrogate regression
model, thereby providing insights into the predictions of the original SA model. Overall, inte-
grating SHAP with ML-based survival models for predictive maintenance and RUL estimation
offers a promising solution to the challenge of model interpretability, where the data include a
vast majority of censored data. Consequently, this integration promotes the development of
transparent and reliable predictive models that can significantly increase maintenance strategies
and decision-making processes in industrial applications.


2. Related Background
2.1. Survival Analysis
Survival analysis is a statistical method that analyzes and estimates the time until an event of
interest happens. More specifically, it provides insights into the probability of survival beyond
a specific time point 𝑡, defined as 𝑆(𝑡) = Pr(𝑇 > 𝑡) [3], where 𝑇 represents the survival time
variable. Traditional SA models vary based on their assumptions regarding the survival time
distribution; non-parametric methods make no assumptions about the distribution of survival
times, semi-parametric methods assume some distribution aspects, and parametric methods
fully specify the distribution[3].
   Survival models were initially developed in the clinical domain [23] to estimate the lifetime
of patients where their end of life was not observed and was censored. Then, they expanded
to other fields, such as engineering and industrial domains like PdM [5, 10]. Depending on
the field of applying SA, the definition of the event of interest varies between death, recovery,
failure of the machine, etc.
   Censored data is inevitable and commonly encountered in real-world datasets, especially in
PdM, where the quality of components is usually high. Many components may not fail within
the data collection/observation time. As a consequence, the application of survival analysis
models in predictive maintenance, especially in estimating RUL, is experiencing significant
growth. Traditional survival analysis techniques, such as the CPH model, have been used
to handle highly censored data in predicting the RUL of assets such as turbofan engines [5]
and mobile working assets [24]. However, due to their strict assumptions, which may only
sometimes be true [6] and the larger modeling capacity of ML models, there is increasing
exploration of integrating machine learning with survival analysis.
   ML-based SA models outperform traditional SA models in many cases. For instance, Voronov
et al. [25] applied Random Survival Forest (RSF) to predict truck battery life, demonstrating its
effectiveness with high-censoring datasets. In another study, Rahat et al. [26] found RSF superior
to Gradient Boosting (GB) in predicting RUL, with a lower mean absolute error. Vallarino [6]
also compared models for predicting startup failure, and the results indicated that RSF achieved
the highest accuracy among the other models.

2.2. Explainability in SA
In recent years, the application of XAI in ML-based SA has attracted increasing attention
from researchers across various domains [3, 27, 28]. In particular, various XAI methods have
been designed and developed to interpret and explain machine learning models. Among these
XAI methods, SHAP analysis is popular for interpreting ML-based SA models. This method
is designed based on game theory, where every feature is considered as a game player, and
contributes a specific value to the prediction output [29, 30]. SHAP analysis interestingly
provides local and global interpretability, enhancing the transparency of complex models [12].
For instance, in clinical studies, Moncada-Torres et al. [29] found that ML models, especially
Extreme Gradient Boosting (XGBoost), could outperform conventional Cox models in predicting
survival among breast cancer patients. They applied SHAP analysis to provide clear insights
into model decisions. Moreover, Sarica et al. [30] show RSF’s superior performance over Cox
models, predicting Alzheimer’s disease progression. Additionally, they applied SHAP on RSF
output to gain more transparency on their prediction.
   Integrating XAI into ML-based survival models holds significant promise for providing
interpretable, accurate predictions in PdM contexts. However, further research is needed to
refine these methods and fully realize their potential, particularly in the PdM domain [30, 18, 31].


3. Methodology and Problem Formulation
The overall methodology outlines four main steps: 1) Data preparation, 2) Survival modeling, 3)
Regression, and 4) Explainer, as illustrated in Fig. 1 The detailed information of the steps taken
in these steps are elaborated in the following section and are also summarized in Algorithm 1.

3.1. Data preparation
Let 𝒟 = {𝑉1 , ..., 𝑉𝑁 } denote a given set of multivariate time series, 𝑉𝑖 ’s, in which 𝑁 is the total
number of time series. In our study, the dataset is collected from three sources of information:
operational, time-to-event, and specification data. For each time series 𝑉𝑖 (here referred to as
all readouts of vehicle 𝑖), the algorithm selects a random representative readout 𝑟𝑖𝑗 ∈ 𝑉𝑖 , where
𝑗 is a uniformly sampled random index, from all the readouts of vehicle 𝑖 to manage the size
and complexity of the dataset in processing (lines 1 and 2 of the Algorithm 1). In the next step
(line 3), the algorithm converts the dataset into a format suitable for survival analysis. In this
setting, each data point is characterized by three elements of (𝑋, 𝛿, 𝑇 ) where 𝑋, 𝛿, and 𝑇 are
𝐹 -dimensional feature vectors, event indicators (𝛿 = 1 when experiencing the event, 𝛿 = 0 in
case of censoring), and observed time for individual readouts, respectively.


Figure 1: The methodology framework
  Considering our problem, for a random readout of each vehicle, we have 𝑟𝑖𝑗 : (𝑋𝑖𝑗 , 𝛿𝑖 , 𝑇𝑖𝑗 ),
where the observed time is the true target and can be calculated for non-censored ob-
servations by 𝑇𝑖𝑗 = 𝑇𝑖𝑗failure − 𝑇𝑖𝑗readout . While, 𝑇𝑖𝑗 for censored observations is equal to
         censoring
𝑇𝑖𝑗 = 𝑇𝑖𝑗        − 𝑇𝑖𝑗readout where the censoring time is equal to the time of last observation.
Finally, the prepared dataset undergoes the preprocessing step (line 4) to address the missing
values, encode categorical features, and remove highly correlated features.

3.2. Survival process
In this step, the dataset is subsequently divided into training and testing sets. The training set
is used for developing the survival models ℳSA (line 6). The trained model is then employed
to predict the survival curves/functions of the test samples (line 7), and these predictions are
evaluated using the C-index (line 8), which is explained in Section 5.1.

3.3. Regression
To facilitate the application of explainability approaches like SHAP, a surrogate regression
model ℳsurrogate is trained on the predictions made by the ℳSA on the training data (line 9). In
other words, this approach translates the complex output of the survival models (i.e., survival
curves) into a compatible format (i.e., point prediction) with standard regression techniques.

3.4. Explainer
Finally, the SHAP explainer is applied to the surrogate model ℳsurrogate to explain the prediction
output. The SHAP explainer provides a detailed understanding of how each feature contributes
to the model’s predictions by assigning SHAP values to each feature. For a given prediction 𝑓 (𝑥)
in which 𝑥 is the input features, SHAP values 𝜑𝑖 (𝑣𝑓,𝑥 ) represent the contribution of feature
𝑖. Features that provide strong power for a specific prediction will have large positive SHAP
values (𝜑𝑖 (𝑣𝑓,𝑥 ) > 0). Conversely, uninformative features will have SHAP values close to zero
(𝜑𝑖 (𝑣𝑓,𝑥 ) ≈ 0). Features indicating a negative impact on the prediction will have negative SHAP
values (𝜑𝑖 (𝑣𝑓,𝑥 ) < 0).


4. Empirical Evaluation
In this study, we evaluate the performance of three ML-based SA models namely RSF, Gradient
Boosting Survival Analysis (GBSA), and Survival Support Vector Machines (SSVMs) against one
traditional survival analysis model (i.e. CPH), using the C-index as the evaluation metric. Based
on the results, the best-performing model is identified and subjected to surrogate modeling.
Subsequently, SHAP analysis is employed on the output of the surrogate model, providing a
comprehensive understanding of the model’s predictive behavior. The SHAP analysis includes
global explanations conducted for all instances in the test dataset, SHAP dependency plots
applied for the four most influential features, and local explanations explored for three instances
ranging from high to low risk. Through these analyses, the influence of individual features on
the model’s predictions is examined, highlighting both the magnitude and direction of their
 Algorithm 1: Survival Analysis with SHAP Explainer for PdM
  Input: Multivariate time series dataset 𝒟, ℳSA , ℳsurrogate
  Output: Φ: SHAP values for explainability of the survival model
1 for each time series/vehicle 𝑉𝑖 ∈ 𝒟 do
2     Select a random index 𝑗 ∼ Uniform(1, 𝑚𝑖 );
3     Selected readout 𝑟𝑖 ← 𝑟𝑖𝑗 : (𝑋𝑖𝑗 , 𝛿𝑖 , 𝑇𝑖𝑗 );
4 Preprocessing;
5 Data split: 𝒟 into training set (𝒟train ) and test set (𝒟test );
6 𝑙𝑒𝑎𝑟𝑛𝑒𝑟 = ℳSA .𝑓 𝑖𝑡(𝒟train );
7 Calculate survival functions for the test set 𝒟test ;
8 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒(𝑙𝑒𝑎𝑟𝑛𝑒𝑟, 𝒟test );
 9 ℳsurrogate .𝑓 𝑖𝑡(𝑋_𝑡𝑟𝑎𝑖𝑛, ℳSA .𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝑋_𝑡𝑟𝑎𝑖𝑛));
10 SHAP(ℳsurrogate , 𝒟train );
11 Compute SHAP values to explain the surrogate model’s predictions;
12 Φ = SHAP(ℳsurrogate , 𝒟test );


impact. This approach allows the key factors affecting the model’s performance and their
implications for survival predictions to be elucidated. For this experiment, Python was utilized
alongside packages such as scikit-survival and SHAP for model development and analysis.

4.1. Dataset
The data used in this study is a publicly available1 , real-world dataset from SCANIA AB,
focusing on a specific anonymized engine component, called Component 𝑋, of SCANIA heavy
trucks [21, 22]. This dataset is ideal for investigating the interpretability of survival models for
predicting the RUL of heavy vehicle components, as it provides extensive real-world operational
data without the need for time-consuming raw data collection.
   The dataset consists of operational data, repair records (time to event), and truck specifica-
tions from 23,550 distinct trucks, organized into training, testing, and validation sets. For our
analysis, we only utilized the training data because its size was sufficiently large to support our
experiments. The operational dataset is a multivariate time series, with the ’time_step’ column
indicating the period each vehicle has been operating with Component 𝑋. It comprises 105
features which represent 14 operational variables collected by truck sensors and stored in vehicle
control units. These features contain numerical data, carefully selected and anonymized by
experts and named by number codes such as “Number_index”. The specification dataset includes
categorical data representing the truck configuration, named after seven distinct specifications
and their categories [22].
   The repair records dataset includes the “length_of_study_time_step” column, which shows the
number of operational time steps since the component started working, and the “in_study_repair”
column that serves as a class label, with 1 indicating a repair and 0 indicating no repair during
the observation time. The dataset is mostly censored (90.35%), with the majority of the data
1
    https://snd.se/en/catalogue/dataset/2024-34
indicating no repair during the observation period.

4.2. Data Preprocessing
After merging the operational and specification data, the dataset contains 112 features. By
enumerating the categorical data, using the get_dumies function, the number of features
increases to 195. Following the removal of highly correlated features, 104 features remain. We
opted to select one random readout from each vehicle. This approach also simulates real-world
scenarios when the complete data is not available for each vehicle from the start of its operation.
Next, the RUL is computed as explained in Section 3.1. It is worth mentioning that the dataset
is split into training and testing sets, with 80% allocated for training and 20% for testing.

4.3. Survival Analysis Models
The SA models evaluated in this study include one traditional model and three ML-based models.
The following sections provide a brief overview of each model.

4.3.1. Traditional Model
The CPH model is a semi-parametric survival analysis model widely employed for its inter-
pretability and effectiveness across numerous applications [32]. It is utilized to estimate the
effect of covariates on the risk of an event [4]. However, despite its popularity due to its inter-
pretability, this model relies on strict assumptions that may not always hold true. These include
assuming constant hazard ratios over time and depending on linear combinations of covariates,
which may fail to capture complex and nonlinear relationships in the data.

4.3.2. ML-based Models
Random Survival Forests [9] is a variation of the traditional random forest algorithm adapted for
survival analysis to predict the survival time of components. Instead of using a single decision
tree, RSFs generate multiple survival trees based on bootstrapped samples of the data. The final
survival prediction for a new observation is obtained by averaging the survival functions of all
the trees in the forest [9].
   Gradient Boosting, similar to Random Forests, is a powerful ensemble learning method known
for its strong performance across various applications. In survival analysis, this technique,
referred to as gradient boosting survival analysis, is adapted to predict survival times[33].
   Survival Support Vector Machines are an adaptation of the traditional support vector machine
framework explicitly designed for survival analysis. They handle censored data by using kernel
functions, which allow for the efficient modeling of complex, high-dimensional feature spaces.
This enables SSVMs to provide accurate risk predictions and effectively manage censored data
in survival analysis [34].

4.3.3. Hyperparameter Tuning
The hyperparameters of the ML-based models are optimized using a grid search. A 5-fold
cross-validation is performed to select hyperparameters that generalize well to unseen data.
The set of hyperparameters for each model used in grid search and the best value for each is
summarized in Table 1.

Table 1
Results of hyperparameter tuning
                Model    Hyperparameters        Parameter space        Best value
                           n_estimator               [100, 400]            100
                           max_depth                [15, 20, 30]            30
                 RSF
                         min_sample_split           [30, 40, 50]            30
                         min_sample_leaf            [10, 20, 30]           20
                           n_estimator                [50, 100]             50
                           max_depth                    [5, 6]              5
                GBSA     min_sample_split              [30, 50]             50
                         min_sample_leaf               [10, 20]            20
                          learning_rate                [0.5, 1]            0.5
                              kernel          [rbf, linear, sigmoid]     linear
                SSVM          alpha             [0.0001, 3, 7, 10]           3
                             gamma                   [0, 0.5, 1]            0


4.4. Surrogate Model
In this study, a Random Forest (RF) regression model is employed as the surrogate function.
The RF regression model is trained on the RSF output, providing point predictions, which is
compatible with SHAP analysis.

4.5. Evaluation
This study uses the C-index as the assessment metric for survival analysis models due to its
effectiveness in gauging predictive performance within survival analysis tasks. The C-index
evaluates a model’s predictive accuracy in survival analysis by assessing its ability to correctly
rank pairs of samples based on their survival times. To accommodate censored observations,
the index is computed by summing the concordance values for all compatible pairs and dividing
by the total number of such pairs. This comprehensively evaluates the model’s capability to
accurately predict event order. A higher C-index denotes superior predictive accuracy, with a
score of 1 signifying perfect prediction and 0.5 representing random guessing [35].

4.6. SHAP Analysis
In this study, various SHAP analysis tools are utilized to interpret the model’s predictions. For
global explanations, SHAP summary plots and dependency plots are used. For local explanations,
SHAP force plots are employed.
5. Results
5.1. Performance evaluation
To assess the models’ performance on an unseen dataset, each model was trained on the entire
training dataset and then tested on the test dataset, demonstrating their generalization capabili-
ties. Additionally, the training score for each model is reported to ensure that no overfitting
has occurred. As shown in Table 2, the RSF model outperformed not only the traditional CPH
but also two other ML models, namely GBSA and SSVM. This superior performance of the RSF
model highlights its robustness and effectiveness in accurately predicting survival outcomes
compared to the other models evaluated.

Table 2
The performance of different models
                          Model:       RSF      GBSA     SSVM      CPH
                          C-index
                                      0.7376   0.7297    0.6096   0.7210
                          (train)
                          C-index
                                      0.7577   0.7434    0.6259   0.6986
                           (test)

   Additionally, the RSF survival probability curves for ten randomly selected instances are
depicted in Figure 2, providing essential insights into the model’s predictive behavior for further
analysis. Lower survival probabilities (e.g., instance 460) indicate a higher risk of failure, while
higher survival probabilities (e.g., instance 4699) denote healthier components with lower failure
risks. Analyzing these curves allows for observation of the predicted durability and risk profiles
of the selected trucks.

5.2. SHAP Analysis
The following sections discuss the results of the global explanation of the RSF prediction, as
well as the local explanation for the three examples of trucks with different survival behaviors.

5.2.1. Global Explanations
SHAP summary plot Figure 3a illustrates the summary plot of SHAP values of 4710 instances
(test dataset). This plot provides a global explanation of the surrogate model output across
trucks of the test dataset. Each point in the plot corresponds to an individual truck. Their
position along the x-axis represents the SHAP value, indicating the impact of that feature on
the model’s prediction for that specific truck. Features are ordered along the y-axis by their
importance, determined by the mean of their absolute SHAP values. Thus, features higher on
the plot are more significant to the model’s overall predictions. The plot displays the SHAP
values of every important feature and their impacts on the model output. The vertical axis
shows the 30 most important features out of a total of 104 features, arranged in descending order
of importance. Additionally, each feature is represented by a line extending from negative to
positive SHAP values, color-coded with red indicating higher feature values and blue indicating
Figure 2: RSF Survival probability plot of 10 randomly selected instances


lower feature values. The negative zone represents a tendency towards censored events (y = 0),
while the positive zone indicates a tendency towards failure occurrences (y = 1). In addition,
the most important features not only have higher mean SHAP values but also exhibit a longer
range of SHAP values along the x-axis. This indicates that these features have a more significant
and varied impact on the model’s predictions across different instances. As shown in Figure 3a
features 666_0 and 309_0 are the most important features with the highest mean SHAP values.
As the color code indicates, the higher amounts (red color) of features 666_0, 309_0, 158_8,
and 837_0 and the lower amount (blue color) of feature 167_3 are associated with a greater
likelihood of failure occurrence (higher SHAP value). The same interpretation applies to the
rest of the features in this plot as well.
   Figure 3b represents the mean SHAP value for each feature, allowing for a comparison
between the features. For instance, feature 666_0 exhibits the highest impact on the model
output, which is twice as influential as feature 272_2 and three times as influential as feature
158_1.

SHAP dependence plot It is a valuable tool provided by SHAP analysis. It shows how features
influence the model’s output and shows interactions between features. Figure 4 illustrates a
SHAP force plot for the four most significant features in our study. These plots demonstrate the
general influence of feature A (color-coded on the right y-axis) on feature B (x-axis for feature
B’s value and left y-axis for its SHAP value) across multiple instances. This plot highlights how
variations in feature A affect the contribution of feature B to the model’s predictions, illustrating
the combined effects of these features on the overall model output. In addition, it identifies the
turning points where the feature value results in a zero SHAP value, indicating a neutral impact
          (a) SHAP summary plot                                    (b) Mean SHAP value

Figure 3: Global explanation of SHAP analysis on the RSF output (for all 4710 instances)


on the model’s prediction at those specific values. For example, Figure 4a shows that when the
feature 666_0 value surpasses 0.1e6 on the x-axis, the corresponding SHAP value turns positive.
This indicates that beyond this threshold, feature 666_0 contributes to predicting failure events
in the model’s output. Regarding feature interaction, when feature 272_1 has a higher value
(indicated by red color), the SHAP value associated with feature 666_0 moves closer to zero.
This means that a high value of 272_1 reduces the impact of 666_0 on the prediction output,
regardless of whether the effect of 666_0 is positive or negative. This interpretation extends to
the other features shown in the Figure 4.

5.2.2. Local Explanations
The SHAP force diagram is a valuable tool that provides local explanations for individual
predictions. It clarifies the influence of each feature on the model’s prediction for a particular
instance, illustrating both the direction and magnitude of each feature’s impact. The blue arrows
              (a) Feature 666_0                                       (b) Feature 309_0


              (c) Feature 158_8                                       (d) Feature 837_0

Figure 4: SHAP dependence plots for the first four most influential features on the prediction output


pointing to the left indicate that lower values of certain features are associated with a lower
model output (e.g., 𝑦 = 0 indicating no failure occurred). Conversely, red arrows pointing to
the right indicate that higher values of these features correspond to a higher model output (e.g.,
𝑦 = 1 indicating a failure occurred).


Figure 5: SHAP force plot, instance 4699, a Low-risk component.


   Figures 5, 6 and 7 depict instances classified as low risk, medium risk, and high risk, respec-
tively, based on their survival probability curves (see Figure 2). Interestingly, these instances
share a common set of influential features, including 666_0, 309_0, 158_8, 837_0, and 167_3,
Figure 6: SHAP force plot, instance 120, a medium-risk component.


Figure 7: SHAP force plot, instance 460, a high-risk component.


as identified by SHAP global analysis (Figure 3). In addition to these common features, each
instance exhibits specific features that are particularly influential. For example, in the low-risk
instance (see Figure 5) feature 459_12 is important, and in the high-risk instance (see Figure 7)
feature 397_4 is identified as a key feature. Addressing these features is crucial for mitigating
potential failures or adverse events in maintenance scenarios.


6. Conclusion
In this paper, we tackled the challenge of incorporating explainability to survival models using
real-world data, which has more than 90% censored enteries, from truck engine components
manufactured by SCANIA AB. We evaluated the performance of three machine learning-based
survival analysis models against the traditional Cox Proportional Hazards model for predicting
the remaining useful life of truck components. The RSF model emerged as the best-performing
model. To address the inherent black-box nature of ML-based survival analysis models, we
utilized SHAP analysis, providing both global and local insights into feature importance and
interactions. To make the RSF output compatible with SHAP analysis, first, a surrogate model
was applied to the RSF output. Subsequently, SHAP analysis was exclusively applied to the
surrogate model. This comprehensive approach not only identified key factors affecting model
predictions but also demonstrated the potential of SHAP analysis in making complex models
more transparent and understandable. Our work, in fact, can be considered as one of the first
attempts to integrate XAI techniques into survival analysis, which in turn can enhance trust in
predictive models and provide invaluable support for decision-making in real-world industrial
scenarios.
   Future work could investigate the performance of other machine learning models, particularly
Deep Learning-based approaches, to enhance predictions of remaining useful life for heavy
vehicle components. Additionally, exploring changes in feature importance over time, by
utilizing all operational readouts could provide insights into the dynamic nature of predictive
features. Engaging domain experts, such as maintenance engineers or equipment manufacturers,
in future studies, would ensure that the models incorporate relevant domain knowledge and
meet practical requirements for predictive maintenance applications.


7. Acknowledgments
This work has been partially funded by Scania CV AB and the Vinnova program for Strategic
Vehicle Research and Innovation (FFI) through the project RAPIDS (grant no. 2021-02522).


References
 [1] Z. Li, K. Wang, Y. He, Industry 4.0-potentials for predictive maintenance, in: 6th inter-
     national workshop of advanced manufacturing and automation, Atlantis Press, 2016, pp.
     42–46.
 [2] S. Pashami, S. Nowaczyk, Y. Fan, J. Jakubowski, N. Paiva, N. Davari, S. Bobek, S. Jamshidi,
     H. Sarmadi, A. Alabdallah, et al., Explainable predictive maintenance, arXiv preprint
     arXiv:2306.05120 (2023).
 [3] P. Wang, Y. Li, C. K. Reddy, Machine learning for survival analysis: A survey, ACM
     Computing Surveys (CSUR) 51 (2019) 1–36.
 [4] D. R. Cox, Regression models and life-tables, Journal of the Royal Statistical Society: Series
     B (Methodological) 34 (1972) 187–202.
 [5] B. Hrnjica, S. Softic, The survival analysis for a predictive maintenance in manufacturing,
     in: Advances in Production Management Systems. Artificial Intelligence for Sustainable
     and Resilient Production Systems: IFIP WG 5.7 International Conference, APMS 2021,
     Nantes, France, September 5–9, 2021, Proceedings, Part III, Springer, 2021, pp. 78–85.
 [6] D. Vallarino, Machine learning survival models restrictions: the case of startups time to
     failed with collinearity-related issues, Journal of Economic Statistics 1 (2023) 1–15.
 [7] V. Van Belle, K. Pelckmans, S. Van Huffel, J. A. Suykens, Support vector methods for
     survival analysis: a comparison between ranking and regression approaches, Artificial
     intelligence in medicine 53 (2011) 107–118.
 [8] J. L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, Y. Kluger, Deepsurv: personalized
     treatment recommender system using a cox proportional hazards deep neural network,
     BMC medical research methodology 18 (2018) 1–12.
 [9] H. Ishwaran, U. B. Kogalur, E. H. Blackstone, M. S. Lauer, Random survival forests (2008).
[10] M. Rahat, Z. Kharazian, Survloss: A new survival loss function for neural networks to
     process censored data, in: PHM Society European Conference, volume 8, 2024, pp. 7–7.
[11] V. Hassija, V. Chamola, A. Mahapatra, A. Singal, D. Goel, K. Huang, S. Scardapane, I. Spinelli,
     M. Mahmud, A. Hussain, Interpreting black-box models: a review on explainable artificial
     intelligence, Cognitive Computation 16 (2024) 45–74.
[12] S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, Advances
     in neural information processing systems 30 (2017).
[13] L. Cummins, A. Sommers, S. B. Ramezani, S. Mittal, J. Jabour, M. Seale, S. Rahimi, Explain-
     able predictive maintenance: A survey of current methods, challenges and opportunities,
     arXiv preprint arXiv:2401.07871 (2024).
[14] S. Matzka, Explainable artificial intelligence for predictive maintenance applications, in:
     2020 third international conference on artificial intelligence for industries (ai4i), IEEE,
     2020, pp. 69–74.
[15] S. Vollert, M. Atzmueller, A. Theissler, Interpretable machine learning: A brief survey
     from the predictive maintenance perspective, in: 2021 26th IEEE international conference
     on emerging technologies and factory automation (ETFA), IEEE, 2021, pp. 01–08.
[16] M. Kozielski, Contextual explanations for decision support in predictive maintenance,
     Applied Sciences 13 (2023) 10068.
[17] C. W. Hong, C. Lee, K. Lee, M.-S. Ko, K. Hur, Explainable artificial intelligence for the
     remaining useful life prognosis of the turbofan engines, in: 2020 3rd ieee international
     conference on knowledge innovation and invention (ickii), IEEE, 2020, pp. 144–147.
[18] A. Alabdallah, S. Pashami, T. Rögnvaldsson, M. Ohlsson, Survshap: a proxy-based algorithm
     for explaining survival models with shap, in: 2022 IEEE 9th international conference on
     data science and advanced analytics (DSAA), IEEE, 2022, pp. 1–10.
[19] C. Ferreira, G. Gonçalves, Remaining useful life prediction and challenges: A literature
     review on the use of machine learning methods, Journal of Manufacturing Systems 63
     (2022) 550–562.
[20] Y. Zhang, P. Tiňo, A. Leonardis, K. Tang, A survey on neural network interpretability,
     IEEE Transactions on Emerging Topics in Computational Intelligence 5 (2021) 726–742.
[21] T. Lindgren, O. Steinert, O. Andersson Reyna, Z. Kharazian, S. Magnusson, SCANIA Compo-
     nent X Dataset: A Real-World Multivariate Time Series Dataset for Predictive Maintenance,
     2024. URL: https://doi.org/10.58141/1w9m-yz81. doi:10.58141/1w9m-yz81.
[22] Z. Kharazian, T. Lindgren, S. Magnússon, O. Steinert, O. A. Reyna, Scania component x
     dataset: A real-world multivariate time series dataset for predictive maintenance, arXiv
     preprint arXiv:2401.15199 (2024).
[23] I. Etikan, S. Abubakar, R. Alkassim, The kaplan-meier estimate in survival analysis, Biom
     Biostat Int J 5 (2017) 00128.
[24] Z. Yang, J. Kanniainen, T. Krogerus, F. Emmert-Streib, Prognostic modeling of predictive
     maintenance with survival analysis for mobile work equipment, Scientific Reports 12
     (2022) 8529.
[25] S. Voronov, E. Frisk, M. Krysander, Data-driven battery lifetime prediction and confidence
     estimation for heavy-duty trucks, IEEE Transactions on Reliability 67 (2018) 623–639.
[26] M. Rahat, Z. Kharazian, P. S. Mashhadi, T. Rögnvaldsson, S. Choudhury, Bridging the gap:
     A comparative analysis of regressive remaining useful life prediction and survival analysis
     methods for predictive maintenance, in: PHM Society Asia-Pacific Conference, volume 4,
     2023.
[27] R. Csalódi, Z. Bagyura, J. Abonyi, Mixture of survival analysis models-cluster-weighted
     weibull distributions, IEEE Access 9 (2021) 152288–152299.
[28] A. Kapuria, D. G. Cole, Integrating survival analysis with bayesian statistics to forecast the
     remaining useful life of a centrifugal pump conditional to multiple fault types, Energies
     16 (2023) 3707.
[29] A. Moncada-Torres, M. C. van Maaren, M. P. Hendriks, S. Siesling, G. Geleijnse, Explainable
     machine learning can outperform cox regression predictions and provide insights in breast
     cancer survival, Scientific reports 11 (2021) 6968.
[30] A. Sarica, F. Aracri, M. G. Bianco, F. Arcuri, A. Quattrone, A. Quattrone, A. D. N. Initiative,
     Explainability of random survival forests in predicting conversion risk from mild cognitive
     impairment to alzheimer’s disease, Brain Informatics 10 (2023) 31.
[31] R. Passera, S. Zompi, J. Gill, A. Busca, Explainable machine learning (xai) for survival
     in bone marrow transplantation trials: A technical report, BioMedInformatics 3 (2023)
     752–768.
[32] M. J. Bradburn, T. G. Clark, S. B. Love, D. G. Altman, Survival analysis part ii: multivariate
     data analysis–an introduction to concepts and methods, British journal of cancer 89 (2003)
     431–436.
[33] T. Hothorn, P. Bühlmann, S. Dudoit, A. Molinaro, M. J. Van Der Laan, Survival ensembles,
     Biostatistics 7 (2006) 355–373.
[34] S. Pölsterl, N. Navab, A. Katouzian, An efficient training algorithm for kernel survival
     support vector machines, arXiv preprint arXiv:1611.07054 (2016).
[35] I. Vasilev, M. Petrovskiy, I. Mashechkin, Sensitivity of survival analysis metrics, Mathe-
     matics 11 (2023) 4246.