=Paper=
{{Paper
|id=Vol-3741/paper21
|storemode=property
|title=A Comparative Assessment of eXplainable AI Tools in Predicting Hard Disk Drive Health
|pdfUrl=https://ceur-ws.org/Vol-3741/paper21.pdf
|volume=Vol-3741
|authors=Flora Amato,Antonino Ferraro,Antonio Galli,Valerio La Gatta,Francesco Moscato,Vincenzo Moscato,Marco Postiglione,Carlo Sansone,Giancarlo Sperlì
|dblpUrl=https://dblp.org/rec/conf/sebd/AmatoFGG0MPSS24
}}
==A Comparative Assessment of eXplainable AI Tools in Predicting Hard Disk Drive Health==
<pdf width="1500px">https://ceur-ws.org/Vol-3741/paper21.pdf</pdf>
<pre>
                                A Comparative Assessment of eXplainable AI Tools in
                                Predicting Hard Disk Drive Health⋆
                                (extended abstract)

                                Flora Amato1 , Antonino Ferraro1 , Antonio Galli1,* , Valerio La Gatta1 ,
                                Francesco Moscato2 , Vincenzo Moscato1 , Marco Postiglione1 , Carlo Sansone1 and
                                Giancarlo Sperlì1
                                1
                                    Department of Electrical Engineering and Information Technology, University of Naples Federico II, Italy
                                2
                                    Department of Information Engineering, Electrical Engineering and Applied Mathematics, University of Salerno, Italy


                                               Abstract
                                               In addressing the challenge of optimizing maintenance operations in Industry 4.0, recent efforts have
                                               focused on predictive maintenance frameworks. However, the effectiveness of these frameworks, largely
                                               relying on complex deep learning models, is hindered by their lack of explainability. To address this, we
                                               employ eXplainable Artificial Intelligence (XAI) methodologies to make the decision-making process more
                                               understandable for humans. Our study, based on a previous work, specifically explores explanations
                                               for predictions made by a recurrent neural network-based model designed for a three-dimensional
                                               dataset, used to estimate the Remaining Useful Life (RUL) of Hard Disk Drives (HDDs). We compare the
                                               explanations provided by different XAI tools, emphasizing the utility of global and local explanations in
                                               supporting predictive maintenance tasks. Using the Backblaze Dataset and a Long Short-Term Memory
                                               (LSTM) prediction model, our developed explanation framework evaluates Local Interpretable Model-
                                               Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) tools. Results show that SHAP
                                               outperforms LIME across various metrics, establishing itself as a suitable and effective solution for HDD
                                               predictive maintenance applications.

                                               Keywords
                                               eXplainable Artificial Intelligence, Predictive Maintenance, LSTM-based model, Deep Learning


                                1. Introduction
                                Over the past decade, numerous companies have increasingly turned their attention to Ar-
                                tificial Intelligence (AI) and Machine Learning (ML) techniques. This shift is driven by the
                                potential of these technologies to design models that support practitioners across various tasks,
                                leveraging abundant data. Notable applications include predictive maintenance [1], product
                                recommendation [2], and labor market analysis [3].
                                   A paradigm shift is evident towards the adoption of more sophisticated models based on
                                Deep Learning (DL) [4, 5]. This transition is fueled by their enhanced accuracy in handling

                                SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy
                                *
                                 Corresponding author.
                                $ flora.amato@unina.it (F. Amato); antonino.ferraro@unina.it (A. Ferraro); antonio.galli@unina.it (A. Galli);
                                valerio.lagatta@unina.it (V. L. Gatta); fmoscato@unisa.it (F. Moscato); vincenzo.moscato@unina.it (V. Moscato);
                                marco.postiglione@unina.it (M. Postiglione); carlo.sansone@unina.it (C. Sansone); giancarlo.sperli@unina.it
                                (G. Sperlì)
                                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
larger datasets, facilitated by advancements in computing power, particularly attributed to the
evolution of Graphics Processing Units (GPUs).
   Recent research efforts, exemplified by studies such as [6], underscore a significant industry
challenge—the maintenance of technological equipment.
   Today, despite the ongoing shift towards Industry 4.0 and the emerging Industry 5.0, many
companies still rely on periodic and corrective maintenance strategies [7]. Industry 5.0 in-
troduces a novel manufacturing paradigm emphasizing collaboration between machines and
humans to enhance efficiency, productivity, and worker well-being [8]. This paradigm shift
involves combining human and equipment capabilities, creating digital twins of entire systems,
and implementing artificial intelligence for automatic and efficient industrial processes [9, 10].
   In dynamic industrial settings, there is a rising demand for automated predictive maintenance
systems analyzing extensive data volumes through condition monitoring [11]. Predictive
maintenance aims to optimize costs by maximizing equipment’s Remaining Useful Life (RUL),
offering a potential return on investment up to 100% and reducing correction costs by up to
60% [12, 13]. Approaches for predictive maintenance are categorized into three groups: physical
model-based, data-driven, and hybrid [14]. Physical model-based approaches face challenges in
modeling complex systems, while data-driven methods learn system behavior from historical
data. Hybrid methods combine both approaches [15].
   In recent years, the widespread use of deep learning (DL) models in various industrial
applications, such as fault diagnosis [16], classification [17], and predicting industrial Key
Performance Indicators (KPIs) [18], has been fueled by increased computing power. Despite
their impressive results, these models, often considered as black boxes, face resistance due to the
need for interpretability, tractability, and reliability in line with the demand for ethical AI [19]. In
the context of Industry 5.0, marked by collaborative efforts between machines and humans [20],
the explanation of AI model predictions (explainability) becomes crucial. This has given rise
to eXplainable Artificial Intelligence (XAI), defined as systems capable of elucidating decision
logic, revealing strengths and weaknesses in decision-making, and offering insights into future
behavior [21]. A significant predictive maintenance task involves estimating the Remaining
Useful Life (RUL) of Hard Disk Drives (HDDs) [22], crucial for data centers. In this study, we
conduct a systematic evaluation of XAI methodologies to explain predictions made by a Long
Short-Term Memory (LSTM)-based model assessing HDD health. Despite its superior accuracy,
precision, and recall [23], the LSTM model lacks explainability due to its reliance on a three-
dimensional dataset (𝑥𝑠𝑎𝑚𝑝𝑙𝑒𝑠 , 𝑦𝑡𝑖𝑚𝑒𝑠𝑡𝑒𝑝𝑠 , 𝑧𝑓 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 ), combining spatial and temporal features.
   This paper represents an extended abstract of a recent proposal [24], in which the authors
present an explanation framework that evaluates the effectiveness of XAI tools, focusing on
Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations
(SHAP), using the Backblaze dataset. This effort represents one of the first attempts to eval-
uate the practical utility of XAI tools in real application contexts, both methodologically and
operationally.
   The structure of the paper is as follows: Section 2 presents a systematic overview of XAI
tools. The proposed framework, consisting of two modules (prediction and explanation), is
detailed in Section 3. Main findings regarding the explanation of the prediction module on the
Backblaze dataset using LIME and SHAP are discussed in Section 4, along with their empirical
evaluation. Section 5 concludes the paper and suggests possible future directions.
2. Comprehensive Analysis of XAI
Three fundamental concepts have been introduced to support XAI methodologies: Interpretabil-
ity, which entails the ability to explain in terms understandable to humans [25]; Explainability,
associated with the role of explanation as a bridge between humans and decision-makers [26];
and Transparency, indicating inherent understandability [27]. A clear distinction is evident be-
tween models designed for interpretation (transparent models) and those necessitating external
XAI techniques for explanation (post-hoc models).
   In the first category, encompassing three levels [25], each level includes its predecessors:
Algorithmic transparency involves the user’s understanding of the model’s process to generate
output data from its input; Decomposability pertains to the ability to explain each component of
the model, including input, parameters, and calculations; and Simulability refers to the model’s
ability to be simulated. The post-hoc techniques can be categorized as model agnostic or specific,
depending on whether they are model-dependent. The former may involve model simplification,
local explanation, feature relevance estimation, and visualization techniques [25].
   Most techniques for simplification rely on rule extraction, with notable examples being
LIME [28] and Anchors [29]. In particular, LIME builds local linear models around predictions
of an opaque model, explaining it.
   The second category aims to describe the behavior of black-box models by classifying or
measuring the influence, relevance, or importance of each feature in the model’s prediction.
Noteworthy algorithmic approaches in this category include SHAP [30] and Partial Dependence
Plot (PDP). In particular, SHAP computes an additive feature importance score for a particular
prediction with desired properties.The third category comprises visual explanation techniques,
generating visualizations from only the inputs and outputs of a black-box model.
   We explore two crucial XAI tools, LIME and SHAP, capable of handling three-dimensional
datasets. Our objective is to aid practitioners in the decision-making process, enhancing the
comprehension of AI model outputs. Specifically, we seek to elucidate predictions made by the
LSTM-based model for assessing HDD health status using these tools, which offer both global
and local explanations, highlighting the key features influencing the predictions.


3. Framework
The rapid growth in technology services has escalated the demand for archive space, making
Hard Disk Drives (HDDs) the primary storage solution in data centers. This shift has increased
the risk of downtime, data loss, and unavailability in data centers. Predicting the health status
of HDDs is crucial for optimizing maintenance strategies, reducing costs, and extending the
HDDs’ Remaining Useful Life (RUL). Commonly, health status prediction relies on analyzing
Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) attributes, often implemented
through complex deep-based models. However, their black-box nature poses challenges in
understanding predictions.
   Our focus is on investigating eXplainable Artificial Intelligence (XAI) techniques for LSTM-
based models applied to real-world scenarios, specifically HDDs’ health status prediction.
The complexity of these models necessitates XAI tools, such as LIME and SHAP, to provide
explanations for predictions. The designed framework for HDDs’ Remaining Useful Life (RUL)
estimation utilizes an LSTM-based model, focusing on the analysis of dependencies between
S.M.A.R.T. attributes over time for multi-class health status prediction. In particular, it is
composed by two modules: i) Prediction Module; ii) Explanation Module. Finally, the three-
dimensional dataset employed is explained solely by LIME and SHAP.

3.1. Prediction module
The prediction module utilizes a LSTM-based model from [23], consisting of two stacked LSTM
layers with 128 units, followed by a dense layer with a unit count equal to the number of classes
and softmax activation. This model exploits temporal dependencies in S.M.A.R.T. features over
a time-window to predict HDD health status across four classes (Alert, Warning, Very Fair, and
Good). The input to each LSTM layer is a three-dimensional data structure with dimensions
(𝑧, 𝑤, 𝑛), where 𝑤, 𝑧, and 𝑛 represent the time window size, total number of sequences, and
features, respectively. The model predicts HDD health status at time 𝑡 + 1 as a multi-class
classification task, assigning each feature sequence to one of the classes (health levels) based on
the sequence (𝑎𝑡−𝑤+1 , · · · , 𝑎𝑡−1 , 𝑎𝑡 ).

3.2. Explanation module
The explanation module seeks to identify features influencing the model’s decision, especially
in predicting false positives or misclassifications. Given the multidimensional nature of the
problem, two XAI tools (SHAP and LIME) were concurrently applied for the task to compare
their explanations. SHAP employs Shapley values, derived from cooperative game theory, to
evaluate each feature’s contribution to the prediction. Utilizing the DeepExplainer explainer,
based on 4, 000 samples and the trained model, SHAP approximates conditional expectations,
providing both global and local explanations.
   In contrast, LIME explains the model by observing how predictions change with perturbed
data. The RecurrentTabularExplainer explainer, an extension of LimeTabularExplainer for 3𝐷
data, used the entire training set for input, producing local explanations. Unlike SHAP, LIME
allows input datasets larger than 5, 000 elements and calculates feature relevance through
locally weighted linear models.
   While SHAP offers both global and local explanations, LIME focuses on local explanations.
SHAP values can be aggregated (mean or median) for a global representation by comparing
features across all dataset instances.


4. Experimental Results
In this section, we explore the evaluation conducted using SHAP and LIME on the LSTM-based
model to explain predictions regarding HDDs’ health status, as described in Section 3.1. We
selected this model due to its superior performance across various metrics in this task. The
training process involved a maximum of 150 epochs, a batch size of 500, and a learning rate of
0.001, employing Adam [31] as the optimizer. Detailed results for each class, along with overall
outcomes based on Macro averaging, are presented in Table 1.
                        Metric      Good     Very Fair   Warning   Alert    Overall
                        Accuracy    99.21%   87.80%      78.10%    84.42%   98.45%
                        Precision   99.90%   74.40%      71.80%    75.50%   98.33%
                        Recall      99.10%   89.60%      85.40%    82.00%   98.34%
                        F1          99.50%   81.30%      78.00%    78.60%   91.48%

Table 1
Results of the model on the Backblaze dataset detailed by each class. The values in the Overall column
are computed according to the Macro averaging measure.


Figure 1: SHAP - Summary Bar Plot


4.1. SHAP
SHAP is the initial framework employed to explore the explanation task regarding HDD health
status assessment. It offers diverse analyses, including Summary bar plot, Summary plot, and
Dependence plot, providing both global and local explanations.

4.1.1. Global explanation
This analysis concentrates on the Summary Bar Plot, offering a global explanation to discern
features influencing the model’s performance based on their Shapley values. The absolute
Shapley values per feature (𝐼𝑗 ), representing S.M.A.R.T. attributes for a single HDD within a
time window, are summed over 𝑛 samples and sorted by decreasing importance.
   Figure 1 illustrates the importance of SHAP features for the four predicted classes. Notably,
Power On Hours (POH ) emerges as the most critical feature, followed by Temperature Celsius
(TC), Seek Error Rate (SER), and Spin Up Time (SUT). The analysis highlights how TC becomes
increasingly significant as HDD status deteriorates, particularly in alert and warning classes.
This investigation focuses on correctly classified samples to ensure the accuracy of the analysis.
                                                                                                                                                          0.2425       0.2450       0.2475     0.2500       0.2525    0.2550   0.2575

                                     0.23            0.24          0.25               0.26             0.27                           PowerOnHours                                       (0.091)
          TemperatureCelsius                                              (-0.5)                                                 TemperatureCelsius                                                     (-0.333)
           RawReadErrorRate                                                          (0.882)                                          SeekErrorRate                             (0.475)
               PowerOnHours                                            (0.455)                                                    RawReadErrorRate                                             (0.647)
               SeekErrorRate                                        (0.443)                                                              SpinUpTime                                              (-0.111)
                  SpinUpTime                                         (-0.333)                                                          HighFlyWrites                                              (1)
                HighFlyWrites                                         (1)                                                ReportedUncorrectableErrors                                               (1)
  ReportedUncorrectableErrors                                         (1)                                                RawReallocatedSectorsCount                                                (-1)
                                                                                                                       RawCurrentPendingSectorCount                                                (-1)
  RawReallocatedSectorsCount                                          (-1)
                                                                                                                                                          0.2425       0.2450       0.2475     0.2500       0.2525    0.2550   0.2575
RawCurrentPendingSectorCount                                          (-1)                                                                                                           Model output value
                                     0.23            0.24          0.25               0.26             0.27
                                                            Model output value
                                                                                                                      (b) Decision Plot: 𝐴𝑙𝑒𝑟𝑡𝑚𝑖𝑠𝑐𝑙𝑎𝑠𝑠𝑖𝑓 𝑖𝑒𝑑 , element
  (a) Decision Plot: 𝐴𝑙𝑒𝑟𝑡𝑐𝑜𝑟𝑟𝑒𝑐𝑡 , element 8223                                                                          8371

Figure 2: Decision plot for Alert class by using SHAP
                                                                                                                                                                                                                                   f(x)

                                                                                                              f(x)               PowerOnHours = 0.091                                                              +0.01
                                                                                                                           TemperatureCelsius = 0.333                    +0.01
        TemperatureCelsius = 0.5                                                   +0.02
                                                                                                                                 SeekErrorRate = 0.475             0
        RawReadErrorRate = 0.882                   +0.01
                                                                                                                              RawReadErrorRate = 0.647                          0
            PowerOnHours = 0.455     +0
            SeekErrorRate = 0.443    0                                                                                            SpinUpTime = 0.111

             SpinUpTime = 0.333                                                                                                       HighFlyWrites = 1

                 HighFlyWrites = 1                                                                                      ReportedUncorrectableErrors = 1

   ReportedUncorrectableErrors = 1                                                                                     RawReallocatedSectorsCount = 1

  RawReallocatedSectorsCount = 1                                                                                     RawCurrentPendingSectorCount = 1

RawCurrentPendingSectorCount = 1                                                                                                                           0.244 0.246 0.248 E[f(X)]                0.252 0.254 0.256             0.259
                                                                                                                                                                                             Model output
                                         E[f(X)]      0.255       0.260            0.265       0.270      0.274
                                                                 Model output
                                                                                                                     (b) Waterfall Plot: 𝐴𝑙𝑒𝑟𝑡𝑚𝑖𝑠𝑐𝑙𝑎𝑠𝑠𝑖𝑓 𝑖𝑒𝑑 , element
 (a) Waterfall Plot: 𝐴𝑙𝑒𝑟𝑡𝑐𝑜𝑟𝑟𝑒𝑐𝑡 , element 8223                                                                         8371

Figure 3: Waterfall plot for Alert class by using SHAP


4.1.2. Local explanation
For local explanations, SHAP provides different types of plots (Single element Decision Plot,
Waterfall Plot), that have been applied to each HDDs’ health level status for explaining prediction
module’s output. The plots related to the class Alert are reported and discussed below.
    Illustrated in Figure 2a, the central vertical line in the Decision plot signifies the model’s base
value. From the plot’s bottom, the prediction line depicts the aggregation of Shapley values
(i.e., feature effects) from the base value to the ultimate model score at the top. Each feature is
denoted with its value in brackets, and the slope represents the contribution of that feature to
the prediction.
    Comparing Figure 2a and 2b, it’s evident that sample 8371 is misclassified as belonging to
the Alert class due to the model heavily relying on TC and POH features for this classification.
    Moreover, Waterfall plots (refer to Figure 3a and 3b) are tailored for individual prediction
explanations, expecting a single row of an Explanation object as input. The bottom of the
Waterfall plot starts with the model’s expected output, and each subsequent row illustrates
how the positive (red) or negative (blue) contribution of each feature shifts the value from
the expected output to the model’s actual output. In this context, the Waterfall plot not only
provides more information but also enhances clarity regarding the contributions of each feature.
4.2. LIME
In this section we investigate the prediction of HDD’s health status by using LIME explainability
framework. In Figure 4a, we employ LIME to analyze model prediction for sample 8223,
displaying the contribution of all features at each time instant. The features positively influencing
Alert class prediction include 𝑇 𝐶𝑡−6 , 𝑃 𝑂𝐻𝑡−13 , 𝑅𝑅𝐸𝑅𝑡−11 , 𝑇 𝐶𝑡−2 , 𝑇 𝐶𝑡−1 , and 𝑅𝑅𝐸𝑅𝑡−2 .
The explanation for misclassification of sample 8371 in Figure 4b reveals POH and TC across
different time instants as the most confusing features.


  (a) LIME Plot 1: 𝐴𝑙𝑒𝑟𝑡𝑐𝑜𝑟𝑟𝑒𝑐𝑡 , element 8223       (b) LIME Plot 1: 𝐴𝑙𝑒𝑟𝑡𝑚𝑖𝑠𝑐𝑙𝑎𝑠𝑠𝑖𝑓 𝑖𝑒𝑑 , element 8371

Figure 4: LIME plot for Alert class.


4.3. Quantitative Evaluation
In this section, we conduct an empirical evaluation using the axiomatic explanation consistency
framework [32]. The framework consists of two steps: (1) axiomatic and (2) explanation
consistency. This involves computing metrics such as Identity, Stability, and Separability on test
sets by explaining different objects with their corresponding predictions multiple times.
   Table 2 displays the results for each metric on the test sets, representing the percentage of
instances satisfying each defined metric. Green highlights the highest performance, while red
indicates the lowest. LIME shows poor performance in the Identity metric due to the uniform
and random sample technique, unlike SHAP, which satisfies the identity metric for all instances.
LIME outperforms SHAP in the Stability metric with 95.5% compared to 85.5%. Both tools
achieve the maximum result (100%) for the Separability metric, though this axiom may not be
significant due to the non-linear nature of the problem.
   Table 3 evaluates the tools’ performance in terms of confidence intervals, employing a boot-
strap procedure. The analysis includes investigating feature contributions to model predictions
and comparing results with a white-box model’s ground truth.
                                            LSTM - Backblaze data-set
                                                     LIME                  SHAP
                             Identity                  0%                  100%
                             Stability               95.5%                 85.5%
                             Separability             100%                 100%

Table 2
Evaluation of interpretability frameworks on Backblaze data-set

                          Features                       LIME          SHAP
                          SpinUpTime                     0.58±0.009    0.001±0.005
                          RawReallocatedSectorsCount     0.345±0.007   -0.001±0.008
                          RawReadErrorRate               0.234±0.006   -0.001±0.005
                          HighFlyWrites                  0.082±0.004   0.001±0.007
                          RawCurrentPendingSectorCount   0.058±0.004   -0.001±0.023
                          SeekErrorRate                  0.052±0.004   -0.0004±0.0172
                          PowerOnHours                   0.028±0.038   0.001±0.009
                          ReportedUncorrectableErrors    0.035±0.027   0.001±0.005
                          TemperatureCelsius             0.04±0.003    0.001±0.004

Table 3
Mean and deviation standard of LIME and SHAP explanations.


5. Conclusions
The widespread use of deep neural networks presents challenges in result interpretation due
to their complex structures. Despite this, their high performance in critical applications like
predictive maintenance, necessitates eXplainable AI (XAI). LSTM-based models, designed for
learning long-term dependencies, are ideal for predictive maintenance tasks. This study focuses
on explaining predictions of a multi-class LSTM model assessing HDD health. With the three-
dimensional input data, LIME and SHAP were chosen as the primary XAI tools, handling such
data effectively. Comparison using invariance, separability, and stability metrics showed LIME
and SHAP reaching 0% and 100% for invariance, and both achieving 100% for separability.
LIME excelled in stability over SHAP (95% vs. 85.5%). While SHAP provides comprehensive
explanations, LIME’s RecurrentTabularExplainer specializes in recurrent networks, detailing
feature contributions across all time instances within a window. Yet, limitations in XAI tools’
completeness and correctness measures need addressing. Continuous user engagement is
crucial for evaluation, especially in tailoring explanations for different users. Concerns also
exist regarding model confidence and potential biases in the learning process.
   Future work will explore explanations for predictions from different deep networks in various
industrial applications, using diverse real-world datasets. Validating results with field experts
remains crucial for enhancing confidence in AI models through XAI.


Acknowledgments
We acknowledge financial support from the PNRR project “Future Artificial Intelligence Research
(FAIR)” – CUP E63C22002150007
References
 [1] D. Markudova, S. Mishra, L. Cagliero, L. Vassio, M. Mellia, E. Baralis, L. Salvatori, R. Loti,
     Preventive maintenance for heterogeneous industrial vehicles with incomplete usage data,
     Computers in Industry 130 (2021) 103468. doi:https://doi.org/10.1016/j.compind.
     2021.103468.
 [2] M.-C. Chiu, J.-H. Huang, S. Gupta, G. Akman, Developing a personalized recommendation
     system in a smart product service system based on unsupervised learning model, Comput-
     ers in Industry 128 (2021) 103421. doi:https://doi.org/10.1016/j.compind.2021.
     103421.
 [3] L. Malandri, F. Mercorio, M. Mezzanzanica, N. Nobani, Meet-lm: A method for embeddings
     evaluation for taxonomic data in the labour market, Computers in Industry 124 (2021)
     103341. doi:https://doi.org/10.1016/j.compind.2020.103341.
 [4] B. Mao, Z. M. Fadlullah, F. Tang, N. Kato, O. Akashi, T. Inoue, K. Mizutani, Routing or
     computing? the paradigm shift towards intelligent computer network packet transmission
     based on deep learning, IEEE Transactions on Computers 66 (2017) 1946–1960. doi:10.
     1109/TC.2017.2709742.
 [5] A. Diez-Olivan, J. Del Ser, D. Galar, B. Sierra, Data fusion and machine learning for
     industrial prognosis: Trends and perspectives towards industry 4.0, Information Fusion 50
     (2019) 92–111. doi:https://doi.org/10.1016/j.inffus.2018.10.005.
 [6] R. De Luca, A. Ferraro, A. Galli, M. Gallo, V. Moscato, G. Sperli, A deep attention based ap-
     proach for predictive maintenance applications in iot scenarios, Journal of Manufacturing
     Technology Management 34 (2023) 535–556.
 [7] V. J. Ramírez-Durán, I. Berges, A. Illarramendi, Towards the implementation of industry 4.0:
     A methodology-based approach oriented to the customer life cycle, Computers in Industry
     126 (2021) 103403. doi:https://doi.org/10.1016/j.compind.2021.103403.
 [8] X. Xu, Y. Lu, B. Vogel-Heuser, L. Wang, Industry 4.0 and industry 5.0—inception, conception
     and perception, Journal of Manufacturing Systems 61 (2021) 530–535. doi:https://doi.
     org/10.1016/j.jmsy.2021.10.006.
 [9] P. K. R. Maddikunta, Q.-V. Pham, P. B, N. Deepa, K. Dev, T. R. Gadekallu, R. Ruby,
     M. Liyanage, Industry 5.0: A survey on enabling technologies and potential appli-
     cations, Journal of Industrial Information Integration 26 (2022) 100257. doi:https:
     //doi.org/10.1016/j.jii.2021.100257.
[10] A. Du, Y. Shen, Q. Zhang, L. Tseng, M. Aloqaily, Cracau: Byzantine machine learning meets
     industrial edge computing in industry 5.0, IEEE Transactions on Industrial Informatics 18
     (2022) 5435–5445. doi:10.1109/TII.2021.3097072.
[11] L. Silvestri, A. Forcina, V. Introna, A. Santolamazza, V. Cesarotti, Maintenance transforma-
     tion through industry 4.0 technologies: A systematic literature review, Computers in Indus-
     try 123 (2020) 103335. doi:https://doi.org/10.1016/j.compind.2020.103335.
[12] C. Colemen, S. Damodaran, M. Chandramoulin, E. Deuel, Making maintenance smarter,
     Deloitte University Press (2017).
[13] Y. Lavi, The rewards and challenges of predictive maintenance, InfoQ(jul2018) (2018).
[14] L. Liao, F. Köttig, A hybrid framework combining data-driven and model-based methods
     for system remaining useful life prediction, Applied Soft Computing 44 (2016) 191–199.
     doi:https://doi.org/10.1016/j.asoc.2016.03.013.
[15] Z. Gao, C. Cecati, S. X. Ding, A survey of fault diagnosis and fault-tolerant techniques—part
     i: Fault diagnosis with model-based and signal-based approaches, IEEE transactions on
     industrial electronics 62 (2015) 3757–3767.
[16] S. Ma, F. Chu, Ensemble deep learning-based fault diagnosis of rotor bearing systems, Com-
     puters in Industry 105 (2019) 143–152. doi:https://doi.org/10.1016/j.compind.
     2018.12.012.
[17] Z. Li, Y. Wang, K. Wang, A deep learning driven method for fault classification and
     degradation assessment in mechanical equipment, Computers in Industry 104 (2019) 1–10.
     doi:https://doi.org/10.1016/j.compind.2018.07.002.
[18] Q. Sun, Z. Ge, Deep learning for industrial kpi prediction: When ensemble learning meets
     semi-supervised data, IEEE Transactions on Industrial Informatics 17 (2020) 260–269.
[19] J. Zhu, A. Liapis, S. Risi, R. Bidarra, G. Youngblood, Explainable ai for designers: A
     human-centered perspective on mixed-initiative co-creation, 2018 IEEE Conference on
     Computational Intelligence and Games (CIG) (2018) 1–8.
[20] S. Zeb, A. Mahmood, S. A. Khowaja, K. Dev, S. A. Hassan, N. M. F. Qureshi, M. Gidlund,
     P. Bellavista, Industry 5.0 is coming: A survey on intelligent nextg wireless networks as
     technological enablers, arXiv preprint arXiv:2205.09084 (2022).
[21] X.-H. Li, C. C. Cao, Y. Shi, W. Bai, H. Gao, L. Qiu, C. Wang, Y. Gao, S. Zhang, X. Xue,
     L. Chen, A survey of data-driven and knowledge-aware explainable ai, IEEE Transactions
     on Knowledge and Data Engineering (2020) 1–1. doi:10.1109/TKDE.2020.2983930.
[22] T. Zonta, C. A. da Costa, R. da Rosa Righi, M. J. de Lima, E. S. da Trindade, G. P. Li, Predictive
     maintenance in the industry 4.0: A systematic literature review, Computers & Industrial
     Engineering 150 (2020) 106889. doi:http://doi.org/10.1016/j.cie.2020.106889.
[23] A. De santo, A. Galli, M. Gravina, V. Moscato, G. Sperli, Deep learning for hdd health
     assessment: an application based on lstm, IEEE Transactions on Computers (2020) 1–1.
     doi:10.1109/TC.2020.3042053.
[24] A. Ferraro, A. Galli, V. Moscato, G. Sperlì, Evaluating explainable artificial intelligence
     tools for hard disk drive predictive maintenance, Artificial Intelligence Review 56 (2023)
     7279–7314.
[25] A. Barredo Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado,
     S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, F. Herrera, Explainable
     artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward
     responsible ai, Information Fusion 58 (2020) 82–115.
[26] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, D. Pedreschi, A survey of
     methods for explaining black box models, ACM Comput. Surv. 51 (2018). doi:10.1145/
     3236009.
[27] Z. C. Lipton, The mythos of model interpretability: In machine learning, the concept
     of interpretability is both important and slippery., Queue 16 (2018) 31–57. URL: https:
     //doi.org/10.1145/3236386.3241340. doi:10.1145/3236386.3241340.
[28] M. T. Ribeiro, S. Singh, C. Guestrin, "why should i trust you?": Explaining the predictions
     of any classifier, in: Proceedings of the 22nd ACM SIGKDD International Conference on
     Knowledge Discovery and Data Mining, KDD ’16, Association for Computing Machinery,
     New York, NY, USA, 2016, p. 1135–1144. URL: https://doi.org/10.1145/2939672.2939778.
     doi:10.1145/2939672.2939778.
[29] M. T. Ribeiro, S. Singh, C. Guestrin, Anchors: High-precision model-agnostic explanations,
     Proceedings of the AAAI Conference on Artificial Intelligence 32 (2018).
[30] S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, in:
     I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett
     (Eds.), Advances in Neural Information Processing Systems, volume 30, Curran Associates,
     Inc., 2017, pp. 4768–4777.
[31] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint
     arXiv:1412.6980 (2014).
[32] F. Doshi-Velez, B. Kim, Towards a rigorous science of interpretable machine learning,
     arXiv preprint arXiv:1702.08608 (2017).

</pre>