Infodeslib: Python Library for Dynamic Ensemble Learning using Late Fusion of Multimodal Data Firuz Juraev1 , Shaker El-Sappagh1,2 and Tamer Abuhmed1,* 1 College of Computing and Informatics, Sungkyunkwan University, South Korea 2 Faculty of Computer Science and Engineering, Galala University, Egypt Abstract There has been a notable increase in research focusing on dynamic selection (DS) techniques within the field of ensemble learning. This leads to the development of various techniques for ensembling multiple classifiers for a specific instance or set of instances during the prediction phase. Despite this progress, the design and development of DS approaches with late fusion settings and their explainability remain unexplored. This work proposes an open-source Python library, Infodeslib, to address this gap. The library provides an implementation of several DS techniques, including four dynamic classifier selections and seven dynamic ensemble selection techniques, all of which are integrated with late data fusion settings and novel explainability features. Infodeslib offers flexibility and customization options, making it a versatile tool for various complex applications that require the fusion of multimodal data and various explainability features. Multimodal data, which integrates information from diverse sources or sensor modalities, is a common and essential setting for real-world problems, enhancing the robustness and depth of data analysis. These data can be fused in two main ways: early fusion, where different modalities are combined at the feature level before model training, and late fusion, where each modality is processed separately and the results are combined at the decision level. The library is fully documented following the Read the Docs standards. The documentation, code, and examples are available anonymously on GitHub at https://github.com/InfoLab-SKKU/infodeslib. Keywords Ensemble of classifiers, Dynamic classifier selection, Dynamic ensemble selection, multimodal data fusion, Late fusion, Machine learning, Explainable AI, Python. Ensemble learning is a thriving domain within the fields feature groups to achieve greater diversity in the model pool. of machine learning and pattern recognition [1, 2]. With This diversity is crucial for constructing a robust ensemble all the diverse ensemble classifiers available, each classifier that can effectively generalize to previously unseen data. approaches the problem from a different perspective. The Moreover, late fusion provides more flexibility as classifiers main idea of ensemble learning is to leverage a group of are assigned to different modalities considering that certain classifiers to provide comprehensive coverage of the learned classifiers are best to model certain modalities [15]. task [3]. By utilizing diverse models that exhibit distinct In current literature, late fusion-based ensemble learning decision boundaries, ensemble learning seeks to maximize is solely available with static classifiers selection [16], and the accuracy and effectiveness of the overall classification most of these studies show the superiority of late fusion process. As a result, the performance of ensemble classi- compared to early fusion for static ensemble [17, 18, 19]. fiers is better than any of its base classifiers [4, 5]. This This motivates us to explore the performance of late fusion is because each base classifier concentrates on the specific in dynamic selection compared to early fusion; however, to region of the error space and combining the decisions of the best of our knowledge, no study or implementation has these classifiers improves the overall ensemble’s decisions. been conducted to examine the performance of late fusion Ensemble learning approaches can be broadly classified into in dynamic selection settings. This work aims to implement two categories: static and dynamic selection approaches different types of dynamic selection techniques in the late [6, 7]. In static selection [8, 9], a predetermined group of fusion setting. By doing so, we can explore the performance classifiers is selected, and this group is utilized to make of late fusion-based ensemble learning under dynamic se- decisions for each new test instance. In dynamic selection lection modeling, gaining a deeper understanding of its [10, 11, 12], a new group of classifiers is selected for each potential advantages and limitations. test instance, and this group is employed to make a decision Resulting late fusion-based dynamic ensembles are ex- for that specific instance. pected to improve the performance of the resulting clas- Since real-world datasets are often complex and consist sifiers. However, these models are black boxes and not of multiple feature groups or so-called ‘modalities’, ensem- understandable. Trustworthy classifiers that are applicable ble learning is a popular candidate to be used to combine in the real world need to be interpretable. Explainable AI multiple models to improve the performance and robustness (XAI) has gained significant attention in recent years [20], of predictive models. One approach to ensemble learning as it is crucial to provide insights into the decision-making is early fusion, where all modalities are merged in a pool process of machine learning models. However, despite the for the classifiers to capture the potential interaction and growing interest in this area, there is a lack of explainabil- interdependencies among the modalities using either static ity features for ensemble learning techniques, which are [13] or dynamic selection [14]. increasingly used in complex real-world applications to im- Another approach to ensemble learning is the late fu- prove the trustworthiness of resulting models. To the best sion or decision fusion, where each classifier in the pool of our knowledge, no study in the literature and no Python is trained with different feature groups or combinations of packages are provided to implement XAI capabilities for dynamic ensemble classifiers. This study aims to address KiL’24: Workshop on Knowledge-infused Learning co-located with 30th ACM KDD Conference, August 26, 2024, Barcelona, Spain this research gap by developing a Python package that of- * Corresponding author. fers novel explainability techniques for ensemble models, $ fjuraev@g.skku.edu (F. Juraev); shaker@skku.edu (S. El-Sappagh); making them accessible and informative for both domain tamer@skku.edu (T. Abuhmed) experts and developers. © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Table 1 Infodeslib implemented DS methods. 𝑝 = 𝑀 (𝑐1 (𝑓1 ), 𝑐2 (𝑓2 ), ..., 𝑐𝑚 (𝑓𝑚 )) (1) Technique Selection Reference Modified Rank (MR) DCS Sabourin et al. [21] The proposed concept of dynamic selection with late fu- Overall Local Accuracy (OLA) DCS Woods et al. [22] sion is illustrated in Figure 1, which outlines a framework Local Class Accuracy (LCA) DCS Woods et al. [22] consisting of three key stages: training, selection, and pre- Modified Local Accuracy (MLA) DCS P.C. Smits [23] DES-KNN DES Soares et al. [24, 25] diction. Additionally, the concept is detailed algorithmically K-Nearest Oracles Eliminate (KNORA-E) DES Ko et al. [10] in Algorithm 1. K-Nearest Oracles Union (KNORA-U) DES Ko et al. [10] Weighted KNORA-E (KNORA-E-W) DES Ko et al. [10] Training Phase. A pool of classifiers is selected and Weighted KNORA-U (KNORA-U-W) DES Ko et al. [10] assigned different feature sets. The classifiers within the DES Performance (DES-P) DES Woloszynski et al. [26] K-Nearest Output Profiles (KNOP) DES Cavalin et al. [27] pool are selected based on their diversity, ensuring a wide range of decision-making capabilities. Each feature set used by selected classifiers is extracted from the same modality The contributions of the study are as follows: to generate a homogeneous feature set. For example, in the medical domain, demographic and MRI features are • We extended the literature on dynamic ensemble different modalities that could be used to train two different modeling by implementing four dynamic classifier classifiers. Each classifier in the pool is then trained and selection techniques and seven dynamic ensemble optimized with its designated feature set, resulting in a pool selection techniques, incorporating a late fusion of of trained classifiers to be utilized in the next phases (1-4 multiple modalities (see Table 1). lines in Algorithm 1). • We propose three types of novel explainability that Selection Phase. During the selection phase (5-12 lines provide deep and suitable XAI for dynamic selection of Algorithm 1), a region of competence (RoC) is determined techniques: Case-Based Reasoning, deep-based clas- for a given new test instance by selecting the nearest sam- sifiers contributions, and local feature importance. ples from the validation data (DSEL). Subsequently, each • We compare the performance of the proposed tech- classifier in the pool is evaluated on the samples within the niques with existing approaches on four well-known RoC, and a measure of competence is calculated for each and real-world multimodal datasets: Alzheimer’s classifier. The specific method employed to compute the Disease Neuroimaging Initiative (ADNI), Credit competence varies depending on the chosen DS technique Card Clients, National Alzheimer’s Coordinating (9-10 lines in Algorithm 1). Once the competencies of each Center, and Parkinson’s Progression Markers Initia- classifier in the pool are calculated, the DS techniques use tive (PPMI). We also tested the proposed techniques their own selection criteria to identify the most competent in the Samarkand Neonatal Center dataset which is classifiers. These criteria are specific to each DS technique. collected by our team with the help of physicians. If no competent classifier satisfies the criteria for a given • The implemented techniques have been included DS technique, all classifiers in the pool are selected to make in a standard public library called ‘Infodeslib’ fol- the final decision. lowing the industry-standard PEP 8 coding guide- Prediction Phase. During the final phase, the selected lines, and Infodeslib is also clearly documented classifiers are utilized to predict the class of a given test in accordance with the Read the Docs standards: instance, and their individual predictions are combined to https://infodeslib.readthedocs.io/en/latest/ generate a final prediction. To provide more accurate deci- • We offer a wide range of valuable functions that en- sions, each of the selected classifiers could be weighed based able the assessment and evaluation of the excellence on its level of competence during the aggregation process and efficacy of the selected pool. (line 13 in Algorithm 1). The study is organized as follows. Section 1 highlights the software framework of the proposed late fusion dynamic 2. Installation and Usage ensemble learning. Section 2 presents Installation and Usage, Users can conveniently install the most recent version Section 4 discusses the performance analysis, and Section 5 of Infodeslib via pip, the Python package manager, by introduces possible package extensions. Section 6 concludes executing the command pip install infodeslib. the paper. Alternatively, the library can be installed via the GitHub address, using the command pip install 1. Late Fusion Dynamic Ensemble git+https://github.com/InfoLab-SKKU/infodeslib. To use the implemented methods in Infodeslib, a list of Framework classifiers and feature sets must be provided as input. The classifiers in the list can be of any type from the scikit-learn In this section, we provide an overview of the late fusion library and should be trained on the corresponding feature dynamic ensemble framework in algorithmic and visual set before being used as input. formats. This encompasses a thorough dissection of the pri- Once the pool of classifiers and feature sets has been ini- mary stages involved, along with step-by-step explanations tialized, the method fit(X_dsel, y_dsel) is applied to fit the of the framework’s methodology. Dynamic Selection method, where (X_dsel, y_dsel) is the val- Since late fusion dynamic ensemble utilizes the decision idation dataset (DSEL) with true labels. Predictions for each values obtained from each modality and fuses them using a test instance x can be obtained using either the predict(x) specific fusion mechanism 𝑀 (such as averaging, weighted or predict_proba(x) methods. In the example provided be- averaging, majority voting, etc), let us assume that classifier low, we demonstrate the steps involved in implementing 𝑐𝑖 is applied to modality 𝑓𝑖 . The final prediction can be the KNORA-U technique. expressed as: Training DSEL Testing data data data xj Feature set 1 Base classifier 1 Feature set 2 Competence 1 Base classifier 2 C φ2 φ3 Competence 2 C1 3 C2 x φ Classifier C1 C 2 Aggregation Feature set 3 φ1 j 4 Competence 3 CN Base classifier 3 C 4 CM Selection φk ... Competence M ... ... Pool of trained Region of Selected Feature set M classifiers competence: Φ classifiers: EoC Base classifier M Training Phase Selection Phase Explainability Contribution on decision [Classifier 1] XGB - Weight: 1.00 [Classifier 2] XGB - Weight: 1.00 Selected Classifier Competence Prediction Confidence Feature 72 Feature 7 Feature 7 Feature 10 [Classifier 1] XGB 1.00 2 0.99 Feature 70 Feature 54 Feature 11 Feature 1 [Classifier 2] XGB 1.00 2 0.99 Prediction Feature 10 Feature 19 Region of Feature 82 Feature 8 Feature 4 Feature 21 [Classifier 3] MLP [Classifier 4] SVC 0.42 0.71 1 3 0.39 0.22 Competence: Φ Models Feature 83 Feature 11 Feature 49 Feature 8 Feature 24 Feature 9 [Classifier 5] XGB 1.00 2 0.99 x Test sample 0: AD Evaluations 1: sMCI 0.0 0.2 Shap values 0.4 0.0 0.2 Shap values 0.4 [Classifier 6] KNN 1.00 2 1.00 0.0 0.5 1.0 1.5 2.0 2: CN 3: pMCI Explainability Feature Importance Contribution of models Case-based Reasoning Explanation Interface Model Evaluation Figure 1: The architecture of the proposed late fusion dynamic ensemble learning framework implemented by Infodeslib. Algorithm 1 Late fusion DES method each test instance. By setting plot=True, explainability for Input: Pool of classifiers 𝐶, training dataset 𝐷𝑡𝑟 , vali- the given test instance can be visualized through a variety dation dataset 𝐷𝑣𝑎 , testing dataset 𝐷𝑡𝑒 , feature set 𝐹 , and of methods (see more details in Section 3). neighborhood size 𝐾 Infodeslib Methods. Figure 2 provides an overview Output: 𝐸𝑜𝐶𝑡* , an ensemble of classifiers for each of the key methods of our library while other support- testing sample 𝑡 in 𝐷𝑡𝑒 ing methods are available in the documentation of the 1: for each classifier 𝑐𝑖 in 𝐶 do library. Some of these methods such as fit(), predict(), 2: Optimize 𝑓𝑖 in 𝐹 for 𝑐𝑖 ; predict_proba(), and score() are well-known and require 3: Optimize and train 𝑐𝑖 on 𝐷𝑡𝑟 with feature set 𝑓𝑖 ; no detailed explanation; there are several other methods 4: end for that are particularly useful for pool generation and ob- 5: for each testing sample 𝑡 in 𝐷𝑡𝑒 do taining information about new test samples. To facilitate 6: Find Ψ as the K nearest neighbors of the testing pool generation, we have implemented three additional sample 𝑡 in 𝐷𝑣𝑎 ; methods: get_average_accuracy(), get_pool_diversity(), and 7: for each sample 𝜓𝑖 in Ψ do get_coverage_score(). get_average_accuracy() method com- 8: for each classifier 𝑐𝑖 in 𝐶 do putes the average performance of the classifiers in the pool 9: Calculate competence of 𝑐𝑖 on Ψ; on the validation data. get_pool_diversity() method cal- 10: Select ensemble of competent classifiers culates the diversity between classifiers in the pool and re- 𝐸𝑜𝐶𝑡* ; quires the diversity measure type as a parameter. It supports 11: end for several diversity functions such as Q-statistic, Correlation 12: end for Coefficient, Disagreement Measure, Double Fault, Nega- 13: Use the ensemble 𝐸𝑜𝐶𝑡* to classify 𝑡; tive Double Fault, and Ratio Errors. get_coverage_score() 14: end for method determines the number of samples in the DSEL data that can be accurately predicted by any model in the given pool. This information is particularly useful for evaluating the coverage of the pool and ensuring that all samples are from infodeslib.des.knorau import KNORAU accurately classified by at least one model. The prediction pool_classifiers = [classifier1, ..., classifierN] process in machine learning often involves the use of en- # feature_set1 is a list of columns semble methods, where multiple classifiers are combined feature_sets = [feature_set1, ..., feature_setN] to improve performance. Within these ensembles, three methods play a crucial role: get_region_of_competence(x), # Initialize the DS model knorau = KNORAU(pool_classifiers, feature_sets) estimate_competence(roc), and select(competences). get_region_of_competence(x) method identifies the re- # Fit the dynamic selection model gion of competence for a given test sample by returning knorau.fit(X_dsel, y_dsel) the k nearest neighbors from the validation dataset. This is achieved by applying the k-nearest neighbors algorithm. # Predict new examples knorau.predict(X_test, plot=True) The estimate_competence(roc) method calculates the com- petence of each classifier in the ensemble on the region of # Check performance (based on accuracy) competence. The competence calculation differs depending knorau.score(X_test, y_test) on the technique being used. For example, the k-Nearest Oracle Union (KNORA-U) technique calculates the accuracy When utilizing the predict(X) method, an additional pa- of each classifier on the region of competence. The Dynamic rameter "plot" can be included to obtain explainability for [Classifier 1] XGB - Weight: 1.00 [Classifier 2] XGB - Weight: 1.00 Infodeslib Feature 72 Feature 7 fit(X, y) predict(X, plot=False) Feature 7 Feature 10 Feature 70 Feature 11 Prepare the DS model by pre-processing Return the class label for Feature 54 Feature 1 the information required to apply the DS each sample in X. plot=True Feature 10 Feature 19 methods. for getting the explainability. Feature 82 Feature 4 Feature 8 Feature 21 score(X, y) predict proba(X) Feature 83 Feature 8 Return the mean accuracy on the given Feature 11 Feature 24 Return the probabilities for Feature 49 Feature 9 data and labels. each sample in X. 0.0 0.2 0.4 0.0 0.2 0.4 Shap values Shap values Pool Single instance get_average_accuracy() get_region_of_competence(x) Return the mean accuracy of Return k nearest samples of the Figure 3: Local feature importance of each selected classifier. classifiers in the pool. given test sample from validation dataset. get_pool_diversity() Return the mean and list of estimate_competence(roc) diversity scores between Return the competences of each including basic approaches such as grid search and random classifiers in the pool. search in Sklearn, as well as more advanced techniques like base classifier on k nearest samples from RoC. get_coverage_score() Genetic algorithms, Bayesian optimization, and others. Our select(competences) Return the explainability how the given pool of classifiers Return all base classifiers that are library is designed to work seamlessly with other Python can cover the task on validation data. competent enough. packages such as TPOT [28], Scikit-Optimize [29], Optuna get_rareness_score(x) [30], Hyperopt [31], BayesianOptimization [32], GPyOpt Return the explainability how the [33], Optunity [34], and similar packages that implement given test sample is rare on trainingand validation data. these advanced optimization techniques. This allows users to leverage a variety of optimization methods to obtain the Hyperparameters best possible hyperparameters for their ensemble models. k: int - number of neighbors used to estimate the competence of the base classifiers. In our library, there are several key hyperparameters that DFP: boolean - determines if the dynamic frienemy pruning is applied. users can adjust to optimize the performance of ensemble knn_metric: str or callable - distance metric utilized by the k-NN classifier. learning models. We present these key hyperparameters dimensionality_reduction: boolean - determines if dimension reduction is applied. along with their default values, which have been shown to reduction_technique: str or callable - technique utilized for dimension reduction. produce satisfactory results in the majority of cases. One n_components: int - number of components to keep. of the main hyperparameters is k (default: 7), which rep- cbr_features: list - list of features to show in cased based reasoning XAI. resents the number of neighbors to be considered when determining the region of competence. Another important hyperparameter is DFP (default: False), which stands for Figure 2: The overall schema of the software architecture. dynamic pruning technique and is particularly useful for imbalanced datasets. In addition, users can also specify the knn_metric (default: ’minkowski’), which determines the distance metric used when computing distances between the Ensemble Selection KNN (DESKNN) technique, on the other test sample and other samples in the validation dataset. Our hand, computes each classifier’s accuracy and diversity on library provides several common metrics such as Minkowski, RoC and uses these metrics to assess its competence. cosine, Manhattan, and Euclidean, as well as the option for select(competences) method selects the most competent users to define their own custom metric function. To han- classifiers from the ensemble to make a prediction. Differ- dle high-dimensional datasets, we also offer a dimension- ent techniques may use different criteria for determining ality_reduction (default: False) hyperparameter, which the competence of a classifier, such as the number of sam- allows users to reduce the number of dimensions used in ples classified correctly within the region of competence. calculating distances between samples. This can be achieved For instance, the KNORA-U technique selects a classifier using either Principal Component Analysis (PCA) or Kernel if it has classified at least one sample within the region PCA, or by specifying a custom dimensionality reduction of competence. Once the competent classifiers have been technique using the next reduction_technique (default: identified, their competence values are used as weights in ’pca’). The n_component (default: 20) hyperparameter aggregating their predictions. To evaluate a single test in- determines the number of components to be retained if a stance, our library includes get_rareness_score(x) method, reduction technique is selected. Lastly, for those interested which provides a detailed description of the instance. The in explainability, our library provides the cbr_features (de- method evaluates whether there are many similar samples fault: None) hyperparameter, which allows users to specify to the given instance in the training and validation datasets, a list of important features to be included in similar cases allowing users to determine the rarity of the instance. If data for Case-Based Reasoning. the instance is an outlier, the method provides informa- tion about how far it is from other classes. Furthermore, get_rareness_score(x) method uses K-means clustering to 3. Model Explainability provide a potential class for the instance and generates ta- bles indicating which features of the instance make it similar In the current version of our library, we offer three main to this class. This approach provides valuable insights into XAI techniques: case-based reasoning, deep-based classi- the characteristics of the instance and its potential classifi- fier contribution, and local feature importance [20]. The cation, aiding in the development of more accurate models. case-based reasoning technique aims to offer domain ex- Hyperparameters. Optimizing hyperparameters is a criti- perts an explanation of the model’s prediction process for a cal step for improving the performance of ensemble learning given test sample by presenting them with similar samples models. This can be achieved through various techniques, and their corresponding labels found within the region of Contribution on decision Selected Classifier Competence Prediction Confidence [Classifier 1] XGB 1.00 2 0.99 [Classifier 2] XGB 1.00 2 0.99 [Classifier 3] MLP 0.42 1 0.39 [Classifier 4] SVC 0.71 3 0.22 [Classifier 5] XGB 1.00 2 0.99 Region of Competence: Φ [Classifier 6] KNN 1.00 2 1.00 0.0 0.5 1.0 1.5 2.0 x Test sample 0: AD 1: sMCI Figure 5: The contribution of each selected classifier on the final 2: CN 3: pMCI decision. a) Estimating the region of competence (RoC) in validation dataset. level of explainability can be utilized. This is illustrated in Figure 5, which provides detailed information about each Samples in the region of competence with selected features and labels classifier in the pool, including their competence level, in- Feature 7 Feature 8 Feature 10 Feature 11 ... Feature 72 Feature 84 Target dividual prediction on the new test sample, and confidence 0.467 0.190 0.00 0.00 0.00 0.00 0.22 0.12 ... ... 0.59 0.64 0.29 0.23 2 2 level. This explanation provides valuable insight for the 0.619 0.00 0.02 0.12 ... 0.84 0.36 2 development of an ensemble model, as it allows developers 0.524 0.00 0.00 0.04 ... 0.84 0.29 2 to identify classifiers that may have a negative impact on 0.524 0.25 0.07 0.28 ... 0.78 0.29 1 decision-making. For instance, as shown in Figure 5, it is evident that most selected classifiers predict the label of the 0.524 0.00 0.00 0.12 ... 0.52 0.24 2 0.619 0.00 0.00 0.00 ... 0.52 0.09 2 given test sample as 2 with high confidence, while the SVC b) Detailed information about the selected sample for RoC. classifier predicts it as 3. The SVC classifier demonstrates a higher level of competence in the region of competence, indicating that it has a more significant influence on the Figure 4: Estimating a region of competence (RoC) and providing details about the selected sample for RoC. decision. If this classifier consistently has a negative impact on many test samples, it may be possible to remove it from the pool of classifiers. Local feature importance. In addition to understand- competence. This approach closely resembles how domain ing how the classifiers contributed to the decision-making experts make decisions in real-world situations, as they fre- process, it is also important to identify which features were quently compare current cases with historical ones from particularly influential in making those decisions. For the their experience. The deep-based classifier contribution example mentioned earlier, we provide local feature impor- technique enables users to comprehend the contribution of tance for each selected classifier, which can be visualized each selected classifier in the decision-making process for through Figure 3. a given test sample. Finally, the local feature importance Furthermore, our proposed ensemble models have the technique is a prevalent explainability method that identi- ability to provide interpretable explanations using two ap- fies the most crucial features and their corresponding Shap proaches: surrogate model explainability and post-hoc ex- values for each selected classifier. plainability methods. The surrogate model approach in- Case-based reasoning. For example, in the case of the volves creating a simplified model that roughly represents KNORA-U technique, in the selection phase, the nearest the behavior of the original ensemble model and using this neighbors for each test instance are estimated in the valida- model to explain the ensemble’s decisions. On the other tion dataset based on their close similarity to the test sample. hand, post-hoc explainability techniques involve analyzing The selected samples are used to generate the region of com- the ensemble model’s decisions after they have been made petence for evaluating and selecting classifiers in the pool. and providing explanations based on the input features that Figure 4 a) illustrates an example in which the given test contributed the most to the decision. Both methods treat sample (light blue x) falls within the area of class 2, and our ensemble model as a black box model. seven nearest samples are selected, six of which belong to class 2 (blue dots), while one belongs to class 1 (green dot). This finding suggests that, for the given test sample, the 4. Performance Analysis chance of it being classified as class 2 is high. Moreover, these samples can also be leveraged for conducting case- Within this section, We compare the performance of the based reasoning, which may be particularly valuable for proposed architecture with the existing approaches. we physicians, given that our dataset is in the medical domain. provide an overview of the datasets that have been utilized Figure 4 b) provides comprehensive information about all along with a detailed analysis of our proposed techniques. nearest samples within the region of competence, enabling physicians to compare and contrast similar samples and 4.1. Evaluation Datasets their corresponding labels or diagnoses. Deep-based classifiers contributions. After selecting In this section, we outline the five datasets utilized to com- the group of classifiers for making the final decision, it may pare Infodeslib with existing models. be unclear how each classifier in the pool contributed to the Alzheimer’s Disease Neuroimaging Initiative decision or what their individual predictions were for the (ADNI) dataset [35]. The study includes a total of new test sample. In order to provide a more comprehensive 1,371 subjects, with a male gender representation of understanding of the decision-making process, an additional 54.5%. Participants have been classified into four distinct categories based on their clinical diagnosis, including Cognitive Normal (CN), Stable Mild Cognitive Impairment Table 2 (sMCI), Progressive Mild Cognitive Impairment (pMCI), and Performance of the different ML approaches using ADNI dataset. Alzheimer’s Disease (AD) [36]. The distribution of these Model Type Model Accuracy Precision Recall F1 classes is as follows: 419 CN, 473 sMCI, 140 pMCI, and 339 Single XGB LGBM 87.11±2.32 86.74±1.58 87.50±2.63 87.34±1.96 87.11±2.32 86.74±1.58 87.03±2.49 86.70±1.72 Models AD individuals. The dataset has four distinct modalities RF 87.11±1.96 87.51±2.22 87.11±1.96 87.08±2.08 [Early] Static Voting 88.08±1.94 88.31±2.07 88.08±1.94 88.05±2.02 or feature groups, which contain demographics, cognitive Ensemble Stacking 86.87±2.02 87.67±2.11 86.87±2.02 86.80±2.15 scores, assessment tests, and MRI features. [Early] Dynamic DESP 88.61±1.96 88.72±2.15 88.61±1.96 88.55±2.08 Ensemble KNOP 88.71±1.91 88.80±2.09 88.71±1.91 88.66±2.03 Credit Card Clients dataset. The study includes a vast [Late] Static Voting 89.29±1.67 89.39±1.81 89.29±1.67 89.24±1.74 participant cohort of 30,000 individuals, with the dataset Ensemble Stacking 87.65±1.59 88.13±1.64 87.65±1.59 87.60±1.74 [Late] Dynamic KNORAU 89.52±2.01 89.77±2.01 89.52±2.01 89.46±2.10 sourced from the UC Irvine Machine Learning Repository Ensemble KNORAU-W 89.84±1.83 90.29±2.03 89.81±1.83 89.80±1.91 [37]. This is a classification problem that involves determin- ing whether or not a client will make their next payment. Table 3 The two distinct classes are labeled as ’no’ and ’yes’, with Performance of the different ML approaches on Credit Card 23,364 and 6,636 instances, respectively. The dataset has Clients dataset. four distinct modalities of features, including demographics, Model Type Model Accuracy Precision Recall F1 financial, and payment history features. Single XGB 81.96±0.84 89.05±0.72 72.87±1.28 80.15±1.02 LGBM 80.43±0.64 86.88±0.66 71.69±0.92 78.56±0.76 National Alzheimer’s Coordinating Center (NACC) Models RF 79.90±0.78 87.49±0.70 69.76±1.13 77.63±0.95 dataset [38]. In this study, we examined a total of 37,547 [Early] Static Voting 83.94±0.74 88.52±0.72 78.01±1.11 82.93±0.84 Ensemble Stacking 82.68±0.69 86.72±0.77 77.17±0.95 81.67±0.77 patients focusing on the Global Clinical Dementia Rating [Early] Dynamic DESKNN 83.99±0.38 89.34±0.53 77.18±0.60 82.82±0.42 (CDRGLOB) as the primary task. CDRGLOB categorizes Ensemble KNORAE 84.16±0.66 88.81±0.64 78.17±0.97 83.15±0.75 [Late] Static Voting 85.72±0.44 89.83±0.57 80.56±0.73 84.94±0.49 patients into five classes based on dementia severity: no Ensemble Stacking 85.08±0.47 89.30±0.36 79.71±0.94 84.23±0.57 impairment (8,253 patients), mild impairment (15,097 pa- [Late] Dynamic KNOP 86.65±0.23 91.64±0.28 80.66±0.52 85.80±0.28 Ensemble KNORAU-W 86.73±0.29 91.76±0.33 80.70±0.64 85.87±0.36 tients), moderate impairment (8,346 patients), and severe impairment (5,851 patients). Our analysis included six spe- cific modalities for investigation: demographics, physical heterogeneous baseline algorithms: XGboost (XGB), Light- health, medications, health history, neuropsychiatric inven- GBM (LGBM), Random Forest (RF), Support Vector Classifier tory questionnaire, and the geriatric depression scale. These (SVC), Multi-Layer Perceptron (MLP), Decision Tree (DT), modalities were chosen to comprehensively assess various and k-Nearest Neighbors (KNN). aspects related to dementia and overall patient health [39]. Results based on ADNI dataset. Table 2 and Figure Parkinson’s Progression Markers Initiative (PPMI) 6 a) show the top-performing results achieved by the in- dataset [40]. Our study involves 952 patients and fo- dividual models, as well as the static ensemble with early cuses on a binary classification task to differentiate between fusion, the dynamic ensemble with early fusion, the static healthy individuals and those diagnosed with Parkinson’s ensemble with late fusion, and our proposed technique - disease (PD). Among these patients, 389 are categorized the dynamic ensemble with late fusion setting. From each as healthy, while 563 have been diagnosed with PD. The group, we selected the best-performed techniques and the dataset encompasses various information modalities, in- results show that our dynamic ensemble techniques, KNO- cluding subject characteristics, biospecimen data, medical RAU and KNORAU-W outperform all existing approaches history records, motor function assessments, and non-motor with 89.52% and 89.84% accuracy. In comparison, a static features. This comprehensive dataset enables a thorough ensemble with late fusion, voting classifier, achieves an accu- analysis to identify potential diagnostic markers and factors racy of 89.29%. This performance is close to the performance associated with PD, facilitating improved understanding of our model and surpasses that of early fusion techniques. and diagnosis of the disease [41]. This result supports our claim for the significance of late Samarkand Neonatal Center dataset. Our study fusion in producing accurate ensemble models. involved 347 neonates from the intensive care unit at Results based on Credit Card Clients dataset. Table 3 Samarkand Neonatal Center. The dataset was collected by and Figure 6 b) present the results obtained from the analysis our team by collaborating physicians in the hospital for a of the Credit Card Clients dataset, following a similar format binary classification task to predict whether a neonate sur- to the previous dataset. Our proposed techniques have once vives or passes away. Among these neonates, 303 survived again outperformed the existing approaches in this instance. and 44 died during the study period. The dataset comprises Specifically, KNOP and KNORAU-W, utilizing the late fusion a comprehensive set of features categorized into multiple setting, have achieved the highest accuracy scores of 86.65% modalities: demographic information, the mother’s medical and 86.73%, respectively. In comparison, the static ensemble history and information, general notes on the neonate’s methods that apply late fusion, specifically the voting and condition, results from blood tests, and APGAR scores (a stacking classifiers, demonstrate accuracies of 85.72% and standardized assessment of a neonate’s health at birth). 85.08%, respectively. In contrast, the ensemble methods that employ early fusion achieve the highest accuracy of 4.2. Results 84.16%, with the dynamic selection technique known as KNORA-E. These results support our argument regarding This section contains a comprehensive analysis and compar- the importance of utilizing late fusion for the purpose of ison of various machine-learning approaches against our producing highly accurate ensemble models. proposed late-fusion dynamic ensemble selection model. Results based on NACC dataset. Table 4 highlights We collect and present the testing results for each of the the results from the analysis of the National Alzheimer’s considered models. To ensure greater consistency in the Coordinating Center dataset, structured similarly to the pre- results, we have applied the 10-holdout testing method [42]. vious dataset. Among all existing techniques, the dynamic The results are presented in the form of (mean ± standard ensemble models with late fusion demonstrate notably su- deviation). As a pool of classifiers, we utilized the following Table 4 Table 6 Performance of the ML approaches on the NACC dataset. Performance of the different ML approaches on Samarkand Model Type Model Accuracy Precision Recall F1 Neonatal Center dataset. GB 85.70+1.16 85.71+1.16 85.77+1.07 85.56+1.19 Model Type Model Accuracy Precision Recall F1 Single XGB 86.30+1.47 86.30+1.47 86.32+1.48 86.18+1.51 RF 69.34+4.66 69.34+4.66 73.70+4.20 67.71+5.44 Models Single RF 86.79+0.74 86.79+0.74 86.76+0.82 86.69+0.76 XGB 69.74+7.64 69.74+7.64 74.29+6.31 67.77+9.16 [Early] Static Voting 87.52+0.90 87.53+0.90 87.42+0.93 87.40+0.90 Models LGBM 70.07+8.58 70.07+8.58 72.65+7.52 68.74+9.62 Ensemble Stacking 87.17+1.29 87.17+1.29 87.20+1.19 87.11+1.26 [Early] Static Voting 73.03+8.03 73.03+8.03 75.50+6.57 72.00+8.95 [Early] Dynamic DESKNN 87.30+1.16 87.61+1.16 87.37+1.06 87.39+1.14 Ensemble Stacking 71.45+5.11 71.45+5.11 75.70+4.39 70.06+5.87 Ensemble KNORAU 88.34+1.44 88.34+1.44 88.34+1.41 88.27+1.45 [Early] Dynamic KNORAU 71.64+6.84 71.64+6.84 75.30+5.37 70.28+7.74 [Late] Static Voting 89.39+1.34 89.39+1.34 89.43+1.30 89.36+1.33 Ensemble DESP 72.45+6.03 75.95+6.03 72.48+5.24 71.48+7.05 Ensemble Stacking 89.11+0.89 89.11+0.89 89.15+0.95 89.07+0.91 [Late] Static Voting 75.07+6.94 75.07+6.94 77.96+5.19 74.12+7.90 [Late] Dynamic KNORAU-W 90.20+1.10 90.20+1.10 90.30+1.09 90.20+1.11 Ensemble Stacking 74.21+7.72 74.21+7.72 77.95+5.38 72.84+9.14 Ensemble DESP 91.16+0.93 91.21+0.93 91.14+0.89 91.17+0.92 [Late] Dynamic KNORAU-W 75.66+7.44 75.66+7.44 78.04+6.09 74.90+8.05 Ensemble KNOP 77.57+5.81 77.57+5.81 80.58+4.21 76.84+6.35 Table 5 Early fusion Late fusion Performance of the ML approaches on the PPMI dataset. Model Type Model Accuracy Precision Recall F1 RF 92.40±1.00 93.40±0.90 92.10±1.00 92.10±1.00 Single Accuracy XGB 93.40±1.50 93.60±1.80 93.10±1.40 93.10±1.40 Models LGBM 93.90±1.60 93.90±2.00 93.70±1.40 93.70±1.40 [Early] Static Voting 94.20±0.70 94.20±0.80 94.00±0.70 94.00±0.70 Ensemble Stacking 94.10±0.90 94.00±1.10 93.90±0.90 93.90±0.90 [Early] Dynamic KNOP 94.20±1.10 94.30±1.20 93.90±1.20 93.90±1.20 Ensemble KNORAU 94.30±0.90 94.40±1.10 94.00±0.90 94.00±0.90 [Late] Static Voting 94.60±0.90 94.60±1.10 94.30±0.90 94.30±0.90 Ensemble Stacking 94.50±0.80 94.70±0.80 94.20±0.80 94.20±0.80 [Late] Dynamic DESP 95.00±0.90 95.20±0.90 94.70±1.00 94.70±1.00 Ensemble KNOP 95.10±0.60 95.40±0.70 94.70±0.60 94.70±0.60 a) ADNI dataset. perior performance. Specifically, the weighted KNORAU (KNORAU-W) and DESP achieve the highest scores at 90.20% Accuracy and 91.16%, respectively. Given the substantial dataset size, the results are well-balanced across various metrics. Results based on PPMI dataset. Table 5 presents the re- sults obtained from the analysis of the Parkinson’s Progres- sion Markers Initiative dataset, following a format similar to the previous dataset. Within this dataset, the techniques DESP and KNOP, utilizing late fusion settings, exhibit the b) Credit Card Clients dataset. most robust performance among other algorithms, achiev- Figure 6: Contribution of each selected classifier to the final ing accuracies of 95% and 95.1%, respectively. Additionally, decision. static ensemble models with late fusion settings demonstrate strong performance at 94.6% accuracy using a voting tech- nique. These results only marginally exceed those achieved with LGBM alone, which achieved a performance of 93.9%. 5. Library extension The fact that the LGBM achieved a high accuracy of 93.9% The primary focus of our paper is to introduce a novel ap- suggests that the task at hand is not very complex. Improv- proach to dynamic ensemble selection (DES) that utilizes a ing accuracy beyond this point becomes more challenging late fusion strategy for effectively fusing multi-modal data when a basic technique like LGBM already performs well. and offers a high degree of explainability for dynamic selec- Essentially, reaching significantly higher accuracies with tion techniques. Our current library offers implementations more advanced methods might be difficult because the task of four dynamic classifier selection and seven dynamic en- is relatively straightforward. semble selection techniques that use a late fusion strategy. Results based on Samarkand Neonatal Center In addition, the library includes several features and options dataset. Table 6 presents the results obtained from ana- that enhance its performance and capability. Furthermore, lyzing the Samarkand Neonatal Center ICU dataset, follow- the library provides three different types of explainability to ing a similar structure to the previous datasets. Due to the help users gain insights into the decision-making processes dataset’s small size, the results may not be consistent or bal- of the models. Finally, the library has been designed to be anced across different metrics. Nonetheless, our proposed compatible with other important libraries, allowing users late fusion-based dynamic ensemble models achieve notably to easily integrate it into their existing workflows. higher performance compared to other techniques, reaching We plan to continue exploring the domain of explainabil- 77.57% accuracy with the KNOP technique. ity in ensemble learning by proposing additional techniques Across all five datasets analyzed, the importance of late for providing comprehensive explanations to domain ex- fusion can be seen in the results. In each dataset, the dy- perts. Our goal is to enhance our library’s ability to provide namic ensemble models with late fusion settings outper- context-based explanations that are tailored to the specific form other existing models. Combining late fusion with needs of users. Additionally, we aim to incorporate what-if dynamic ensemble learning consistently delivers promis- explainability features that enable developers to gain deeper ing and improved results. This highlights the effectiveness insights into the behavior of their ensemble models. These and reliability of employing late fusion techniques within features will be included in future versions of our library. dynamic ensemble models across various datasets. Through our experimental evaluations, we have discov- ered that selecting an appropriate pool of classifiers with matching feature groups is a critical aspect of successful ensemble modeling. However, identifying the ideal combi- [6] A. S. Britto Jr, R. Sabourin, L. E. Oliveira, Dynamic se- nation of classifiers for the pool remains a challenging task. lection of classifiers—a comprehensive review, Pattern In future versions of our library, we plan to address this recognition 47 (2014) 3665–3680. issue by developing an automatic optimization process for [7] X. Dong, Z. Yu, W. Cao, Y. Shi, Q. Ma, A survey on the selection of the optimal pool of classifiers. We believe ensemble learning, Frontiers of Computer Science 14 this to be a crucial task in the field of ensemble learning, (2020) 241–258. and we are committed to exploring ways to simplify this [8] L. Breiman, Bagging predictors, Machine learning 24 process and make it more effective. (1996) 123–140. [9] G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis, C. D. Spyropoulos, P. Stamatopoulos, 6. Conclusion Stacking classifiers for anti-spam filtering of e-mail, arXiv preprint cs/0106040 (2001). This paper presents a novel approach to dynamic selection [10] A. H. Ko, R. Sabourin, A. S. Britto Jr, From dynamic using a late fusion setting, which is implemented across four classifier selection to dynamic ensemble selection, Pat- dynamic classifier selection and seven dynamic ensemble tern recognition 41 (2008) 1718–1731. selection techniques. This late fusion-based approach is [11] R. M. Cruz, R. Sabourin, G. D. Cavalcanti, Dynamic particularly well-suited for complex tasks based on multi- classifier selection: Recent advances and perspectives, modal datasets containing multiple feature groups, which Information Fusion 41 (2018) 195–216. are common in real-world scenarios. As a result, the role of [12] F. Juraev, S. El-Sappagh, E. Abdukhamidov, F. Ali, late fusion is crucial in the context of ensemble learning for T. Abuhmed, Multilayer dynamic ensemble model ensuring diversity in the pool of classifiers. Furthermore, for intensive care unit mortality prediction of neonate we introduce a novel approach to explainability for dynamic patients, Journal of Biomedical Informatics 135 (2022) selection techniques. Our proposed approach goes beyond 104216. the traditional methods and provides a more in-depth and [13] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, nuanced understanding of the dynamic selection process. B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, The effectiveness of our proposed techniques is evaluated R. Weiss, V. Dubourg, et al., Scikit-learn: Machine through a comprehensive comparison with existing base- learning in python, the Journal of machine Learning line approaches. The experimental results demonstrate the research 12 (2011) 2825–2830. superior performance of our proposed techniques over the [14] R. M. Cruz, L. G. Hafemann, R. Sabourin, G. D. Caval- existing approaches, highlighting the potential of our ap- canti, Deslib: A dynamic ensemble selection library proach to improving the accuracy and reliability of ensemble in python, The Journal of Machine Learning Research learning systems. 21 (2020) 283–287. [15] K. Liu, Y. Li, N. Xu, P. Natarajan, Learn to combine Acknowledgments modalities in multimodal deep learning, arXiv preprint arXiv:1805.11730 (2018). This work was supported by the National Research Foun- [16] S. Raschka, Mlxtend: Providing machine learning dation of Korea(NRF) grant funded by the Korea govern- and data science utilities and extensions to python’s ment(MSIT)(No. 2021R1A2C1011198), (Institute for Infor- scientific computing stack, The Journal of Open Source mation & communications Technology Planning & Evalua- Software 3 (2018). URL: https://joss.theoj.org/papers/ tion) (IITP) grant funded by the Korea government (MSIT) 10.21105/joss.00638. doi:10.21105/joss.00638. under the ICT Creative Consilience Program (IITP-2021- [17] S. El-Sappagh, F. Ali, T. Abuhmed, J. Singh, J. M. 2020-0-01821), and AI Platform to Fully Adapt and Reflect Alonso, Automatic detection of alzheimer’s disease Privacy-Policy Changes (No.RS-2022-II220688). progression: An efficient information fusion approach with heterogeneous ensemble classifiers, Neurocom- puting 512 (2022) 203–224. References [18] C. G. Snoek, M. Worring, A. W. Smeulders, Early ver- sus late fusion in semantic video analysis, in: Proceed- [1] H. Xiao, Z. Xiao, Y. Wang, Ensemble classification ings of the 13th annual ACM international conference based on supervised clustering for credit scoring, Ap- on Multimedia, 2005, pp. 399–402. plied Soft Computing 43 (2016) 73–86. [19] F. Juraev, S. El-Sappagh, T. Abuhmed, Explainable [2] D. Di Nucci, F. Palomba, R. Oliveto, A. De Lucia, Dy- dynamic ensemble framework for classification based namic selection of classifiers in bug prediction: An on the late fusion of heterogeneous multimodal data, adaptive method, IEEE Transactions on Emerging in: Intelligent Systems Conference, Springer, 2023, pp. Topics in Computational Intelligence 1 (2017) 202–212. 555–570. [3] O. Sagi, L. Rokach, Ensemble learning: A survey, Wiley [20] S. Ali, T. Abuhmed, S. El-Sappagh, K. Muhammad, J. M. Interdisciplinary Reviews: Data Mining and Knowl- Alonso-Moral, R. Confalonieri, R. Guidotti, J. Del Ser, edge Discovery 8 (2018) e1249. N. Díaz-Rodríguez, F. Herrera, Explainable artificial [4] L. I. Kuncheva, A theoretical study on six classifier fu- intelligence (xai): What we know and what is left to sion strategies, IEEE Transactions on pattern analysis attain trustworthy artificial intelligence, Information and machine intelligence 24 (2002) 281–286. Fusion (2023) 101805. [5] M. Fernández-Delgado, E. Cernadas, S. Barro, [21] M. Sabourin, A. Mitiche, D. Thomas, G. Nagy, Classi- D. Amorim, Do we need hundreds of classifiers to fier combination for hand-printed digit recognition, in: solve real world classification problems?, The journal Proceedings of 2nd International Conference on Doc- of machine learning research 15 (2014) 3133–3181. ument Analysis and Recognition (ICDAR’93), IEEE, 1993, pp. 163–166. [22] K. Woods, W. P. Kegelmeyer, K. Bowyer, Combination [38] D. L. Beekly, E. M. Ramos, W. W. Lee, W. D. Deitrich, of multiple classifiers using local accuracy estimates, M. E. Jacka, J. Wu, J. L. Hubbard, T. D. Koepsell, J. C. IEEE transactions on pattern analysis and machine Morris, W. A. Kukull, et al., The national alzheimer’s intelligence 19 (1997) 405–410. coordinating center (nacc) database: the uniform data [23] P. C. Smits, Multiple classifier systems for supervised set, Alzheimer Disease & Associated Disorders 21 remote sensing image classification based on dynamic (2007) 249–258. classifier selection, IEEE Transactions on Geoscience [39] N. Rahim, S. El-Sappagh, H. Rizk, O. A. El-serafy, and Remote Sensing 40 (2002) 801–813. T. Abuhmed, Information fusion-based bayesian opti- [24] R. G. Soares, A. Santana, A. M. Canuto, M. C. P. mized heterogeneous deep ensemble model based on de Souto, Using accuracy and diversity to select clas- longitudinal neuroimaging data, Applied Soft Com- sifiers to build ensembles, in: The 2006 IEEE Interna- puting 162 (2024) 111749. tional Joint Conference on Neural Network Proceed- [40] K. Marek, D. Jennings, S. Lasch, A. Siderowf, C. Tanner, ings, IEEE, 2006, pp. 1310–1316. T. Simuni, C. Coffey, K. Kieburtz, E. Flagg, S. Chowd- [25] M. C. de Souto, R. G. Soares, A. Santana, A. M. Canuto, hury, et al., The parkinson progression marker ini- Empirical comparison of dynamic classifier selection tiative (ppmi), Progress in neurobiology 95 (2011) methods based on diversity and accuracy for building 629–635. ensembles, in: 2008 IEEE international joint confer- [41] M. Junaid, S. Ali, F. Eid, S. El-Sappagh, T. Abuhmed, ence on neural networks (IEEE world congress on com- Explainable machine learning models based on mul- putational intelligence), IEEE, 2008, pp. 1480–1487. timodal time-series data for the early detection of [26] T. Woloszynski, M. Kurzynski, P. Podsiadlo, G. W. Sta- parkinson’s disease, Computer Methods and Programs chowiak, A measure of competence based on random in Biomedicine 234 (2023) 107495. classification for dynamic ensemble selection, Infor- [42] C. Sammut, G. I. Webb (Eds.), Holdout Evalua- mation Fusion 13 (2012) 207–213. tion, Springer US, Boston, MA, 2010, pp. 506–507. [27] P. R. Cavalin, R. Sabourin, C. Y. Suen, Dynamic selec- URL: https://doi.org/10.1007/978-0-387-30164-8_369. tion approaches for multiple classifier systems, Neural doi:10.1007/978-0-387-30164-8_369. computing and applications 22 (2013) 673–688. [28] R. S. Olson, J. H. Moore, Tpot: A tree-based pipeline optimization tool for automating machine learning, in: Workshop on automatic machine learning, PMLR, 2016, pp. 66–74. [29] T. Head, M. Kumar, H. Nahrstaedt, G. Louppe, I. Shcherbatyi, scikit-optimize/scikit-optimize: v0. 8.1, Zenodo (2020). [30] T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Op- tuna: A next-generation hyperparameter optimization framework, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019. [31] J. Bergstra, D. Yamins, D. Cox, Making a science of model search: Hyperparameter optimization in hun- dreds of dimensions for vision architectures, in: Inter- national conference on machine learning, PMLR, 2013, pp. 115–123. [32] F. Nogueira, et al., Bayesian optimization: Open source constrained global optimization tool for python, URL https://github. com/fmfn/BayesianOptimization (2014). [33] T. G. authors, Gpyopt: A bayesian optimization framework in python, http://github.com/SheffieldML/ GPyOpt, 2016. [34] M. Claesen, J. Simm, D. Popovic, Y. Moreau, B. De Moor, Easy hyperparameter search using optu- nity, arXiv preprint arXiv:1412.1114 (2014). [35] S. G. Mueller, M. W. Weiner, L. J. Thal, R. C. Petersen, C. Jack, W. Jagust, J. Q. Trojanowski, A. W. Toga, L. Beckett, The alzheimer’s disease neuroimaging initiative, Neuroimaging Clinics of North America 15 (2005) 869. [36] N. Rahim, T. Abuhmed, S. Mirjalili, S. El-Sappagh, K. Muhammad, Time-series visual explainability for alzheimer’s disease progression detection for smart healthcare, Alexandria Engineering Journal 82 (2023) 484–502. [37] A. Asuncion, D. Newman, Uci machine learning repos- itory, 2007.