Infodeslib: Python Library for Dynamic Ensemble Learning using
                         Late Fusion of Multimodal Data
                         Firuz Juraev1 , Shaker El-Sappagh1,2 and Tamer Abuhmed1,*
                         1
                             College of Computing and Informatics, Sungkyunkwan University, South Korea
                         2
                             Faculty of Computer Science and Engineering, Galala University, Egypt


                                           Abstract
                                           There has been a notable increase in research focusing on dynamic selection (DS) techniques within the field of ensemble learning.
                                           This leads to the development of various techniques for ensembling multiple classifiers for a specific instance or set of instances
                                           during the prediction phase. Despite this progress, the design and development of DS approaches with late fusion settings and their
                                           explainability remain unexplored. This work proposes an open-source Python library, Infodeslib, to address this gap. The library
                                           provides an implementation of several DS techniques, including four dynamic classifier selections and seven dynamic ensemble selection
                                           techniques, all of which are integrated with late data fusion settings and novel explainability features. Infodeslib offers flexibility and
                                           customization options, making it a versatile tool for various complex applications that require the fusion of multimodal data and various
                                           explainability features. Multimodal data, which integrates information from diverse sources or sensor modalities, is a common and
                                           essential setting for real-world problems, enhancing the robustness and depth of data analysis. These data can be fused in two main ways:
                                           early fusion, where different modalities are combined at the feature level before model training, and late fusion, where each modality is
                                           processed separately and the results are combined at the decision level. The library is fully documented following the Read the Docs
                                           standards. The documentation, code, and examples are available anonymously on GitHub at https://github.com/InfoLab-SKKU/infodeslib.

                                           Keywords
                                           Ensemble of classifiers, Dynamic classifier selection, Dynamic ensemble selection, multimodal data fusion, Late fusion, Machine learning,
                                           Explainable AI, Python.


                            Ensemble learning is a thriving domain within the fields                                                  feature groups to achieve greater diversity in the model pool.
                         of machine learning and pattern recognition [1, 2]. With                                                     This diversity is crucial for constructing a robust ensemble
                         all the diverse ensemble classifiers available, each classifier                                              that can effectively generalize to previously unseen data.
                         approaches the problem from a different perspective. The                                                     Moreover, late fusion provides more flexibility as classifiers
                         main idea of ensemble learning is to leverage a group of                                                     are assigned to different modalities considering that certain
                         classifiers to provide comprehensive coverage of the learned                                                 classifiers are best to model certain modalities [15].
                         task [3]. By utilizing diverse models that exhibit distinct                                                     In current literature, late fusion-based ensemble learning
                         decision boundaries, ensemble learning seeks to maximize                                                     is solely available with static classifiers selection [16], and
                         the accuracy and effectiveness of the overall classification                                                 most of these studies show the superiority of late fusion
                         process. As a result, the performance of ensemble classi-                                                    compared to early fusion for static ensemble [17, 18, 19].
                         fiers is better than any of its base classifiers [4, 5]. This                                                This motivates us to explore the performance of late fusion
                         is because each base classifier concentrates on the specific                                                 in dynamic selection compared to early fusion; however, to
                         region of the error space and combining the decisions of                                                     the best of our knowledge, no study or implementation has
                         these classifiers improves the overall ensemble’s decisions.                                                 been conducted to examine the performance of late fusion
                         Ensemble learning approaches can be broadly classified into                                                  in dynamic selection settings. This work aims to implement
                         two categories: static and dynamic selection approaches                                                      different types of dynamic selection techniques in the late
                         [6, 7]. In static selection [8, 9], a predetermined group of                                                 fusion setting. By doing so, we can explore the performance
                         classifiers is selected, and this group is utilized to make                                                  of late fusion-based ensemble learning under dynamic se-
                         decisions for each new test instance. In dynamic selection                                                   lection modeling, gaining a deeper understanding of its
                         [10, 11, 12], a new group of classifiers is selected for each                                                potential advantages and limitations.
                         test instance, and this group is employed to make a decision                                                    Resulting late fusion-based dynamic ensembles are ex-
                         for that specific instance.                                                                                  pected to improve the performance of the resulting clas-
                            Since real-world datasets are often complex and consist                                                   sifiers. However, these models are black boxes and not
                         of multiple feature groups or so-called ‘modalities’, ensem-                                                 understandable. Trustworthy classifiers that are applicable
                         ble learning is a popular candidate to be used to combine                                                    in the real world need to be interpretable. Explainable AI
                         multiple models to improve the performance and robustness                                                    (XAI) has gained significant attention in recent years [20],
                         of predictive models. One approach to ensemble learning                                                      as it is crucial to provide insights into the decision-making
                         is early fusion, where all modalities are merged in a pool                                                   process of machine learning models. However, despite the
                         for the classifiers to capture the potential interaction and                                                 growing interest in this area, there is a lack of explainabil-
                         interdependencies among the modalities using either static                                                   ity features for ensemble learning techniques, which are
                         [13] or dynamic selection [14].                                                                              increasingly used in complex real-world applications to im-
                            Another approach to ensemble learning is the late fu-                                                     prove the trustworthiness of resulting models. To the best
                         sion or decision fusion, where each classifier in the pool                                                   of our knowledge, no study in the literature and no Python
                         is trained with different feature groups or combinations of                                                  packages are provided to implement XAI capabilities for
                                                                                                                                      dynamic ensemble classifiers. This study aims to address
                         KiL’24: Workshop on Knowledge-infused Learning co-located with 30th
                         ACM KDD Conference, August 26, 2024, Barcelona, Spain
                                                                                                                                      this research gap by developing a Python package that of-
                         *
                           Corresponding author.                                                                                      fers novel explainability techniques for ensemble models,
                         $ fjuraev@g.skku.edu (F. Juraev); shaker@skku.edu (S. El-Sappagh);                                           making them accessible and informative for both domain
                         tamer@skku.edu (T. Abuhmed)                                                                                  experts and developers.
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Table 1
Infodeslib implemented DS methods.                                                         𝑝 = 𝑀 (𝑐1 (𝑓1 ), 𝑐2 (𝑓2 ), ..., 𝑐𝑚 (𝑓𝑚 ))       (1)
 Technique                               Selection   Reference
 Modified Rank (MR)                      DCS         Sabourin et al. [21]
                                                                                  The proposed concept of dynamic selection with late fu-
 Overall Local Accuracy (OLA)            DCS         Woods et al. [22]         sion is illustrated in Figure 1, which outlines a framework
 Local Class Accuracy (LCA)              DCS         Woods et al. [22]         consisting of three key stages: training, selection, and pre-
 Modified Local Accuracy (MLA)           DCS         P.C. Smits [23]
 DES-KNN                                 DES         Soares et al. [24, 25]    diction. Additionally, the concept is detailed algorithmically
 K-Nearest Oracles Eliminate (KNORA-E)   DES         Ko et al. [10]            in Algorithm 1.
 K-Nearest Oracles Union (KNORA-U)       DES         Ko et al. [10]
 Weighted KNORA-E (KNORA-E-W)            DES         Ko et al. [10]               Training Phase. A pool of classifiers is selected and
 Weighted KNORA-U (KNORA-U-W)            DES         Ko et al. [10]            assigned different feature sets. The classifiers within the
 DES Performance (DES-P)                 DES         Woloszynski et al. [26]
 K-Nearest Output Profiles (KNOP)        DES         Cavalin et al. [27]
                                                                               pool are selected based on their diversity, ensuring a wide
                                                                               range of decision-making capabilities. Each feature set used
                                                                               by selected classifiers is extracted from the same modality
  The contributions of the study are as follows:                               to generate a homogeneous feature set. For example, in
                                                                               the medical domain, demographic and MRI features are
     • We extended the literature on dynamic ensemble                          different modalities that could be used to train two different
       modeling by implementing four dynamic classifier                        classifiers. Each classifier in the pool is then trained and
       selection techniques and seven dynamic ensemble                         optimized with its designated feature set, resulting in a pool
       selection techniques, incorporating a late fusion of                    of trained classifiers to be utilized in the next phases (1-4
       multiple modalities (see Table 1).                                      lines in Algorithm 1).
     • We propose three types of novel explainability that                        Selection Phase. During the selection phase (5-12 lines
       provide deep and suitable XAI for dynamic selection                     of Algorithm 1), a region of competence (RoC) is determined
       techniques: Case-Based Reasoning, deep-based clas-                      for a given new test instance by selecting the nearest sam-
       sifiers contributions, and local feature importance.                    ples from the validation data (DSEL). Subsequently, each
     • We compare the performance of the proposed tech-                        classifier in the pool is evaluated on the samples within the
       niques with existing approaches on four well-known                      RoC, and a measure of competence is calculated for each
       and real-world multimodal datasets: Alzheimer’s                         classifier. The specific method employed to compute the
       Disease Neuroimaging Initiative (ADNI), Credit                          competence varies depending on the chosen DS technique
       Card Clients, National Alzheimer’s Coordinating                         (9-10 lines in Algorithm 1). Once the competencies of each
       Center, and Parkinson’s Progression Markers Initia-                     classifier in the pool are calculated, the DS techniques use
       tive (PPMI). We also tested the proposed techniques                     their own selection criteria to identify the most competent
       in the Samarkand Neonatal Center dataset which is                       classifiers. These criteria are specific to each DS technique.
       collected by our team with the help of physicians.                      If no competent classifier satisfies the criteria for a given
     • The implemented techniques have been included                           DS technique, all classifiers in the pool are selected to make
       in a standard public library called ‘Infodeslib’ fol-                   the final decision.
       lowing the industry-standard PEP 8 coding guide-                           Prediction Phase. During the final phase, the selected
       lines, and Infodeslib is also clearly documented                        classifiers are utilized to predict the class of a given test
       in accordance with the Read the Docs standards:                         instance, and their individual predictions are combined to
       https://infodeslib.readthedocs.io/en/latest/                            generate a final prediction. To provide more accurate deci-
     • We offer a wide range of valuable functions that en-                    sions, each of the selected classifiers could be weighed based
       able the assessment and evaluation of the excellence                    on its level of competence during the aggregation process
       and efficacy of the selected pool.                                      (line 13 in Algorithm 1).

   The study is organized as follows. Section 1 highlights
the software framework of the proposed late fusion dynamic                     2. Installation and Usage
ensemble learning. Section 2 presents Installation and Usage,
                                                                               Users can conveniently install the most recent version
Section 4 discusses the performance analysis, and Section 5
                                                                               of Infodeslib via pip, the Python package manager, by
introduces possible package extensions. Section 6 concludes
                                                                               executing the command pip install infodeslib.
the paper.
                                                                               Alternatively, the library can be installed via the
                                                                               GitHub address, using the command pip install
1. Late Fusion Dynamic Ensemble                                                git+https://github.com/InfoLab-SKKU/infodeslib.
                                                                                  To use the implemented methods in Infodeslib, a list of
   Framework                                                                   classifiers and feature sets must be provided as input. The
                                                                               classifiers in the list can be of any type from the scikit-learn
In this section, we provide an overview of the late fusion
                                                                               library and should be trained on the corresponding feature
dynamic ensemble framework in algorithmic and visual
                                                                               set before being used as input.
formats. This encompasses a thorough dissection of the pri-
                                                                                  Once the pool of classifiers and feature sets has been ini-
mary stages involved, along with step-by-step explanations
                                                                               tialized, the method fit(X_dsel, y_dsel) is applied to fit the
of the framework’s methodology.
                                                                               Dynamic Selection method, where (X_dsel, y_dsel) is the val-
   Since late fusion dynamic ensemble utilizes the decision
                                                                               idation dataset (DSEL) with true labels. Predictions for each
values obtained from each modality and fuses them using a
                                                                               test instance x can be obtained using either the predict(x)
specific fusion mechanism 𝑀 (such as averaging, weighted
                                                                               or predict_proba(x) methods. In the example provided be-
averaging, majority voting, etc), let us assume that classifier
                                                                               low, we demonstrate the steps involved in implementing
𝑐𝑖 is applied to modality 𝑓𝑖 . The final prediction can be
                                                                               the KNORA-U technique.
expressed as:
                                             Training                                                                                                                          DSEL                                                                                                 Testing
                                               data                                                                                                                            data                                                                                                  data

                                                                                                                                                                                                                                                                             xj
            Feature set 1
                                                     Base classiﬁer 1

            Feature set 2                                                                                                                                                                       Competence 1
                                                     Base classiﬁer 2
                                                                                                                  C                                       φ2 φ3                                 Competence 2
                                                                                                               C1 3 C2                                      x φ                                                                 Classiﬁer             C1 C 2             Aggregation
            Feature set 3                                                                                                                                 φ1 j 4                                Competence 3                                            CN
                                                     Base classiﬁer 3                                            C 4 CM                                                                                                         Selection
                                                                                                                                                             φk


                                                                                                                                                                                                    ...
                                                                                                                                                                                                Competence M
                        ...


                                                                     ...                                   Pool of trained                        Region of                                                                                           Selected
           Feature set M                                                                                     classiﬁers                         competence: Φ                                                                                     classiﬁers: EoC
                                                    Base classiﬁer M


      Training Phase                                                                                                                             Selection Phase                                                                                                              Explainability


                                                                                                                                                               Contribution on decision
        [Classiﬁer 1] XGB - Weight: 1.00             [Classiﬁer 2] XGB - Weight: 1.00         Selected Classiﬁer     Competence Prediction   Conﬁdence
    Feature 72                                   Feature 7
    Feature 7                                    Feature 10
                                                                                              [Classiﬁer 1] XGB         1.00       2           0.99
    Feature 70
    Feature 54
                                                 Feature 11
                                                 Feature 1
                                                                                              [Classiﬁer 2] XGB         1.00       2           0.99                                                                                                       Prediction
    Feature 10                                   Feature 19                                                                                                                                                                         Region of
    Feature 82
    Feature 8
                                                 Feature 4
                                                 Feature 21
                                                                                              [Classiﬁer 3] MLP
                                                                                              [Classiﬁer 4] SVC
                                                                                                                        0.42
                                                                                                                        0.71
                                                                                                                                   1
                                                                                                                                   3
                                                                                                                                               0.39
                                                                                                                                               0.22
                                                                                                                                                                                                                                  Competence: Φ
                                                                                                                                                                                                                                                                                 Models
    Feature 83
    Feature 11
    Feature 49
                                                 Feature 8
                                                 Feature 24
                                                 Feature 9
                                                                                              [Classiﬁer 5] XGB         1.00       2           0.99
                                                                                                                                                                                                                x Test sample
                                                                                                                                                                                                                  0: AD
                                                                                                                                                                                                                                                                               Evaluations
                                                                                                                                                                                                                  1: sMCI
                 0.0         0.2
                       Shap values
                                           0.4                0.0         0.2
                                                                    Shap values
                                                                                        0.4   [Classiﬁer 6] KNN         1.00       2           1.00
                                                                                                                                                         0.0      0.5    1.0    1.5       2.0
                                                                                                                                                                                                                  2: CN
                                                                                                                                                                                                                  3: pMCI                               Explainability
                       Feature Importance                                                                          Contribution of models                                                            Case-based Reasoning

                                                                                                              Explanation Interface                                                                                                                            Model Evaluation

Figure 1: The architecture of the proposed late fusion dynamic ensemble learning framework implemented by Infodeslib.


Algorithm 1 Late fusion DES method                                                                                                                                                                each test instance. By setting plot=True, explainability for
     Input: Pool of classifiers 𝐶, training dataset 𝐷𝑡𝑟 , vali-                                                                                                                                   the given test instance can be visualized through a variety
dation dataset 𝐷𝑣𝑎 , testing dataset 𝐷𝑡𝑒 , feature set 𝐹 , and                                                                                                                                    of methods (see more details in Section 3).
neighborhood size 𝐾                                                                                                                                                                               Infodeslib Methods. Figure 2 provides an overview
     Output: 𝐸𝑜𝐶𝑡* , an ensemble of classifiers for each                                                                                                                                          of the key methods of our library while other support-
testing sample 𝑡 in 𝐷𝑡𝑒                                                                                                                                                                           ing methods are available in the documentation of the
  1: for each classifier 𝑐𝑖 in 𝐶 do                                                                                                                                                               library. Some of these methods such as fit(), predict(),
  2:     Optimize 𝑓𝑖 in 𝐹 for 𝑐𝑖 ;                                                                                                                                                                predict_proba(), and score() are well-known and require
  3:     Optimize and train 𝑐𝑖 on 𝐷𝑡𝑟 with feature set 𝑓𝑖 ;                                                                                                                                       no detailed explanation; there are several other methods
  4: end for                                                                                                                                                                                      that are particularly useful for pool generation and ob-
  5: for each testing sample 𝑡 in 𝐷𝑡𝑒 do                                                                                                                                                          taining information about new test samples. To facilitate
  6:     Find Ψ as the K nearest neighbors of the testing                                                                                                                                         pool generation, we have implemented three additional
     sample 𝑡 in 𝐷𝑣𝑎 ;                                                                                                                                                                            methods: get_average_accuracy(), get_pool_diversity(), and
  7:     for each sample 𝜓𝑖 in Ψ do                                                                                                                                                               get_coverage_score(). get_average_accuracy() method com-
  8:         for each classifier 𝑐𝑖 in 𝐶 do                                                                                                                                                       putes the average performance of the classifiers in the pool
  9:             Calculate competence of 𝑐𝑖 on Ψ;                                                                                                                                                 on the validation data. get_pool_diversity() method cal-
 10:             Select ensemble of competent classifiers                                                                                                                                         culates the diversity between classifiers in the pool and re-
     𝐸𝑜𝐶𝑡* ;                                                                                                                                                                                      quires the diversity measure type as a parameter. It supports
 11:         end for                                                                                                                                                                              several diversity functions such as Q-statistic, Correlation
 12:     end for                                                                                                                                                                                  Coefficient, Disagreement Measure, Double Fault, Nega-
 13:     Use the ensemble 𝐸𝑜𝐶𝑡* to classify 𝑡;                                                                                                                                                    tive Double Fault, and Ratio Errors. get_coverage_score()
 14: end for                                                                                                                                                                                      method determines the number of samples in the DSEL data
                                                                                                                                                                                                  that can be accurately predicted by any model in the given
                                                                                                                                                                                                  pool. This information is particularly useful for evaluating
                                                                                                                                                                                                  the coverage of the pool and ensuring that all samples are
from infodeslib.des.knorau import KNORAU
                                                                                                                                                                                                  accurately classified by at least one model. The prediction
pool_classifiers = [classifier1, ..., classifierN]                                                                                                                                                process in machine learning often involves the use of en-
# feature_set1 is a list of columns                                                                                                                                                               semble methods, where multiple classifiers are combined
feature_sets = [feature_set1, ..., feature_setN]                                                                                                                                                  to improve performance. Within these ensembles, three
                                                                                                                                                                                                  methods play a crucial role: get_region_of_competence(x),
# Initialize the DS model
knorau = KNORAU(pool_classifiers, feature_sets)
                                                                                                                                                                                                  estimate_competence(roc), and select(competences).
                                                                                                                                                                                                     get_region_of_competence(x) method identifies the re-
# Fit the dynamic selection model                                                                                                                                                                 gion of competence for a given test sample by returning
knorau.fit(X_dsel, y_dsel)                                                                                                                                                                        the k nearest neighbors from the validation dataset. This
                                                                                                                                                                                                  is achieved by applying the k-nearest neighbors algorithm.
# Predict new examples
knorau.predict(X_test, plot=True)
                                                                                                                                                                                                  The estimate_competence(roc) method calculates the com-
                                                                                                                                                                                                  petence of each classifier in the ensemble on the region of
# Check performance (based on accuracy)                                                                                                                                                           competence. The competence calculation differs depending
knorau.score(X_test, y_test)                                                                                                                                                                      on the technique being used. For example, the k-Nearest
                                                                                                                                                                                                  Oracle Union (KNORA-U) technique calculates the accuracy
  When utilizing the predict(X) method, an additional pa-                                                                                                                                         of each classifier on the region of competence. The Dynamic
rameter "plot" can be included to obtain explainability for
                                                                                                   [Classiﬁer 1] XGB - Weight: 1.00             [Classiﬁer 2] XGB - Weight: 1.00
                                           Infodeslib
                                                                                               Feature 72                                   Feature 7
  ﬁt(X, y)                                              predict(X, plot=False)                 Feature 7                                    Feature 10
                                                                                               Feature 70                                   Feature 11
      Prepare the DS model by pre-processing                Return the class label for         Feature 54                                   Feature 1
      the information required to apply the DS              each sample in X. plot=True        Feature 10                                   Feature 19
      methods.                                              for getting the explainability.    Feature 82                                   Feature 4
                                                                                               Feature 8                                    Feature 21
 score(X, y)                                            predict proba(X)                       Feature 83                                   Feature 8
     Return the mean accuracy on the given                                                     Feature 11                                   Feature 24
                                                            Return the probabilities for
                                                                                               Feature 49                                   Feature 9
     data and labels.                                       each sample in X.
                                                                                                            0.0         0.2           0.4                0.0         0.2           0.4
                                                                                                                  Shap values                                  Shap values
                  Pool                                        Single instance
   get_average_accuracy()                          get_region_of_competence(x)
        Return the mean accuracy of                      Return k nearest samples of the       Figure 3: Local feature importance of each selected classifier.
        classiﬁers in the pool.                          given test sample from validation
                                                         dataset.
   get_pool_diversity()
        Return the mean and list of                estimate_competence(roc)
        diversity scores between                          Return the competences of each       including basic approaches such as grid search and random
        classiﬁers in the pool.
                                                                                               search in Sklearn, as well as more advanced techniques like
                                                          base classiﬁer on k nearest
                                                          samples from RoC.
   get_coverage_score()                                                                        Genetic algorithms, Bayesian optimization, and others. Our
                                                   select(competences)
        Return the explainability how
        the given pool of classiﬁers                     Return all base classiﬁers that are
                                                                                               library is designed to work seamlessly with other Python
        can cover the task on validation
        data.
                                                         competent enough.                     packages such as TPOT [28], Scikit-Optimize [29], Optuna
                                                   get_rareness_score(x)
                                                                                               [30], Hyperopt [31], BayesianOptimization [32], GPyOpt
                                                          Return the explainability how the    [33], Optunity [34], and similar packages that implement
                                                          given test sample is rare on
                                                          trainingand validation data.
                                                                                               these advanced optimization techniques. This allows users
                                                                                               to leverage a variety of optimization methods to obtain the
                                  Hyperparameters                                              best possible hyperparameters for their ensemble models.
  k: int - number of neighbors used to estimate the competence of the base classiﬁers.
                                                                                                  In our library, there are several key hyperparameters that
  DFP: boolean - determines if the dynamic frienemy pruning is applied.                        users can adjust to optimize the performance of ensemble
  knn_metric: str or callable - distance metric utilized by the k-NN classiﬁer.                learning models. We present these key hyperparameters
  dimensionality_reduction: boolean - determines if dimension reduction is applied.            along with their default values, which have been shown to
  reduction_technique: str or callable - technique utilized for dimension reduction.           produce satisfactory results in the majority of cases. One
  n_components: int - number of components to keep.                                            of the main hyperparameters is k (default: 7), which rep-
  cbr_features: list - list of features to show in cased based reasoning XAI.                  resents the number of neighbors to be considered when
                                                                                               determining the region of competence. Another important
                                                                                               hyperparameter is DFP (default: False), which stands for
Figure 2: The overall schema of the software architecture.                                     dynamic pruning technique and is particularly useful for
                                                                                               imbalanced datasets. In addition, users can also specify the
                                                                                               knn_metric (default: ’minkowski’), which determines the
                                                                                               distance metric used when computing distances between the
Ensemble Selection KNN (DESKNN) technique, on the other
                                                                                               test sample and other samples in the validation dataset. Our
hand, computes each classifier’s accuracy and diversity on
                                                                                               library provides several common metrics such as Minkowski,
RoC and uses these metrics to assess its competence.
                                                                                               cosine, Manhattan, and Euclidean, as well as the option for
   select(competences) method selects the most competent
                                                                                               users to define their own custom metric function. To han-
classifiers from the ensemble to make a prediction. Differ-
                                                                                               dle high-dimensional datasets, we also offer a dimension-
ent techniques may use different criteria for determining
                                                                                               ality_reduction (default: False) hyperparameter, which
the competence of a classifier, such as the number of sam-
                                                                                               allows users to reduce the number of dimensions used in
ples classified correctly within the region of competence.
                                                                                               calculating distances between samples. This can be achieved
For instance, the KNORA-U technique selects a classifier
                                                                                               using either Principal Component Analysis (PCA) or Kernel
if it has classified at least one sample within the region
                                                                                               PCA, or by specifying a custom dimensionality reduction
of competence. Once the competent classifiers have been
                                                                                               technique using the next reduction_technique (default:
identified, their competence values are used as weights in
                                                                                               ’pca’). The n_component (default: 20) hyperparameter
aggregating their predictions. To evaluate a single test in-
                                                                                               determines the number of components to be retained if a
stance, our library includes get_rareness_score(x) method,
                                                                                               reduction technique is selected. Lastly, for those interested
which provides a detailed description of the instance. The
                                                                                               in explainability, our library provides the cbr_features (de-
method evaluates whether there are many similar samples
                                                                                               fault: None) hyperparameter, which allows users to specify
to the given instance in the training and validation datasets,
                                                                                               a list of important features to be included in similar cases
allowing users to determine the rarity of the instance. If
                                                                                               data for Case-Based Reasoning.
the instance is an outlier, the method provides informa-
tion about how far it is from other classes. Furthermore,
get_rareness_score(x) method uses K-means clustering to                                        3. Model Explainability
provide a potential class for the instance and generates ta-
bles indicating which features of the instance make it similar                                 In the current version of our library, we offer three main
to this class. This approach provides valuable insights into                                   XAI techniques: case-based reasoning, deep-based classi-
the characteristics of the instance and its potential classifi-                                fier contribution, and local feature importance [20]. The
cation, aiding in the development of more accurate models.                                     case-based reasoning technique aims to offer domain ex-
Hyperparameters. Optimizing hyperparameters is a criti-                                        perts an explanation of the model’s prediction process for a
cal step for improving the performance of ensemble learning                                    given test sample by presenting them with similar samples
models. This can be achieved through various techniques,                                       and their corresponding labels found within the region of
                                                                                                                                                                  Contribution on decision
                                                                                                   Selected Classiﬁer   Competence Prediction   Conﬁdence

                                                                                                   [Classiﬁer 1] XGB       1.00       2           0.99
                                                                                                   [Classiﬁer 2] XGB       1.00       2           0.99
                                                                                                   [Classiﬁer 3] MLP       0.42       1           0.39
                                                                                                   [Classiﬁer 4] SVC       0.71       3           0.22
                                                                                                   [Classiﬁer 5] XGB       1.00       2           0.99
                                                                                  Region of
                                                                                Competence: Φ      [Classiﬁer 6] KNN       1.00       2           1.00
                                                                                                                                                            0.0      0.5    1.0    1.5       2.0

                                                         x Test sample
                                                            0: AD
                                                            1: sMCI                                Figure 5: The contribution of each selected classifier on the final
                                                            2: CN
                                                            3: pMCI
                                                                                                   decision.


            a) Estimating the region of competence (RoC) in validation dataset.
                                                                                                   level of explainability can be utilized. This is illustrated in
                                                                                                   Figure 5, which provides detailed information about each
Samples in the region of competence with selected features and labels
                                                                                                   classifier in the pool, including their competence level, in-
Feature 7     Feature 8      Feature 10     Feature 11    ...   Feature 72   Feature 84   Target
                                                                                                   dividual prediction on the new test sample, and confidence
0.467
0.190
              0.00
              0.00
                             0.00
                             0.00
                                            0.22
                                            0.12
                                                          ...
                                                          ...
                                                                0.59
                                                                0.64
                                                                             0.29
                                                                             0.23
                                                                                            2
                                                                                            2
                                                                                                   level. This explanation provides valuable insight for the
0.619         0.00           0.02           0.12          ...   0.84         0.36           2      development of an ensemble model, as it allows developers
0.524         0.00           0.00           0.04          ...   0.84         0.29           2      to identify classifiers that may have a negative impact on
0.524         0.25           0.07           0.28          ...   0.78         0.29           1
                                                                                                   decision-making. For instance, as shown in Figure 5, it is
                                                                                                   evident that most selected classifiers predict the label of the
0.524         0.00           0.00           0.12          ...   0.52         0.24           2
0.619         0.00           0.00           0.00          ...   0.52         0.09           2
                                                                                                   given test sample as 2 with high confidence, while the SVC
                b) Detailed information about the selected sample for RoC.                         classifier predicts it as 3. The SVC classifier demonstrates
                                                                                                   a higher level of competence in the region of competence,
                                                                                                   indicating that it has a more significant influence on the
Figure 4: Estimating a region of competence (RoC) and providing
details about the selected sample for RoC.
                                                                                                   decision. If this classifier consistently has a negative impact
                                                                                                   on many test samples, it may be possible to remove it from
                                                                                                   the pool of classifiers.
                                                                                                      Local feature importance. In addition to understand-
competence. This approach closely resembles how domain                                             ing how the classifiers contributed to the decision-making
experts make decisions in real-world situations, as they fre-                                      process, it is also important to identify which features were
quently compare current cases with historical ones from                                            particularly influential in making those decisions. For the
their experience. The deep-based classifier contribution                                           example mentioned earlier, we provide local feature impor-
technique enables users to comprehend the contribution of                                          tance for each selected classifier, which can be visualized
each selected classifier in the decision-making process for                                        through Figure 3.
a given test sample. Finally, the local feature importance                                            Furthermore, our proposed ensemble models have the
technique is a prevalent explainability method that identi-                                        ability to provide interpretable explanations using two ap-
fies the most crucial features and their corresponding Shap                                        proaches: surrogate model explainability and post-hoc ex-
values for each selected classifier.                                                               plainability methods. The surrogate model approach in-
   Case-based reasoning. For example, in the case of the                                           volves creating a simplified model that roughly represents
KNORA-U technique, in the selection phase, the nearest                                             the behavior of the original ensemble model and using this
neighbors for each test instance are estimated in the valida-                                      model to explain the ensemble’s decisions. On the other
tion dataset based on their close similarity to the test sample.                                   hand, post-hoc explainability techniques involve analyzing
The selected samples are used to generate the region of com-                                       the ensemble model’s decisions after they have been made
petence for evaluating and selecting classifiers in the pool.                                      and providing explanations based on the input features that
Figure 4 a) illustrates an example in which the given test                                         contributed the most to the decision. Both methods treat
sample (light blue x) falls within the area of class 2, and                                        our ensemble model as a black box model.
seven nearest samples are selected, six of which belong to
class 2 (blue dots), while one belongs to class 1 (green dot).
This finding suggests that, for the given test sample, the                                         4. Performance Analysis
chance of it being classified as class 2 is high. Moreover,
these samples can also be leveraged for conducting case-                                           Within this section, We compare the performance of the
based reasoning, which may be particularly valuable for                                            proposed architecture with the existing approaches. we
physicians, given that our dataset is in the medical domain.                                       provide an overview of the datasets that have been utilized
Figure 4 b) provides comprehensive information about all                                           along with a detailed analysis of our proposed techniques.
nearest samples within the region of competence, enabling
physicians to compare and contrast similar samples and                                             4.1. Evaluation Datasets
their corresponding labels or diagnoses.
   Deep-based classifiers contributions. After selecting                                           In this section, we outline the five datasets utilized to com-
the group of classifiers for making the final decision, it may                                     pare Infodeslib with existing models.
be unclear how each classifier in the pool contributed to the                                         Alzheimer’s Disease Neuroimaging Initiative
decision or what their individual predictions were for the                                         (ADNI) dataset [35]. The study includes a total of
new test sample. In order to provide a more comprehensive                                          1,371 subjects, with a male gender representation of
understanding of the decision-making process, an additional                                        54.5%. Participants have been classified into four distinct
                                                                                                   categories based on their clinical diagnosis, including
Cognitive Normal (CN), Stable Mild Cognitive Impairment           Table 2
(sMCI), Progressive Mild Cognitive Impairment (pMCI), and         Performance of the different ML approaches using ADNI dataset.
Alzheimer’s Disease (AD) [36]. The distribution of these           Model Type        Model      Accuracy     Precision    Recall       F1
classes is as follows: 419 CN, 473 sMCI, 140 pMCI, and 339         Single
                                                                                     XGB
                                                                                     LGBM
                                                                                                87.11±2.32
                                                                                                86.74±1.58
                                                                                                             87.50±2.63
                                                                                                             87.34±1.96
                                                                                                                          87.11±2.32
                                                                                                                          86.74±1.58
                                                                                                                                       87.03±2.49
                                                                                                                                       86.70±1.72
                                                                   Models
AD individuals. The dataset has four distinct modalities                             RF         87.11±1.96   87.51±2.22   87.11±1.96   87.08±2.08
                                                                   [Early] Static    Voting     88.08±1.94   88.31±2.07   88.08±1.94   88.05±2.02
or feature groups, which contain demographics, cognitive           Ensemble          Stacking   86.87±2.02   87.67±2.11   86.87±2.02   86.80±2.15
scores, assessment tests, and MRI features.                        [Early] Dynamic   DESP       88.61±1.96   88.72±2.15   88.61±1.96   88.55±2.08
                                                                   Ensemble          KNOP       88.71±1.91   88.80±2.09   88.71±1.91   88.66±2.03
   Credit Card Clients dataset. The study includes a vast          [Late] Static     Voting     89.29±1.67   89.39±1.81   89.29±1.67   89.24±1.74
participant cohort of 30,000 individuals, with the dataset         Ensemble          Stacking   87.65±1.59   88.13±1.64   87.65±1.59   87.60±1.74
                                                                   [Late] Dynamic    KNORAU     89.52±2.01   89.77±2.01   89.52±2.01   89.46±2.10
sourced from the UC Irvine Machine Learning Repository             Ensemble          KNORAU-W   89.84±1.83   90.29±2.03   89.81±1.83   89.80±1.91
[37]. This is a classification problem that involves determin-
ing whether or not a client will make their next payment.
                                                                  Table 3
The two distinct classes are labeled as ’no’ and ’yes’, with
                                                                  Performance of the different ML approaches on Credit Card
23,364 and 6,636 instances, respectively. The dataset has         Clients dataset.
four distinct modalities of features, including demographics,      Model Type        Model      Accuracy     Precision    Recall       F1
financial, and payment history features.                           Single
                                                                                     XGB        81.96±0.84   89.05±0.72   72.87±1.28   80.15±1.02
                                                                                     LGBM       80.43±0.64   86.88±0.66   71.69±0.92   78.56±0.76
   National Alzheimer’s Coordinating Center (NACC)                 Models
                                                                                     RF         79.90±0.78   87.49±0.70   69.76±1.13   77.63±0.95
dataset [38]. In this study, we examined a total of 37,547         [Early] Static    Voting     83.94±0.74   88.52±0.72   78.01±1.11   82.93±0.84
                                                                   Ensemble          Stacking   82.68±0.69   86.72±0.77   77.17±0.95   81.67±0.77
patients focusing on the Global Clinical Dementia Rating           [Early] Dynamic   DESKNN     83.99±0.38   89.34±0.53   77.18±0.60   82.82±0.42
(CDRGLOB) as the primary task. CDRGLOB categorizes                 Ensemble          KNORAE     84.16±0.66   88.81±0.64   78.17±0.97   83.15±0.75
                                                                   [Late] Static     Voting     85.72±0.44   89.83±0.57   80.56±0.73   84.94±0.49
patients into five classes based on dementia severity: no          Ensemble          Stacking   85.08±0.47   89.30±0.36   79.71±0.94   84.23±0.57
impairment (8,253 patients), mild impairment (15,097 pa-           [Late] Dynamic    KNOP       86.65±0.23   91.64±0.28   80.66±0.52   85.80±0.28
                                                                   Ensemble          KNORAU-W   86.73±0.29   91.76±0.33   80.70±0.64   85.87±0.36
tients), moderate impairment (8,346 patients), and severe
impairment (5,851 patients). Our analysis included six spe-
cific modalities for investigation: demographics, physical        heterogeneous baseline algorithms: XGboost (XGB), Light-
health, medications, health history, neuropsychiatric inven-      GBM (LGBM), Random Forest (RF), Support Vector Classifier
tory questionnaire, and the geriatric depression scale. These     (SVC), Multi-Layer Perceptron (MLP), Decision Tree (DT),
modalities were chosen to comprehensively assess various          and k-Nearest Neighbors (KNN).
aspects related to dementia and overall patient health [39].         Results based on ADNI dataset. Table 2 and Figure
   Parkinson’s Progression Markers Initiative (PPMI)              6 a) show the top-performing results achieved by the in-
dataset [40]. Our study involves 952 patients and fo-             dividual models, as well as the static ensemble with early
cuses on a binary classification task to differentiate between    fusion, the dynamic ensemble with early fusion, the static
healthy individuals and those diagnosed with Parkinson’s          ensemble with late fusion, and our proposed technique -
disease (PD). Among these patients, 389 are categorized           the dynamic ensemble with late fusion setting. From each
as healthy, while 563 have been diagnosed with PD. The            group, we selected the best-performed techniques and the
dataset encompasses various information modalities, in-           results show that our dynamic ensemble techniques, KNO-
cluding subject characteristics, biospecimen data, medical        RAU and KNORAU-W outperform all existing approaches
history records, motor function assessments, and non-motor        with 89.52% and 89.84% accuracy. In comparison, a static
features. This comprehensive dataset enables a thorough           ensemble with late fusion, voting classifier, achieves an accu-
analysis to identify potential diagnostic markers and factors     racy of 89.29%. This performance is close to the performance
associated with PD, facilitating improved understanding           of our model and surpasses that of early fusion techniques.
and diagnosis of the disease [41].                                This result supports our claim for the significance of late
   Samarkand Neonatal Center dataset. Our study                   fusion in producing accurate ensemble models.
involved 347 neonates from the intensive care unit at                Results based on Credit Card Clients dataset. Table 3
Samarkand Neonatal Center. The dataset was collected by           and Figure 6 b) present the results obtained from the analysis
our team by collaborating physicians in the hospital for a        of the Credit Card Clients dataset, following a similar format
binary classification task to predict whether a neonate sur-      to the previous dataset. Our proposed techniques have once
vives or passes away. Among these neonates, 303 survived          again outperformed the existing approaches in this instance.
and 44 died during the study period. The dataset comprises        Specifically, KNOP and KNORAU-W, utilizing the late fusion
a comprehensive set of features categorized into multiple         setting, have achieved the highest accuracy scores of 86.65%
modalities: demographic information, the mother’s medical         and 86.73%, respectively. In comparison, the static ensemble
history and information, general notes on the neonate’s           methods that apply late fusion, specifically the voting and
condition, results from blood tests, and APGAR scores (a          stacking classifiers, demonstrate accuracies of 85.72% and
standardized assessment of a neonate’s health at birth).          85.08%, respectively. In contrast, the ensemble methods
                                                                  that employ early fusion achieve the highest accuracy of
4.2. Results                                                      84.16%, with the dynamic selection technique known as
                                                                  KNORA-E. These results support our argument regarding
This section contains a comprehensive analysis and compar-
                                                                  the importance of utilizing late fusion for the purpose of
ison of various machine-learning approaches against our
                                                                  producing highly accurate ensemble models.
proposed late-fusion dynamic ensemble selection model.
                                                                     Results based on NACC dataset. Table 4 highlights
We collect and present the testing results for each of the
                                                                  the results from the analysis of the National Alzheimer’s
considered models. To ensure greater consistency in the
                                                                  Coordinating Center dataset, structured similarly to the pre-
results, we have applied the 10-holdout testing method [42].
                                                                  vious dataset. Among all existing techniques, the dynamic
The results are presented in the form of (mean ± standard
                                                                  ensemble models with late fusion demonstrate notably su-
deviation). As a pool of classifiers, we utilized the following
Table 4                                                                             Table 6
Performance of the ML approaches on the NACC dataset.                               Performance of the different ML approaches on Samarkand
 Model Type        Model       Accuracy      Precision    Recall       F1           Neonatal Center dataset.
                   GB          85.70+1.16    85.71+1.16   85.77+1.07   85.56+1.19    Model Type          Model       Accuracy      Precision      Recall       F1
 Single
                   XGB         86.30+1.47    86.30+1.47   86.32+1.48   86.18+1.51                        RF          69.34+4.66    69.34+4.66     73.70+4.20   67.71+5.44
 Models                                                                              Single
                   RF          86.79+0.74    86.79+0.74   86.76+0.82   86.69+0.76                        XGB         69.74+7.64    69.74+7.64     74.29+6.31   67.77+9.16
 [Early] Static    Voting      87.52+0.90    87.53+0.90   87.42+0.93   87.40+0.90    Models
                                                                                                         LGBM        70.07+8.58    70.07+8.58     72.65+7.52   68.74+9.62
 Ensemble          Stacking    87.17+1.29    87.17+1.29   87.20+1.19   87.11+1.26    [Early] Static      Voting      73.03+8.03    73.03+8.03     75.50+6.57   72.00+8.95
 [Early] Dynamic   DESKNN      87.30+1.16    87.61+1.16   87.37+1.06   87.39+1.14    Ensemble            Stacking    71.45+5.11    71.45+5.11     75.70+4.39   70.06+5.87
 Ensemble          KNORAU      88.34+1.44    88.34+1.44   88.34+1.41   88.27+1.45    [Early] Dynamic     KNORAU      71.64+6.84    71.64+6.84     75.30+5.37   70.28+7.74
 [Late] Static     Voting      89.39+1.34    89.39+1.34   89.43+1.30   89.36+1.33    Ensemble            DESP        72.45+6.03    75.95+6.03     72.48+5.24   71.48+7.05
 Ensemble          Stacking    89.11+0.89    89.11+0.89   89.15+0.95   89.07+0.91    [Late] Static       Voting      75.07+6.94    75.07+6.94     77.96+5.19   74.12+7.90
 [Late] Dynamic    KNORAU-W    90.20+1.10    90.20+1.10   90.30+1.09   90.20+1.11    Ensemble            Stacking    74.21+7.72    74.21+7.72     77.95+5.38   72.84+9.14
 Ensemble          DESP        91.16+0.93    91.21+0.93   91.14+0.89   91.17+0.92    [Late] Dynamic      KNORAU-W    75.66+7.44    75.66+7.44     78.04+6.09   74.90+8.05
                                                                                     Ensemble            KNOP        77.57+5.81    77.57+5.81     80.58+4.21   76.84+6.35


Table 5                                                                                                             Early fusion    Late fusion
Performance of the ML approaches on the PPMI dataset.
 Model Type        Model      Accuracy      Precision     Recall       F1
                   RF         92.40±1.00    93.40±0.90    92.10±1.00   92.10±1.00
 Single


                                                                                              Accuracy
                   XGB        93.40±1.50    93.60±1.80    93.10±1.40   93.10±1.40
 Models
                   LGBM       93.90±1.60    93.90±2.00    93.70±1.40   93.70±1.40
 [Early] Static    Voting     94.20±0.70    94.20±0.80    94.00±0.70   94.00±0.70
 Ensemble          Stacking   94.10±0.90    94.00±1.10    93.90±0.90   93.90±0.90
 [Early] Dynamic   KNOP       94.20±1.10    94.30±1.20    93.90±1.20   93.90±1.20
 Ensemble          KNORAU     94.30±0.90    94.40±1.10    94.00±0.90   94.00±0.90
 [Late] Static     Voting     94.60±0.90    94.60±1.10    94.30±0.90   94.30±0.90
 Ensemble          Stacking   94.50±0.80    94.70±0.80    94.20±0.80   94.20±0.80
 [Late] Dynamic    DESP       95.00±0.90    95.20±0.90    94.70±1.00   94.70±1.00
 Ensemble          KNOP       95.10±0.60    95.40±0.70    94.70±0.60   94.70±0.60
                                                                                                                       a) ADNI dataset.


perior performance. Specifically, the weighted KNORAU
(KNORAU-W) and DESP achieve the highest scores at 90.20%                                      Accuracy

and 91.16%, respectively. Given the substantial dataset size,
the results are well-balanced across various metrics.
   Results based on PPMI dataset. Table 5 presents the re-
sults obtained from the analysis of the Parkinson’s Progres-
sion Markers Initiative dataset, following a format similar
to the previous dataset. Within this dataset, the techniques
DESP and KNOP, utilizing late fusion settings, exhibit the                                                     b) Credit Card Clients dataset.
most robust performance among other algorithms, achiev-
                                                                                    Figure 6: Contribution of each selected classifier to the final
ing accuracies of 95% and 95.1%, respectively. Additionally,
                                                                                    decision.
static ensemble models with late fusion settings demonstrate
strong performance at 94.6% accuracy using a voting tech-
nique. These results only marginally exceed those achieved
with LGBM alone, which achieved a performance of 93.9%.                             5. Library extension
   The fact that the LGBM achieved a high accuracy of 93.9%
                                                                                    The primary focus of our paper is to introduce a novel ap-
suggests that the task at hand is not very complex. Improv-
                                                                                    proach to dynamic ensemble selection (DES) that utilizes a
ing accuracy beyond this point becomes more challenging
                                                                                    late fusion strategy for effectively fusing multi-modal data
when a basic technique like LGBM already performs well.
                                                                                    and offers a high degree of explainability for dynamic selec-
Essentially, reaching significantly higher accuracies with
                                                                                    tion techniques. Our current library offers implementations
more advanced methods might be difficult because the task
                                                                                    of four dynamic classifier selection and seven dynamic en-
is relatively straightforward.
                                                                                    semble selection techniques that use a late fusion strategy.
   Results based on Samarkand Neonatal Center
                                                                                    In addition, the library includes several features and options
dataset. Table 6 presents the results obtained from ana-
                                                                                    that enhance its performance and capability. Furthermore,
lyzing the Samarkand Neonatal Center ICU dataset, follow-
                                                                                    the library provides three different types of explainability to
ing a similar structure to the previous datasets. Due to the
                                                                                    help users gain insights into the decision-making processes
dataset’s small size, the results may not be consistent or bal-
                                                                                    of the models. Finally, the library has been designed to be
anced across different metrics. Nonetheless, our proposed
                                                                                    compatible with other important libraries, allowing users
late fusion-based dynamic ensemble models achieve notably
                                                                                    to easily integrate it into their existing workflows.
higher performance compared to other techniques, reaching
                                                                                       We plan to continue exploring the domain of explainabil-
77.57% accuracy with the KNOP technique.
                                                                                    ity in ensemble learning by proposing additional techniques
   Across all five datasets analyzed, the importance of late
                                                                                    for providing comprehensive explanations to domain ex-
fusion can be seen in the results. In each dataset, the dy-
                                                                                    perts. Our goal is to enhance our library’s ability to provide
namic ensemble models with late fusion settings outper-
                                                                                    context-based explanations that are tailored to the specific
form other existing models. Combining late fusion with
                                                                                    needs of users. Additionally, we aim to incorporate what-if
dynamic ensemble learning consistently delivers promis-
                                                                                    explainability features that enable developers to gain deeper
ing and improved results. This highlights the effectiveness
                                                                                    insights into the behavior of their ensemble models. These
and reliability of employing late fusion techniques within
                                                                                    features will be included in future versions of our library.
dynamic ensemble models across various datasets.
                                                                                       Through our experimental evaluations, we have discov-
                                                                                    ered that selecting an appropriate pool of classifiers with
                                                                                    matching feature groups is a critical aspect of successful
ensemble modeling. However, identifying the ideal combi-          [6] A. S. Britto Jr, R. Sabourin, L. E. Oliveira, Dynamic se-
nation of classifiers for the pool remains a challenging task.        lection of classifiers—a comprehensive review, Pattern
In future versions of our library, we plan to address this            recognition 47 (2014) 3665–3680.
issue by developing an automatic optimization process for         [7] X. Dong, Z. Yu, W. Cao, Y. Shi, Q. Ma, A survey on
the selection of the optimal pool of classifiers. We believe          ensemble learning, Frontiers of Computer Science 14
this to be a crucial task in the field of ensemble learning,          (2020) 241–258.
and we are committed to exploring ways to simplify this           [8] L. Breiman, Bagging predictors, Machine learning 24
process and make it more effective.                                   (1996) 123–140.
                                                                  [9] G. Sakkis, I. Androutsopoulos, G. Paliouras,
                                                                      V. Karkaletsis, C. D. Spyropoulos, P. Stamatopoulos,
6. Conclusion                                                         Stacking classifiers for anti-spam filtering of e-mail,
                                                                      arXiv preprint cs/0106040 (2001).
This paper presents a novel approach to dynamic selection
                                                                 [10] A. H. Ko, R. Sabourin, A. S. Britto Jr, From dynamic
using a late fusion setting, which is implemented across four
                                                                      classifier selection to dynamic ensemble selection, Pat-
dynamic classifier selection and seven dynamic ensemble
                                                                      tern recognition 41 (2008) 1718–1731.
selection techniques. This late fusion-based approach is
                                                                 [11] R. M. Cruz, R. Sabourin, G. D. Cavalcanti, Dynamic
particularly well-suited for complex tasks based on multi-
                                                                      classifier selection: Recent advances and perspectives,
modal datasets containing multiple feature groups, which
                                                                      Information Fusion 41 (2018) 195–216.
are common in real-world scenarios. As a result, the role of
                                                                 [12] F. Juraev, S. El-Sappagh, E. Abdukhamidov, F. Ali,
late fusion is crucial in the context of ensemble learning for
                                                                      T. Abuhmed, Multilayer dynamic ensemble model
ensuring diversity in the pool of classifiers. Furthermore,
                                                                      for intensive care unit mortality prediction of neonate
we introduce a novel approach to explainability for dynamic
                                                                      patients, Journal of Biomedical Informatics 135 (2022)
selection techniques. Our proposed approach goes beyond
                                                                      104216.
the traditional methods and provides a more in-depth and
                                                                 [13] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
nuanced understanding of the dynamic selection process.
                                                                      B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
The effectiveness of our proposed techniques is evaluated
                                                                      R. Weiss, V. Dubourg, et al., Scikit-learn: Machine
through a comprehensive comparison with existing base-
                                                                      learning in python, the Journal of machine Learning
line approaches. The experimental results demonstrate the
                                                                      research 12 (2011) 2825–2830.
superior performance of our proposed techniques over the
                                                                 [14] R. M. Cruz, L. G. Hafemann, R. Sabourin, G. D. Caval-
existing approaches, highlighting the potential of our ap-
                                                                      canti, Deslib: A dynamic ensemble selection library
proach to improving the accuracy and reliability of ensemble
                                                                      in python, The Journal of Machine Learning Research
learning systems.
                                                                      21 (2020) 283–287.
                                                                 [15] K. Liu, Y. Li, N. Xu, P. Natarajan, Learn to combine
Acknowledgments                                                       modalities in multimodal deep learning, arXiv preprint
                                                                      arXiv:1805.11730 (2018).
This work was supported by the National Research Foun-           [16] S. Raschka, Mlxtend: Providing machine learning
dation of Korea(NRF) grant funded by the Korea govern-                and data science utilities and extensions to python’s
ment(MSIT)(No. 2021R1A2C1011198), (Institute for Infor-               scientific computing stack, The Journal of Open Source
mation & communications Technology Planning & Evalua-                 Software 3 (2018). URL: https://joss.theoj.org/papers/
tion) (IITP) grant funded by the Korea government (MSIT)              10.21105/joss.00638. doi:10.21105/joss.00638.
under the ICT Creative Consilience Program (IITP-2021-           [17] S. El-Sappagh, F. Ali, T. Abuhmed, J. Singh, J. M.
2020-0-01821), and AI Platform to Fully Adapt and Reflect             Alonso, Automatic detection of alzheimer’s disease
Privacy-Policy Changes (No.RS-2022-II220688).                         progression: An efficient information fusion approach
                                                                      with heterogeneous ensemble classifiers, Neurocom-
                                                                      puting 512 (2022) 203–224.
References                                                       [18] C. G. Snoek, M. Worring, A. W. Smeulders, Early ver-
                                                                      sus late fusion in semantic video analysis, in: Proceed-
 [1] H. Xiao, Z. Xiao, Y. Wang, Ensemble classification
                                                                      ings of the 13th annual ACM international conference
     based on supervised clustering for credit scoring, Ap-
                                                                      on Multimedia, 2005, pp. 399–402.
     plied Soft Computing 43 (2016) 73–86.
                                                                 [19] F. Juraev, S. El-Sappagh, T. Abuhmed, Explainable
 [2] D. Di Nucci, F. Palomba, R. Oliveto, A. De Lucia, Dy-
                                                                      dynamic ensemble framework for classification based
     namic selection of classifiers in bug prediction: An
                                                                      on the late fusion of heterogeneous multimodal data,
     adaptive method, IEEE Transactions on Emerging
                                                                      in: Intelligent Systems Conference, Springer, 2023, pp.
     Topics in Computational Intelligence 1 (2017) 202–212.
                                                                      555–570.
 [3] O. Sagi, L. Rokach, Ensemble learning: A survey, Wiley
                                                                 [20] S. Ali, T. Abuhmed, S. El-Sappagh, K. Muhammad, J. M.
     Interdisciplinary Reviews: Data Mining and Knowl-
                                                                      Alonso-Moral, R. Confalonieri, R. Guidotti, J. Del Ser,
     edge Discovery 8 (2018) e1249.
                                                                      N. Díaz-Rodríguez, F. Herrera, Explainable artificial
 [4] L. I. Kuncheva, A theoretical study on six classifier fu-
                                                                      intelligence (xai): What we know and what is left to
     sion strategies, IEEE Transactions on pattern analysis
                                                                      attain trustworthy artificial intelligence, Information
     and machine intelligence 24 (2002) 281–286.
                                                                      Fusion (2023) 101805.
 [5] M. Fernández-Delgado, E. Cernadas, S. Barro,
                                                                 [21] M. Sabourin, A. Mitiche, D. Thomas, G. Nagy, Classi-
     D. Amorim, Do we need hundreds of classifiers to
                                                                      fier combination for hand-printed digit recognition, in:
     solve real world classification problems?, The journal
                                                                      Proceedings of 2nd International Conference on Doc-
     of machine learning research 15 (2014) 3133–3181.
                                                                      ument Analysis and Recognition (ICDAR’93), IEEE,
                                                                      1993, pp. 163–166.
[22] K. Woods, W. P. Kegelmeyer, K. Bowyer, Combination          [38] D. L. Beekly, E. M. Ramos, W. W. Lee, W. D. Deitrich,
     of multiple classifiers using local accuracy estimates,          M. E. Jacka, J. Wu, J. L. Hubbard, T. D. Koepsell, J. C.
     IEEE transactions on pattern analysis and machine                Morris, W. A. Kukull, et al., The national alzheimer’s
     intelligence 19 (1997) 405–410.                                  coordinating center (nacc) database: the uniform data
[23] P. C. Smits, Multiple classifier systems for supervised          set, Alzheimer Disease & Associated Disorders 21
     remote sensing image classification based on dynamic             (2007) 249–258.
     classifier selection, IEEE Transactions on Geoscience       [39] N. Rahim, S. El-Sappagh, H. Rizk, O. A. El-serafy,
     and Remote Sensing 40 (2002) 801–813.                            T. Abuhmed, Information fusion-based bayesian opti-
[24] R. G. Soares, A. Santana, A. M. Canuto, M. C. P.                 mized heterogeneous deep ensemble model based on
     de Souto, Using accuracy and diversity to select clas-           longitudinal neuroimaging data, Applied Soft Com-
     sifiers to build ensembles, in: The 2006 IEEE Interna-           puting 162 (2024) 111749.
     tional Joint Conference on Neural Network Proceed-          [40] K. Marek, D. Jennings, S. Lasch, A. Siderowf, C. Tanner,
     ings, IEEE, 2006, pp. 1310–1316.                                 T. Simuni, C. Coffey, K. Kieburtz, E. Flagg, S. Chowd-
[25] M. C. de Souto, R. G. Soares, A. Santana, A. M. Canuto,          hury, et al., The parkinson progression marker ini-
     Empirical comparison of dynamic classifier selection             tiative (ppmi), Progress in neurobiology 95 (2011)
     methods based on diversity and accuracy for building             629–635.
     ensembles, in: 2008 IEEE international joint confer-        [41] M. Junaid, S. Ali, F. Eid, S. El-Sappagh, T. Abuhmed,
     ence on neural networks (IEEE world congress on com-             Explainable machine learning models based on mul-
     putational intelligence), IEEE, 2008, pp. 1480–1487.             timodal time-series data for the early detection of
[26] T. Woloszynski, M. Kurzynski, P. Podsiadlo, G. W. Sta-           parkinson’s disease, Computer Methods and Programs
     chowiak, A measure of competence based on random                 in Biomedicine 234 (2023) 107495.
     classification for dynamic ensemble selection, Infor-       [42] C. Sammut, G. I. Webb (Eds.), Holdout Evalua-
     mation Fusion 13 (2012) 207–213.                                 tion, Springer US, Boston, MA, 2010, pp. 506–507.
[27] P. R. Cavalin, R. Sabourin, C. Y. Suen, Dynamic selec-           URL: https://doi.org/10.1007/978-0-387-30164-8_369.
     tion approaches for multiple classifier systems, Neural          doi:10.1007/978-0-387-30164-8_369.
     computing and applications 22 (2013) 673–688.
[28] R. S. Olson, J. H. Moore, Tpot: A tree-based pipeline
     optimization tool for automating machine learning,
     in: Workshop on automatic machine learning, PMLR,
     2016, pp. 66–74.
[29] T. Head, M. Kumar, H. Nahrstaedt, G. Louppe,
     I. Shcherbatyi, scikit-optimize/scikit-optimize: v0. 8.1,
     Zenodo (2020).
[30] T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Op-
     tuna: A next-generation hyperparameter optimization
     framework, in: Proceedings of the 25th ACM SIGKDD
     International Conference on Knowledge Discovery
     and Data Mining, 2019.
[31] J. Bergstra, D. Yamins, D. Cox, Making a science of
     model search: Hyperparameter optimization in hun-
     dreds of dimensions for vision architectures, in: Inter-
     national conference on machine learning, PMLR, 2013,
     pp. 115–123.
[32] F. Nogueira, et al., Bayesian optimization: Open
     source constrained global optimization tool for python,
     URL https://github. com/fmfn/BayesianOptimization
     (2014).
[33] T. G. authors, Gpyopt: A bayesian optimization
     framework in python, http://github.com/SheffieldML/
     GPyOpt, 2016.
[34] M. Claesen, J. Simm, D. Popovic, Y. Moreau,
     B. De Moor, Easy hyperparameter search using optu-
     nity, arXiv preprint arXiv:1412.1114 (2014).
[35] S. G. Mueller, M. W. Weiner, L. J. Thal, R. C. Petersen,
     C. Jack, W. Jagust, J. Q. Trojanowski, A. W. Toga,
     L. Beckett, The alzheimer’s disease neuroimaging
     initiative, Neuroimaging Clinics of North America 15
     (2005) 869.
[36] N. Rahim, T. Abuhmed, S. Mirjalili, S. El-Sappagh,
     K. Muhammad, Time-series visual explainability for
     alzheimer’s disease progression detection for smart
     healthcare, Alexandria Engineering Journal 82 (2023)
     484–502.
[37] A. Asuncion, D. Newman, Uci machine learning repos-
     itory, 2007.