=Paper=
{{Paper
|id=Vol-3894/paper9
|storemode=property
|title=Infodeslib: Python Library for Dynamic Ensemble Learning using Late Fusion of Multimodal Data
|pdfUrl=https://ceur-ws.org/Vol-3894/paper9.pdf
|volume=Vol-3894
|authors=Firuz Juraev,Shaker El-Sappagh,Tamer Abuhmed
|dblpUrl=https://dblp.org/rec/conf/kil/JuraevEA24
}}
==Infodeslib: Python Library for Dynamic Ensemble Learning using Late Fusion of Multimodal Data==
Infodeslib: Python Library for Dynamic Ensemble Learning using
Late Fusion of Multimodal Data
Firuz Juraev1 , Shaker El-Sappagh1,2 and Tamer Abuhmed1,*
1
College of Computing and Informatics, Sungkyunkwan University, South Korea
2
Faculty of Computer Science and Engineering, Galala University, Egypt
Abstract
There has been a notable increase in research focusing on dynamic selection (DS) techniques within the field of ensemble learning.
This leads to the development of various techniques for ensembling multiple classifiers for a specific instance or set of instances
during the prediction phase. Despite this progress, the design and development of DS approaches with late fusion settings and their
explainability remain unexplored. This work proposes an open-source Python library, Infodeslib, to address this gap. The library
provides an implementation of several DS techniques, including four dynamic classifier selections and seven dynamic ensemble selection
techniques, all of which are integrated with late data fusion settings and novel explainability features. Infodeslib offers flexibility and
customization options, making it a versatile tool for various complex applications that require the fusion of multimodal data and various
explainability features. Multimodal data, which integrates information from diverse sources or sensor modalities, is a common and
essential setting for real-world problems, enhancing the robustness and depth of data analysis. These data can be fused in two main ways:
early fusion, where different modalities are combined at the feature level before model training, and late fusion, where each modality is
processed separately and the results are combined at the decision level. The library is fully documented following the Read the Docs
standards. The documentation, code, and examples are available anonymously on GitHub at https://github.com/InfoLab-SKKU/infodeslib.
Keywords
Ensemble of classifiers, Dynamic classifier selection, Dynamic ensemble selection, multimodal data fusion, Late fusion, Machine learning,
Explainable AI, Python.
Ensemble learning is a thriving domain within the fields feature groups to achieve greater diversity in the model pool.
of machine learning and pattern recognition [1, 2]. With This diversity is crucial for constructing a robust ensemble
all the diverse ensemble classifiers available, each classifier that can effectively generalize to previously unseen data.
approaches the problem from a different perspective. The Moreover, late fusion provides more flexibility as classifiers
main idea of ensemble learning is to leverage a group of are assigned to different modalities considering that certain
classifiers to provide comprehensive coverage of the learned classifiers are best to model certain modalities [15].
task [3]. By utilizing diverse models that exhibit distinct In current literature, late fusion-based ensemble learning
decision boundaries, ensemble learning seeks to maximize is solely available with static classifiers selection [16], and
the accuracy and effectiveness of the overall classification most of these studies show the superiority of late fusion
process. As a result, the performance of ensemble classi- compared to early fusion for static ensemble [17, 18, 19].
fiers is better than any of its base classifiers [4, 5]. This This motivates us to explore the performance of late fusion
is because each base classifier concentrates on the specific in dynamic selection compared to early fusion; however, to
region of the error space and combining the decisions of the best of our knowledge, no study or implementation has
these classifiers improves the overall ensemble’s decisions. been conducted to examine the performance of late fusion
Ensemble learning approaches can be broadly classified into in dynamic selection settings. This work aims to implement
two categories: static and dynamic selection approaches different types of dynamic selection techniques in the late
[6, 7]. In static selection [8, 9], a predetermined group of fusion setting. By doing so, we can explore the performance
classifiers is selected, and this group is utilized to make of late fusion-based ensemble learning under dynamic se-
decisions for each new test instance. In dynamic selection lection modeling, gaining a deeper understanding of its
[10, 11, 12], a new group of classifiers is selected for each potential advantages and limitations.
test instance, and this group is employed to make a decision Resulting late fusion-based dynamic ensembles are ex-
for that specific instance. pected to improve the performance of the resulting clas-
Since real-world datasets are often complex and consist sifiers. However, these models are black boxes and not
of multiple feature groups or so-called ‘modalities’, ensem- understandable. Trustworthy classifiers that are applicable
ble learning is a popular candidate to be used to combine in the real world need to be interpretable. Explainable AI
multiple models to improve the performance and robustness (XAI) has gained significant attention in recent years [20],
of predictive models. One approach to ensemble learning as it is crucial to provide insights into the decision-making
is early fusion, where all modalities are merged in a pool process of machine learning models. However, despite the
for the classifiers to capture the potential interaction and growing interest in this area, there is a lack of explainabil-
interdependencies among the modalities using either static ity features for ensemble learning techniques, which are
[13] or dynamic selection [14]. increasingly used in complex real-world applications to im-
Another approach to ensemble learning is the late fu- prove the trustworthiness of resulting models. To the best
sion or decision fusion, where each classifier in the pool of our knowledge, no study in the literature and no Python
is trained with different feature groups or combinations of packages are provided to implement XAI capabilities for
dynamic ensemble classifiers. This study aims to address
KiL’24: Workshop on Knowledge-infused Learning co-located with 30th
ACM KDD Conference, August 26, 2024, Barcelona, Spain
this research gap by developing a Python package that of-
*
Corresponding author. fers novel explainability techniques for ensemble models,
$ fjuraev@g.skku.edu (F. Juraev); shaker@skku.edu (S. El-Sappagh); making them accessible and informative for both domain
tamer@skku.edu (T. Abuhmed) experts and developers.
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
Table 1
Infodeslib implemented DS methods. 𝑝 = 𝑀 (𝑐1 (𝑓1 ), 𝑐2 (𝑓2 ), ..., 𝑐𝑚 (𝑓𝑚 )) (1)
Technique Selection Reference
Modified Rank (MR) DCS Sabourin et al. [21]
The proposed concept of dynamic selection with late fu-
Overall Local Accuracy (OLA) DCS Woods et al. [22] sion is illustrated in Figure 1, which outlines a framework
Local Class Accuracy (LCA) DCS Woods et al. [22] consisting of three key stages: training, selection, and pre-
Modified Local Accuracy (MLA) DCS P.C. Smits [23]
DES-KNN DES Soares et al. [24, 25] diction. Additionally, the concept is detailed algorithmically
K-Nearest Oracles Eliminate (KNORA-E) DES Ko et al. [10] in Algorithm 1.
K-Nearest Oracles Union (KNORA-U) DES Ko et al. [10]
Weighted KNORA-E (KNORA-E-W) DES Ko et al. [10] Training Phase. A pool of classifiers is selected and
Weighted KNORA-U (KNORA-U-W) DES Ko et al. [10] assigned different feature sets. The classifiers within the
DES Performance (DES-P) DES Woloszynski et al. [26]
K-Nearest Output Profiles (KNOP) DES Cavalin et al. [27]
pool are selected based on their diversity, ensuring a wide
range of decision-making capabilities. Each feature set used
by selected classifiers is extracted from the same modality
The contributions of the study are as follows: to generate a homogeneous feature set. For example, in
the medical domain, demographic and MRI features are
• We extended the literature on dynamic ensemble different modalities that could be used to train two different
modeling by implementing four dynamic classifier classifiers. Each classifier in the pool is then trained and
selection techniques and seven dynamic ensemble optimized with its designated feature set, resulting in a pool
selection techniques, incorporating a late fusion of of trained classifiers to be utilized in the next phases (1-4
multiple modalities (see Table 1). lines in Algorithm 1).
• We propose three types of novel explainability that Selection Phase. During the selection phase (5-12 lines
provide deep and suitable XAI for dynamic selection of Algorithm 1), a region of competence (RoC) is determined
techniques: Case-Based Reasoning, deep-based clas- for a given new test instance by selecting the nearest sam-
sifiers contributions, and local feature importance. ples from the validation data (DSEL). Subsequently, each
• We compare the performance of the proposed tech- classifier in the pool is evaluated on the samples within the
niques with existing approaches on four well-known RoC, and a measure of competence is calculated for each
and real-world multimodal datasets: Alzheimer’s classifier. The specific method employed to compute the
Disease Neuroimaging Initiative (ADNI), Credit competence varies depending on the chosen DS technique
Card Clients, National Alzheimer’s Coordinating (9-10 lines in Algorithm 1). Once the competencies of each
Center, and Parkinson’s Progression Markers Initia- classifier in the pool are calculated, the DS techniques use
tive (PPMI). We also tested the proposed techniques their own selection criteria to identify the most competent
in the Samarkand Neonatal Center dataset which is classifiers. These criteria are specific to each DS technique.
collected by our team with the help of physicians. If no competent classifier satisfies the criteria for a given
• The implemented techniques have been included DS technique, all classifiers in the pool are selected to make
in a standard public library called ‘Infodeslib’ fol- the final decision.
lowing the industry-standard PEP 8 coding guide- Prediction Phase. During the final phase, the selected
lines, and Infodeslib is also clearly documented classifiers are utilized to predict the class of a given test
in accordance with the Read the Docs standards: instance, and their individual predictions are combined to
https://infodeslib.readthedocs.io/en/latest/ generate a final prediction. To provide more accurate deci-
• We offer a wide range of valuable functions that en- sions, each of the selected classifiers could be weighed based
able the assessment and evaluation of the excellence on its level of competence during the aggregation process
and efficacy of the selected pool. (line 13 in Algorithm 1).
The study is organized as follows. Section 1 highlights
the software framework of the proposed late fusion dynamic 2. Installation and Usage
ensemble learning. Section 2 presents Installation and Usage,
Users can conveniently install the most recent version
Section 4 discusses the performance analysis, and Section 5
of Infodeslib via pip, the Python package manager, by
introduces possible package extensions. Section 6 concludes
executing the command pip install infodeslib.
the paper.
Alternatively, the library can be installed via the
GitHub address, using the command pip install
1. Late Fusion Dynamic Ensemble git+https://github.com/InfoLab-SKKU/infodeslib.
To use the implemented methods in Infodeslib, a list of
Framework classifiers and feature sets must be provided as input. The
classifiers in the list can be of any type from the scikit-learn
In this section, we provide an overview of the late fusion
library and should be trained on the corresponding feature
dynamic ensemble framework in algorithmic and visual
set before being used as input.
formats. This encompasses a thorough dissection of the pri-
Once the pool of classifiers and feature sets has been ini-
mary stages involved, along with step-by-step explanations
tialized, the method fit(X_dsel, y_dsel) is applied to fit the
of the framework’s methodology.
Dynamic Selection method, where (X_dsel, y_dsel) is the val-
Since late fusion dynamic ensemble utilizes the decision
idation dataset (DSEL) with true labels. Predictions for each
values obtained from each modality and fuses them using a
test instance x can be obtained using either the predict(x)
specific fusion mechanism 𝑀 (such as averaging, weighted
or predict_proba(x) methods. In the example provided be-
averaging, majority voting, etc), let us assume that classifier
low, we demonstrate the steps involved in implementing
𝑐𝑖 is applied to modality 𝑓𝑖 . The final prediction can be
the KNORA-U technique.
expressed as:
Training DSEL Testing
data data data
xj
Feature set 1
Base classifier 1
Feature set 2 Competence 1
Base classifier 2
C φ2 φ3 Competence 2
C1 3 C2 x φ Classifier C1 C 2 Aggregation
Feature set 3 φ1 j 4 Competence 3 CN
Base classifier 3 C 4 CM Selection
φk
...
Competence M
...
... Pool of trained Region of Selected
Feature set M classifiers competence: Φ classifiers: EoC
Base classifier M
Training Phase Selection Phase Explainability
Contribution on decision
[Classifier 1] XGB - Weight: 1.00 [Classifier 2] XGB - Weight: 1.00 Selected Classifier Competence Prediction Confidence
Feature 72 Feature 7
Feature 7 Feature 10
[Classifier 1] XGB 1.00 2 0.99
Feature 70
Feature 54
Feature 11
Feature 1
[Classifier 2] XGB 1.00 2 0.99 Prediction
Feature 10 Feature 19 Region of
Feature 82
Feature 8
Feature 4
Feature 21
[Classifier 3] MLP
[Classifier 4] SVC
0.42
0.71
1
3
0.39
0.22
Competence: Φ
Models
Feature 83
Feature 11
Feature 49
Feature 8
Feature 24
Feature 9
[Classifier 5] XGB 1.00 2 0.99
x Test sample
0: AD
Evaluations
1: sMCI
0.0 0.2
Shap values
0.4 0.0 0.2
Shap values
0.4 [Classifier 6] KNN 1.00 2 1.00
0.0 0.5 1.0 1.5 2.0
2: CN
3: pMCI Explainability
Feature Importance Contribution of models Case-based Reasoning
Explanation Interface Model Evaluation
Figure 1: The architecture of the proposed late fusion dynamic ensemble learning framework implemented by Infodeslib.
Algorithm 1 Late fusion DES method each test instance. By setting plot=True, explainability for
Input: Pool of classifiers 𝐶, training dataset 𝐷𝑡𝑟 , vali- the given test instance can be visualized through a variety
dation dataset 𝐷𝑣𝑎 , testing dataset 𝐷𝑡𝑒 , feature set 𝐹 , and of methods (see more details in Section 3).
neighborhood size 𝐾 Infodeslib Methods. Figure 2 provides an overview
Output: 𝐸𝑜𝐶𝑡* , an ensemble of classifiers for each of the key methods of our library while other support-
testing sample 𝑡 in 𝐷𝑡𝑒 ing methods are available in the documentation of the
1: for each classifier 𝑐𝑖 in 𝐶 do library. Some of these methods such as fit(), predict(),
2: Optimize 𝑓𝑖 in 𝐹 for 𝑐𝑖 ; predict_proba(), and score() are well-known and require
3: Optimize and train 𝑐𝑖 on 𝐷𝑡𝑟 with feature set 𝑓𝑖 ; no detailed explanation; there are several other methods
4: end for that are particularly useful for pool generation and ob-
5: for each testing sample 𝑡 in 𝐷𝑡𝑒 do taining information about new test samples. To facilitate
6: Find Ψ as the K nearest neighbors of the testing pool generation, we have implemented three additional
sample 𝑡 in 𝐷𝑣𝑎 ; methods: get_average_accuracy(), get_pool_diversity(), and
7: for each sample 𝜓𝑖 in Ψ do get_coverage_score(). get_average_accuracy() method com-
8: for each classifier 𝑐𝑖 in 𝐶 do putes the average performance of the classifiers in the pool
9: Calculate competence of 𝑐𝑖 on Ψ; on the validation data. get_pool_diversity() method cal-
10: Select ensemble of competent classifiers culates the diversity between classifiers in the pool and re-
𝐸𝑜𝐶𝑡* ; quires the diversity measure type as a parameter. It supports
11: end for several diversity functions such as Q-statistic, Correlation
12: end for Coefficient, Disagreement Measure, Double Fault, Nega-
13: Use the ensemble 𝐸𝑜𝐶𝑡* to classify 𝑡; tive Double Fault, and Ratio Errors. get_coverage_score()
14: end for method determines the number of samples in the DSEL data
that can be accurately predicted by any model in the given
pool. This information is particularly useful for evaluating
the coverage of the pool and ensuring that all samples are
from infodeslib.des.knorau import KNORAU
accurately classified by at least one model. The prediction
pool_classifiers = [classifier1, ..., classifierN] process in machine learning often involves the use of en-
# feature_set1 is a list of columns semble methods, where multiple classifiers are combined
feature_sets = [feature_set1, ..., feature_setN] to improve performance. Within these ensembles, three
methods play a crucial role: get_region_of_competence(x),
# Initialize the DS model
knorau = KNORAU(pool_classifiers, feature_sets)
estimate_competence(roc), and select(competences).
get_region_of_competence(x) method identifies the re-
# Fit the dynamic selection model gion of competence for a given test sample by returning
knorau.fit(X_dsel, y_dsel) the k nearest neighbors from the validation dataset. This
is achieved by applying the k-nearest neighbors algorithm.
# Predict new examples
knorau.predict(X_test, plot=True)
The estimate_competence(roc) method calculates the com-
petence of each classifier in the ensemble on the region of
# Check performance (based on accuracy) competence. The competence calculation differs depending
knorau.score(X_test, y_test) on the technique being used. For example, the k-Nearest
Oracle Union (KNORA-U) technique calculates the accuracy
When utilizing the predict(X) method, an additional pa- of each classifier on the region of competence. The Dynamic
rameter "plot" can be included to obtain explainability for
[Classifier 1] XGB - Weight: 1.00 [Classifier 2] XGB - Weight: 1.00
Infodeslib
Feature 72 Feature 7
fit(X, y) predict(X, plot=False) Feature 7 Feature 10
Feature 70 Feature 11
Prepare the DS model by pre-processing Return the class label for Feature 54 Feature 1
the information required to apply the DS each sample in X. plot=True Feature 10 Feature 19
methods. for getting the explainability. Feature 82 Feature 4
Feature 8 Feature 21
score(X, y) predict proba(X) Feature 83 Feature 8
Return the mean accuracy on the given Feature 11 Feature 24
Return the probabilities for
Feature 49 Feature 9
data and labels. each sample in X.
0.0 0.2 0.4 0.0 0.2 0.4
Shap values Shap values
Pool Single instance
get_average_accuracy() get_region_of_competence(x)
Return the mean accuracy of Return k nearest samples of the Figure 3: Local feature importance of each selected classifier.
classifiers in the pool. given test sample from validation
dataset.
get_pool_diversity()
Return the mean and list of estimate_competence(roc)
diversity scores between Return the competences of each including basic approaches such as grid search and random
classifiers in the pool.
search in Sklearn, as well as more advanced techniques like
base classifier on k nearest
samples from RoC.
get_coverage_score() Genetic algorithms, Bayesian optimization, and others. Our
select(competences)
Return the explainability how
the given pool of classifiers Return all base classifiers that are
library is designed to work seamlessly with other Python
can cover the task on validation
data.
competent enough. packages such as TPOT [28], Scikit-Optimize [29], Optuna
get_rareness_score(x)
[30], Hyperopt [31], BayesianOptimization [32], GPyOpt
Return the explainability how the [33], Optunity [34], and similar packages that implement
given test sample is rare on
trainingand validation data.
these advanced optimization techniques. This allows users
to leverage a variety of optimization methods to obtain the
Hyperparameters best possible hyperparameters for their ensemble models.
k: int - number of neighbors used to estimate the competence of the base classifiers.
In our library, there are several key hyperparameters that
DFP: boolean - determines if the dynamic frienemy pruning is applied. users can adjust to optimize the performance of ensemble
knn_metric: str or callable - distance metric utilized by the k-NN classifier. learning models. We present these key hyperparameters
dimensionality_reduction: boolean - determines if dimension reduction is applied. along with their default values, which have been shown to
reduction_technique: str or callable - technique utilized for dimension reduction. produce satisfactory results in the majority of cases. One
n_components: int - number of components to keep. of the main hyperparameters is k (default: 7), which rep-
cbr_features: list - list of features to show in cased based reasoning XAI. resents the number of neighbors to be considered when
determining the region of competence. Another important
hyperparameter is DFP (default: False), which stands for
Figure 2: The overall schema of the software architecture. dynamic pruning technique and is particularly useful for
imbalanced datasets. In addition, users can also specify the
knn_metric (default: ’minkowski’), which determines the
distance metric used when computing distances between the
Ensemble Selection KNN (DESKNN) technique, on the other
test sample and other samples in the validation dataset. Our
hand, computes each classifier’s accuracy and diversity on
library provides several common metrics such as Minkowski,
RoC and uses these metrics to assess its competence.
cosine, Manhattan, and Euclidean, as well as the option for
select(competences) method selects the most competent
users to define their own custom metric function. To han-
classifiers from the ensemble to make a prediction. Differ-
dle high-dimensional datasets, we also offer a dimension-
ent techniques may use different criteria for determining
ality_reduction (default: False) hyperparameter, which
the competence of a classifier, such as the number of sam-
allows users to reduce the number of dimensions used in
ples classified correctly within the region of competence.
calculating distances between samples. This can be achieved
For instance, the KNORA-U technique selects a classifier
using either Principal Component Analysis (PCA) or Kernel
if it has classified at least one sample within the region
PCA, or by specifying a custom dimensionality reduction
of competence. Once the competent classifiers have been
technique using the next reduction_technique (default:
identified, their competence values are used as weights in
’pca’). The n_component (default: 20) hyperparameter
aggregating their predictions. To evaluate a single test in-
determines the number of components to be retained if a
stance, our library includes get_rareness_score(x) method,
reduction technique is selected. Lastly, for those interested
which provides a detailed description of the instance. The
in explainability, our library provides the cbr_features (de-
method evaluates whether there are many similar samples
fault: None) hyperparameter, which allows users to specify
to the given instance in the training and validation datasets,
a list of important features to be included in similar cases
allowing users to determine the rarity of the instance. If
data for Case-Based Reasoning.
the instance is an outlier, the method provides informa-
tion about how far it is from other classes. Furthermore,
get_rareness_score(x) method uses K-means clustering to 3. Model Explainability
provide a potential class for the instance and generates ta-
bles indicating which features of the instance make it similar In the current version of our library, we offer three main
to this class. This approach provides valuable insights into XAI techniques: case-based reasoning, deep-based classi-
the characteristics of the instance and its potential classifi- fier contribution, and local feature importance [20]. The
cation, aiding in the development of more accurate models. case-based reasoning technique aims to offer domain ex-
Hyperparameters. Optimizing hyperparameters is a criti- perts an explanation of the model’s prediction process for a
cal step for improving the performance of ensemble learning given test sample by presenting them with similar samples
models. This can be achieved through various techniques, and their corresponding labels found within the region of
Contribution on decision
Selected Classifier Competence Prediction Confidence
[Classifier 1] XGB 1.00 2 0.99
[Classifier 2] XGB 1.00 2 0.99
[Classifier 3] MLP 0.42 1 0.39
[Classifier 4] SVC 0.71 3 0.22
[Classifier 5] XGB 1.00 2 0.99
Region of
Competence: Φ [Classifier 6] KNN 1.00 2 1.00
0.0 0.5 1.0 1.5 2.0
x Test sample
0: AD
1: sMCI Figure 5: The contribution of each selected classifier on the final
2: CN
3: pMCI
decision.
a) Estimating the region of competence (RoC) in validation dataset.
level of explainability can be utilized. This is illustrated in
Figure 5, which provides detailed information about each
Samples in the region of competence with selected features and labels
classifier in the pool, including their competence level, in-
Feature 7 Feature 8 Feature 10 Feature 11 ... Feature 72 Feature 84 Target
dividual prediction on the new test sample, and confidence
0.467
0.190
0.00
0.00
0.00
0.00
0.22
0.12
...
...
0.59
0.64
0.29
0.23
2
2
level. This explanation provides valuable insight for the
0.619 0.00 0.02 0.12 ... 0.84 0.36 2 development of an ensemble model, as it allows developers
0.524 0.00 0.00 0.04 ... 0.84 0.29 2 to identify classifiers that may have a negative impact on
0.524 0.25 0.07 0.28 ... 0.78 0.29 1
decision-making. For instance, as shown in Figure 5, it is
evident that most selected classifiers predict the label of the
0.524 0.00 0.00 0.12 ... 0.52 0.24 2
0.619 0.00 0.00 0.00 ... 0.52 0.09 2
given test sample as 2 with high confidence, while the SVC
b) Detailed information about the selected sample for RoC. classifier predicts it as 3. The SVC classifier demonstrates
a higher level of competence in the region of competence,
indicating that it has a more significant influence on the
Figure 4: Estimating a region of competence (RoC) and providing
details about the selected sample for RoC.
decision. If this classifier consistently has a negative impact
on many test samples, it may be possible to remove it from
the pool of classifiers.
Local feature importance. In addition to understand-
competence. This approach closely resembles how domain ing how the classifiers contributed to the decision-making
experts make decisions in real-world situations, as they fre- process, it is also important to identify which features were
quently compare current cases with historical ones from particularly influential in making those decisions. For the
their experience. The deep-based classifier contribution example mentioned earlier, we provide local feature impor-
technique enables users to comprehend the contribution of tance for each selected classifier, which can be visualized
each selected classifier in the decision-making process for through Figure 3.
a given test sample. Finally, the local feature importance Furthermore, our proposed ensemble models have the
technique is a prevalent explainability method that identi- ability to provide interpretable explanations using two ap-
fies the most crucial features and their corresponding Shap proaches: surrogate model explainability and post-hoc ex-
values for each selected classifier. plainability methods. The surrogate model approach in-
Case-based reasoning. For example, in the case of the volves creating a simplified model that roughly represents
KNORA-U technique, in the selection phase, the nearest the behavior of the original ensemble model and using this
neighbors for each test instance are estimated in the valida- model to explain the ensemble’s decisions. On the other
tion dataset based on their close similarity to the test sample. hand, post-hoc explainability techniques involve analyzing
The selected samples are used to generate the region of com- the ensemble model’s decisions after they have been made
petence for evaluating and selecting classifiers in the pool. and providing explanations based on the input features that
Figure 4 a) illustrates an example in which the given test contributed the most to the decision. Both methods treat
sample (light blue x) falls within the area of class 2, and our ensemble model as a black box model.
seven nearest samples are selected, six of which belong to
class 2 (blue dots), while one belongs to class 1 (green dot).
This finding suggests that, for the given test sample, the 4. Performance Analysis
chance of it being classified as class 2 is high. Moreover,
these samples can also be leveraged for conducting case- Within this section, We compare the performance of the
based reasoning, which may be particularly valuable for proposed architecture with the existing approaches. we
physicians, given that our dataset is in the medical domain. provide an overview of the datasets that have been utilized
Figure 4 b) provides comprehensive information about all along with a detailed analysis of our proposed techniques.
nearest samples within the region of competence, enabling
physicians to compare and contrast similar samples and 4.1. Evaluation Datasets
their corresponding labels or diagnoses.
Deep-based classifiers contributions. After selecting In this section, we outline the five datasets utilized to com-
the group of classifiers for making the final decision, it may pare Infodeslib with existing models.
be unclear how each classifier in the pool contributed to the Alzheimer’s Disease Neuroimaging Initiative
decision or what their individual predictions were for the (ADNI) dataset [35]. The study includes a total of
new test sample. In order to provide a more comprehensive 1,371 subjects, with a male gender representation of
understanding of the decision-making process, an additional 54.5%. Participants have been classified into four distinct
categories based on their clinical diagnosis, including
Cognitive Normal (CN), Stable Mild Cognitive Impairment Table 2
(sMCI), Progressive Mild Cognitive Impairment (pMCI), and Performance of the different ML approaches using ADNI dataset.
Alzheimer’s Disease (AD) [36]. The distribution of these Model Type Model Accuracy Precision Recall F1
classes is as follows: 419 CN, 473 sMCI, 140 pMCI, and 339 Single
XGB
LGBM
87.11±2.32
86.74±1.58
87.50±2.63
87.34±1.96
87.11±2.32
86.74±1.58
87.03±2.49
86.70±1.72
Models
AD individuals. The dataset has four distinct modalities RF 87.11±1.96 87.51±2.22 87.11±1.96 87.08±2.08
[Early] Static Voting 88.08±1.94 88.31±2.07 88.08±1.94 88.05±2.02
or feature groups, which contain demographics, cognitive Ensemble Stacking 86.87±2.02 87.67±2.11 86.87±2.02 86.80±2.15
scores, assessment tests, and MRI features. [Early] Dynamic DESP 88.61±1.96 88.72±2.15 88.61±1.96 88.55±2.08
Ensemble KNOP 88.71±1.91 88.80±2.09 88.71±1.91 88.66±2.03
Credit Card Clients dataset. The study includes a vast [Late] Static Voting 89.29±1.67 89.39±1.81 89.29±1.67 89.24±1.74
participant cohort of 30,000 individuals, with the dataset Ensemble Stacking 87.65±1.59 88.13±1.64 87.65±1.59 87.60±1.74
[Late] Dynamic KNORAU 89.52±2.01 89.77±2.01 89.52±2.01 89.46±2.10
sourced from the UC Irvine Machine Learning Repository Ensemble KNORAU-W 89.84±1.83 90.29±2.03 89.81±1.83 89.80±1.91
[37]. This is a classification problem that involves determin-
ing whether or not a client will make their next payment.
Table 3
The two distinct classes are labeled as ’no’ and ’yes’, with
Performance of the different ML approaches on Credit Card
23,364 and 6,636 instances, respectively. The dataset has Clients dataset.
four distinct modalities of features, including demographics, Model Type Model Accuracy Precision Recall F1
financial, and payment history features. Single
XGB 81.96±0.84 89.05±0.72 72.87±1.28 80.15±1.02
LGBM 80.43±0.64 86.88±0.66 71.69±0.92 78.56±0.76
National Alzheimer’s Coordinating Center (NACC) Models
RF 79.90±0.78 87.49±0.70 69.76±1.13 77.63±0.95
dataset [38]. In this study, we examined a total of 37,547 [Early] Static Voting 83.94±0.74 88.52±0.72 78.01±1.11 82.93±0.84
Ensemble Stacking 82.68±0.69 86.72±0.77 77.17±0.95 81.67±0.77
patients focusing on the Global Clinical Dementia Rating [Early] Dynamic DESKNN 83.99±0.38 89.34±0.53 77.18±0.60 82.82±0.42
(CDRGLOB) as the primary task. CDRGLOB categorizes Ensemble KNORAE 84.16±0.66 88.81±0.64 78.17±0.97 83.15±0.75
[Late] Static Voting 85.72±0.44 89.83±0.57 80.56±0.73 84.94±0.49
patients into five classes based on dementia severity: no Ensemble Stacking 85.08±0.47 89.30±0.36 79.71±0.94 84.23±0.57
impairment (8,253 patients), mild impairment (15,097 pa- [Late] Dynamic KNOP 86.65±0.23 91.64±0.28 80.66±0.52 85.80±0.28
Ensemble KNORAU-W 86.73±0.29 91.76±0.33 80.70±0.64 85.87±0.36
tients), moderate impairment (8,346 patients), and severe
impairment (5,851 patients). Our analysis included six spe-
cific modalities for investigation: demographics, physical heterogeneous baseline algorithms: XGboost (XGB), Light-
health, medications, health history, neuropsychiatric inven- GBM (LGBM), Random Forest (RF), Support Vector Classifier
tory questionnaire, and the geriatric depression scale. These (SVC), Multi-Layer Perceptron (MLP), Decision Tree (DT),
modalities were chosen to comprehensively assess various and k-Nearest Neighbors (KNN).
aspects related to dementia and overall patient health [39]. Results based on ADNI dataset. Table 2 and Figure
Parkinson’s Progression Markers Initiative (PPMI) 6 a) show the top-performing results achieved by the in-
dataset [40]. Our study involves 952 patients and fo- dividual models, as well as the static ensemble with early
cuses on a binary classification task to differentiate between fusion, the dynamic ensemble with early fusion, the static
healthy individuals and those diagnosed with Parkinson’s ensemble with late fusion, and our proposed technique -
disease (PD). Among these patients, 389 are categorized the dynamic ensemble with late fusion setting. From each
as healthy, while 563 have been diagnosed with PD. The group, we selected the best-performed techniques and the
dataset encompasses various information modalities, in- results show that our dynamic ensemble techniques, KNO-
cluding subject characteristics, biospecimen data, medical RAU and KNORAU-W outperform all existing approaches
history records, motor function assessments, and non-motor with 89.52% and 89.84% accuracy. In comparison, a static
features. This comprehensive dataset enables a thorough ensemble with late fusion, voting classifier, achieves an accu-
analysis to identify potential diagnostic markers and factors racy of 89.29%. This performance is close to the performance
associated with PD, facilitating improved understanding of our model and surpasses that of early fusion techniques.
and diagnosis of the disease [41]. This result supports our claim for the significance of late
Samarkand Neonatal Center dataset. Our study fusion in producing accurate ensemble models.
involved 347 neonates from the intensive care unit at Results based on Credit Card Clients dataset. Table 3
Samarkand Neonatal Center. The dataset was collected by and Figure 6 b) present the results obtained from the analysis
our team by collaborating physicians in the hospital for a of the Credit Card Clients dataset, following a similar format
binary classification task to predict whether a neonate sur- to the previous dataset. Our proposed techniques have once
vives or passes away. Among these neonates, 303 survived again outperformed the existing approaches in this instance.
and 44 died during the study period. The dataset comprises Specifically, KNOP and KNORAU-W, utilizing the late fusion
a comprehensive set of features categorized into multiple setting, have achieved the highest accuracy scores of 86.65%
modalities: demographic information, the mother’s medical and 86.73%, respectively. In comparison, the static ensemble
history and information, general notes on the neonate’s methods that apply late fusion, specifically the voting and
condition, results from blood tests, and APGAR scores (a stacking classifiers, demonstrate accuracies of 85.72% and
standardized assessment of a neonate’s health at birth). 85.08%, respectively. In contrast, the ensemble methods
that employ early fusion achieve the highest accuracy of
4.2. Results 84.16%, with the dynamic selection technique known as
KNORA-E. These results support our argument regarding
This section contains a comprehensive analysis and compar-
the importance of utilizing late fusion for the purpose of
ison of various machine-learning approaches against our
producing highly accurate ensemble models.
proposed late-fusion dynamic ensemble selection model.
Results based on NACC dataset. Table 4 highlights
We collect and present the testing results for each of the
the results from the analysis of the National Alzheimer’s
considered models. To ensure greater consistency in the
Coordinating Center dataset, structured similarly to the pre-
results, we have applied the 10-holdout testing method [42].
vious dataset. Among all existing techniques, the dynamic
The results are presented in the form of (mean ± standard
ensemble models with late fusion demonstrate notably su-
deviation). As a pool of classifiers, we utilized the following
Table 4 Table 6
Performance of the ML approaches on the NACC dataset. Performance of the different ML approaches on Samarkand
Model Type Model Accuracy Precision Recall F1 Neonatal Center dataset.
GB 85.70+1.16 85.71+1.16 85.77+1.07 85.56+1.19 Model Type Model Accuracy Precision Recall F1
Single
XGB 86.30+1.47 86.30+1.47 86.32+1.48 86.18+1.51 RF 69.34+4.66 69.34+4.66 73.70+4.20 67.71+5.44
Models Single
RF 86.79+0.74 86.79+0.74 86.76+0.82 86.69+0.76 XGB 69.74+7.64 69.74+7.64 74.29+6.31 67.77+9.16
[Early] Static Voting 87.52+0.90 87.53+0.90 87.42+0.93 87.40+0.90 Models
LGBM 70.07+8.58 70.07+8.58 72.65+7.52 68.74+9.62
Ensemble Stacking 87.17+1.29 87.17+1.29 87.20+1.19 87.11+1.26 [Early] Static Voting 73.03+8.03 73.03+8.03 75.50+6.57 72.00+8.95
[Early] Dynamic DESKNN 87.30+1.16 87.61+1.16 87.37+1.06 87.39+1.14 Ensemble Stacking 71.45+5.11 71.45+5.11 75.70+4.39 70.06+5.87
Ensemble KNORAU 88.34+1.44 88.34+1.44 88.34+1.41 88.27+1.45 [Early] Dynamic KNORAU 71.64+6.84 71.64+6.84 75.30+5.37 70.28+7.74
[Late] Static Voting 89.39+1.34 89.39+1.34 89.43+1.30 89.36+1.33 Ensemble DESP 72.45+6.03 75.95+6.03 72.48+5.24 71.48+7.05
Ensemble Stacking 89.11+0.89 89.11+0.89 89.15+0.95 89.07+0.91 [Late] Static Voting 75.07+6.94 75.07+6.94 77.96+5.19 74.12+7.90
[Late] Dynamic KNORAU-W 90.20+1.10 90.20+1.10 90.30+1.09 90.20+1.11 Ensemble Stacking 74.21+7.72 74.21+7.72 77.95+5.38 72.84+9.14
Ensemble DESP 91.16+0.93 91.21+0.93 91.14+0.89 91.17+0.92 [Late] Dynamic KNORAU-W 75.66+7.44 75.66+7.44 78.04+6.09 74.90+8.05
Ensemble KNOP 77.57+5.81 77.57+5.81 80.58+4.21 76.84+6.35
Table 5 Early fusion Late fusion
Performance of the ML approaches on the PPMI dataset.
Model Type Model Accuracy Precision Recall F1
RF 92.40±1.00 93.40±0.90 92.10±1.00 92.10±1.00
Single
Accuracy
XGB 93.40±1.50 93.60±1.80 93.10±1.40 93.10±1.40
Models
LGBM 93.90±1.60 93.90±2.00 93.70±1.40 93.70±1.40
[Early] Static Voting 94.20±0.70 94.20±0.80 94.00±0.70 94.00±0.70
Ensemble Stacking 94.10±0.90 94.00±1.10 93.90±0.90 93.90±0.90
[Early] Dynamic KNOP 94.20±1.10 94.30±1.20 93.90±1.20 93.90±1.20
Ensemble KNORAU 94.30±0.90 94.40±1.10 94.00±0.90 94.00±0.90
[Late] Static Voting 94.60±0.90 94.60±1.10 94.30±0.90 94.30±0.90
Ensemble Stacking 94.50±0.80 94.70±0.80 94.20±0.80 94.20±0.80
[Late] Dynamic DESP 95.00±0.90 95.20±0.90 94.70±1.00 94.70±1.00
Ensemble KNOP 95.10±0.60 95.40±0.70 94.70±0.60 94.70±0.60
a) ADNI dataset.
perior performance. Specifically, the weighted KNORAU
(KNORAU-W) and DESP achieve the highest scores at 90.20% Accuracy
and 91.16%, respectively. Given the substantial dataset size,
the results are well-balanced across various metrics.
Results based on PPMI dataset. Table 5 presents the re-
sults obtained from the analysis of the Parkinson’s Progres-
sion Markers Initiative dataset, following a format similar
to the previous dataset. Within this dataset, the techniques
DESP and KNOP, utilizing late fusion settings, exhibit the b) Credit Card Clients dataset.
most robust performance among other algorithms, achiev-
Figure 6: Contribution of each selected classifier to the final
ing accuracies of 95% and 95.1%, respectively. Additionally,
decision.
static ensemble models with late fusion settings demonstrate
strong performance at 94.6% accuracy using a voting tech-
nique. These results only marginally exceed those achieved
with LGBM alone, which achieved a performance of 93.9%. 5. Library extension
The fact that the LGBM achieved a high accuracy of 93.9%
The primary focus of our paper is to introduce a novel ap-
suggests that the task at hand is not very complex. Improv-
proach to dynamic ensemble selection (DES) that utilizes a
ing accuracy beyond this point becomes more challenging
late fusion strategy for effectively fusing multi-modal data
when a basic technique like LGBM already performs well.
and offers a high degree of explainability for dynamic selec-
Essentially, reaching significantly higher accuracies with
tion techniques. Our current library offers implementations
more advanced methods might be difficult because the task
of four dynamic classifier selection and seven dynamic en-
is relatively straightforward.
semble selection techniques that use a late fusion strategy.
Results based on Samarkand Neonatal Center
In addition, the library includes several features and options
dataset. Table 6 presents the results obtained from ana-
that enhance its performance and capability. Furthermore,
lyzing the Samarkand Neonatal Center ICU dataset, follow-
the library provides three different types of explainability to
ing a similar structure to the previous datasets. Due to the
help users gain insights into the decision-making processes
dataset’s small size, the results may not be consistent or bal-
of the models. Finally, the library has been designed to be
anced across different metrics. Nonetheless, our proposed
compatible with other important libraries, allowing users
late fusion-based dynamic ensemble models achieve notably
to easily integrate it into their existing workflows.
higher performance compared to other techniques, reaching
We plan to continue exploring the domain of explainabil-
77.57% accuracy with the KNOP technique.
ity in ensemble learning by proposing additional techniques
Across all five datasets analyzed, the importance of late
for providing comprehensive explanations to domain ex-
fusion can be seen in the results. In each dataset, the dy-
perts. Our goal is to enhance our library’s ability to provide
namic ensemble models with late fusion settings outper-
context-based explanations that are tailored to the specific
form other existing models. Combining late fusion with
needs of users. Additionally, we aim to incorporate what-if
dynamic ensemble learning consistently delivers promis-
explainability features that enable developers to gain deeper
ing and improved results. This highlights the effectiveness
insights into the behavior of their ensemble models. These
and reliability of employing late fusion techniques within
features will be included in future versions of our library.
dynamic ensemble models across various datasets.
Through our experimental evaluations, we have discov-
ered that selecting an appropriate pool of classifiers with
matching feature groups is a critical aspect of successful
ensemble modeling. However, identifying the ideal combi- [6] A. S. Britto Jr, R. Sabourin, L. E. Oliveira, Dynamic se-
nation of classifiers for the pool remains a challenging task. lection of classifiers—a comprehensive review, Pattern
In future versions of our library, we plan to address this recognition 47 (2014) 3665–3680.
issue by developing an automatic optimization process for [7] X. Dong, Z. Yu, W. Cao, Y. Shi, Q. Ma, A survey on
the selection of the optimal pool of classifiers. We believe ensemble learning, Frontiers of Computer Science 14
this to be a crucial task in the field of ensemble learning, (2020) 241–258.
and we are committed to exploring ways to simplify this [8] L. Breiman, Bagging predictors, Machine learning 24
process and make it more effective. (1996) 123–140.
[9] G. Sakkis, I. Androutsopoulos, G. Paliouras,
V. Karkaletsis, C. D. Spyropoulos, P. Stamatopoulos,
6. Conclusion Stacking classifiers for anti-spam filtering of e-mail,
arXiv preprint cs/0106040 (2001).
This paper presents a novel approach to dynamic selection
[10] A. H. Ko, R. Sabourin, A. S. Britto Jr, From dynamic
using a late fusion setting, which is implemented across four
classifier selection to dynamic ensemble selection, Pat-
dynamic classifier selection and seven dynamic ensemble
tern recognition 41 (2008) 1718–1731.
selection techniques. This late fusion-based approach is
[11] R. M. Cruz, R. Sabourin, G. D. Cavalcanti, Dynamic
particularly well-suited for complex tasks based on multi-
classifier selection: Recent advances and perspectives,
modal datasets containing multiple feature groups, which
Information Fusion 41 (2018) 195–216.
are common in real-world scenarios. As a result, the role of
[12] F. Juraev, S. El-Sappagh, E. Abdukhamidov, F. Ali,
late fusion is crucial in the context of ensemble learning for
T. Abuhmed, Multilayer dynamic ensemble model
ensuring diversity in the pool of classifiers. Furthermore,
for intensive care unit mortality prediction of neonate
we introduce a novel approach to explainability for dynamic
patients, Journal of Biomedical Informatics 135 (2022)
selection techniques. Our proposed approach goes beyond
104216.
the traditional methods and provides a more in-depth and
[13] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
nuanced understanding of the dynamic selection process.
B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
The effectiveness of our proposed techniques is evaluated
R. Weiss, V. Dubourg, et al., Scikit-learn: Machine
through a comprehensive comparison with existing base-
learning in python, the Journal of machine Learning
line approaches. The experimental results demonstrate the
research 12 (2011) 2825–2830.
superior performance of our proposed techniques over the
[14] R. M. Cruz, L. G. Hafemann, R. Sabourin, G. D. Caval-
existing approaches, highlighting the potential of our ap-
canti, Deslib: A dynamic ensemble selection library
proach to improving the accuracy and reliability of ensemble
in python, The Journal of Machine Learning Research
learning systems.
21 (2020) 283–287.
[15] K. Liu, Y. Li, N. Xu, P. Natarajan, Learn to combine
Acknowledgments modalities in multimodal deep learning, arXiv preprint
arXiv:1805.11730 (2018).
This work was supported by the National Research Foun- [16] S. Raschka, Mlxtend: Providing machine learning
dation of Korea(NRF) grant funded by the Korea govern- and data science utilities and extensions to python’s
ment(MSIT)(No. 2021R1A2C1011198), (Institute for Infor- scientific computing stack, The Journal of Open Source
mation & communications Technology Planning & Evalua- Software 3 (2018). URL: https://joss.theoj.org/papers/
tion) (IITP) grant funded by the Korea government (MSIT) 10.21105/joss.00638. doi:10.21105/joss.00638.
under the ICT Creative Consilience Program (IITP-2021- [17] S. El-Sappagh, F. Ali, T. Abuhmed, J. Singh, J. M.
2020-0-01821), and AI Platform to Fully Adapt and Reflect Alonso, Automatic detection of alzheimer’s disease
Privacy-Policy Changes (No.RS-2022-II220688). progression: An efficient information fusion approach
with heterogeneous ensemble classifiers, Neurocom-
puting 512 (2022) 203–224.
References [18] C. G. Snoek, M. Worring, A. W. Smeulders, Early ver-
sus late fusion in semantic video analysis, in: Proceed-
[1] H. Xiao, Z. Xiao, Y. Wang, Ensemble classification
ings of the 13th annual ACM international conference
based on supervised clustering for credit scoring, Ap-
on Multimedia, 2005, pp. 399–402.
plied Soft Computing 43 (2016) 73–86.
[19] F. Juraev, S. El-Sappagh, T. Abuhmed, Explainable
[2] D. Di Nucci, F. Palomba, R. Oliveto, A. De Lucia, Dy-
dynamic ensemble framework for classification based
namic selection of classifiers in bug prediction: An
on the late fusion of heterogeneous multimodal data,
adaptive method, IEEE Transactions on Emerging
in: Intelligent Systems Conference, Springer, 2023, pp.
Topics in Computational Intelligence 1 (2017) 202–212.
555–570.
[3] O. Sagi, L. Rokach, Ensemble learning: A survey, Wiley
[20] S. Ali, T. Abuhmed, S. El-Sappagh, K. Muhammad, J. M.
Interdisciplinary Reviews: Data Mining and Knowl-
Alonso-Moral, R. Confalonieri, R. Guidotti, J. Del Ser,
edge Discovery 8 (2018) e1249.
N. Díaz-Rodríguez, F. Herrera, Explainable artificial
[4] L. I. Kuncheva, A theoretical study on six classifier fu-
intelligence (xai): What we know and what is left to
sion strategies, IEEE Transactions on pattern analysis
attain trustworthy artificial intelligence, Information
and machine intelligence 24 (2002) 281–286.
Fusion (2023) 101805.
[5] M. Fernández-Delgado, E. Cernadas, S. Barro,
[21] M. Sabourin, A. Mitiche, D. Thomas, G. Nagy, Classi-
D. Amorim, Do we need hundreds of classifiers to
fier combination for hand-printed digit recognition, in:
solve real world classification problems?, The journal
Proceedings of 2nd International Conference on Doc-
of machine learning research 15 (2014) 3133–3181.
ument Analysis and Recognition (ICDAR’93), IEEE,
1993, pp. 163–166.
[22] K. Woods, W. P. Kegelmeyer, K. Bowyer, Combination [38] D. L. Beekly, E. M. Ramos, W. W. Lee, W. D. Deitrich,
of multiple classifiers using local accuracy estimates, M. E. Jacka, J. Wu, J. L. Hubbard, T. D. Koepsell, J. C.
IEEE transactions on pattern analysis and machine Morris, W. A. Kukull, et al., The national alzheimer’s
intelligence 19 (1997) 405–410. coordinating center (nacc) database: the uniform data
[23] P. C. Smits, Multiple classifier systems for supervised set, Alzheimer Disease & Associated Disorders 21
remote sensing image classification based on dynamic (2007) 249–258.
classifier selection, IEEE Transactions on Geoscience [39] N. Rahim, S. El-Sappagh, H. Rizk, O. A. El-serafy,
and Remote Sensing 40 (2002) 801–813. T. Abuhmed, Information fusion-based bayesian opti-
[24] R. G. Soares, A. Santana, A. M. Canuto, M. C. P. mized heterogeneous deep ensemble model based on
de Souto, Using accuracy and diversity to select clas- longitudinal neuroimaging data, Applied Soft Com-
sifiers to build ensembles, in: The 2006 IEEE Interna- puting 162 (2024) 111749.
tional Joint Conference on Neural Network Proceed- [40] K. Marek, D. Jennings, S. Lasch, A. Siderowf, C. Tanner,
ings, IEEE, 2006, pp. 1310–1316. T. Simuni, C. Coffey, K. Kieburtz, E. Flagg, S. Chowd-
[25] M. C. de Souto, R. G. Soares, A. Santana, A. M. Canuto, hury, et al., The parkinson progression marker ini-
Empirical comparison of dynamic classifier selection tiative (ppmi), Progress in neurobiology 95 (2011)
methods based on diversity and accuracy for building 629–635.
ensembles, in: 2008 IEEE international joint confer- [41] M. Junaid, S. Ali, F. Eid, S. El-Sappagh, T. Abuhmed,
ence on neural networks (IEEE world congress on com- Explainable machine learning models based on mul-
putational intelligence), IEEE, 2008, pp. 1480–1487. timodal time-series data for the early detection of
[26] T. Woloszynski, M. Kurzynski, P. Podsiadlo, G. W. Sta- parkinson’s disease, Computer Methods and Programs
chowiak, A measure of competence based on random in Biomedicine 234 (2023) 107495.
classification for dynamic ensemble selection, Infor- [42] C. Sammut, G. I. Webb (Eds.), Holdout Evalua-
mation Fusion 13 (2012) 207–213. tion, Springer US, Boston, MA, 2010, pp. 506–507.
[27] P. R. Cavalin, R. Sabourin, C. Y. Suen, Dynamic selec- URL: https://doi.org/10.1007/978-0-387-30164-8_369.
tion approaches for multiple classifier systems, Neural doi:10.1007/978-0-387-30164-8_369.
computing and applications 22 (2013) 673–688.
[28] R. S. Olson, J. H. Moore, Tpot: A tree-based pipeline
optimization tool for automating machine learning,
in: Workshop on automatic machine learning, PMLR,
2016, pp. 66–74.
[29] T. Head, M. Kumar, H. Nahrstaedt, G. Louppe,
I. Shcherbatyi, scikit-optimize/scikit-optimize: v0. 8.1,
Zenodo (2020).
[30] T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Op-
tuna: A next-generation hyperparameter optimization
framework, in: Proceedings of the 25th ACM SIGKDD
International Conference on Knowledge Discovery
and Data Mining, 2019.
[31] J. Bergstra, D. Yamins, D. Cox, Making a science of
model search: Hyperparameter optimization in hun-
dreds of dimensions for vision architectures, in: Inter-
national conference on machine learning, PMLR, 2013,
pp. 115–123.
[32] F. Nogueira, et al., Bayesian optimization: Open
source constrained global optimization tool for python,
URL https://github. com/fmfn/BayesianOptimization
(2014).
[33] T. G. authors, Gpyopt: A bayesian optimization
framework in python, http://github.com/SheffieldML/
GPyOpt, 2016.
[34] M. Claesen, J. Simm, D. Popovic, Y. Moreau,
B. De Moor, Easy hyperparameter search using optu-
nity, arXiv preprint arXiv:1412.1114 (2014).
[35] S. G. Mueller, M. W. Weiner, L. J. Thal, R. C. Petersen,
C. Jack, W. Jagust, J. Q. Trojanowski, A. W. Toga,
L. Beckett, The alzheimer’s disease neuroimaging
initiative, Neuroimaging Clinics of North America 15
(2005) 869.
[36] N. Rahim, T. Abuhmed, S. Mirjalili, S. El-Sappagh,
K. Muhammad, Time-series visual explainability for
alzheimer’s disease progression detection for smart
healthcare, Alexandria Engineering Journal 82 (2023)
484–502.
[37] A. Asuncion, D. Newman, Uci machine learning repos-
itory, 2007.