Generalizing Machine Learning Evaluation through the
                         Integration of Shannon Entropy and Rough Set Theory
                         Olga Cherednichenko1, Dmytro Chernyshov 2, Dmytro Sytnikov2 and Polina Sytnikova2
                         1
                                University of Lyon 2, 5 avenue Mendès, Lyon, 69676, France
                         2
                                National University of RadioElectronics, Nauky ave. 14, Kharkiv, 61166, Ukraine

                                             Abstract
                                             This research paper delves into the innovative integration of Shannon entropy and rough set theory,
                                             presenting a novel approach to generalize the evaluation approach in machine learning. The
                                             conventional application of entropy, primarily focused on information uncertainty, is extended through
                                             its combination with rough set theory to offer a deeper insight into data's intrinsic structure and the
                                             interpretability of machine learning models. We introduce a comprehensive framework that synergizes
                                             the granularity of rough set theory with the uncertainty quantification of Shannon entropy, applied
                                             across a spectrum of machine learning algorithms. Our methodology is rigorously tested on various
                                             datasets, showcasing its capability to not only assess predictive performance but also to illuminate the
                                             underlying data complexity and model robustness. The results underscore the utility of this integrated
                                             approach in enhancing the evaluation landscape of machine learning, offering a multi-faceted
                                             perspective that balances accuracy with a profound understanding of data attributes and model
                                             dynamics. This paper contributes a groundbreaking perspective to machine learning evaluation,
                                             proposing a method that encapsulates a holistic view of model performance, thereby facilitating more
                                             informed decision-making in model selection and application.

                                             Keywords
                                             Machine learning, entropy, information theory, rough set theory, model evaluation 1


                         1. Introduction
                         In the evolving landscape of machine learning, the quest for robust evaluation metrics that
                         transcend mere predictive accuracy is paramount. This research delves into an innovative
                         integration of two mathematical concepts: Shannon entropy[1] and rough set theory[2], to forge
                         a novel pathway in machine learning evaluation. Shannon entropy, a cornerstone in information
                         theory, quantifies the uncertainty or the informational content within a system. It has been
                         extensively applied across various domains, offering insights into the unpredictability or the
                         inherent informational richness of datasets. Rough set theory, on the other hand, provides a
                         framework to deal with vagueness and indiscernibility in data, enabling the analysis of data's
                         granularity and the discernment of patterns within an ambiguous informational landscape[3,4].
                            The intersection of these two theories presents a fertile ground for advancing machine
                         learning evaluation. Traditional metrics, while effective in gauging model performance, often
                         overlook the nuanced interplay of data features and their collective impact on the learning
                         process. The integration of entropy and rough set theory proposes a more holistic approach,
                         considering not just the outcome but the informational dynamics and structural intricacies of the
                         data being processed.
                            The primary objective of this research is to establish a methodological framework that
                         employs this integration to offer a more nuanced and comprehensive evaluation of machine


                         COLINS-2024: 8th International Conference on Computational Linguistics and Intelligent Systems, April 12–13, 2024,
                         Lviv, Ukraine
                            olga.cherednichenko@univ-lyon2.fr (O. Cherednichenko); dmytro.chernyshov@nure.ua (D. Chernyshov);
                         dmytro.sytnikov@nure.ua (D. Sytnikov); polina.sytnikova@nure.ua (P. Sytnikova)
                             0000-0002-9391-5220 (O. Cherednichenko); 0009-0003-2773-7467 (D. Chernyshov); 0000-0003-1240-7900 (D.
                         Sytnikov); 0000-0002-6688-4641 (P. Sytnikova)
                                        © 2024 Copyright for this paper by its authors.
                                        Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
learning models. By embedding Shannon entropy's measure of uncertainty within the granular
perspective of rough set theory, this framework aims to illuminate aspects of model behavior and
data structure that are typically obscured in conventional evaluations.
   Through a comprehensive analysis, we aim to demonstrate the efficacy and applicability of our
approach, culminating in a discussion of potential applications, challenges, and future directions
for this line of research. In doing so, this paper aspires to contribute a new lens through which
machine learning models can be evaluated, enriching the toolkit available to researchers and
practitioners in the field.

2. Related works
   2.1. Entropy in machine learning

Entropy, a concept from thermodynamics and information theory, plays a pivotal role in
understanding the uncertainty and informational content within a dataset in machine learning.
Originating from Claude Shannon's seminal work in 1948[1], entropy quantifies the
unpredictability or randomness of a system. In the context of machine learning, it provides a
measure of the impurity or diversity of the attributes or classes within a dataset.
   Shannon's entropy, defined as:
                                                                                             (1)
, where         is the probability of occurrence of the i-th element in the dataset, serves as a
foundational metric in various machine learning algorithms, particularly in decision tree
classifiers (Figure 1.). In these algorithms, entropy helps in determining the optimal points for
splitting the data, thereby enhancing the model's ability to classify or predict outcomes
accurately.
   The application of entropy extends beyond tree-based models. It is instrumental in feature
selection, where the goal is to identify the most informative features that contribute to the
predictive power of a model. By evaluating the entropy of different feature subsets, machine
learning practitioners can eliminate redundant or irrelevant features, simplifying the model
without sacrificing performance.


Figure 1: A visual representation of entropy in a decision tree.

   Furthermore, entropy is employed in clustering algorithms to assess the homogeneity of
clusters. A lower entropy value indicates that the cluster contains predominantly similar
instances, while a higher value suggests a mixture of different instances, signaling the need for
further refinement in the clustering process.
   In the realm of information theory, the concept of joint entropy and conditional entropy also
provides insights into the relationships between variables. Joint entropy,           , quantifies the
uncertainty of a pair of random variables, while conditional entropy,               , measures the
uncertainty of a variable given the knowledge of another. These metrics are crucial in
understanding the dependencies and interactions among features in a dataset.
   The significance of entropy in machine learning is not just theoretical; it has practical
implications in model evaluation and comparison[5]. By analyzing the entropy of model
predictions, researchers can gain insights into the model's confidence and its ability to generalize
from training to unseen data. This is particularly relevant in the evaluation of probabilistic
models, where entropy can indicate the model's certainty in its predictions.
   Expanding further on the role of entropy in machine learning, it's crucial to understand its
application in the context of uncertainty quantification[6] and how it guides the learning process
in algorithms beyond the decision trees and clustering mentioned previously[7].
   Entropy's role in machine learning extends into the realms of unsupervised learning,
particularly in the optimization of models such as autoencoders[8] and in the evaluation of neural
network architectures. In autoencoders, for instance, entropy can be used to measure the
effectiveness of the data compression and reconstruction process, indicating how well the
network has captured the essential information of the input data[9].
   In neural networks, entropy is a key factor in understanding and optimizing the information
flow[10]. It can be used to analyze the layers of a network, providing insights into which layers
are contributing most to the reduction in uncertainty about the output[11]. This can guide the
design of more efficient and effective network architectures, optimizing the depth and width of
the network to balance complexity and performance.
   Additionally, entropy plays a pivotal role in reinforcement learning. In environments where
agents must make decisions under uncertainty, entropy can serve as a measure of the
randomness in the agent's policy, providing a balance between exploration (trying new things)
and exploitation (leveraging known strategies). High entropy in the policy indicates more
explorative behavior, which is particularly beneficial in the early stages of learning or in highly
dynamic environments[12].
   The concept of cross-entropy is also fundamental in machine learning, especially in
classification tasks[13]. It measures the difference between two probability distributions - the
true distribution and the predicted distribution, serving as a loss function in classification
problems, particularly in training deep learning models. By minimizing cross-entropy, models are
trained to improve their predictions, aligning them more closely with the true data distribution.
   Moreover, in the evaluation of generative models, such as Generative Adversarial Networks
(GANs), entropy helps in assessing the diversity of the generated samples[14]. It ensures that the
model generates a variety of outputs, not just replicating a subset of the training data, which is
crucial for the effectiveness and realism of the generated samples.
   In summary, entropy serves as a versatile tool in machine learning, aiding in decision-making,
feature selection, model evaluation, and providing a deeper understanding of the data's inherent
structure. Its ability to quantify uncertainty and diversity is invaluable in the quest to develop
robust and interpretable machine learning models.

    2.1. Rough set theory in data analysis

    Rough set theory, introduced by Zdzisław Pawlak in the early 1980s[2], provides a
mathematical framework to deal with vagueness and indiscernibility in information systems. It
is particularly useful in the realm of data analysis for handling imprecise or incomplete
information, offering a robust alternative to traditional statistical methods.
    The fundamental concept in rough set theory is the approximation sets. Given an information
system, any subset of the universe can be approximated using two sets, the lower and upper
approximations. The lower approximation of denoted as (2), is the set of all elements that are
certainly in    based on the available information. Conversely, the upper approximation of           ,
denoted as (3), comprises elements that possibly belong to .
    Formally, these approximations are defined as follows:
                                                                                                   (2)
                                                                                                   (3)
    Here,     represents the equivalence class of under an equivalence relation, which groups
together indiscernible elements (elements that cannot be distinguished using the available
attributes).
    Rough set theory also introduces the concept of the boundary region, which is the set
difference between the upper and lower approximations. The boundary region, denoted as
         , represents the set of elements for which we cannot decisively determine whether they
belong to or not:
                                                                                                   (4)
    In data analysis, these concepts allow for the classification of data into three regions: the
positive, negative, and boundary regions, corresponding to the lower approximation, its
complement, and the boundary region, respectively.
    One of the key strengths of rough set theory is its ability to reduce data complexity without
significant loss of information[15]. Through attribute reduction, it identifies the essential features
necessary for data classification, eliminating redundant or irrelevant attributes. This process not
only simplifies the data but also enhances the interpretability of the resulting models, making it
a valuable tool in exploratory data analysis and decision-making.
    In machine learning, rough set theory has been applied to various tasks, including feature
selection, rule generation, and pattern recognition. By providing a mechanism to deal with
uncertainty and partial knowledge, it complements probabilistic and fuzzy approaches, offering
a different perspective on data analysis[5].
    In conclusion, rough set theory offers a unique lens through which to view data analysis,
emphasizing granularity, discernibility, and interpretability. Its integration into machine learning
paves the way for more nuanced and informed approaches to model evaluation and decision-
making, reinforcing the importance of understanding the intricacies of data in the era of big data
and complex models.


Figure 2: A conceptual diagram illustrating basic concepts in rough set theory.

    In conclusion the integration of Shannon entropy and rough set theory presents a significant
advancement in the methodology of machine learning evaluation. By merging these two concepts,
researchers and practitioners can gain deeper insights into the informational dynamics of data and the
performance of machine learning models.
3. Method
This section delineates the methods and techniques employed to integrate Shannon entropy and
rough set theory for enhancing machine learning model evaluation. The methodology is
structured to systematically address the research problem, providing a clear path from
theoretical underpinnings to practical application.
   The integration of Shannon entropy and rough set theory represents a pioneering approach in
the realm of machine learning, aiming to enhance the interpretability and efficacy of model
evaluation[16]. While entropy measures the uncertainty or randomness in information, rough set
theory provides a framework for dealing with ambiguity and granularity in data sets. The
convergence of these two theories offers a multifaceted lens through which the complexity and
structure of data can be analyzed more profoundly.
   Shannon entropy, traditionally used to quantify the amount of information in a system, can be
applied to the subsets of data delineated by rough set theory. In this context, entropy can measure
the information content within the boundaries of rough sets, offering insights into the
distribution and significance of data attributes. This application allows for a nuanced assessment
of data, highlighting the interplay between various features and their impact on the information
structure.
   It has been demonstrated that by applying entropy within the framework of rough set theory
yields results that resonate closely with the characteristics of the boundary region, as depicted in
Figure 3. This alignment underscores a pivotal aspect of our integrated approach: as the
granularity decreases, the outcomes derived from the entropy calculations for granular data
begin to mirror those obtained from the analysis of the boundary region in rough set theory
   Upon establishing the theoretical basis for integrating Shannon entropy with rough set theory,
the next phase involves the practical application of these concepts to machine learning datasets.
The data is first subjected to a granulation process, where it is divided into subsets based on the
equivalence relations dictated by rough set theory. This granulation is crucial as it forms the basis
for subsequent entropy calculations, allowing us to examine the data at varying levels of
granularity.


Figure 3: Boundary region and entropy change over decreasing granularity.
    Once the data is granulated, Shannon entropy is computed for each subset to quantify the
informational content present within these granules. This step is pivotal as it provides a measure
of the uncertainty or randomness associated with each granule, offering valuable insights into the
underlying structure of the data. The entropy values obtained from this process are then analyzed
in conjunction with the boundary regions defined by rough set theory, enabling a comprehensive
evaluation of the data's complexity and informational content.
    Through this detailed methodological approach, the research aims to demonstrate the value
of combining Shannon entropy and rough set theory in enhancing the evaluation of machine
learning models, offering a new perspective that considers the intricate interplay between data
complexity and model performance.

4. Experiments
This section outlines the experimental framework designed to demonstrate the applicability of
the proposed technique, integrating Shannon entropy and rough set theory for the evaluation of
machine learning models. The experiments are structured to validate the methodology's
effectiveness and to illustrate its potential in offering deeper insights into model performance
and data complexity.
   The experiments are conducted across a variety of datasets, selected to cover a broad
spectrum of domains and complexities:
   1. “Titanic”[17] is a classic dataset in machine learning, the Titanic dataset includes
   passenger information from the ill-fated Titanic voyage. The objective is to predict survival
   outcomes based on various features like age, sex, class, fare, and more. This dataset allows us
   to explore how the proposed method handles binary classification tasks with relatively
   straightforward, structured data.
   2. “Microsoft Malware Detection”[18] is a dataset that is used for detecting malware, it
   presents a more complex challenge with a higher dimensionality space. It consists of
   characteristics of software to determine whether it is malicious or benign. The complexity and
   feature richness of this dataset provide an opportunity to evaluate the proposed method's
   effectiveness in handling intricate, high-dimensional data.


Figure 4: Entropy over granularity curves for “Titanic” dataset

   Each dataset undergoes a standard preprocessing pipeline, including data cleaning,
granulation, feature scaling and the computation of entropy within the rough set framework.
Machine learning models are then trained on these datasets to ensure a comprehensive
evaluation across different types of algorithms.
   After preprocessing each dataset, we apply the proposed entropy-rough set framework to
create granulated views of the data, which then inform the training and evaluation of various
machine learning models. For each task—classification, regression, clustering, and
dimensionality reduction—the models are evaluated using both traditional metrics and our novel
entropy-rough set-based metric.
   For granularity, we will utilize the machine learning models themselves, analyzing how they
segment the dataset based on their intrinsic mechanisms of creating granular subsets. This
evaluation will focus on understanding the models' inherent data partitioning behavior and its
impact on the overall model performance.
   The experimental results illustrate the performance trends of four different machine learning
models: Decision Tree, Random Forest, Logistic Regression, and KNN, as their complexity or
parameter tuning is varied.


Figure 5: Entropy over granularity curves for “Malaware Detection” dataset

   Upon closer examination, with the x-axis representing the exponential scale of data split, the
trends observed in the performance of the machine learning models acquire a new dimension of
interpretation, emphasizing the impact of data volume on model performance Figure 4 and 5.
   1. Decision tree: The initial low performance of the decision tree model at lower bits of data
       suggests that it is not capable of capturing essential patterns with limited information.
       However, as the amount of data increases exponentially, the model's performance
       improves, suggesting that the decision tree may be capable of capturing more patterns
       with more information provided.
   2. Random forest: The gradual improvement in the random forest model's performance
       across an increasing volume of data implies that it is more robust to overfitting than the
       decision tree. This could be due to the model's increasing difficulty in generalizing as the
       data complexity grows with more bits of data.
   3. Logistic regression: The logistic regression model's relatively stable performance at the
       lower end of the data scale suggests it requires a minimal amount of data to establish its
       predictive patterns. However, the subsequent improvement indicates that additional
       precision, especially when increasing exponentially, does not necessarily translate to
       improved performance.
   4. KNN: The KNN model's performance improvement with increased data bits is particularly
       noteworthy. The model benefits from larger data volumes, possibly because more data
       provides a better context for its instance-based learning approach, allowing for more
       accurate neighborhood estimations and, consequently, better performance.
   The results were analyzed using a combination of statistical methods and qualitative
assessments. The analysis aimed to illustrate not only the performance improvements but also
how the entropy and rough set-based metrics provided deeper insights into the model's
interaction with the data.
   Significantly, the experiments showed that in cases where traditional metrics suggested
multiple optimal hyperparameter configurations, the entropy and rough set-derived metrics
often identified a single configuration that offered superior performance in terms of
generalization, robustness, and interpretability.
   These findings underscore the potential of the proposed method to act as a crucial decision-
making tool in hyperparameter tuning, offering a more nuanced approach that goes beyond
conventional performance metrics. The results convincingly demonstrate that the integration of
Shannon entropy and rough set theory can lead to the selection of hyperparameter configurations
that not only optimize predictive performance but also ensure that the model is more aligned
with the underlying data structure and complexity.

5. Discussions
The experimental results provide a nuanced understanding of how different machine learning
models adapt to increasing volumes of data, as represented in an exponential scale of granular
data. This section discusses the implications of these findings, juxtaposing them with existing
research to draw broader conclusions about model behavior and data scalability.
    The findings from this research emphasize the dynamic interplay between data characteristics
and model performance, underscoring the necessity for a holistic approach to machine learning
that considers both the quantitative and qualitative aspects of data and algorithms.
    Decision tree and random forest, both models demonstrate an improvement in performance
with increased model capacity, which seem intuitive as more data typically aids in model training.
This phenomenon aligns with research suggesting that decision tree-based models can benefit
from complex data. The random forest's more gradual improvement compared to the decision
tree could be attributed to its ensemble nature, providing a built-in mechanism to combat
overfitting, albeit not entirely negating the effect of data volume.
    The stability of logistic regression at lower data volumes and its subsequent decline resonate
with studies indicating that logistic regression models, being linear classifiers, have limited
capacity to benefit from massive data if the underlying relationships in the data are not linear or
if the additional data does not introduce new information. This observation is crucial for
practitioners, emphasizing the need to balance data volume with the inherent model capacity.
    The improvement in KNN's performance with more data contrasts with the other models,
highlighting its unique dependency on data volume for performance enhancement. This aligns
with the understanding that KNN models, which rely on neighborhood-based decision-making,
inherently scale their performance with more data points, improving the model's ability to make
informed predictions based on a richer context.
    The observed trends contribute to the broader discourse on the scalability of machine learning
models with respect to data volume. Previous studies have emphasized the importance of
matching model complexity with data complexity to avoid overfitting or underfitting. Our
findings corroborate this perspective, demonstrating that an exponential increase in data volume
does not uniformly translate to linear improvement of performance.
    Moreover, the distinctive behavior of KNN in our experiments underscores the importance of
model selection in the context of data availability. While some models like KNN thrive on larger
datasets, others may not leverage additional data effectively after a certain threshold. This
observation is particularly relevant in the era of big data, where the temptation to
indiscriminately increase dataset sizes is prevalent.
    The results from these experiments offer practical insights for machine learning practitioners:
        1. Model Selection: Practitioners should carefully consider the nature of their data and
             the corresponding model's capacity to handle data volume when selecting a machine
             learning algorithm.
        2. Data Preparation: The findings highlight the need for judicious data preprocessing
             and granulation, especially when dealing with large datasets, to ensure that models
             are not overwhelmed by data volume[19,20].
        3. Performance Evaluation: The integration of Shannon entropy and rough set theory
             for model evaluation provides a novel perspective that goes beyond traditional
             performance metrics, offering a deeper understanding of how models interact with
             data.
    The application of this method can also be instrumental in the domain of hyperparameter
optimization, providing a novel perspective to evaluate and select the optimal set of
hyperparameters for machine learning models. In hyperparameter optimization, the objective is
to find the set of hyperparameters that yields the best model performance, which can often be a
challenging and computationally intensive process.
    In this context, the proposed method can be used as an additional criterion in hyperparameter
tuning algorithms, such as grid search, random search, or Bayesian optimization. By evaluating
the entropy and rough set-derived metrics alongside conventional performance measures,
practitioners can gain deeper insights into the hyperparameter effects, potentially identifying
configurations that not only optimize predictive performance but also enhance model
interpretability and robustness.
    For instance, in a scenario where multiple hyperparameter configurations result in similar
accuracy, the entropy and rough set-based metrics could be the deciding factor, favoring
configurations that yield models with better generalization properties or more interpretable
structures. This approach could lead to more informed decision-making in hyperparameter
selection, ultimately resulting in models that are not only high-performing but also more aligned
with the underlying data structure and complexity.
    Incorporating this method into hyperparameter optimization processes could significantly
enhance the efficiency and effectiveness of model tuning, providing a richer set of criteria to guide
the search for optimal hyperparameters and contributing to the development of more
sophisticated and nuanced machine learning models.
    By applying this method, it's possible to assess the impact of different hyperparameter
configurations on the model's ability to capture and utilize the information within the data. This
method can provide a more granular view of how changes in hyperparameters affect the model's
structure and performance, beyond traditional evaluation metrics.
    This approach offers a multifaceted perspective that enhances traditional evaluation metrics,
enabling a deeper understanding of a model's interaction with data.
    In classification tasks, this method can reveal subtle nuances in how models manage class
boundaries, especially in cases of imbalanced datasets or overlapping class distributions. By
assessing the entropy and rough set-based metrics, we can gauge a model's ability to discern
between classes effectively, not just its overall accuracy. This can lead to improved model designs
that are more sensitive to the intrinsic complexities of the data.
    For regression tasks, the integration of these theories helps in understanding how models
cope with noise and outliers. It can provide insights into the robustness of the model, indicating
how changes in hyperparameters affect the model's ability to generalize from the training data to
unseen data, which is crucial for predictive accuracy in real-world applications.
    In the realm of unsupervised learning, such as clustering and dimensionality reduction, the
proposed method introduces a novel approach to evaluate the quality of the clustering or the
representation of data in reduced-dimensional spaces. It allows us to assess whether the essential
structure of the data is preserved or if important information is lost, thus guiding the tuning of
hyperparameters to achieve more meaningful and interpretable results.
   Moreover, this approach promotes the development of new hyperparameter tuning
algorithms that integrate entropy and rough set-derived metrics into their optimization criteria.
Such algorithms could potentially automate the process of finding hyperparameters that not only
optimize traditional performance metrics but also ensure that the model captures the underlying
data structure efficiently and effectively.

6. Conclusions
This research explored the integration of Shannon entropy and rough set theory as a novel
method for evaluating machine learning models, extending its application across various tasks
including classification, regression, clustering, dimensionality reduction, compression, and
hyperparameter optimization. The experimental results demonstrated the method's potential to
provide deeper insights into model performance and data structure, offering a multifaceted
perspective that complements traditional evaluation metrics.
   In classification and regression tasks, the method revealed nuanced differences in how models
handle increasing data complexity and volume, highlighting the potential risks of overfitting and
underfitting in models like decision trees and logistic regression. For KNN, the method illustrated
an improved performance with increased data, underscoring the model's dependency on data
volume for its effectiveness.
   In clustering and dimensionality reduction, the proposed approach offered a novel metric to
assess the quality of clusters and the information preservation in reduced-dimensional spaces,
respectively. These applications underscored the method's versatility and its ability to enhance
the interpretability and efficacy of unsupervised learning tasks.
   The research also highlighted the method's applicability in compression, where it can serve as
a tool to evaluate the loss of information, and in hyperparameter optimization, where it provides
additional criteria to guide the selection of optimal hyperparameters.
   The integration of these concepts enhances the capacity to discern the subtle intricacies of
model performance and data interaction, providing a richer, more granular perspective on
machine learning efficacy and reliability. This is particularly vital as the field moves towards more
complex, data-driven decision-making processes, where the stakes of model accuracy and
reliability are higher. The ability to evaluate and fine-tune models with such precision is a crucial
step forward, ensuring that machine learning systems can be trusted and relied upon in diverse
applications, from healthcare to autonomous vehicles.
   Overall, the integration of Shannon entropy and rough set theory presents a promising avenue
for advancing machine learning model evaluation. It not only enriches the toolkit available to
practitioners and researchers but also opens up new possibilities for refining machine learning
models to achieve better performance, robustness, and interpretability.
   The prospects for this line of research are expansive. Future work can delve into more
extensive applications, explore the integration of this method with advanced machine learning
models, and investigate its potential in guiding the development of new algorithms. By building
on the foundational work presented here, subsequent research can further elucidate the
complexities of model-data interactions, driving the evolution of machine learning towards more
sophisticated and nuanced methodologies.

Acknowledgements
The research study depicted in this paper is funded by the French National Research Agency
(ANR), project ANR-19-CE23-0005 BI4people (Business intelligence for the people)
References
[1] C. E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal 27
     (3) (1948) 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x.
[2] Z. Pawlak, Rough sets, Int. J. Comput. Inf. Sci. 11 (1982) 341–356.
[3] Z. Wang, X. Zhang, J. Deng, The uncertainty measures for covering rough set models, Soft
     Computing 24 (2020) 11909–11929. doi:10.1007/s00500-020-05098-x.
[4] M. A. Geetha, D. P. Acharjya, N. Ch. S. N. Iyengar, Algebraic properties and measures of
     uncertainty in rough set on two universal sets based on multi-granulation, in: Proceedings of
     the 6th ACM India Computing Convention, ACM, 2013, doi: 10.1145/2522548.2523168.
[5] J. Xu, K. Qu, X. Meng, Y. Sun, Q. Hou, Feature selection based on multiview entropy measures
     in multiperspective rough set, International Journal of Intelligent Systems 37 (2022) 7200–
     7234. doi:10.1002/int.22878.
[6] J. Gawlikowski et al., A Survey of Uncertainty in Deep Neural Networks, arXiv preprint
     arXiv:2107.03342, 2021. doi:10.48550/ARXIV.2107.03342.
[7] A. R. Asadi, An Entropy-Based Model for Hierarchical Learning, arXiv preprint
     arXiv:2212.14681, 2022. doi:10.48550/arXiv.2212.14681.
[8] J. Mosiński, P. Biliński, T. Merritt, A. Ezzerg, D. Korzekwa, AE-Flow: AutoEncoder Normalizing
     Flow, arXiv:2312.16552, 2023. doi:10.48550/ arXiv.2312.16552.
[9] T. Ge, J. Hu, L. Wang, X. Wang, S.-Q. Chen, F. Wei, In-context Autoencoder for Context
     Compression in a Large Language Model, arXiv, 2023. doi:10.48550/ARXIV.2307.06945.
[10] R. Shwartz-Ziv, Information Flow in Deep Neural Networks, arXiv preprint
     arXiv:2202.06749, 2022. doi:10.48550/arXiv.2202.06749.
[11] A. Thuy, D. F. Benoit, Explainability through uncertainty: Trustworthy decision-making with
     neural     networks,        European       Journal    of   Operational     Research     (2023).
     doi:10.1016/j.ejor.2023.09.009.
[12] D. Tiapkin et al., Fast Rates for Maximum Entropy Exploration, arXiv preprint
     arXiv:2303.08059, 2023. doi:10.48550/ arXiv.2303.08059.
[13] A. Mao, M. Mohri, Y. Zhong, Cross-Entropy Loss Functions: Theoretical Analysis and
     Applications, arXiv (2023). doi:10.48550/ arXiv.2304.07288.
[14] D. Reshetova, Y. Bai, X. Wu, A. Ozgur, Understanding Entropic Regularization in GANs,
     arXiv:2111.01387, 2021. doi:10.48550/ arXiv.2111.01387.
[15] D. Sitnikov, O. Ryabov, An Algebraic Approach To Defining Rough Set Approximations And
     Generating Logic Rules, in: Data Mining V, WIT Press, 2004, pp. (specific page numbers if
     available). doi:10.2495/data040171.
[16] L. Zhao, Y. Yao, Subsethood Measures of Spatial Granules, arXiv, 2023. doi:10.48550/
     arXiv.2309.02662.
[17] W. Cukierski, Titanic - Machine Learning from Disaster, 2012. URL:
     https://kaggle.com/competitions/titanic.
[18] R. Ronen, M. Radu, C. Feuerstein, E. Yom-Tov, M. Ahmadi, Microsoft Malware Classification
     Challenge, arXiv (2018). doi:10.48550/ARXIV.1802.10135.
[19] D. Chernyshov, D. Sytnikov, Binary classification based on a combination of rough set theory
     and decision trees, Innovative Technologies and Scientific Solutions for Industries, (2023).
     doi:10.30837/itssi.2023.26.087.
[20] G. Chiaselotti, T. Gentile, F. Infusino, Decision systems in rough set theory: A set operatorial
     perspective, J. Algebra Its Appl. 18 (01) (2019) 1950004. doi:10.1142/s021949881950004x.