<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Interpretable Prototype Parts-based Neural Network for Medical Tabular Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jacek Karolczak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jerzy Stefanowski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Poznan University of Technology, Institute of Computing Science</institution>
          ,
          <addr-line>ul. Piotrowo 2, 60-695 Poznań</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>785</fpage>
      <lpage>794</lpage>
      <abstract>
        <p>The ability to interpret machine learning model decisions is critical in such domains as healthcare, where trust in model predictions is as important as their accuracy. Inspired by the development of prototype parts-based Deep Neural Networks in computer vision, we propose a new model for tabular data, specifically tailored to medical records, that requires discretization of diagnostic result norms. Unlike the original vision models that rely on the spatial structure, our method employs trainable patching over features describing a patient, to learn meaningful prototypical parts from structured data. These parts are represented as binary or discretized feature subsets. This allows the model to express prototypes in human-readable terms, enabling alignment with clinical language and case-based reasoning. Our proposed neural network is inherently interpretable and ofers interpretable concept-based predictions by comparing the patient's description to learned prototypes in the latent space of the network. In experiments, we demonstrate that the model achieves classification performance competitive to widely used baseline models on medical benchmark datasets, while also ofering transparency, bridging the gap between predictive performance and interpretability in clinical decision support.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Interpretable Machine Learning</kwd>
        <kwd>Prototype Learning</kwd>
        <kwd>Case-Based Reasoning</kwd>
        <kwd>Learnable Discretization</kwd>
        <kwd>Tabular Data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Machine learning (ML) has been increasingly used in medicine for many decades, in particular to improve
diagnostic accuracy, predict patient outcomes, and support clinical decision making by uncovering
complex patterns in medical data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Early applications of machine learning prioritized inherently
interpretable models that provided symbolic knowledge representations, such as Decision Trees (DT)
and rule-based systems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Encouraged by the initial successes of these approaches, researchers
began addressing more complex problems using more advanced models such as random forests (RF),
other ensembles, or even hybrid approaches [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Although these ML systems ofer an improvement in
predictive performance [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], they do so at the expense of transparency and interpretability [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Nowadays, in many tasks, Deep Neural Networks (DNN) have become the most popular approach,
particularly for analyzing modalities such as images, time series, or text data. However, a significant
portion of clinical work still relies on tabular data, where the application of deep learning models,
due to their black-box nature, is less widespread and less appreciated. In the healthcare domain, the
reluctance to adopt DNN is partially driven by the dificulty in interpreting their decision-making
processes, making it challenging for physicians to analyze, validate, and ultimately trust their results
in real-world applications. As a result, there has been growing interest in using Explainable AI (XAI)
techniques [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to make machine learning models more transparent and understandable for clinical use.
      </p>
      <p>
        Currently, the landscape of XAI is dominated by feature importance methods such as SHAP [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
LIME [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] being among the most widely used. However, despite their popularity, these approaches
often produce abstract and incomprehensible explanations, even for machine learning experts, and can
be particularly challenging for physicians to understand [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. As a result, there is growing interest in
alternative paradigms that provide more intuitive and human-understandable insights aligned with the
way physicians reason about the patient and the diagnosis.
      </p>
      <p>
        In this context, prototypes – instances that represent groups of similar examples – have emerged as a
particularly promising explanation technique [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Since they correspond directly to input data, they
align more naturally with human reasoning processes and are generally easier to interpret, including
for medical professionals without specialized training in machine learning [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Prototypes can serve as
both local explanations by showing cases similar to the predicted instance and as global explanations by
presenting representative examples from the data. This makes them a powerful tool for understanding
both individual predictions and overall model reasoning.
      </p>
      <p>
        Inspired by the paper [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] on the prototypical part-based network for image classification, where
predictions are explained through interpretable patches rather than complete images, we explore how
similar principles can be adapted for tabular medical data. Despite the success of [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] in other domains,
prototype networks for tabular data remain underexplored, particularly in healthcare. This is notable
because medical data often use discrete range language rather than raw features values. Discretized
variables are easier to interpret because they correspond to clear, meaningful categories, such as age
ranges, test result groups, or risk levels. These discrete features align better with clinical reasoning
and allow for more transparent decision-making. Using these features, models can ofer more intuitive
explanations, helping physicians better understand the predictions and relate them to real-world clinical
scenarios.
      </p>
      <p>To address this gap, we propose a prototype-based neural network, called Model for Explainable
Diagnosis using Interpretable Concepts (MEDIC), specifically designed for tabular medical data. Our
approach introduces discrete prototypes, with the aim of improving interpretability while maintaining
strong predictive performance. While traditional models such as DTs ofer symbolic interpretability,
their reliance on rigid rule structures may not align well with the complexity of clinical reasoning. In
contrast, our method adopts a prototype-based approach that enables more flexible, example-driven
explanations, allowing clinicians to interpret decisions through similarities to real, representative patient
cases. The goal of this study is to develop and evaluate this model in the context of medical records of
patients, with a focus on producing faithful and physician-friendly explanations.</p>
      <p>To ensure reproducibility, the code is publicly available in a GitHub repository1.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] claims that Deep Neural Networks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] do not provide significant performance advantages
over classical approaches such as random forest [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or gradient boosting (GB) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for clinical prediction
tasks utilizing tabular data, which may explain their limited adoption in the healthcare domain [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Despite machine learning models consistently demonstrating superior performance with tabular
clinical data, these ensemble methods sufer from inherent opacity in their decision processes, creating
a critical need for efective explanation frameworks that can provide healthcare professionals with
transparent insights into model reasoning [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        The landscape of explainable AI approaches can be broadly categorized into two paradigms: feature
attribution methods and concept-based methods [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The first group was briefly discussed in Section 1.
The second category includes prototype-based explanations (also called example-based or instance-based
explanations [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), has shown particular promise in aligning with human cognitive processes, especially
in domains where case-based reasoning is predominant. This is particularly true in the medical domain,
where such methods have been shown to be efective in improving interpretability and trust [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Prototype-based explanations generally fall into two families. The post-hoc family identifies
prototypes after model training, typically selecting representative instances from the training set. Notable
algorithms include MMD-Critic [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which employs maximum mean discrepancy to select prototypes
and criticisms, and optimization-based approaches like A-PETE [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and IKNN_PSLFW [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Although
straightforward to apply to tabular medical data, these methods often struggle with high-dimensional
      </p>
      <sec id="sec-2-1">
        <title>1https://github.com/jkarolczak/medic</title>
        <p>
          datasets containing many irrelevant features, which is a common characteristic in healthcare, where
comprehensive diagnostic panels frequently generate information records containing redundant
information [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. In this context, it is important to guide the decision maker’s attention toward the specific
features of the prototype that the model considers relevant [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>
          The second family, ante-hoc or intrinsic prototype methods, integrates prototype reasoning directly
into the model architecture. Usually, these approaches represent prototypes not as complete instances
but as parts or feature conjunctions that participate in decision making through mechanisms such as
weighted voting. This direction gained significant attention following [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], where the approach was
originally proposed for image classification.
        </p>
        <p>
          ProtoPNet [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] represents a breakthrough in this area, introducing a convolutional neural architecture
where class predictions are based on similarities of the learned prototypical parts of images. The key
innovation of ProtoPNet was enabling interpretability through visualization of prototypical image patches
that the model "looks for" when making classifications. When classifying a new image, ProtoPNet
identifies similar-looking patches in the input and compares them to its learned prototypes, with the
similarity scores directly contributing to class predictions. This approach is particularly powerful for
medical imaging, where specific visual patterns (such as tumors or lesions) are diagnostically significant.
As documented in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] and demonstrated in applications like [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], such models enhance transparency
by highlighting medically relevant image regions and explicitly connecting them to learned prototypes
that represent typical visual manifestations of conditions.
        </p>
        <p>
          However, despite ProtoPNet’s successful application across various image processing tasks [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ],
adapting this architecture for tabular data presents unique challenges. Medical tabular data lack the
spatial structure of images that convolutional networks exploit, requiring fundamentally diferent
approaches to identify meaningful "parts" or feature conjunctions. To date, a comparable architecture
specifically designed for tabular medical records remains conspicuously absent from the literature.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. MEDIC: Model for Explainable Diagnosis using Interpretable</title>
    </sec>
    <sec id="sec-4">
      <title>Concepts</title>
      <p>
        In this section, for the first time, we propose the neural network MEDIC: Model for Explainable Diagnosis
using Interpretable Concepts. MEDIC is inspired by the prototypical parts paradigm proposed in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and
is designed to produce accurate and inherently interpretable predictions, which makes it particularly
well suited for medical decision support, which usually requires human interpretation of proposed
results. The model decomposes the decision support process into a small number of meaningful,
humanunderstandable components: discretized input features describing the patient, interpretable feature
subsets (parts) and class prototypes grounded in real data. These elements enable case-based reasoning
and transparent justification of the proposed classification.
      </p>
      <p>At a high level, the architecture follows an interpretable processing pipeline consisting of four key
stages: (1) input discretization, which transforms continuous variables in the patient’s description into
symbolic bins; (2) part extraction, which identifies sparse and semantically coherent subsets of input
features; (3) prototype comparison, where each extracted part of the patient’s description is matched to
learned prototypes, stored as embeddings representing features subsets of feature-value pairs from the
training data; and (4) classification of the considered instance based on its similarity to prototypes. The
complete MEDIC model is trained end-to-end to jointly learn all of these components in a supervised
setting.</p>
      <p>In clinical data, where features often come from heterogeneous sources (e.g. lab tests values, vital
signs, diagnoses), such structured reasoning aligns well with domain expert expectations. Discretized
bins can reflect clinically relevant ranges of diagnostic tests (e.g., abnormally high glucose), sparse
parts mirror combinations that physicians would consider jointly (e.g., elevated CRP and fever), and
prototypes anchor predictions in real cases that can be inspected post hoc.</p>
      <p>We now describe the architecture in detail, starting with describing the interpretable discretization
of continuous input features.</p>
      <p>(a) Fuzzy binning: the input value is softly as- (b) Hard binning: the input is deterministically
signed to each bin based on proximity to bin’s assigned to a single bin with the nearest center.
center. The final encoding is a weighted combi- The encoding becomes a one-hot vector.
nation, where the weights reflect similarity to
the bin centers.</p>
      <sec id="sec-4-1">
        <title>3.1. Interpretable Discretization of Continuous Input Features</title>
        <p>The decretization of continuous medical variables (e.g. age, lab tests values) into symbolic categories can
aid interpretation and facilitate reasoning about patient features. In this work, our aim is to ultimately
produce ranges of continuous features for symbolic interpretability. However, such a hard discretization
is hardly optimizable in gradient-based neural network training.</p>
        <p>To overcome this challenge, we introduce a fuzzy binning layer that enables a smooth, diferentiable
approximation of hard discretization during training. This allows gradients to flow through the
discretization process and enables end-to-end optimization. After training, the soft representation can be
replaced with a deterministic hard binning for better interpretability.</p>
        <p>Fuzzy Binning To allow interpretable discretization of continuous input features, we introduce a
fuzzy binning layer that softly assigns each scalar feature value  ∈ R to a set of  trainable bins. Each
bin is characterized by a learnable center   ∈ R and a shared bandwidth parameter  &gt; 0 . The soft
membership of  to the bin  is defined using a Gaussian kernel:
() =
( − 
2 2</p>
        <p>)2
˜() =</p>
        <p>exp(− ())
∑︀=1 exp(−  ()) + 
(1)
(2)
where ˜() denotes the normalized soft assignment and  is a small constant added for numerical
stability. This results in a fuzzy, probabilistically weighted representation over bins, allowing each input
to contribute partially to multiple bins (see Figure 1a).</p>
        <p>The use of Gaussian kernels for fuzzy binning ofers several advantages over direct distance-based
assignment (e.g., 2 norm). First, the smooth exponential decay naturally reflects uncertainty in
proximity, which is especially relevant when feature values lie near bin boundaries. Second, the
resulting softmax distribution is diferentiable and normalized, facilitating gradient-based optimization
in Deep Neural Networks.</p>
        <p>Importantly, the bin centers   and shared bandwidth  are optimized jointly with other model
parameters during end-to-end training, allowing the discretization scheme to adapt to the data
distribution.</p>
        <p>After initial training of the network, the discretization is switched to the hard mode.
In the hard setup, the input is assigned to a single bin via a non-diferentiable arg min operation over
squared distances:
(3)
(4)
︂(
^() = one_hot arg min ( − 

)
2
︂)</p>
        <p>The resulting representation is a one-hot2 vector (Figure 1b), which can be advantageous for symbolic
interpretation and comparison of prototypes. However, it lacks gradient flow, making it unsuitable for
end-to-end training.</p>
        <p>In the hard binning regime, the input feature values are partitioned into contiguous intervals derived
from the learned bin centers { }=1. Specifically, each bin  is associated with the interval
 =
⎧(︀
⎪
⎨︀[  −1
⎪⎩[︀  −1
− ∞,  1+ 2 )︀</p>
        <p>2
2
+  ,  + +1 )︀</p>
        <p>2
2
+  , +∞)︀
if  = 1
if 1 &lt;  &lt; 
if  = 
facilitates interpretation by decision makers.
such that any scalar input  is discretized into the bin  for which  ∈ . This alternative representation</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. MEDIC Architecture</title>
        <p>
          MEDIC is a neural network inspired by the interpretable prototypical parts-based classification paradigm
proposed in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and integrates symbolic input binning, feature extraction, prototype learning, and
class prediction based on association with learned prototypes. The overview of the entire MEDIC
architecture is shown in Figure 2.
        </p>
        <p>In the beginning, the raw input vector of features describing the patient is transformed by the network
into a sparse high-dimensional binary representation. Each continuous feature is processed by a binning
module introduced in Section 3.1. Meanwhile, categorical features undergo one-hot2 encoding. All
vectors coming from discretization are concatenated into a single ′ dimensional vector.</p>
        <p>Next, the binarized input is multiplied with a trainable set of  patching masks, encoded as a matrix
 ∈ R× ′ . Each mask selects and linearly combines a sparse subset of binary features, efectively
defining a part of the input instance (i.e. its description by features). Intuitively, each part can be seen as
a meaningful combination of clinical indicators – for example, high blood pressure in elderly patients or
elevated glucose and BMI. This encourages the model to focus on patterns that are not only interpretable
but also structured in a way that reflects domain knowledge.</p>
        <p>Each of the  part vectors is passed through a shared feature extractor module, implemented as a
shallow multilayer perceptron with ReLU activations. This module transforms each sparse binary part
into a dense embedding – a compact vector of size ℎ that captures abstract and informative features.
Embeddings are designed to preserve meaningful relationships in the data while reducing dimensionality,
enabling the model to generalize across similar patterns. From an interpretability perspective, this step
summarizes each clinically relevant pattern into a low-dimensional representation that retains the most
diagnostically informative aspects.</p>
        <p>To facilitate interpretable decision-making, the network maintains a set of  learnable prototype
vectors of size ℎ. For each input part represented as embedding, the 2 distance is computed for every
prototype. This results in a  ×</p>
        <p>distance matrix, where each entry quantifies the dissimilarity between
a specific part and a prototype. A max-pooling operation across parts selects the most relevant part
for each prototype, yielding a vector of  minimal distances, where each entry reflects the smallest
distance between a given prototype and the most similar embedding representing a part of the input
2A one-hot vector is a way of representing categories or intervals where only one entry is "on" (set to 1) and all others are
will be marked as active. This makes it easy to interpret into which clinical range the value falls.
describing the patient. This enables comparison of each prototype to its best-matching part in the
patient description.</p>
        <p>To enable case-based reasoning, the network maintains a set of  learnable prototype vectors, each
of dimension ℎ. Conceptually, each prototype represents a summary of a typical clinical condition or
patient case learned from data. Each prototype is anchored in real patient data and corresponds to a
representative example that lies near the center of a cluster of similar cases, making it reflective of
common patterns observed across many patients. For every embedding corresponding to the part of
patient description, the model computes the squared Euclidean (2) distance to each prototype, yielding
a  ×  distance matrix. Each row corresponds to one patient description part and each column to a
prototype.</p>
        <p>Then a maximum-pooling operation is applied across the rows of this matrix (that is, across parts),
selecting for each prototype, the input part that has the smallest distance to that prototype, efectively
identifying the most similar part. This produces a distance vector of length , which summarizes how
closely the input aligns with each of the learned prototypes.</p>
        <p>Finally, this vector of distances is passed through a linear classification layer, producing a probability
distribution over  target classes. Since the classification decision is based directly on similarity to
interpretable prototypes, each linked to specific input parts, the resulting predictions can be traced back
and explained in terms of clinically meaningful comparisons to learned prototype parts.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Three-Stage Training Procedure</title>
        <p>To ensure stable and interpretable network training, we adopt a three-stage training procedure that
enables the model to learn hard bins – intervals, and realistic prototype parts directly from the training
data, all within a gradient-based optimization framework.</p>
        <p>Stage 1: Initialization with Fuzzy Binning and Learnable Prototypes In the initial stage, the
entire network is trained end-to-end with fuzzy binning and randomly initialized learnable prototypes.
This setting ensures smooth gradient flow through the discretization modules, allowing the network to
co-adapt binning thresholds and part extraction masks.</p>
        <p>Fuzzy binning uses soft Gaussian kernels (Section 3.1, Figure 1a), which provide fuzzy assignments
across bins. The patching masks and prototypes are trained jointly using classification loss, combined
with auxiliary regularization terms: (1) L1 sparsity of patching masks, and (2) a diversity penalty to
encourage spread among prototypes, which are further discussed in Section 3.4.</p>
        <p>Stage 2: Hard Binning and Mask Discretization Once convergence in the training criterion is
achieved, the discretization mode is switched to a hard mode by replacing the fuzzy binning with hard
arg min bin selection, freezing binning thresholds (Section 3.1, Figure 1b). Additionally, patching masks
are binarized by thresholding to enforce strict binary groupings of input dimensions into parts.</p>
        <p>This transition enables symbolic interpretability and highlights which specific input features are
most relevant for each part. The rest of the network is fine-tuned using discretized inputs, preserving
the interpretability of the parts.</p>
        <p>Stage 3: Prototype Replacement with Real Parts Finally, the learned prototypes are replaced
with embeddings derived from parts of actual patient records in the training data, ensuring that
each prototype corresponds to a real and representative clinical case. For each prototype, the closest
embedded input part is identified using the 2 distance. These real parts are then copied into the
prototype memory, replacing the synthetic prototypes. This step improves interpretability by anchoring
each prototype to an actual example from the data.</p>
        <p>This last step grounds the network’s reasoning in actual data, allowing domain experts to inspect
prototypical cases for each class. During this phase, the prototype embeddings are frozen, and only
the classification head is fine-tuned to maintain stable performance, as accuracy would otherwise be
expected to decline.</p>
        <p>Model and training complexity From a computational standpoint, MEDIC maintains a relatively
simple neural network architecture, comparable in training complexity to a shallow multi-layer
perceptron with five layers. The operations within the network consist primarily of matrix multiplications
and element-wise products, which are fully parallelizable on modern hardware and avoid sequential
dependencies inherent in architectures such as Recurrent Neural Networks. The final stage of prototype
selection, which matches learned embeddings to parts of real patient records, scales linearly with
the size of the dataset, making it eficient even for larger collections. As a result, both training and
inference remain computationally lightweight, with performance characteristics similar to other small
feed-forward neural networks.</p>
      </sec>
      <sec id="sec-4-4">
        <title>3.4. Objective Function and Regularization</title>
        <p>The model is trained using cross-entropy loss, later denoted as ℒ , as the standard objective for
classification tasks. To improve interpretability and promote eficient structure, two regularization
terms are added. The first is an ℓ 1 sparsity penalty applied to the patching mask parameters:
 ′
1 ∑︁ ∑︁
ℒsparsity =  sparsity · ℓ 1( ) =  sparsity · ′ =1 =1 | | ,
(5)
where  ∈ R× ′ are the patching masks. This term encourages parts to rely on a minimal set of input
features describing each patient. The ℓ1 penalty is chosen because, unlike ℓ2, it promotes exact zeros in
patching masks, efectively turning of irrelevant input dimensions, and therefore leading to sparser
and therefore more interpretable part-feature associations.</p>
        <p>The second term encourages diversity among prototypes by penalizing redundancy in their
representations:
1
diversity ·  ( − 1)
ℒdiversity = − ∑︁ ‖z − z  ‖2 , (6)
̸=
where z and z are prototype embeddings. This promotes coverage of distinct regions in the latent
space. The full training objective function is the sum of three above-mentioned loss parts:
ℒ = ℒCE + ℒsparsity + ℒdiversity .
(7)</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Experiments</title>
      <p>This section presents a comprehensive evaluation of our model, both from an interpretability and a
predictive performance perspective. In the beginning, in Section 4.1 we assess the predictive accuracy of
the method on three benchmark datasets, comparing its performance to selected baseline models.
Subsequently, in Section 4.2 we demonstrate how the learned prototypes may be applied in practice, through
a case study grounded in a real-world medical dataset. This analysis highlights the interpretability of
MEDIC and its ability to form clinically plausible representations.</p>
      <sec id="sec-5-1">
        <title>4.1. Predictive performance</title>
        <p>4.1.1. Experimental setup
Data To evaluate the proposed model, three publicly available medical datasets were selected:
Cirrhosis3, Chronic Kidney Disease (CKD)4, and Diabetes5. These datasets were chosen due to their clinical
relevance and inclusion of multiple laboratory measurements such as blood test results, namely:
• Cirrhosis: bilirubin, cholesterol, albumin, copper, triglycerides, alkaline phosphatase (ALP), serum
glutamic-oxaloacetic transaminase (SGOT), platelets, and prothrombin time;
• CKD: red blood cells, pus cells, pus cell clumps, blood glucose, blood urea, serum creatinine,
sodium, potassium, hemoglobin, packed cell volume, white blood cell count, and red blood cell
count;
• Diabetes: blood glucose and insulin.</p>
        <p>
          Enumerated tests are well suited for discretization. These datasets also include additional numerical
indicators, such as body mass index (BMI, in the Diabetes dataset), which further benefit from
discretization by enhancing interpretability. Moreover, all three datasets exhibit a class imbalance: Cirrhosis
contains 125, 19, and 168 instances for classes death, censored, and censored due to liver transplantation
respectively; CKD dataset consists of 115 and 43 instances for classes not CKD and CKD; and Diabetes
includes 500 negative and 268 positive samples for diabetes presence. These characteristics present
a realistic benchmark for evaluating the model’s ability to process numerical medical features while
addressing class imbalance, a common challenge in clinical predictive modeling [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
Baselines To evaluate the efectiveness of our prototype-based method, we compare it with a set of
well-established baseline models commonly used in clinical machine learning tasks. Ensemble methods
such as Random Forest (RF) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and Gradient Boosting, specifically the XGBoost (XGB) implementation [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ],
serve as strong baselines due to their robustness, ability to capture non-linear feature interactions, and
proven success in medical applications [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. We include a Decision Tree (DT) model [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] as a reference
for interpretability, since it represents models that are interpretable by design and is also a classifier
that turned out to be suficient to solve some problems [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Furthermore, we incorporate a simple
feedforward neural network, also known as a Multi-Layer Perceptron (MLP) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], to provide a baseline
comparison within the class of neural models. The MLP consists of an input layer, one or more hidden
layers with nonlinear activation functions, and an output layer with softmax activation for classification.
The Decision Tree, Random Forest, and MLP, utilized implementations of the scikit-learn6 Python
package. The XGB implementation comes from the XGBoost7 package.
        </p>
        <p>Criterion To compare the performance of diferent models, we use the geometric mean (g-mean) of
sensitivity and specificity, defined as:
3https://archive.ics.uci.edu/dataset/878/cirrhosis+patient+survival+prediction+dataset-1
4https://archive.ics.uci.edu/dataset/336/chronic+kidney+disease
5https://www.kaggle.com/datasets/mathchi/diabetes-data-set
6https://scikit-learn.org/
7https://xgboost.readthedocs.io/
g-mean = √︀sensitivity × specificity
(8)
where sensitivity (also called recall) measures the proportion of actual positive cases correctly identified
and specificity measures the proportion of actual negative cases correctly identified:
sensitivity =   , specificity =   (9)</p>
        <p>+     +</p>
        <p>
          The g-mean balances performance on both classes, ensuring the model is not biased toward the
majority class. This is especially useful in medical datasets with class imbalance, where one outcome
(e.g., disease presence) is much rarer. Unlike accuracy, g-mean provides a more balanced and clinically
meaningful measure by ensuring good performance on both positive and negative classes [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ].
Hyperparameter Optimization Hyperparameter optimization (HPO) is essential to achieve strong
and unbiased performance between models, particularly in settings that involve heterogeneous
architectures. For all evaluated models, we used the Tree-structured Parzen Estimator Approach (TPE)
[
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] implemented in the Optuna framework8 to perform black-box optimization of key
hyperparameters. Each model was independently tuned using 100 optimization trials to maximize the g-mean
metric [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. To ensure a reliable estimation of predictive performance on unseen data, we adopt a 5-fold
cross-validation framework.
        </p>
        <p>
          For MEDIC, we recommend a structured tuning strategy informed by prior experience. Begin by
setting  sparsity to a relatively high value (e.g. ≈ 1.0 ) and  diversity = 0, together with a large number of
prototypes (e.g. 64). Gradually decrease  sparsity until the number of activated features in the prototype
parts stabilizes within a comprehensible range, ideally fewer than 5-7 features per prototype part.
Once this is achieved, incrementally increase  diversity to promote diversity inactivated parts, ensuring
that the prototype part lengths remain consistent. After arriving at interpretable and stable prototype
configurations, the remaining hyperparameters, including the number of prototypes (see Table 1) can
be automatically tuned using a hyperparameter optimization algorithm such as TPE [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ].
        </p>
        <p>Table 1 summarizes the hyperparameters tuned for each model and the corresponding search spaces.
For implementation-specific details, we refer the reader to the respective baseline packages cited in the
Baselines paragraph above.
4.1.2. Results
The performance of the model is summarized in Table 2, using the geometric mean (g-mean) metric
in three datasets. MEDIC demonstrates competitive performance, achieving the best g-mean on the
Cirrhosis and CKD datasets. In the diabetes data set, although XGB achieved the highest score, MEDIC
followed closely, within less than a percentage point, indicating comparable efectiveness.</p>
        <p>Table 3 shows the maximum allowed number of prototypes defined as a model setting
(hyperparameter) and the number of unique prototype parts actually discovered by the MEDIC model during training.
The results suggest that the model can self-regularize by reusing the same prototype multiple times
when no additional meaningful feature-value sets (prototype parts) can be identified. This indicates that
MEDIC avoids overfitting by focusing only on truly informative patterns, even when more prototypes
are allowed.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Studying MEDIC in Action</title>
        <p>To show that MEDIC is interpretable and present how prototype parts learned by MEDIC look like, we
conducted a qualitative analysis using a case study for a single dataset – Cirrhosis. Our aim is to illustrate
how the MEDIC reasoning process works by comparing individual patient cases to representative clinical
patterns (prototypes) previously learned from the data.</p>
        <p>Table 4 shows the discretized feature intervals identified by the network. These intervals represent
meaningful partitions of the input space and often align with known clinical thresholds. For reference,
we compare them with the standard clinical intervals provided by the American College of Clinical
Pharmacy9.</p>
        <p>For example, the learned limit between intervals 1 and 2 for albumin is 3.7 g/dL, which closely matches
the clinical lower limit of 3.5 g/dL. Similarly, the learned limits for the prothrombin time (10.52-10.93
seconds) are well within the reference range of 10–13 seconds. Triglycerides also have a limit near 137
mg / dL, close to the reference value of &lt;150 mg/dL.</p>
        <p>Earlier in this work, we justified using three bins per feature to intuitively capture low–normal–high</p>
        <sec id="sec-5-2-1">
          <title>9https://www.accp.com/docs/sap/Lab_Values_Table_PSAP.pdf</title>
          <p>ranges. However, for certain features in specific disease contexts, deviations in one single direction may
be clinically significant. Interestingly, in this experiment the observations suggest the network to exhibit
the ability to self-organize and adjust these bins accordingly. For example, for copper, the first interval
is efectively disabled by learning a negative upper bound (–8.98), which is not physiologically plausible,
thus disregarding it. Likewise, for platelets and albumin, the network forms exceptionally narrow
middle intervals, suggesting that small changes within this range may be critical for the classification.</p>
          <p>Although Table 4 shows intervals ranging from −∞ to ∞ for technical completeness, in practical
applications these can be translated into clinically relevant and bounded intervals. For example, lower
limits can be set to zero and upper limits can be capped according to known physiological limits, without
afecting the model’s learning performance. This translation can support physicians in interpreting the
model’s behavior more easily.</p>
          <p>Subsequently, MEDIC identified several prototype parts, specific combinations of clinical features
and value ranges, that it considers informative to predict patient outcomes related to cirrhosis. These
prototype parts are presented in Table 5.</p>
          <p>Next, we investigate how a specific patient case (shown in Table 6, classified as 0 – death) is internally
processed and classified by MEDIC through its similarity to the nearest learned prototype parts. Each
prototype part consists of a sparse conjunction of conditions over discretized or binary features, typically
involving only a small number of dimensions. For example, a prototype part may specify conditions
such as: Bilirubin level within [0.79, 3.43) mg/dL, absence of hepatomegaly (Hepatomegaly = 0), and
drug usage indicated (Drug = 1). These concise feature subsets capture clinically meaningful patterns
that contribute to the model’s decisions. MEDIC’s classification is driven by the similarity between the
patient’s description and the prototype parts, which are easily accessible and can be examined by the
user, thereby ofering transparent insight into the model’s reasoning process.</p>
          <p>The list of prototypes with the highest similarity to this example demonstrates how the model
constructs its reasoning by combining interpretable substructures. Many of these substructures align
with known clinical heuristics or highlight relevant feature interactions. For example, bilirubin, ALP
(alkaline phosphatase), and N_Days (duration since patient registration) appear frequently in the most
similar prototypes, highlighting their importance as clinical indicators influencing classification.
Bilirubin in [0.79, 3.43) ∧ Hepatomegaly = 0 ∧ Spiders = 0
Albumin ∈ [3.82, ∞) ∧ Bilirubin ∈ [0.79, 3.43) ∧ Hepatomegaly = 0 ∧ Spiders = 0
Cholesterol ∈ [667, ∞) ∧ Copper ∈ [103.76, ∞) ∧ Hepatomegaly = 0 ∧ Spiders = 0
Bilirubin ∈ (−∞ , 0.79) ∧ Cholesterol ∈ (−∞ , 345) ∧ Hepatomegaly = 1 ∧ Spiders = 0
Albumin ∈ [3.70, 3.82) ∧ Bilirubin ∈ [0.79, 3.43) ∧ Cholesterol ∈ (−∞ , 345) ∧
Hepatomegaly = 0 ∧ Spiders = 0
Bilirubin ∈ [0.79, 3.43) ∧ Cholesterol ∈ (−∞ , 345) ∧ Hepatomegaly = 0 ∧ Platelets ∈
(−∞, 271) ∧ Spiders = 0
Bilirubin ∈ (−∞ , 0.79) ∧ Cholesterol ∈ (−∞ , 345) ∧ Hepatomegaly = 0 ∧ Platelets ∈
(−∞, 271) ∧ Spiders = 0
Albumin ∈ [3.70, 3.82) ∧ Bilirubin ∈ [0.79, 3.43) ∧ Cholesterol ∈ (−∞ , 345) ∧
Hepatomegaly = 0 ∧ Spiders = 0
Albumin ∈ [3.82, ∞) ∧ Bilirubin ∈ [0.79, 3.43) ∧ Cholesterol ∈ (−∞ , 345) ∧
Hepatomegaly = 1 ∧ Spiders = 0
ALP ∈ (−∞ , 3668) ∧ Bilirubin ∈ [0.79, 3.43) ∧ Hepatomegaly = 0 ∧ N_Days ∈ [2343,
∞) ∧ SGOT ∈ [80, 144) ∧ Tryglicerides ∈ (−∞, 137)
ALP ∈ (−∞ , 366) ∧ Bilirubin ∈ [0.79, 3.43) ∧ Drug = 1 ∧ Hepatomegaly = 0 ∧ N_Days
∈ (−∞, 2152) ∧ SGOT ∈ [80, 144) ∧ Tryglicerides ∈ [172, ∞)
Albumin ∈ [3.82, ∞) ∧ ALP ∈ (−∞ , 3668) ∧ Drug = 1 ∧ Hepatomegaly = 0 ∧ N_Days
∈ [2343, ∞) ∧ SGOT ∈ (−∞, 80) ∧ Tryglicerides ∈ [172, ∞)
ALP ∈ (−∞ , 3668) ∧ Bilirubin ∈ [0.79, 3.43) ∧ Drug = 1 ∧ Hepatomegaly = 0 ∧ N_Days
∈ (−∞ , 2152) ∧ Prothrombin ∈ (−∞ , 10.52) ∧ SGOT ∈ [80, 144) ∧ Tryglicerides ∈
[172, ∞)
Albumin ∈ [3.82, ∞) ∧ ALP ∈ (−∞ , 3668) ∧ Bilirubin ∈ [0.79, 3.43) ∧ Drug = 1 ∧
Hepatomegaly = 0 ∧ N_Days ∈ [2343, ∞) ∧ Prothrombin ∈ (−∞ , 10.52) ∧ SGOT ∈
[80, 144) ∧ Tryglicerides ∈ (−∞, 137)</p>
          <p>These prototypes highlight clinically relevant signals, such as low bilirubin, no hepatomegaly, and
shorter hospital stays (N_Days), as contributors to the classification. Furthermore, the inclusion of
interaction patterns, such as elevated triglycerides in the context of certain ranges of liver enzymes,
reflects how the network captures more nuanced decision logic than simple thresholding.
1. Similarity: 0.864</p>
          <p>Prototype: N_Days ∈ (−∞, 2152) ∧ Drug = 1 ∧ Hepatomegaly = 0 ∧ Bilirubin ∈ [0.79, 3.43) ∧
ALP ∈ (−∞, 366) ∧ SGOT ∈ [80, 144) ∧ Tryglicerides ∈ [172, ∞)
2. Similarity: 0.846</p>
          <p>Prototype: Hepatomegaly = 0 ∧ Spiders = 0 ∧ Cholesterol ∈ [667, ∞) ∧ Copper ∈ [103.76, ∞)
3. Similarity: 0.834</p>
          <p>Prototype: N_Days ∈ [2343, ∞) ∧ Hepatomegaly = 0 ∧ Bilirubin ∈ [0.79, 3.43) ∧
ALP ∈ (−∞, 3668) ∧ SGOT ∈ [80, 144) ∧ Tryglicerides ∈ (−∞, 137)
4. Similarity: 0.824</p>
          <p>Prototype: N_Days ∈ [2343, ∞) ∧ Drug = 1 ∧ Hepatomegaly = 0 ∧ Bilirubin ∈ [0.79, 3.43) ∧
Albumin ∈ [3.82, ∞) ∧ ALP ∈ (−∞, 3668) ∧ SGOT ∈ [80, 144) ∧ Tryglicerides ∈ (−∞, 137) ∧
Prothrombin ∈ (−∞, 10.52)
5. Similarity: 0.798</p>
          <p>Prototype: N_Days ∈ (−∞, 2152) ∧ Drug = 1 ∧ Hepatomegaly = 0 ∧ Bilirubin ∈ [0.79, 3.43) ∧
ALP ∈ (−∞, 3668) ∧ SGOT ∈ 80, 144) ∧ Tryglicerides ∈ [172, ∞) ∧ Prothrombin ∈ (−∞ , 10.52)
Although MEDIC’s prototype parts may resemble rules derived from DTs, they difer fundamentally
in how they are used for decision making. DTs require strict rule satisfaction, whereas MEDIC allows
partial matches to prototype parts, enabling more flexible and probabilistic reasoning. This tolerance
to incomplete matches can improve robustness to noise, missing values, and borderline cases, which
are common issues in clinical data due to measurement variability, incomplete testing, or inconsistent
documentation.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions</title>
      <p>This work introduced MEDIC, a novel prototype parts-based neural network architecture that transforms
the approach to interpretability in machine learning for medical tabular data. Unlike conventional
post-hoc explanation methods that retrospectively justify black-box decisions, MEDIC represents a
paradigm shift toward inherently interpretable models that mimic clinical reasoning patterns. The core
innovation lies in our three-component architecture: (1) diferentiable discretization that aligns with
medical thresholds, (2) sparse patching masks that identify clinically meaningful feature combinations,
and (3) prototype-based reasoning that grounds predictions in case-based comparisons, all unified
within an end-to-end trainable framework.</p>
      <p>Evaluation across three clinical datasets demonstrated that MEDIC achieves competitive and
sometimes superior predictive performance compared to established methods while providing transparent
decision processes. In particular, the model autonomously discovered discretization thresholds that
closely align with clinically recognized reference ranges, as evidenced in our cirrhosis case study where
albumin and prothrombin intervals closely matched established medical guidelines. Furthermore, the
prototype parts learned by the model reflected combinations of features that correspond to recognizable
diagnostic patterns, suggesting that MEDIC captures meaningful representations of clinical knowledge.</p>
      <p>The implications of this work extend beyond technical innovation only. By bridging the gap between
accuracy and interpretability, MEDIC addresses a critical barrier to AI adoption in healthcare, the lack
of interpretability, which undermines the trust of clinicians and regulatory acceptance. Our approach
supports collaborative human-AI decision making where the model’s reasoning can be verified, critiqued,
and integrated with clinical expertise.</p>
      <p>The interpretability of MEDIC is achieved by grounding each prediction in prototypical parts –
concise, clinically meaningful feature patterns drawn from real patient data and presented in natural,
domain-specific language. Such clarity is essential for building trust, enabling clinicians to understand
and validate the model’s reasoning, and ensuring that AI-assisted decisions can be confidently integrated
into medical practice.</p>
      <p>
        Several promising directions emerge for future research. First, incorporating domain-specific prior
knowledge into the prototype learning process could further align the model’s representations with
established medical understanding. Second, investigating methods for dynamic prototype adaptation
could enable the model to update its learned representations in response to changes in symptoms over
time. This drift in symptoms may result from evolving disease variants, treatment efects, or changes in
how diseases present between populations, as seen with COVID-19 [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. Another important direction
for future research is conducting formal user studies with medical professionals to assess the practical
usefulness and cognitive accessibility of the explanations generated by MEDIC. Finally, conducting
rigorous user studies with physicians would provide valuable insights into how this approach afects
clinical decision making and how the model’s explanations could be further optimized for maximum
utility.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This research was funded in part by National Science Centre, Poland OPUS grant no.
2023/51/B/ST6/00545 and in part by PUT SBAD 0311/SBAD/0752 grant.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <sec id="sec-8-1">
        <title>The authors have not used generative AI tools in the creation of this work.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Christodoulou</surname>
          </string-name>
          , J. Ma, G. S. Collins,
          <string-name>
            <given-names>E. W.</given-names>
            <surname>Steyerberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Verbakel</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Van Calster</surname>
          </string-name>
          ,
          <article-title>A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models</article-title>
          ,
          <source>Journal of Clinical Epidemiology</source>
          <volume>110</volume>
          (
          <year>2019</year>
          )
          <fpage>12</fpage>
          -
          <lpage>22</lpage>
          . doi:https://doi.org/ 10.1016/j.jclinepi.
          <year>2019</year>
          .
          <volume>02</volume>
          .004.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Podgorelec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kokol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stiglic</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Rozman</surname>
          </string-name>
          ,
          <article-title>Decision trees: An overview and their use in medicine</article-title>
          ,
          <source>Journal of medical systems 26</source>
          (
          <year>2002</year>
          )
          <fpage>445</fpage>
          -
          <lpage>63</lpage>
          . doi:
          <volume>10</volume>
          .1023/A:
          <fpage>1016409317640</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          , Random forests,
          <source>Machine Learning</source>
          <volume>45</volume>
          (
          <year>2001</year>
          )
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          . doi:
          <volume>10</volume>
          .1023/A:
          <fpage>1010933404324</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Vlachas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Damianos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gousetis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Mouratidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kelepouris</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-F. Kollias</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Asimopoulos</surname>
            ,
            <given-names>G. F.</given-names>
          </string-name>
          <string-name>
            <surname>Fragulis</surname>
          </string-name>
          ,
          <article-title>Random forest classification algorithm for medical industry data</article-title>
          ,
          <source>SHS Web of Conferences</source>
          <volume>139</volume>
          (
          <year>2022</year>
          )
          <article-title>03008</article-title>
          . doi:
          <volume>10</volume>
          .1051/shsconf/202213903008.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bharati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R. H.</given-names>
            <surname>Mondal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Podder</surname>
          </string-name>
          ,
          <article-title>A review on explainable artificial intelligence for healthcare: Why, how</article-title>
          , and when?,
          <source>IEEE Transactions on Artificial Intelligence</source>
          <volume>5</volume>
          (
          <year>2024</year>
          )
          <fpage>1429</fpage>
          -
          <lpage>1442</lpage>
          . doi:
          <volume>10</volume>
          . 1109/TAI.
          <year>2023</year>
          .
          <volume>3266418</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bodria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Naretto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rinzivillo</surname>
          </string-name>
          ,
          <article-title>Benchmarking and survey of explanation methods for black box models</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>37</volume>
          (
          <year>2023</year>
          )
          <fpage>1719</fpage>
          -
          <lpage>1778</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10618-023-00933-9.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          ,
          <source>in: Proceedings of the 31st International Conference on Neural Information Processing Systems</source>
          , NIPS'17, Curran Associates Inc.,
          <year>2017</year>
          , p.
          <fpage>4768</fpage>
          -
          <lpage>4777</lpage>
          . doi:
          <volume>10</volume>
          .5555/3295222.3295230.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>"why should I trust you?": Explaining the predictions of any classifier</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , San Francisco, CA, USA,
          <year>August</year>
          13-
          <issue>17</issue>
          ,
          <year>2016</year>
          ,
          <year>2016</year>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Longo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brcic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cabitza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Confalonieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Ser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hayashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holzinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Khosravi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lecue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Malgieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Páez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Samek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Speith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stumpf</surname>
          </string-name>
          ,
          <article-title>Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions</article-title>
          ,
          <source>Information Fusion</source>
          <volume>106</volume>
          (
          <year>2024</year>
          )
          <article-title>102301</article-title>
          . doi:doi.org/10.1016/j.inffus.
          <year>2024</year>
          .
          <volume>102301</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Molnar</surname>
          </string-name>
          ,
          <source>Interpretable Machine Learning</source>
          ,
          <volume>2</volume>
          <fpage>ed</fpage>
          .,
          <source>Independently published</source>
          ,
          <year>2022</year>
          . URL: https: //christophm.github.
          <article-title>io/interpretable-ml-book.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bergen</surname>
          </string-name>
          ,
          <article-title>Prototype-based methods in explainable ai and emerging opportunities in the geosciences</article-title>
          ,
          <source>in: Int. Conf. on Machine Learning (ICML) 2024 AI for Science Workshop</source>
          , PLMR vol.
          <volume>235</volume>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2410.19856.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Barnett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <article-title>This looks like that: deep learning for interpretable image recognition</article-title>
          ,
          <source>in: Proceedings of the 33rd International Conference on Neural Information Processing Systems</source>
          , Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2019</year>
          . URL:
          <volume>10</volume>
          .5555/ 3454287.3455088.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fürnkranz</surname>
          </string-name>
          ,
          <source>Decision Tree In: Encyclopedia of Machine Learning</source>
          ,
          <string-name>
            <surname>Springer</surname>
            <given-names>US</given-names>
          </string-name>
          , Boston, MA,
          <year>2010</year>
          , pp.
          <fpage>263</fpage>
          -
          <lpage>267</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-0-
          <fpage>387</fpage>
          -30164-8_
          <fpage>204</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>Xgboost: A scalable tree boosting system</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , KDD '
          <volume>16</volume>
          ,
          <fpage>2939672</fpage>
          .
          <fpage>2939785</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Adeniran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Onebunne</surname>
          </string-name>
          , P. William,
          <article-title>Explainable ai (xai) in healthcare: Enhancing trust and transparency in critical decision-making</article-title>
          ,
          <source>World Journal of Advanced Research and Reviews</source>
          <volume>23</volume>
          (
          <year>2024</year>
          )
          <fpage>2647</fpage>
          -
          <lpage>2658</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Khanna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. O.</given-names>
            <surname>Koyejo</surname>
          </string-name>
          ,
          <article-title>Examples are not enough, learn to criticize! criticism for interpretability</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , volume
          <volume>29</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2016</year>
          , pp.
          <fpage>2288</fpage>
          -
          <lpage>2296</lpage>
          . doi:doi/10.5555/3157096.3157352.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Karolczak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stefanowski</surname>
          </string-name>
          ,
          <string-name>
            <surname>A-PETE</surname>
          </string-name>
          :
          <article-title>Adaptive prototype explanations of tree ensembles</article-title>
          ,
          <source>in: Progress in Polish Artificial Intelligence Research</source>
          , volume
          <volume>5</volume>
          , Warsaw University of Technology,
          <year>2024</year>
          , pp.
          <fpage>2</fpage>
          -
          <lpage>8</lpage>
          . URL: https://pages.mini.pw.edu.pl/~estatic/pliki/PP-RAI_
          <year>2024</year>
          <article-title>_proceedings</article-title>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Xiao,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>K-nearest neighbors rule combining prototype selection and local feature weighting for classification, Knowledge-Based Systems 243 (</article-title>
          <year>2022</year>
          )
          <article-title>108451</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.knosys.
          <year>2022</year>
          .
          <volume>108451</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Beam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. S.</given-names>
            <surname>Kohane</surname>
          </string-name>
          ,
          <article-title>Big data and machine learning in health care</article-title>
          ,
          <source>JAMA</source>
          <volume>319</volume>
          (
          <year>2018</year>
          )
          <fpage>1317</fpage>
          -
          <lpage>1318</lpage>
          . doi:
          <volume>10</volume>
          .1001/jama.
          <year>2017</year>
          .
          <volume>18391</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Karolczak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stefanowski</surname>
          </string-name>
          ,
          <article-title>This part looks alike this: identifying important parts of explained instances and prototypes, 2025</article-title>
          . URL: https://arxiv.org/abs/2505.05597. arXiv:
          <volume>2505</volume>
          .
          <fpage>05597</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>O.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <article-title>Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions</article-title>
          ,
          <source>in: Proceedings of the 32nd AAAI Conference on Artificial Intelligence</source>
          , AAAI Press,
          <year>2018</year>
          . doi:
          <volume>10</volume>
          .5555/3504035.3504467.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>L. A.</given-names>
            <surname>De Santi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. I.</given-names>
            <surname>Piparo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bargagna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Santarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Celi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Positano</surname>
          </string-name>
          ,
          <article-title>Part-prototype models in medical imaging: Applications and current challenges</article-title>
          ,
          <source>BioMedInformatics</source>
          <volume>4</volume>
          (
          <year>2024</year>
          )
          <fpage>2149</fpage>
          -
          <lpage>2172</lpage>
          . doi:
          <volume>10</volume>
          .3390/biomedinformatics4040115.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>G.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. F.</given-names>
            <surname>Stefenon</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-C. Yow</surname>
          </string-name>
          ,
          <article-title>The shallowest transparent and interpretable deep neural network for image recognition</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>15</volume>
          (
          <year>2025</year>
          )
          <article-title>13940</article-title>
          . doi:
          <volume>10</volume>
          .1038/ s41598-025-92945-2.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Stefanowski</surname>
          </string-name>
          ,
          <article-title>Dealing with data dificulty factors while learning from imbalanced data</article-title>
          <source>In: Challenges in Computational Statistics and Data Mining</source>
          , Springer International Publishing, Cham,
          <year>2016</year>
          , pp.
          <fpage>333</fpage>
          -
          <lpage>363</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -18781-5_
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>D.</given-names>
            <surname>Brzezinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stefanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Susmaga</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Szczech</surname>
          </string-name>
          ,
          <article-title>Visual-based analysis of classification measures and their properties for class imbalanced problems</article-title>
          ,
          <source>Information Sciences 462</source>
          (
          <year>2018</year>
          )
          <fpage>242</fpage>
          -
          <lpage>261</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bergstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bardenet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kégl</surname>
          </string-name>
          ,
          <article-title>Algorithms for hyper-parameter optimization</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , volume
          <volume>24</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2011</year>
          . doi:
          <volume>10</volume>
          .5555/2986459.2986743.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27] V.
          <string-name>
            <surname>-T. Tran</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Porcher</surname>
            ,
            <given-names>I. Pane</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ravaud</surname>
          </string-name>
          ,
          <article-title>Course of post covid-19 disease symptoms over time in the compare long covid prospective e-cohort</article-title>
          ,
          <source>Nature Communications</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <year>1812</year>
          . doi:10.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>