A Novel Feature Vector for ECG Classification using Deep
Learning
Oleksii Kovalchuka, Pavlo Radiuka, Oleksander Barmaka, Sergіi Petrovskyia and Iurii Krakb,c
a
  Khmelnytskyi National University, 11, Instytuts’ka str., Khmelnytskyi, 29016, Ukraine
b
  Taras Shevchenko National University of Kyiv, 64/13, Volodymyrska str., Kyiv, 01601, Ukraine
c
  Glushkov Cybernetics Institute, 40, Glushkov ave., Kyiv, 03187, Ukraine


                 Abstract
                 In the past decade, deep learning techniques have been widely used in the healthcare
                 industry to detect heartbeats and diagnose heart conditions. However, these tools have
                 been criticized for being a “black box” and lacking transparency. Therefore, in this paper,
                 we propose a new approach to making the classification results obtained by deep learning
                 more comprehensible. We suggest forming a vector of features based on ECG signals that
                 correspond to specific heart conditions. This vector includes measurable characteristics of
                 the cardiac cycle, such as wave durations and amplitudes, which are typical and
                 understandable to healthcare professionals. This feature vector serves as input data for a
                 deep neural network that acts as a feature encoder and classifier. Our computational
                 experiments with the handcrafted feature vector achieved an average accuracy of 98.69%,
                 comparable to other deep learning tools based on the complete cardiac cycle. The results
                 of this study suggest that future research should focus on developing interpretable deep
                 learning tools that are transparent and comprehensible to healthcare professionals.

                 Keywords 1
                 Electrocardiogram signals, MIT-BIH arrhythmia database, feature extraction, deep
                 learning, explainable artificial intelligence

1. Introduction
    Electrocardiography is a commonly used method in diagnosing heart disease because it is a
straightforward and dependable way to monitor heart muscle activity. This process produces a visual
record of changes in electrical potentials caused by heart muscle excitation, known as an
electrocardiogram (ECG) [1]. In medical practice, ECG is an effective tool for identifying heart issues
like an irregular heartbeat or arrhythmia, which can lead to life-threatening heart diseases like
myocarditis or cardiosclerosis [2]. Prompt detection of arrhythmia is essential for successful treatment,
but manually analyzing ECG signals to identify arrhythmia under different conditions can be both time-
consuming and prone to errors [3].
    Recently, artificial intelligence (AI) methods and tools, particularly machine learning (ML) [4] and
deep learning (DL) [5] have been actively used to automate the diagnosis of arrhythmia based on ECG
signals. However, DL-based arrhythmia detection methods typically require much training data to
achieve satisfactory results. There is a shortage of well-annotated ECG data available in the public
domain for training multi-layer DL models [6]. Therefore, it is essential to develop new approaches to
prepare input data and form sets of target features for an automated image classifier. To evaluate the

IntelITSIS’2023: 4th International Workshop on Intelligent Information Technologies & Systems of Information Security, March 22–24,
2023, Khmelnytskyi, Ukraine
EMAIL: losha.kovalchyk1998@gmail.com (O. Kovalchuk); radiukpavlo@gmail.com (P. Radiuk); аlexander.barmak@gmail.com (O.
Barmak); petrovskijs69@gmail.com (S. Petrovskyi); yuri.krak@gmail.com (I. Krak).
ORCID: 0000-0001-9828-0941 (O. Kovalchuk); 0000-0003-3609-112X (P. Radiuk); 0000-0003-0739-9678 (O. Barmak); 0000-0002-0590-
0484 (S. Petrovskyi); 0000-0002-8043-0785 (I. Krak).
           © 2023 Copyright for this paper by its authors.
           Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           CEUR Workshop Proceedings (CEUR-WS.org)
effectiveness of arrhythmia detection based on ECG signals, the Association for the Advancement of
Medical Instruments (AAMI) has designed a novel standard [7] (ANSI/AAMI EC57:2012). However,
not all DL-based approaches to arrhythmia detection meet the AAMI standard. Some approaches report
near-perfect classification accuracy (>99%) [5], [8] but provide misleading and unreliable results when
evaluated with the MIT-BIH database [9]. Further experiments have shown that classification accuracy
significantly decreases when evaluated according to AAMI guidelines.
    Detecting heart pathologies is a critical task in the healthcare field, and it is essential not only to
obtain accurate results but also to explain how these results were obtained [10]. Thus, the scientific
community has emphasized the importance of developing AI-based systems for healthcare that are
based on principles of trust [11], which have been formalized into the concept of FATE (Fairness,
Accountability, Transparency, Ethics) in AI [12]. In other words, healthcare professionals should be
able to understand how AI results for medical issues were generated and have the tools to trust these
results. This issue is a significant challenge that needs to be considered carefully when developing AI
methods for healthcare [13].
    In this work, the authors consider the interpretability of AI decision-making for ECG signal
classification that can be defined as follows: the system’s user must understand, based on which features
of the ECG signal the AI method reached one or another decision regarding the possible presence of heart
pathologies. The proposed approach involves inputting a vector of features into the neural network, which
represents the signal in a manner similar to how healthcare professionals interpret it. This approach differs
from previous ones, where the entire cardiac cycle signal was used as input data. The feature vector is
formalized so that the results of the deep neural network are no worse by the classification metrics than
using the full ECG signal.
    The study presented here offers several important scientific contributions:
    •    We proposed a novel approach for representing ECG signals as a feature vector that uses
    characteristics commonly used by healthcare professionals.
    •    We created a new subset of data from the MIT-BIH database using the inter-patient paradigm.
    •    We showed that our approach, which uses ECG fragments, achieves classification accuracy
    similar to other methods that use the complete cardiac cycle signal.
    The article’s structure is as follows: Section 2 provides an analytical overview of deep learning
methods and techniques for processing ECG signals. Section 3 explains the data preprocessing and the
proposed approach for identifying heart pathologies from ECG signals. Section 4 presents the
experimental results on a benchmark dataset, including implementation details, and compares the
proposed approach with state-of-the-art methods. Finally, Section 5 summarizes the research findings.

2. Related works
    The process of using AI to analyze ECG signals generally involves several stages, as outlined in [8]:
1) preprocessing and noise reduction of the ECG signal, 2) segmenting heart contractions, 3) extracting
features, 4) training a model, and 5) classifying ECG signals using the trained model. Recently,
researchers have focused on identifying features from isolated fragments of the ECG signal (such as
pre-selected cardiac cycles) and building a DL model based on these features. Such an approach can
result in higher accuracy due to the detection of hidden features by the DL model. However, the
resulting model may be perceived as a “black box,” which may not provide the end user with an
understanding of the criteria used to make decisions about the classification of ECG signals.
    In recent years, researchers have focused on developing methods for processing ECG signals. For
instance, one study proposed a modified and extended Kalman filter structure [14], which can be used
to reduce ECG signal noise and compress it. Another study [15] presented a method for segmenting
heart contractions using wavelet transforms (WTs), which can detect the QRS complex behind high P
or T waves even in the presence of solid noise or drift. In addition, a new fractional WT technique was
proposed in another study [16] to extract features from ECG signals, which were then used with
traditional ML algorithms to detect peaks and segments.
    In the past few years, researchers have proposed various methods to detect heart pathologies from
ECG signals. For example, in [17], a system is suggested to use a combination of RR intervals, signal
morphology, and higher-order statistics. In [18], a system is presented that classifies arrhythmias based
on normalized RR intervals and morphological features using VP and regression modeling.
Additionally, [19] utilized a support vector machine classifier with reduced features derived from linear
discriminant analysis. In [20], the authors used a hidden Markov model to simulate continuous signals
for arrhythmia detection, which combines temporal information and statistical knowledge of the ECG
signal. In [21], an end-to-end convolutional neural network (CNN) is proposed to directly accept raw
ECG signals as input data without prior feature selection. The construction of CNN architecture to
analyze medical images and signals is described in [22]. Furthermore, [23] suggests a CNN
autoencoder-decoder to classify heartbeats based on ECG signals using temporal and statistical
characteristics. In [24], a new approach based on DenseNet architecture achieved over 95% accuracy
in classifying ECG signals with heart pathology signs, with class activation gradient maps visualizing
specific ECG leads and parts of ECG waves significantly impacting predictive decisions.
    In this study, we have taken a different approach to process ECG signals than previous studies, such
as [16], [21], and [23]. We created a feature vector based on pre-selected fragments of the ECG signal
using the “human-in-a-loop” principle. An autoencoder CNN was then applied to this vector to detect
hidden dependencies in the original ECG signal. By doing so, the accuracy of the built model is
comparable to that of a model which accepts a full ECG signal as input. As a result, this approach is
able to provide both high accuracy in classification inherent in DL methods and an understandable
mechanism for healthcare professionals to identify heart disease pathologies.

3. Methodology of research
   Our research consists of the following steps: data preprocessing, feature extraction, and signal
classification. The stages of the utilized methodology are shown in Fig. 1.


Figure 1: The scheme of the methodology that performs the proposed approach.

3.1.    Dataset and data preprocessing
   For experimental research, this work utilized the MIT-BIH arrhythmia database [9], which contains
48 half-hour segments of two-channel ambulatory ECG recordings taken from 47 patients at a
frequency of 360 Hz. Two or more cardiologists independently annotated each sample in the dataset at
the heart rate level. According to the AAMI guidelines, the fourteen original heartbeat types are
classified into seven categories, as depicted in Fig. 2.


Figure 2: A scheme of the subset of the MIT-BIH database based on the inter-patient paradigm.

   The derivative set from the MIT-BIH database (Fig. 2) comprises seven types of heartbeats: 1) N –
normal beat; 2) V – premature ventricular contraction; 3) R – right bundle branch block beat; 4) L – left
bundle branch block beat; 5) A– atrial premature beat; 6) ! – ventricular flutter wave; 7) E – ventricular
escape beat.
   The MIT-BIH database was initially divided into training and test subsets based on different
paradigms, such as intra-patient and inter-patient. Table 1 presents a comprehensive breakdown of the
data from the MIT-BIH database and the data used in this study.

Table 1
Heart rate distribution by classes of raw data in the MIT-BIH dataset.
            Data             N         V         R         L         A         !         E       Total

       Full MIT-BIH set    163,037   12,293    24,954    2,817     4,569      793      1,258    209,721
    Intra-patient split    72,471    5,789     14,445     173      2,223      155       642     95,898
   Training (80 % split)
   Testing (20 % split)    18,118    1,447     2,889      346       556       310       161     23,827
    Inter-patient split    45,866    3,788     4,756      394       944       317       415     56,480
      Training (DS1)
      Testing (DS2)        26,582    1,269     2,864     1,904      846        11       40      33,516


    Using ECG signals from the same patient for both the training and test sets is known as the intra-
patient paradigm while using ECG signals from different patients is called the inter-patient paradigm.
The inter-patient paradigm has the advantage of avoiding the issue of overlapping information, where
the patient’s personal characteristics may affect the training and testing of the model, leading to
inaccurate classification results.
    In this study, only the derivative set of the MIT-BIH database was used according to the inter-patient
paradigm. The MIT-BIH subset was split into two groups, DS1 and DS2, based on patient identification
numbers. DS1 included patients 102, 107, 109, 110, 115, 116, 117, 120, 123, 125, 202, 204, 206, 208,
209, 216, 224, and 231, while DS2 included patients 101, 104, 106, 112, 114, 118, 124, 126, 201, 203,
207, 213, 214, 215, 219, 222, 225, 229, and 232 [17]. DS1 was used as the training dataset, while DS2
was used as the testing dataset to evaluate the model.
    Raw ECG signals obtained from the MIT-BIH database typically had noise, including myoelectric
artifacts and signal baseline drift. Therefore, we processed all ECG signals using Daubechies 6 wavelet
(db6) and decomposed them into six levels to eliminate noise. The wavelet coefficients from the 3rd to
the 6th level were saved and used for signal reconstruction, as recommended in [15]. After denoising,
the ECG signals were segmented into heart contractions based on the R-peak location coordinates
provided in the MIT-BIH arrhythmia database.

3.2.      Feature vector
   The research provides a novel approach to feature extraction that employs the “human-in-the-loop”
principle. The proposed technique involves feeding a classifier with a feature vector consisting of
quantitative properties of the ECG signal that are relevant to healthcare professionals in diagnosing
heart diseases and are easily understood by them. Performing our contribution includes the following
stages:
   •     Identifying the primary characteristics of the ECG signal, such as peaks and wave boundaries,
   using the neurokit2 library [25].
   •     Using the coordinates of the detected peaks and boundaries to create a list of cardiac cycles.
   Excluding those cardiac cycles with incorrectly defined peaks and boundaries was essential for
   constructing a valid feature vector.
   •     Formation of a vectorized representation of each cardiac cycle. The feature extraction approach
   involves creating a vector representation for each derivative cardiac cycle of 10 ms. This vector
   includes several amplitudes and durations of cardiac cycle waves that are schematically illustrated
   in Fig. 3.
                       (a)                                              (b)


                       (c)                                              (d)


                       (e)                                              (f)


                       (g)                                              (h)


                       (i)                                              (j)
Figure 3: Amplitudes and durations of cardiac cycle waves included in the proposed feature vector:
(a) P-wave amplitude; (b) P-wave duration, ms; (c) PQ-interval duration, ms; (d) Q-wave amplitude;
(e) R-wave amplitude; (f) S-wave amplitude; (g) QRS-complex duration, ms; (h) ST-segment duration;
(i) T-wave amplitude; (j) T-wave duration, ms
   Below is a short description of each vector’s element.

       1. P-wave amplitude.
       2. P-wave duration in milliseconds (ms).
       3. PQ-interval duration, ms.
       4. Q-wave amplitude.
       5. R-wave amplitude.
       6. S-wave amplitude.
       7. QRS-complex duration, ms.
       8. ST-segment duration.
       9. T-wave amplitude.
       10. T-wave duration, ms.
       11. A fragment that represents the duration of the interval between the current and previous R
           peaks in ms (Fig. 4a).
       12. A fragment that represents the duration of the interval between the current and the following
           R peaks in ms (Fig 4b).


                                                  (a)


                                                  (b)
Figure 4: (a) duration between the current and previous R peaks, ms; (b) duration between the current
and the following R peaks, ms.

       13. P-wave fragment, which lasts 80 ms and corresponds to 30 elements of the input signal of
           the cardiac cycle. These 30 elements are added to the feature vector.
       14. QRS complex fragment, which lasts up to 100 milliseconds and corresponds to 37 elements
           of the input signal of the cardiac cycle. These 37 elements are added to the feature vector.
       15. ST segment – this fragment lasts up to 150 ms, corresponding to 56 elements of the input
           signal of the cardiac cycle. These 56 elements are added to the feature vector.
       16. T-wave fragment, which lasts 160 ms and corresponds to 60 elements of the input signal of
           the cardiac cycle. These 60 elements are added to the feature vector.

   As a result of the above steps, the feature vector should contain 195 quantitative elements.

3.3.    Feature extraction and classification
   The extraction of hidden features from the constructed feature vector and subsequent classification
of heartbeats is performed by the autoencoder-type CNN [22]. The scheme of it is shown in Fig. 5.
Figure 5: The scheme of the utilized CNN.

   The CNN used in this study comprises nine layers: four convolutional layers, two subsampling
layers, two fully connected layers, and one SoftMax classification layer. The convolutional layers,
namely C1, C2, C3, and C4, consist of convolutional cores of sizes 5, 5, 3, and 3, respectively. The
convolution operation is expressed mathematically as follows.
                                                   𝑀𝑀𝑘𝑘

                                𝑥𝑥𝑘𝑘𝑙𝑙 = ReLU �� 𝑥𝑥𝑖𝑖𝑙𝑙−1 ∗ 𝑤𝑤𝑖𝑖𝑖𝑖 + 𝑏𝑏𝑘𝑘 �,                              (1)
                                                  𝑖𝑖=1

where 𝑥𝑥𝑘𝑘𝑙𝑙 is the output value of the k-th neuron in the l-th layer, 𝑀𝑀𝑘𝑘 is the effective range of the
convolution kernel, 𝑥𝑥𝑖𝑖𝑙𝑙−1 is the output value of the k-th neuron in the l-th convolutional layer, 𝑏𝑏𝑘𝑘 is the
displacement of k-th neuron in the l-th layer, 𝑤𝑤𝑖𝑖𝑖𝑖 is the kernel weight value between the i-th neuron in
the l-1-st layer and the k-th neuron in the l-th layer, ReLU is the activation function.
   In this study, the CNN architecture’s sub-sampling layers S1 and S2 use the MaxPool function. The
role of these layers is to reduce the input size for the next layer and decrease the dimensionality of the
ECG signals to lessen the computational load. The MaxPool function in the subsampling layer is
formalized as follows.
                                    𝑥𝑥𝑘𝑘𝑙𝑙 = subsample�𝑥𝑥𝑘𝑘𝑙𝑙−1
                                                             𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
                                                                            �,                            (2)
where 𝑥𝑥𝑘𝑘𝑙𝑙 is the output value of the k-th neuron in the l-th layer, subsample is a subsampling operation,
𝑥𝑥𝑘𝑘𝑙𝑙−1
      𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
                     is the output value of the k-th cluster in the l-1-st layer.
        The purpose of fully connected layers F1 and F2 in the CNN architecture is to increase the number
of nonlinear operations. The mathematical expression of the full connectivity operation is given below.
                                               𝑁𝑁

                                  𝑥𝑥𝑘𝑘 = 𝑓𝑓 �� 𝑥𝑥𝑖𝑖𝑙𝑙−1 ∗ 𝑤𝑤𝑖𝑖𝑖𝑖 + 𝑏𝑏𝑘𝑘 �,
                                    𝑙𝑙                                                                    (3)
                                             𝑖𝑖=1
where 𝑥𝑥𝑘𝑘𝑙𝑙 is the output value of the k-th neuron in the l-th layer, 𝑥𝑥𝑖𝑖𝑙𝑙−1 is the output value of the k-th
neuron in the l-th convolutional layer, 𝑏𝑏𝑘𝑘 is the offset of the k-th neuron in the l-th layer, 𝑤𝑤𝑖𝑖𝑖𝑖 is the
kernel weight value between the i-th neuron in the l-1-st layer and the k-th neuron in the l-th layer, N is
the total number of neurons in the l-1-st layer.
   The output layer of the CNN architecture implies the SoftMax activation function with an output of
seven types of heartbeats.
                                                         𝑇𝑇
                                                    𝑒𝑒 𝑋𝑋 𝑤𝑤𝑖𝑖
                               𝑃𝑃�(𝑦𝑦 = 𝑖𝑖|𝑋𝑋)� = 7                 , 𝑖𝑖 = 1,7,                           (4)
                                                 ∑𝑗𝑗=1 𝑒𝑒 𝑋𝑋𝑇𝑇 𝑤𝑤𝑖𝑖
where 𝑋𝑋 𝑇𝑇 is the output value vector of the final fully connected layer, 𝑤𝑤𝑖𝑖 is the kernel weight value
between the i-th neuron in the final fully connected layer.
   More details about the utilized CNN architecture are presented in our previous work [26].

3.4.    Evaluation criteria and experiment setup
   We shall denote the number of positive and negative cases in the initial dataset as P and N,
respectively. When a classifier is applied to the dataset, the objects are sorted into true positive (TP),
true negative (TN), false positive (FP), and false negative (FN) categories. This study assessed the
proposed approach using several statistical metrics, which are defined as follows.

                                                 TP + TN
                               Accuracy =                     ,                                     (5)
                                            TP + TN + FP + FN
                                                    TP
                                     Precision =          ,                                         (6)
                                                 TP + FP
                                                  TP
                                       Recall =         ,                                           (7)
                                                TP + FN
                                                2TP
                                     𝐹𝐹1 =                .                                         (8)
                                           2TP + FP + FN
    To train the network, we employed Adam optimization with a total of 10 epochs. We set the training
parameters based on our previous research [22], [26]: the learning rate was set between 0.0001 and
0.001, weight decay was 0.0005, momentum was 0.85, and batch size was 64. On a single GPU, the
training process lasted about 155 minutes.
    The computational experiments were performed based on the software setup of Python v3.9,
neurolink2 [25], Scikit-learn [27], and TensorFlow [28]. The experiments were run on a system with
an eight-core Ryzen 2700, 32GB RAM, and a single NVIDIA GeForce GTX1080 CPU with 8 GB
video memory.

4. Results and discussion
   The proposed approach was tested using a stratified approach to create training and testing datasets
that are mutually exclusive. The training dataset was divided into 90% for direct training data and 10%
for validation data, which are also mutually exclusive. The results of the training and validation are
shown in Fig. 6.
   In Fig. 6, the blue curve indicates the classification accuracy for the training set, while the green
curve represents the validation set.


                         (a)                                                  (b)

Figure 6: Learning curves obtained by the CNN model built based on the proposed approach: (a)
accuracy and (b) loss.
    As we can observe, both curves nearly overlap, indicating that the CNN model developed was
trained effectively without overfitting.
    The proposed approach’s classification results were compared to those of Gupta et al. [16],
Pokaprakarn et al. [21], and Sun et al. [23]. All three analogs were initially introduced and examined
based on an entire cardiac cycle as input data. The confusion matrices of the four testing approaches
are shown in Fig. 7.


                        (a)                                                  (b)


                        (c)                                                  (d)
Figure 7: Confusion matrices of heart rate by ECG signals on the test dataset obtained by approaches:
(a) Gupta et al., (b) Pokaprakarn et al., (c) Sun et al., and (d) the proposed approach.

   Ten experiments were conducted to confirm the built CNN model’s stability and practical significance.
Table 2 shows the statistical indicators obtained by the testing approaches on the test dataset.
   The proposed approach achieved an overall classification accuracy of 98.69%, while Gupta et al.
achieved 97.96%, Pokaprakarn et al. – 98.12%, and Sun et al. – 98.57%. The relative classification
accuracy values indicate that the proposed model is stable and competitive. Notably, high accuracy was
achieved when using both the entire cardiac cycle and its individual fragments, demonstrating the
practical significance of the selected amplitudes and durations of cardiac cycle waves in the constructed
feature vector.
   Meanwhile, examining related works revealed opportunities to enhance the classification accuracy
of ECG signals and improve the transparency of the results obtained by using additional techniques.
One of the techniques is normalizing the ECG signal, which could assist the model in determining the
significance of individual signal fragments, ultimately increasing the model’s capability to detect
hidden patterns.
Table 2
Statistical indicators obtained by the testing approaches on the test dataset.
 Class             Approach                Precision          Recall            𝑭𝑭𝟏𝟏         Number of
                                                                                              samples
   N            Gupta et al. [16]             0.96            0.98             0.97            26582
             Pokaprakarn et al. [21]          0.98             1.0             0.99
                 Sun et al. [23]              0.99             1.0             0.99
                     Ours                     0.99            0.99             0.99
   V            Gupta et al. [16]             0.96            0.93             0.94             1269
             Pokaprakarn et al. [21]          0.96            0.95             0.95
                 Sun et al. [23]              0.98            0.92             0.95
                     Ours                     0.97            0.94             0.95
   R            Gupta et al. [16]             0.97            0.93             0.95             2864
             Pokaprakarn et al. [21]           1.0            0.91             0.95
                 Sun et al. [23]              0.99            0.98             0.98
                     Ours                     0.99            0.99             0.98
   L            Gupta et al. [16]             0.97            0.97             0.97             1904
             Pokaprakarn et al. [21]           1.0            0.99             0.99
                 Sun et al. [23]              0.98            0.99             0.99
                     Ours                      1.0            0.99             0.99
   A            Gupta et al. [16]             0.86            0.77             0.82             846
             Pokaprakarn et al. [21]          0.88            0.79             0.83
                 Sun et al. [23]              0.89            0.79             0.84
                     Ours                     0.85            0.84             0.85
   !            Gupta et al. [16]             0.62            0.33             0.49              11
             Pokaprakarn et al. [21]          0.67            0.36             0.47
                 Sun et al. [23]              0.80            0.36             0.50
                     Ours                     0.56            0.45             0.50
   E            Gupta et al. [16]             0.88            0.92             0.90              40
             Pokaprakarn et al. [21]          0.95            0.88             0.91
                 Sun et al. [23]               1.0            0.78             0.87
                     Ours                     0.85            0.97             0.91

   Furthermore, the model’s transparency could be enhanced by incorporating into the constructed
feature vector individual characteristics of a person, such as age, gender, weight, and height, which are
typically known to healthcare professionals during diagnosis but are usually omitted in DL tools. By
doing so, these modifications could improve both classification accuracy and the mechanisms that
healthcare professionals use to detect heart disease pathologies.

5. Conclusion
    This study focuses on developing a novel approach to create a feature vector that can detect heart
diseases through ECG signals. This new approach aims to enhance the interpretability of the results
obtained. Unlike previous approaches that utilized the entire signal of the cardiac cycle as the input data
for neural networks, this approach inputs information about the signal in terms that doctors use. The
feature vector comprises discrete fragments of the cardiac cycle, along with the amplitude and duration
of the waves, which are crucial in diagnosing heart diseases. To test this approach, a derivative subset
of the MIT-BIH database was utilized under the inter-patient paradigm. The proposed approach
achieved an average accuracy of 98.69%, comparable to other methods based on a complete cardiac
cycle. High accuracy was achieved when using both the entire cardiac cycle and its individual
fragments, demonstrating the practical significance of the selected amplitudes and durations of cardiac
cycle waves in the constructed feature vector.
   Furthermore, the information system based on this approach can be directly implemented on
wearable devices for long-term ECG monitoring. Future research will aim to develop transparent and
understandable solutions for healthcare professionals using AI tools.

6. References
[1] S. Kaplan Berkaya, A. K. Uysal, E. Sora Gunal, S. Ergin, S. Gunal, and M. B. Gulmezoglu, A
     survey on ECG analysis, Biomed. Signal Process. Control, vol. 43, pp. 216–235, May 2018,
     doi:10.1016/j.bspc.2018.03.003.
[2] K. M. Bonney, D. J. Luthringer, S. A. Kim, N. J. Garg, and D. M. Engman, Pathology and
     pathogenesis of chagas heart disease, Annual Review of Pathology: Mechanisms of Disease, vol.
     14, no. 1, pp. 421–447, 2019, doi:10.1146/annurev-pathol-020117-043711.
[3] P. M. Tripathi, A. Kumar, R. Komaragiri, and M. Kumar, A review on computational methods for
     denoising and detecting ECG signals to detect cardiovascular diseases, Arch Computat Methods
     Eng, vol. 29, no. 3, pp. 1875–1914, May 2022, doi:10.1007/s11831-021-09642-2.
[4] I. Krak, A. Pashko, O. Khorozov, and O. Stelia, Physiological signals analysis, recognition
     and classification using machine learning algorithms, in Computer Modeling and Intelligent
     Systems, Zaporizhzhia, Ukraine, April 27-May 1, 2020, 2020, vol. 2608, pp. 955-965.
     doi:10.32782/cmis/2608-71.
[5] S. W. Chen, S. L. Wang, X. Z. Qi, S. M. Samuri, and C. Yang, Review of ECG detection and
     classification based on deep learning: Coherent taxonomy, motivation, open challenges and
     recommendations, Biomedical Signal Processing and Control, vol. 74, p. 103493, Apr. 2022,
     doi:10.1016/j.bspc.2022.103493.
[6] M. A. Reyna et al., Issues in the automated classification of multilead ECGs using heterogeneous
     labels and populations, Physiol. Meas., vol. 43, no. 8, p. 084001, Aug. 2022, doi:10.1088/1361-
     6579/ac79fd.
[7] ANSI/AAMI EC57:2012 (ANSI/AAMI EC 57:2012) - Testing and reporting performance results
     of cardiac rhythm and ST segment measurement algorithms, 2012. [Online]. Available:
     https://webstore.ansi.org/standards/aami/ansiaamiec572012ec57.
[8] S. K. Saini and R. Gupta, Artificial intelligence methods for analysis of electrocardiogram signals
     for cardiac abnormalities: State-of-the-art and future challenges, Artif Intell Rev, vol. 55, no. 2,
     pp. 1519–1565, Feb. 2022, doi:10.1007/s10462-021-09999-7.
[9] S. Kuila, N. Dhanda, and S. Joardar, Feature extraction and classification of MIT-BIH arrhythmia
     database, in Proceedings of the 2nd International Conference on Communication, Devices and
     Computing, Singapore, 2020, pp. 417–427. doi:10.1007/978-981-15-0829-5_41.
[10] J. Petch, S. Di, and W. Nelson, Opening the black box: The promise and limitations of explainable
     machine learning in cardiology, Can. J. Cardiol., vol. 38, no. 2, pp. 204–213, Sep. 2021,
     doi:10.1016/j.cjca.2021.09.004.
[11] E. Manziuk, O. Barmak, I. Krak, O. Mazurets, and T. Skrypnyk, Formal model of trustworthy
     artificial intelligence based on standardisation, in Proceedings of the 2nd International Workshop
     on Intelligent Information Technologies & Systems of Information Security (IntelITSIS-2021),
     Khmelnytskyi, Ukraine, March 24–26, 2021, 2021, vol. 2853, pp. 190–197. [Online]. Available:
     http://ceur-ws.org/Vol-2853/short18.pdf.
[12] A. Shaban-Nejad, M. Michalowski, J. S. Brownstein, and D. L. Buckeridge, Guest editorial
     explainable AI: Towards fairness, accountability, transparency and trust in healthcare, IEEE
     Journal of Biomedical and Health Informatics, vol. 25, no. 7, pp. 2374–2375, Jul. 2021,
     doi:10.1109/JBHI.2021.3088832.
[13] P. Radiuk, O. Kovalchuk, V. Slobodzian, E. Manziuk, and I. Krak, Human-in-the-loop approach
     based on MRI and ECG for healthcare diagnosis, in Proceedings of the 5th International
     Conference on Informatics & Data-Driven Medicine, Lyon, France, 18-20 November, 2022, vol.
     3302, pp. 9–20. [Online]. Available: https://ceur-ws.org/Vol-3302/paper1.pdf
[14] Z. Zhang, Q. Yu, Q. Zhang, N. Ning, and J. Li, A Kalman filtering based adaptive threshold
     algorithm for QRS complex detection, Biomedical Signal Processing and Control, vol. 58, p.
     101827, Apr. 2020, doi:10.1016/j.bspc.2019.101827.
[15] C. K. Jha and M. H. Kolekar, Cardiac arrhythmia classification using tunable Q-wavelet transform
     based features and support vector machine classifier, Biomedical Signal Processing and Control,
     vol. 59, p. 101875, May 2020, doi:10.1016/j.bspc.2020.101875.
[16] V. Gupta, M. Mittal, V. Mittal, A. K. Sharma, and N. K. Saxena, “A novel feature extraction-based
     ECG signal analysis,” J. Inst. Eng. India Ser. B, vol. 102, no. 5, pp. 903–913, Oct. 2021,
     doi:10.1007/s40031-021-00591-9.
[17] F. M. Dias, H. L. M. Monteiro, T. W. Cabral, R. Naji, M. Kuehni, and E. J. da S. Luz, Arrhythmia
     classification from single-lead ECG signals using the inter-patient paradigm, Computer Methods
     and Programs in Biomedicine, vol. 202, p. 105948, Apr. 2021, doi:10.1016/j.cmpb.2021.105948.
[18] N. Widatalla et al., Similarities between maternal and fetal RR interval tachograms and their
     association with fetal development, Frontiers in Physiology, vol. 13, 2022, Accessed: Feb. 04,
     2023. [Online]. Available: https://www.frontiersin.org/articles/10.3389/fphys.2022.964755
[19] X. Tang, Z. Ma, Q. Hu, and W. Tang, A real-time arrhythmia heartbeats classification algorithm
     using parallel delta modulations and rotated linear-kernel support vector machines, IEEE
     Transactions on Biomedical Engineering, vol. 67, no. 4, pp. 978–986, Apr. 2020,
     doi:10.1109/TBME.2019.2926104.
[20] A. K. Sangaiah, M. Arumugam, and G.-B. Bian, An intelligent learning approach for improving
     ECG signal classification and arrhythmia analysis, Artificial Intelligence in Medicine, vol. 103, p.
     101788, Mar. 2020, doi:10.1016/j.artmed.2019.101788.
[21] T. Pokaprakarn, R. R. Kitzmiller, R. Moorman, D. E. Lake, A. K. Krishnamurthy, and M. R.
     Kosorok, Sequence to sequence ECG cardiac rhythm classification using convolutional recurrent
     neural networks, IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 2, pp. 572–580,
     Feb. 2022, doi:10.1109/JBHI.2021.3098662.
[22] I. Krak, O. Barmak, and P. Radiuk, Detection of early pneumonia on individual CT scans with
     dilated convolutions, in Proceedings of the 2nd International Workshop on Intelligent Information
     Technologies & Systems of Information Security (IntelITSIS-2021), Khmelnytskyi, Ukraine,
     March 24–26, 2021, 2021, vol. 2853, pp. 214–227. Accessed: May 09, 2021. [Online]. Available:
     http://ceur-ws.org/Vol-2853/
[23] L. Sun, Z. Zhong, Z. Qu, and N. Xiong, PerAE: An effective personalized autoencoder for ECG-
     based biometric in augmented reality system, IEEE Journal of Biomedical and Health Informatics,
     vol. 26, no. 6, pp. 2435–2446, Jun. 2022, doi:10.1109/JBHI.2022.3145999.
[24] V. Jahmunah, E. Y. K. Ng, R.-S. Tan, S. L. Oh, and U. R. Acharya, Explainable detection of
     myocardial infarction using deep learning models with Grad-CAM technique on ECG signals,
     Comput. Biol. Med., vol. 146, p. 105550, Jul. 2022, doi:10.1016/j.compbiomed.2022.105550.
[25] D. Makowski et al., NeuroKit2: A Python toolbox for neurophysiological signal processing, Behav
     Res, vol. 53, no. 4, pp. 1689–1696, Aug. 2021, doi:10.3758/s13428-020-01516-y.
[26] P. Radiuk, O. Barmak, and I. Krak, “An approach to early diagnosis of pneumonia on individual
     radiographs based on the CNN information technology,” The Open Bioinformatics Journal, vol.
     14, no. 1, pp. 92–105, Jun. 2021, doi:10.2174/1875036202114010093.
[27] F. Pedregosa et al., “Scikit-learn: Machine learning in Python.” arXiv, Jun. 05, 2018.
     doi:10.48550/arXiv.1201.0490.
[28] M. Abadi et al., TensorFlow: A system for large-scale machine learning, in Proceedings of 12th
     USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, 2–
     4 November 2016, 2019, pp. 265–283. [Online]. Available:
     https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf