=Paper= {{Paper |id=Vol-3003/paper7 |storemode=property |title=Intelligent Evaluation of the Informative Features of Cardiac Studies Diagnostic Data using Shannon Method |pdfUrl=https://ceur-ws.org/Vol-3003/paper7.pdf |volume=Vol-3003 |authors=Kseniia Bazilevych,Serhii Krivtsov,Mykola Butkevych |dblpUrl=https://dblp.org/rec/conf/profitai/BazilevychKB21 }} ==Intelligent Evaluation of the Informative Features of Cardiac Studies Diagnostic Data using Shannon Method== https://ceur-ws.org/Vol-3003/paper7.pdf
Intelligent Evaluation of the Informative Features of Cardiac
Studies Diagnostic Data using Shannon Method
Kseniia Bazilevych, Serhii Krivtsov, Mykola Butkevych
 National Aerospace University “Kharkiv Aviation Institute”, Chkalow str., 17, Kharkiv, Ukraine

                Abstract
                The paper is devoted to the important issue of separating more informative data from less
                informative data for further analysis and use. This determines the relevance of the study. As a
                result of the study, the methods for assessing the informativeness of signs based on medical
                data were analyzed. On the basis of Shannon's method, a model for assessing information
                content has been built and a software package has been implemented. For the experimental
                study, data from 303 patients and 13 signs were used. The informative value was calculated
                for various groups of cardiac data. We found that the following signs are the most
                informative: tala, type of chest pain, colored vessels, angina pectoris, age. The Shannon
                method is also compared with other methods for assessing the informativeness of features.

                Keywords 1
                Features informativeness, Shannon method, diagnostics, heart disease, cardiac studies.

1. Introduction
   The COVID-19 pandemic caused by the SARS-CoV-2 coronavirus has become a real challenge
not only for health systems, but also for the economy around the world [1]. Announced March 11,
2020. It began with the discovery at the end of December 2019 in the city of Wuhan in the Hubei
province of central China. There are still no specific antiviral drugs for treatment or prevention
against the disease [2]. In severe cases, funds are used to maintain the functions of vital organs.
People of all ages are susceptible to infection. Severe forms of the disease are more likely to develop
in older people and in people with certain medical conditions, including asthma, diabetes, and heart
disease [3].
   The coronavirus pandemic has clearly demonstrated that we must act together and give our fight
against this crisis the necessary momentum to achieve the Sustainable Development Goals [4]. The
COVID-19 pandemic has accelerated the digitalization of all spheres of social activity [5]: education
[6], commerce [7], public administration [8], personnel management [9-10], logistics [11], etc.
Particular attention should be paid to the many approaches to digitalizing medicine [12]. In this area,
information technologies have been developed for insurance [13], decision-making [14], medical
diagnostics [15-16], epidemic control systems [17] and morbidity simulation [18]. In this article, we
will focus on the diagnostic problem that has arisen sharply in connection with the pandemic. There
are not enough people in hospitals [19], and COVID-19 is especially difficult with concomitant
diseases [20].
   Diseases of the cardiovascular system continue to be the leading cause of death in many countries
of the world. Every year 17 million people die from diseases of the cardiovascular system in the
world. According to the Centers for Disease Control and Prevention, life expectancy would be 10
years longer in the absence of such a high prevalence of cardiovascular diseases, covering all
countries and continents [21]. They lead to long-term disability of the adult population and require
colossal economic costs.

International Workshop of IT-professionals on Artificial Intelligence (ProfIT AI 2021), September 20-21, 2021, Kharkiv, Ukraine
EMAIL: ksenia.bazilevich@gmail.com (K. Bazilevych); krivtsovpro@gmail.com (S. Krivtsov); nikolai.butkevych@gmail.com
(M. Butkevych).
ORCID: 0000-0001-5332-9545 (K. Bazilevych); 0000-0001- 5214-0927 (S. Krivtsov); 0000-0001-8189-631x (M. Butkevych).
           ©️ 2021 Copyright for this paper by its authors.
           Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           CEUR Workshop Proceedings (CEUR-WS.org)
    High-risk groups include people who have had heart attacks and strokes. It is important for
patients with repeated heart attacks and high blood pressure to be under medical supervision. High
cholesterol in the patient's blood contributes to narrowing of the blood vessels and requires long-term
medication. Excess weight, high blood sugar and a sedentary lifestyle have an extremely negative
effect on the state of the cardiovascular system, and smoking is one of the most common risk factors.
In the development of atherothrombosis, heredity and age play a significant role, and it is noted that in
recent years cardiovascular diseases have become significantly “younger” [22]. The growth and
occurrence of cardiovascular diseases in young people is associated not only with an incorrect
lifestyle, but also with increased neuropsychological stress. The Internet, TV, phones, radio give us
such a stream of information that our ancestor cannot cope with in a week. Negative emotions and
stress cause an increased amount of adrenaline in the blood, hence fear, anxiety, anxiety, panic, and
increased heart rate [23]. The state of the cardiovascular system quickly reacts to changes in mood,
and the constant imbalance between physical and neuropsychological stress leads to pathological
changes and the development of cardiovascular diseases.
    In Ukraine, cardiovascular diseases are the main cause of death among the population [24].
According to this indicator, the country remains one of the world leaders.
    According to the ranking data, based on the number of deaths of the population in Ukraine [25],
common causes are:
    1. Cardiovascular diseases (64.3%)
    2. Neoplasm (14.1%)
    3. Diseases of the digestive system (4.3%)
    4. Neurological disorders (3.1%)
    5. Self-harm and interpersonal violence (2.7%)
    Nationally, mortality from cardiovascular diseases over the past 29 years has increased by almost
8%: to 449,376 in 2019 and accounts for 64.3% of the total number of deaths, while in 1990 there
were 350,605 deaths from cardiovascular diseases, amounted to 56.5% respectively [26].
    Thus, aim of the paper is development of intelligent information system of heart diseases
diagnostics. To achieve the aim, we are going to develop model based on Shannon method to evaluate
the informative features of cardiac studies.

2. Informative features evaluation
    The informativeness of signs is a relative concept. One and the same system of signs can be
considered informative for solving some problems and uninformative for others. For example, in
medicine, some signs may be significant for the differential diagnosis of diabetes diseases [27], and
others for the diagnosis of heart diseases.
    In the tasks of medical diagnostics, patients act as objects. Signs characterize the results of
examinations, symptoms of diseases and the methods of treatment used. The specifics of modern
requirements for data processing in order to discover knowledge are as follows: data are large,
heterogeneous (binary, ordinal, quantitative), the results must be specific and understandable.
Examples of binary signs are gender, headache, weakness, nausea, etc. An ordinal sign is the severity
of the condition (mild, moderate, severe, life-threatening). Quantitative signs are age, pulse, blood
pressure, hemoglobin content in the blood, respiratory rate, drug dose, etc. The symptomatic
description of the patient is, in fact, a formalized medical history. Having accumulated a sufficient
number of precedents, it is possible to solve various problems: to classify the type of disease
(differential diagnosis), to determine the most appropriate method of treatment, to predict the duration
and outcome of the disease, to assess the risk of complications, and to find syndromes - the most
characteristic set of symptoms for a given disease. When studying objects characterized by a large
number of factors, it is often important to determine which of these factors most affect the properties
of objects of interest to us. In particular, the determination of the informativeness of factors is one of
the important stages in the analysis of the object under study.
    The block diagram of the method for informative features evaluation is shown in Figure 1.
Figure 1: The block diagram of the method for informative features evaluation.

   The symptomatic descriptions of patients are formalized case histories [28]. Having accumulated
the required number of use cases in the database, you can solve various problems:
   • classification of types of diseases;
   • differential diagnostics;
   • determination of effective methods of treatment;
   • prediction of the outcome of the disease;
   • prediction of the duration of the disease;
   • risk assessment of complications;
   • identification of syndromes that are most typical for this disease.
   Speaking about the tasks of medicine, the following features can be distinguished:
   • Quantitative features are features measured in a certain numerical scale.
   • Qualitative features are features used to express terms and concepts that do not have numerical
       values, which are measured in ordinal scales.
   • Nominal features are features measured in a naming scale (e.g. blood group). When analyzing
       such features, each mark of the nominal scale is converted to a boolean scale.
    It is also possible to single out various methods for assessing the informativeness of signs: energy
and information.
    The energy approach is based on the fact that the information content is assessed by the value of
the attribute. The signs are sorted by values, and those whose values are greater are considered the
most informative. For example, according to the amplitude-time analyzes of the electrocardiogram,
the amplitude of the R waves is considered the most informative signs among the amplitudes. But,
such approaches to assessing the information content may turn out to be poorly suitable for object
recognition. If some features are large in absolute values, but are almost the same for objects of
different classes, then by the values of these features it is difficult to assign objects to some classes.
Conversely, if the features are relatively small in magnitude, but differ greatly for objects of different
classes, then objects can be easily classified by their values.
    The method for determining the informativeness is selected depending on the purpose of the study,
the number of studied classes and medical data (coding methods, the number of gradations, the
sample size, etc.)
    Therefore, information methods are more suitable for classification in medical diagnostics,
according to which information of signs is considered as reliable differences between classes of
images in spaces of signs. If, when classifying objects, they need to be attributed to one of two
classes, then the differences in the probability distributions of features constructed from samples of
two compared classes can act as such a reliable difference.

3. Shannon method application
   Shannon's method suggests evaluating information content as a weighted average amount of
information per different grades of a feature [29]. In information theory, information is understood as
the value of the eliminated entropy.



                                                                      ,                               (1)

where G is the number of gradations of the feature;
     K is quantity of classes;
     Pi is the probability of the i-th gradation of the feature


                                                                  ,                                   (2)

where mi,k is the frequency of occurrence of the i-th grade in the K-th class,
      N is the total number of observations;
      Pi,k is probability of occurrence of the i-th gradation of a feature in the K-th class.


                                                              .                                 (3)
    Shannon's method gives an estimate of the informativeness as a normalized value, which varies
from 0 to 1. Therefore, the informativeness of a feature determined by Shannon's method can be said
in absolute terms: closer to 1 for high; closer to 0 for low.
    The block diagram of the Shannon method for informative features evaluation is shown in Figure
2.
Figure 2: The block diagram of the Shannon method for informative features evaluation.
4. Results
   The input data is a dataset of information on the diagnostic data of patients based on cardiac
studies, their age, gender, type of chest pain, cholesterol level, etc., a complete list of parameters in
Table 1.

Table 1
Parameters list
       Name of parameter                      Description                         Data type
              Name                                 ID                              Count
               Age                               Years                             Count
                Sex                               Sex                              String
         Chest pain type                       Pain type                           String
         Blood pressure                         Scores                             Count
           Cholesterol                          Scores                             Count
    Fasting blood sugar < 120                     +/-                                0/1
           Resting ECG                      Normal/Hyper                           String
      Maximum heart rate                        Scores                             Count
              Angina                              +/-                                0/1
               Peak                             Scores                              Float
              Slope                         Flat/Up/Down                           1/2/3
         Colored Vessels                         0/1/2                             0/1/2
               Thal                         Normal/Rev/Fix                         String

    Before software implementation of an information system, it is necessary to design it. For this, the
IDEF0 and DFD methodologies were used.
    The model is based on the concepts of an external entity, process, data storage (storage) and data
flow.
    An external entity is a material object or individual acting as sources or receivers of information,
for example, customers, personnel, suppliers, bank customers, and the like.
    Process is converting input data streams to output in accordance with a certain algorithm. Each
process in the system has its own number and is associated with the executor who performs this
transformation. As in the case of functional diagrams, physical transformation can be carried out by
computers, manually or by special devices. At the upper levels of the hierarchy, when the processes
have not yet been defined, instead of the concept of “process”, the concepts of “system” and
“subsystem” are used, which respectively denote the system as a whole or its functionally complete
part.
    A data warehouse is an abstract device for storing information. The type of device and methods of
placement, removal and storage for such a device are not detailed. Physically, it can be a database, a
file, a table in RAM, a card file on paper, and the like.
    Data flow is the process of transferring some information from a source to a receiver. Physically,
the process of transferring information can occur through cables under the control of a program or
software system, or manually with the participation of devices or people outside the designed system.
    The functional model of the system is presented in Figure 3.
    Decomposition of the system is presented in Figure 4.
Figure 3: The functional model of the information system.




Figure 4: Decomposition of the system.

    In total, for example, data from 303 patients and 13 features was taken (their age, gender, type of
chest pain, cholesterol level, ECG, blood pressure, maximum pressure, blood sugar level, type and
presence of tonsillitis, colored vessels, etc.)
    For software implementation, the C# programming language was used in the Microsoft Visual
Studio environment. To start the software package, you need to upload the data presented in the *.csv
file (Figure 5).
Figure 5: Setting the initial data for program use.

    The data is divided into two classes A – “Healthy” and B – “Sick”.
    The results of the calculation by the Shannon method for assessing the informativeness of the
attribute m = “Patient's age” is shown in Figure 6.




Figure 6: Result of the information system calculations.

   The numerical results are shown in Table 2.
Table 2
Informative features by Shannon method
                        Name of parameter                     Informativeness
                                Age                            0.8944500799
                                 Sex                           0.9604848895
                          Chest pain type                      0.9308938858
                          Blood pressure                       0.8607771991
                            Cholesterol                        0.8053139309
                     Fasting blood sugar < 120                 0.9336653947
                            Resting ECG                        0.9208676495
                       Maximum heart rate                      0.8094061731
                               Angina                          0.9151560649
                                Peak                           0.8490648688
                               Slope                           0.9011879836
                          Colored Vessels                      0.8930070608
                                Thal                           0.8971856127

   Shannon method gives an estimate of the informativeness of the investigated feature in the form of
a value, takes values from 0 to 1. In this case, it is believed that the closer I (x) to 1, the higher the
informativeness of the feature, on the contrary, the closer I (x) to 0, the lower the informative value of
x.

5. Conclusions
    As a result of the study, methods for assessing the informativeness of signs for medical data were
analyzed. The Shannon method was chosen as the most appropriate method for medical data. On the
basis of the Shannon method, a model for assessing the information content was built and a software
package was implemented. For the experimental study, data from 303 patients and 13 features were
used. The information content was calculated for various groups of cardiac data. We got that the
following signs are the most informative: thal, chest pain type, colored vessels, angina, age. The
Shannon method is used to determine the informativeness of a feature that is involved in the
recognition of two classes of objects. Also, comparisons of the Shannon method with other methods
(Kullback and Сumulative frequency method) for assessing the informativeness of features are made.

6. Acknowledgements
   The study was funded by the National Research Foundation of Ukraine in the framework of the
research project 2020.02/0404 on the topic “Development of intelligent technologies for assessing the
epidemic situation to support decision-making within the population biosafety management” [30].

7. References
[1] E.R. Fox, Budgeting in the time of COVID-19, American Journal of Health-System Pharmacy:
    official journal of the American Society of Health-System Pharmacists 77 (15) (2020) 1174-
    1175. doi: 10.1093/ajhp/zxaa185.
[2] M. Gavriatopoulou M, et. al., Emerging treatment strategies for COVID-19 infection, Clinical
    and Experimental Medicine 21 (2) (2021) 167-179. doi: 10.1007/s10238-020-00671-y.
[3] H. Ejaz, et. al., COVID-19 and comorbidities: Deleterious impact on infected patients, Journal of
    Infection and Public Health 13 (12) (2020) 1833-1839. doi: 10.1016/j.jiph.2020.07.014.
[4] K. Heggen, T.J. Sandset, E. Engebretsen, COVID-19 and sustainable development goals,
    Bulletin of World Health Organization 98 (10) (2020) 646. doi: 10.2471/BLT.20.263533.
[5] A. Abd-Alrazaq, et. al., Artificial Intelligence in the Fight Against COVID-19: Scoping Review,
     Journal of Medical Internet Research 22 (12) (2020) e20756. doi: 10.2196/20756.
[6] D. Chumachenko, V. Balitskii, T. Chumachenko, V. Makarova, M. Railian, Intelligent expert
     system of knowledge examination of medical staff regarding infections associated with the
     provision of medical care, CEUR Workshop Proceedings 2386 (2019) 321-330.
[7] P. Piletskiy, et. al., Development and Analysis of Intelligent Recommendation System Using
     Machine Learning Approach, Advances in Intelligent Systems and Computing 1113 (2020) 186-
     197. doi: 10.1007/978-3-030-37618-5_17.
[8] N. Davidich, et. al., Monitoring of urban freight flows distribution considering the human factor,
     Sustainable Cities and Society 75 (2021) 103168. doi: 10.1016/j.scs.2021.103168.
[9] N. Dotsenko, et. al. Modeling of the processes of stakeholder involvement in command
     management in a multi-project environment, Proceedings of 2018 IEEE 13th International
     Scientific and Technical Conference on Computer Sciences and Information Technologies 1
     (2018) 29-33. doi: 10.1109/STC-CSIT.2018.8526613
[10] N. Dotsenko, et. al. Project-oriented management of adaptive teams' formation resources in
     multi-project environment, CEUR Workshop Proceedings 2353 (2019) 911-920.
[11] M. Bielecki, et. al., Air travel and COVID-19 prevention in the pandemic and peri-pandemic
     period: A narrative review, Travel Medicine and Infectious Disease 39 (2021) 101915. doi:
     10.1016/j.tmaid.2020.101915.
[12] S.C. Mathews, et. al., Digital health: a path to validation, NPJ Digital Medicine 2 (2019) 38. doi:
     10.1038/s41746-019-0111-3.
[13] K. Bazilevych, et al. Stochastic modelling of cash flow for personal insurance fund using the
     cloud data storage, International Journal of Computing 17 (3) (2018) 153-162.
     doi: 10.47839/ijc.17.3.1035
[14] D. Chumachenko, et. al. On Intelligent Decision Making in Multiagent Systems in Conditions of
     Uncertainty, Proceedings of 2019 11th International Scientific and Practical Conference on
     Electronics and Information Technologies (2019) 150-154. doi: 10.1109/ELIT.2019.8892307
[15] M. Mazorchuck, et. al. Web-Application Development for Tasks of Prediction in Medical
     Domain, 2018 IEEE 13th International Scientific and Technical Conference on Computer
     Sciences and Information Technologies (CSIT) (2018) 5-8. doi: 10.1109/STC-
     CSIT.2018.8526684
[16] O. Skitsan, et. al., Evaluation of the informative features of cardiac studies diagnostic data using
     the Kullback method, CEUR Workshop Proceedings 2917 (2021) 186-195.
[17] D. Chumachenko, et. al. On-Line Data Processing, Simulation and Forecasting of the
     Coronavirus Disease (COVID-19) Propagation in Ukraine Based on Machine Learning
     Approach, Communications in Computer and Information Science 1158 (2020) 372-382. doi:
     10.1007/978-3-030-61656-4_25
[18] Yu. Polyvianna, et. al. Computer Aided System of Time Series Analysis Methods for Forecasting
     the Epidemics Outbreaks, 2019 15th International Conference on the Experience of Designing
     and Application of CAD Systems (2019) pp. 7.1-7.4. doi: 10.1109/CADSM.2019.8779344
[19] J. Wosik, et. al., Telehealth transformation: COVID-19 and the rise of virtual care, Journal of
     American Medical Informatics Association 27 (6) (2020) 957-962. doi: 10.1093/jamia/ocaa067
[20] M.S. Gold, et. al., COVID-19 and comorbidities: a systematic review and meta-analysis,
     Postgraduate Medicine 132 (8) (2020) 749-755. doi: 10.1080/00325481.2020.1786964.
[21] C.Y. Cheng, C.Y. Hsu, T.C. Wang, Y.C. Jeng, W.H. Yang, The risk of cardiac mortality in
     patients with status epilepticus: A 10-year study using data from the Centers for Disease Control
     and Prevention (CDC), Epilepsy and Behaviour 117 (2021) 107901. doi:
     10.1016/j.yebeh.2021.107901
[22] R.D. Bagnall, E.S. Singer, J. Tfelt-Hansen, Sudden Cardiac Death in the Young, Heart, Lung and
     Circulation 29 (4) (2020) 498-504. doi: 10.1016/j.hlc.2019.11.007.
[23] A. Tajbakhsh, et. al., COVID-19 and cardiac injury: clinical manifestations, biomarkers,
     mechanisms, diagnosis, treatment, and follow up, Expert Review of Anti-Infective Therapy 19
     (3) (2021) 345-357. doi: 10.1080/14787210.2020.1822737
[24] O. Makar, G. Siabrenko, Influence of physical activity on cardiovascular system and prevention
     of cardiovascular diseases (review), Georgian Medical News 285 (2018) 69-74.
[25] R.O. Moiseienko, N.G. Gojda, O.O. Dudina, N.M. Bodnaruk, Development of perinatal
     medicine in Ukraine in the context of international approaches, Wiadomosci Lekarskie 74 (3) 2
     (2021) 761-766.
[26] J. Luck, J.W. Peabody, L.M. DeMaria, C.S. Alvarado, R. Menon, Patient and provider
     perspectives on quality and health system effectiveness in a transition economy: evidence from
     Ukraine, Social Science and Medicine 114 (2014) 57-65. doi: 10.1016/j.socscimed.2014.05.034.
[27] T. Dudkina, et. al., Classification and prediction of diabetes disease using decision tree method,
     CEUR Workshop Proceedings 2824 (2021) 163–172
[28] D. Chumachenko, O. Sokolov, S. Yakovlev, Fuzzy recurrent mappings in multiagent simulation
     of population dynamics, International Journal of Computing 19 (2) (2020) 290-297.
     doi: 10.47839/ijc.19.2.1773.
[29] Qing Tian, T. Arbel, J. J. Clark, Shannon information based adaptive sampling for action
     recognition, 2016 23rd International Conference on Pattern Recognition (ICPR) (2016) 967-972,
     doi: 10.1109/ICPR.2016.7899761.
[30] S. Yakovlev, et. al., The concept of developing a decision support system for the epidemic
     morbidity control, CEUR Workshop Proceedings 2753 (2020) 265–274.