2.Data Processing

Intelligent Discriminant Diagnosis of Heart Disease Cases

Qi Wang

Guici Chen

0 0 Wuhan University of Science and Technology , Wuhan, Hubei; 430065 , China

2022

21 23

The application of artificial intelligence in the medical field has greatly alleviated the contradiction between people's growing demand for medical resources and the actual shortage of medical resources. In this paper, the combination of Fisher dimension reduction and Hidden Markov Model (HMM) is applied to the intelligent diagnosis of heart disease cases. The index sequence of heart disease cases was simplified by Fisher dimension reduction. The HMM of heart disease and non-heart disease is established by Baum-Welch algorithm. The matching score between the observation sequence and the two HMM is calculated by the Forward-Backward algorithm. The experimental results show that the diagnosis of heart disease cases by matching scores is reliable.

eol>Fisher dimension reduction HMM heart disease diagnosis classification

2.Data Processing

accounting for 47.65%, and 390 groups of patients without heart disease, accounting for 52.35%. Then, the data is deconstructed and mapped. The processed data is shown in Table 1. 1 (MaxHR is within the range of its average +/-2*variance), 0 (Otherwise) 1 (ExerciseAngina = Yes), 0 (ExerciseAngina = No) 1 (Oldpeak > 0.5), 0 (Otherwise) 1 (ST_Slope = Up), 0 (Otherwise) 1 (ST_Slope = Flat), 0 (Otherwise) 1 (ST_Slope = Dowm), 0 (Otherwise) output class [1: heart disease, 0: Normal]

3. Combining HMM and Fisher's diagnostic model 3.1. Fisher dimension reduction

classificatio inter-class distance new subspace are guaranteed.

There are 19 groups of data indicators after deconstruction and mapping. To reduce the time complexity and space complexity, we use the Fisher dimension reduction method to extract significant features and simplify the prediction indicators. Fisher's idea of dimensionality reduction is to project high-dimensional pattern samples into the optimal discriminant vector space ω to extract -n information and compress the dimension of feature space. After projection, the maximum and the minimum intra-class distance of pattern samples in the = ∑∈ = ( ( − − )( )( − − ) , ) +∑∈

= ( − ∑∈

, )( − ) , ( 1 ) ( 2 ) The best projection direction ω is the direction that makes / maximum. Here, the Lagrange multiplier method uses to solve the best projection direction and obtains the discriminant function y = ω x

3.2. Baum-Welch algorithm for solving HMM parameters

To establish HMM with heart disease and non-heart disease, we first need to obtain its parameter λ = (A, B, Π). A is the state transition matrix composed of ; B is the observation-generated probability matrix composed of (), and Π is the initial state probability distribution. First, get the numerical sequence of ST_Slope UP, ASY, ExerciseAngia, Sex, and RestingBP input into the HMM and the randomly given parameter π , a , b (k). Then calculate (, ), () to update the model parameters. (, ) describes the probability that t is in state and t+1 is in state () describes the probability that t is in state q at time t, whic h are recorded as: at time t, and ( 3 ) ( 4 ) ( 5 ) ( 6 ) (7) (8) (9) (10) Then update the model parameters, (, ) = () =

() = () = = ∑ ∑ ∑ ∑ ∑ ∑

( (|)

) () () () () () , , () () , ∑ ∑ ∑ ∑ , () () (,) , () () () () () () , non-heart disease are trained by the Baum-Welch algorithm.

If the value has converged, the algorithm ends, otherwise, continues to iterate. The parameter λ = (A , B , Π ) of the HMM of heart disease and the parameter λ = (A , B , Π )of the HMM of 3.3. Forward-Backward algorithm to distinguish the category (|) matching score p(O| ) ( i=0,1) between the observation index sequence O and the two models. Compare the score size, and determine the category of the observation index sequence.The first step of the Forward algorithm is to calculate the forward probability () of each state at time 1, the second step is to calculate the forward probability () at times 2, 3,..., T, and finally calculate () = ( )

, () = ( ∑ (|) = () ∑

) ( () , )

The Backward algorithm is the reverse process of the forward algorithm, so I won't repeat it here.

4.Model experiment test

Through the Fisher discriminant function, we extracted 5 significant indicators from the original 19 groups of indicator data. At the same time, the ranking of the importance of the five indicators is ST_ Slope UP>ASY > ExerciseAngina> Sex > RestingBP.

To unify the input length of the index series, we add five opposite indexes to the observation index series of the HMM. Therefore, the observation index series of the HMM includes RestingBP high, RestingBP normal,

Male, Female, ChestPainType yes, ChestPainType as, ExerciseAngia yes, ExerciseAngia no, ST_ Slope-up, ST_ Ten indicators of Slope Flat Down. We select 80% of the observation index sequence data of all data sets as the training set to input the HMM. The model parameters are obtained by the Baum-Welch algorithm.

Four groups of observation index sequences are selected and the matching scores calculated by the Forward-Backward algorithm are shown in Figure 1. Through Figure 1, we can see that the matching scores of the same index sequence under different models have certain differences, which shows that the validity of observation sequence data can be judged by matching scores. 0.0004 0.0003 HeartDisease

Non HeartDisease

HeartDisease

Non HeartDisease Index series 3

Index series 4

We take all the data as the validation set, and diagnosed the heart disease cases through the matching score between the observation index sequence and HMM, and the overall accuracy was 85.9%. To further verify the reliability of the model. We used ANN and Decision Tree to diagnose heart disease cases after data processing. The overall accuracy of ANN and Decision Tree was 84.3% and 82.5%. The detailed classification results of the three models are shown in Table 2 (the model proposed in this paper is abbreviated as F-H).

From Table 2, we can see that among the three models, 92.39% of the F-H model, 87.18% of the decision tree model, and 85.47% of the ANN model have the highest diagnostic accuracy. The highest diagnostic accuracy of non-heart disease was 83.46% in the ANN model, 80% in the F-H model, and 78.29% in the Decision Tree model.

5. Result analysis and summary

We deconstruct and map the original 11 groups of index data, expand the data to 19 groups, and then extract the important features that determine heart disease from the data indicators through Fisher dimension reduction ST_ Slope UP, ChestPainType ASY, ExerciseAngia, Sex, RestingBP, and their importance ranking. Finally, the overall accuracy of the classification identified by HMM is 85.9%. Next, we will make some simple analysis of the results. We counted the number of patients with heart disease and non heart disease under several observation indicators, as shown in Table 3.

From Table 3, we can see that under the same observation index sequence, the number of patients with heart disease is equivalent to the number of non-heart patients. It leads to that even if the model parameters can fully fit the characteristics of the training data set, the accuracy of discrimination will not reach a very high level. It also shows that the original indicators cannot completely and accurately depict the characteristics of the heart disease population. ST_Slope-Flat-Down，ChestPainType yes, ExerciseAngina no, Male, RestingBP normal ST_Slope-Flat-Down，ChestPainType yes, ExerciseAngina yes, Male, RestingBP normal ST_Slope-Up，ChestPainType no, ExerciseAngina no, Male, RestingBP normal ST_Slope-Flat-Down，ChestPainType yes, ExerciseAngina no, Male, RestingBP high ST_Slope-Flat-Down，ChestPainType no, ExerciseAngina yes, Female, RestingBP normal 25 20 10 9 12

6.Acknowledgements 7.References

Thanks to the teachers, classmates, friends, and family who contributed to this article.

[1]

Wang

Tingting , Xing Dengxiang. Research on the progress of artificial intelligence in medical applications , J. Trauma and Critical Care Medicine , ( 2021 ).doi: 10 .16048/j.issn.2095- 5561 .

[2]

Gong

Gao , Huang Wenhua, Cao Shi, Chen Chaomin,. Research progress in the application of artificial intelligence in medicine J . Chinese Journal of Medical Physics , ( 2021 ): 1044 - 1047

[3]

D.A.Qiu

Shuang . Application of artificial intelligence diagnosis system based on DE Light framework in breast ultrasound , Master's thesis , University of Electronic Science and Technology of China, Chengdu, China, 2022 .

[4] Zong

Changfu

, Yang

Xiao

, Wang

Chang

, Zhang Guangcai. Driver's driving intention identification and behavior prediction during vehicle steering J . Journal of Jilin University (Engineering Edition), ( 2009 ) 27 - 32 . doi: 10 .13229/j. cnki.jdxbgxb2009.s1.023

[5]

Sergios

Theodoridis , Machine Learning (Second Edition),Chapter 7 - Classification: a Tour of the Classics, Editor(s): Sergios TheodOrid , Academic Press, 2020 , Pages 301 -350, doi:10.1016/B978-0 -12-818803-3 . 00016 - 7 .

[6]

Shoba

Ranganathan , Encyclopedia of Bioinformatics and Computat-ional Biology ,Hidden Markov Models, Monica Franzese, Antonella Iuliano, Academic Press, 2019 ,Pages 753 -762, doi:10.1016/B978-0 -12-809633-8 . 20488 - 3 .