Interpretable Knowledge Mining for Heart Failure Prognosis Risk Evaluation Shaobo Wang1,2? , Guangliang Liu2 , Wenyan Zhu2 , Zengtao Jiao2 , Haichen Lv3 , Jun Yan2 , and Yunlong Xia3 1 Beijing University of Technology, Beijing, China 2 Yidu Cloud (Beijing) Technology Co Ltd., Beijing, China {shaobo.wang,guangliang.liu,wenyan.zhu,zengtao.jiao,jun.yan}@yiducloud.cn 3 Department of Cardiology, The First Affiliated Hospital of Dalian Medical University, Dalian, China lvhaichen@dmu.edu.cn,yunlongxia01@163.com Abstract. In this article, we propose a pipeline to mine interpretable knowledge from electronic health records (EHR) for the Heart Failure (HF) prognosis risk evaluation task. Mortality risk after first-diagnosis HF highly impacts patients’ life quality, and is helpful for physicians to efficiently monitor patients’ disease progress. How to mine medically reasonable and interpretable knowledge to assist physicians in evaluat- ing mortality risk is a non-trivial task. The proposed pipeline leverages a gradient-boosting-based predictive model to estimate the risk of HF prognosis, and discovers variables and decision rules from the predictive model. The mined knowledge is confirmed as interpretable and inspirable by physicians. Keywords: Heart Failure · Knowledge Mining · Interpretability. 1 Introduction HF is a clinical syndrome characterized by blood congestion in pulmonary cir- culation or systemic circulation, and/or by insufficient blood perfusion in organs and tissues. HF is the late stage of various heart diseases and it can lead to serious manifestation. With high mortality and readmission rate, the prognosis of HF is often not satisfactory, resulting in a certain medical burden. The incidence rate of HF, in modern society, has been increasing due to ag- ing of population, change of disease spectrum and improved survival rate of various cardiovascular diseases [27]. The prevalence of HF in developed coun- tries is 1.5%-2.0%, and, among people over 70 years old, the prevalence rate is higher than 10% [26]. To take China as an example, the average life expectancy ? The author is an intern at Yidu Cloud. Copyright © 2021 for this paper by its au- thors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 S. Wang et al. grows, more people, especially the elderly, are suffering from chronic diseases. This causes many complications in HF patients. Technically, poor prognosis con- ditions of HF cannot be avoided. The 1-year all-cause mortality rate and 1-year readmission rate were 7.2% and 31.9% in patients with chronic stable HF, and 17.4% and 43.9% in patients with acute HF, respectively [24] . Some biomarkers have been individually used to predict outcomes of HF, for ex- ample BNP, age, cystatin C, serum uric acid, D-Dimer, etc. [10, 28]. Traditional biomarkers closely related to cardiovascular mortality in the general population, such as body mass index (BMI), serum cholesterol, and blood pressure (BP), are found useful to predict outcomes of patients with Chronic Heart Failure (CHF) [5]. Because of the complexity of HF prognosis, analysis towards multi- biomarkers might be worthy. Multi-biomarker strategies are gaining interest in tasks like clinical assessment and risk stratification of HF patients. Previous studies have shown that multi-biomarkers contribute to higher prognostic accu- racy than an individual one [8]. How multi-biomarkers can affect the mortality rate of HF patients deserves fur- ther investigation. Predictive models (PM) based on machine learning have ad- vantages in portraying interactions between biomarkers. Another significant issue of medical applications is interpretability. Therefore we employ an interpretable predictive model(IPM) to mine medical knowledge. When it comes to HF, widely recognized prognostic guidelines are not available, and current research of medi- cal knowledge mining(MKM) in this field is not sufficient either. The motivation behind MKM for HF prognosis is to discover knowledge that can be a good sup- plement to medical guidelines. To the best of our knowledge, this is the first work regarding knowledge mining for 1-year in-hospital HF mortality risk evaluation. Our contributions are: – We apply an interpretable model to understand the decisions made by pre- dictive models for the task of 1-year in-hospital mortality risk evaluation of HF patients. The extracted knowledge is human-understandable. – We set up a pipeline, incorporating medical expertise, to verify extracted knowledge, thereby the extracted knowledge is medically meaningful. – We design a knowledge filtering method to extract knowledge that is appli- cable in both angles of medical logic and statistical analysis. 2 Previous works 2.1 HF Risk Prediction In recent years, predictive modelling, a powerful risk prediction tool, has been gaining increasing interests in the study of cardiovascular diseases. Early and effective intervention according to risk evaluation is of great significance for HF patients [6, 31]. There are many studies on predicting the outcome of HF based on statistical or machine learning methods. [20, 21] proposed the Seattle HF Interpretable Knowledge Mining for Heart Failure Prognosis Risk Evaluation 3 Model (SHFM) to predict the mortality rate of HF patients. [19] made a pre- diction model to predict HF mortality called Enhanced Feedback for Effective Cardiac Treatment (EFFECT), and [10] used a Classification And Regression Tree (CART) model to predict the in-hospital mortality of acutely decompen- sated HF and made a risk stratification. Logistic Regression (LR) is often used in the research of HF prognosis [12], yet it fails to model the non-linear relations among features. [13] concluded that it was more reasonable to construct a non- linear model than LR. In our study, we consider a method of applying gradient- boosting-based algorithms to establish the one-year mortality prediction model of HF, and we choose Rulefit [11] in the subsequent medical knowledge mining task. 2.2 Medical Knowledge Mining MKM aims to extract meaningful patterns from medical datasets, and these pat- terns are expected to support physicians and patients in the process of screening, diagnosis, treatment, prognosis, health monitoring and management. A popular data source of MKM is EHR which records a patient’s routine in a hospital, for example demographic data, diagnosis, laboratory test results, nursing records and prescriptions. Compared to the general applications of knowledge mining, there are some specific difficulties in the study of MKM: data availability and data standardization [18]. Cancer, heart diseases and diabetes are the top 3 most common diseases that are considered in previous works, most of which focus on the diagnosis and prognosis stage [9]. [2] compared the performance of various machine learning models for the task of heart disease prediction. [3] applied social network analysis, text min- ing, temporal analysis and higher order feature construction to healthcare data analysis. [7] used 13 attributes, such as gender and blood pressure, to estimate the likelihood of a patient being diagnosed with heart disease. [30] evaluated the performance of machine learning algorithms in four benchmark prediction tasks and suggested that recurrent neural networks achieved the most promising results in mortality prediction. According to [15], 64% of cardiology studies are devoted to classification techniques, and predictive modeling is the second most popular technique. Previous research mainly focus on diagnosis-related tasks or onset risk evalua- tion tasks. Refined-Clinical Knowledge Model (R-CKM) which is a tree-based PM could produce medical knowledge from EHR [14]. [4] mined some knowledge through the R-CKM and made it possible to enrich and optimize the medical guidelines of HF diagnosis. [22] developed a medical knowledge mining pipeline based on temporal pattern mining for early detection of Congestive HF, and the mined patterns can make more accurate predictions than PM [22]. In contrast to previous studies, we interpret extracted patterns through incorporating physi- cians. The mined knowledge in our work not only conforms to medical common sense, but also gives supportive evidence to unverified hypotheses. 4 S. Wang et al. 2.3 Interpretable Predictive Model Interpretable machine learning(IML) has been a hot topic in current machine learning communities, especially due to the popularity of deep learning models. There is no clear mathematical definition of interpretability, though a natural language definition by Miller is ‘Interpretability is the degree to which a human can understand the cause of a decision’ [25]. Some methods, such as neural network, have very high ability of feature abstraction and nonlinear fitting, yet the intermediate process is a black box. Medical experts would view these algo- rithms with suspicion. IML can be categorized into three groups: interpretable models, model-agnostic methods and example-based explanation methods. Interpretable models include algorithms that are interpretable themselves, for instance linear regression, lo- gistic regression and the decision tree. The RuleFit model employed in this work is one of interpretable models as well [11]. Model-agnostic methods enjoy high flexibility because they can be applied to any models, but they might influ- ence models’ performance adversely, like SHAP [38]. LIME [34], Anchor [35] and LORE [39] are representative model-agnostic methods that could generate rule-based explanations, but they can only achieve local interpretability. In the medical domain, data is represented in a structured format, thereby example- based explanation methods can not work [36]. Tree-based machine learning has been widely used in medical research as a rea- son of its self-interpretability, and the decision making process is humanly under- standable. On the other hand, physicians are interested in figuring out the role of key variables at the population level. For instance, patients whose BNP is above 35 ng/L and NYHA is less than 40% will be diagnosed as HFrEF, while another patient whose NYHA level is larger than 49% might be diagnosed with HFpEF even though his/her NYHA level is around 90%. Given the aforementioned rea- sons, we employed RuleFit in this work. RuleFits consists of two components: a tree model and a linear model. The tree model implements classification or regression tasks, and associated decision rules are extracted from the learned model. Then the extracted decision rules, together with original features, would be fitted into the linear model. 3 Method 3.1 Data In this study, we retrospectively collected EHR data of hospitalized patients di- agnosed with HF between December 2010 and August 2018. Included patients are over 18 years old and were diagnosed with HF according to diagnosis guide- lines. We used off-the-shelf natural language processing tools to structrize and standardrize collected raw data. Finally, we enrolled 13,602 patients with HF, and 537 (3.95%) died within 1 year. Interpretable Knowledge Mining for Heart Failure Prognosis Risk Evaluation 5 3.2 Outcomes Firstly, we build a predictive model to predict the mortality risk of HF patients. Any patients with a clear hospital death record within one year are labelled with high risk, and the others are considered low risk. Then, we interpret knowledge learned by the predictive model through calculating feature importance. 3.3 Feature engineering Variables with filling rate greater than 80% were selected, and, finally, a total of 73 features were extracted, including demographics (age or sex), living habits (smoking or drinking), previous medical history (comorbidities or surgery), eti- ology, vital signs, routine laboratory examinations, interventions and admission medications. Given the normal range of each laboratory test item from hospital, we discretize continuous features through labelling them into three tags(lower , normal and higher ). For example, the normal range of white blood cell (WBC) is [3.5- 9.5]109 /L, and it will be transferred to ‘lower’ if WBC<3.5. This is because qualitative values are more meaningful than quantitative values in the process of making clinical assessment. Generally speaking, physicians focus far more on what items are abnormal. Some continuous features will be labelled with only two categories, such as Ba- sophils (BASO), the normal range of BASO is (0,0.06]109 /L, and it falls to two tags (normal and higher). In order to achieve human-understandable inter- pretability from RuleFit, we adopt one-hot feature representations, examples of one-hot features are shown in Table 1. After normalization, missing values are fixed through calculating mean values (for continuous features like age) or mode numbers (for one-hot features). Feature One-hot Representation Normal Range Raw Data WBC low (1,0,0) [3.5,9.5] WBC<3.5 WBC normal (0,1,0) [3.5,9.5] 3.5≤WBC≤9.5 WBC high (0,0,1) [3.5,9.5] WBC>9.5 BASO normal (1,0) (0,0.06] BASO≤0.06 BASO high (0,1) (0,0.06] BASO>0.06 Table 1. One-hot feature representation examples 3.4 Predictive Modelling We compare the performance of several predictive models: Logistic Regres- sion(LR), Support Vector Machine(SVM), Gradient Boosting Decision Tree(GBDT) 6 S. Wang et al. and RuleFit. LR is generally considered as a baseline model for classification tasks, and SVM shows good performance in binary classification tasks. RuleFit applies a GBDT in the decision rules generation, so a pure GBDT model is trained to make comparisons. The prediction outcome of RuleFit is defined as  F (x) = 1/ 1 + e−g(x) PK Pn g(x) = aˆ0 + k=1 aˆk rk (x) + j=1 bˆj lj (xj ) rk (x) is the feature representation of kth rule, and xj is the original feature. Considering normalization of input feature xj , it is transformed into lj (xj ): min(δj+ , max(δj− , xj )). δj+ and δj− are respectively the upper and lower quan- tiles of feature xj , then linear parameters can be available with PN PK Pn aˆk , bˆj = arg minak ,bj i=1 Loss(yi , F (x)) + λ( k=1 |ak | + j=1 |bˆj |) Procedures for choosing the regularization coefficient λ are discussed in [37]. In this work, a squared-error ramp loss is used to achieve better robustness against out-of-distribution cases. The loss function is defined as Loss(yi ,F(xi )) = [yi − max(−1, min(1,F(xi )))]2 3.5 Knowledge Mining Decision Rules Extraction We decompose the trained GBDT into decision rules: any path through the root nodes in trees can be converted into decision rules. The representation of the rule is as: IF x1 > 60 and x2 = 1 and x3 = 0 THEN 1 ELSE 0. Rules rm are defined Q rm (x) = j∈Tm I(xj ∈ sj m ) Where Tm is the set of features used in the m-th tree, I(.) is the indicator function that is 1 if feature xj is in the subset of the value sjm and 0 otherwise. An example of the rule is like r256 (x) = I(is digoxin)I(BIL is high)I(HGB is not low) r256 (x) is 1 if and only if all the three conditions above are met. The total number PM of rules derived from the GBDT model K is: m=1 2(tm − 1) where tm is the number of terminal nodes within the mth tree. Feature Importance RuleFit p calculates the importance of the rule feature as: k-th rule-feature Ik = |αˆk | sk (1 − sk ), where the first term αˆk is the coefficient value calculated p above, measuring the estimated predictive relevance. And the second term sk (1 − sk ) represents the standard deviation, in order to mitigate PN the impact of features’ scales. sk = N1 i=1 rk (xi ) is the support on the training data, which means the proportion of data point where the specific decision rule k can applies on training data. The importance of original feature is calculated as: Ij = |bˆj |std(lj (xj )). std(lj (xj )) is the standard deviation of lj (j ). Ik and Ij measure global feature importance. An advantage of RuleFit is no variables or rules are dropped before being fitted into a linear model, while inspirable decision rules or variables might be excluded if some of them have been removed for the purpose of dimensional reduction. Interpretable Knowledge Mining for Heart Failure Prognosis Risk Evaluation 7 Knowledge Filtering The linear model of RuleFit takes both original features and features constructed through decision rules into account. The final feature space is large and is difficult to interpret. It is a straightforward way to rank de- cision rules and individual variables by feature importance. However, there are no guarantees that all extracted knowledge are consistent with medical logic. To maintain reasonable knowledge, we design a filtering method, incorporating medical experts, to revise extracted knowledge. As shown in table 2, there are three kinds of criteria. Firstly, decision rules associated with calculated mortal- ity rate, from original data, lower than 7.2% are rejected, this is because we far more concern decision rules that are likely to cause death. Secondly, reserved rules would be ranked by feature importance, and two cardiologists review them and mark them by two labels: is content with medical common sense and is in- spirable. The difference of their reviewing results would be re-checked by a more experienced cardiologist to reach a final decision. All rules marked as NO with the first label would be filtered out. The second label regrading inspiration aims to detect knowledge that might be supportive to unverified medical hypotheses. Criteria Priority Primary Mortality 1 Yes Medical Expertise 1 Yes Feature Importance 2 No Table 2. Knowledge filtering criteria 3.6 Evaluation Metrics We adopt sensitivity, specificity, accuracy, AUC and ROC to evaluate perfor- mance of predictive models. The first two metrics are popular in the medical domain, and they can mathematically describe the performance of predictive models on high risk patients and low risk patients. 4 Experimental Results 4.1 Experimental setup In our study, we divide the original dataset into training set and testing set with a ratio of 7:3 (training set with 9521 samples and testing set with 4081 samples). To address the data imbalance issue, we employ the Boarderline SMOTE algo- rithm [40] which is a over-sampling method generating samples for the minority class. We implement 10-fold cross validation on the training set. In terms of the Gradient Boosting Classifier, we train 100 classifiers, and the depth of each tree model is set to 3, and nodes will not split if there are less than 182 sam- ples associated with them, the minimum number of samples required to be at a 8 S. Wang et al. leaf node is set to 15. The decision rules are integrated into additional feature sets, combined with original feature sets, work as input to the logistic regression model. 4.2 Performance of the Predictive Models Fig. 1. ROC curves of PMs Figure 1 shows ROC curves of four predictive models, the results of evaluation metrics are available in Table 3. According to Figure 1, RuleFit achieves a AUC of 0.92 and outperforms other models with a large margin, and SVM is better than LR and GBDT thanks to its outstanding performance in binary classifica- tion tasks. The tree model component of RuleFit implements feature interaction and selection, this proves the advantage of non-linear features. RuleFit enjoys the Model Accuracy Sensitivity Specificity AUC LR 0.83 0.66 0.84 0.83 SVM 0.85 0.63 0.85 0.83 GBDT 0.88 0.54 0.89 0.81 RuleFit 0.97 0.45 0.98 0.92 Table 3. Evaluation results Interpretable Knowledge Mining for Heart Failure Prognosis Risk Evaluation 9 best performance in both accuracy(0.97) and specificity(0.98), but its sensitivity value of 0.45 is dramatically lower than that of other models. The sensitivity value of LR(0.66) is the best among four PMs, and all models show worse sen- sitivity than their specificity values. This means they are not good at detecting high risk HF patients, but are more likely to make correct decisions for low risk HF patients. Another reason is the unbalanced source data. Rule Mortality (%) Importance |Coef| Death toll Mono% low: no Urea high: yes 19.74 0.047 0.23 92 CK-MB high: yes gender: female hs-cTnl high: yes 19.39 0.090 0.37 145 UA normal: no temperature low: yes Ca high : no 17.31 0.014 0.12 9 age>81.5 TP low: yes cardi-surgery history: yes 15.00 0.016 0.13 12 GGT normal: no PLT normal: yes age>81.5 12.96 0.142 0.39 245 Table 4. Examples of Mined Knowledge 4.3 Evaluation of Mined Knowledge After filtering procedures, there are 110 (34.38% of initial rules) valid rules left. Table 4 reveals top-ranked decision rules and their statistical characteristics. The average mortality rate corresponded with extracted rules is 6.26%, and 36 rules (11.25%) exceed the average Chinese HF mortality rate(7.2%). According to physicians’ evaluation, there is no any knowledge against medical common sense. For instance, the second rule in table 4, ‘gender: female hs-cTnI high: yes UA normal: no’, interprated as ”a female patient with high hs-cTni and abnormal UA”, is in line with previous findings. [17] confirms hs-cTnI as a useful biomarker for CHF patients and uric acid is an important prognostic marker for all-cause mortality of HF [29]. Also, some rules are of inspiration and are evident to unconfirmed medical hypotheses. For example, the rule ‘Mono% low: no Urea high: yes CK-MB high: yes’ demonstrates the significance of Mono% which has been proved by [32] relevant with the pathogenesis of car- diovascular diseases, but its impact on HF is unclear. The rule ’ALP high: yes ChE low: yes gender: male’ confirms that higher level of ALP and lower level of ChE are acceerative factors to the death of HF patients, whereas these two 10 S. Wang et al. biomarkers have not been seriousely considered before and are worthy of further investigation. A highly interesting discovery is the rule of ‘PLT normal: yes age >81.5’. It is well known and has been verified in [19] that age is a high-risky factor to HF. However, in clinical scenarios, normal level of PLT is generally not an influential factor not to mention interprating it a more important factor than age. The inverse association is referred as ”reverse epidemiology” or the ”risk factor paradox”, and it deserves more research. 5 Conclusion In this work, we test four predictive models on the task of HF prognosis risk evaluation, and incorporate RuleFit and medical expertise to mine knowledge in an interpratable manner. The extracted knowledge, screened by our knowledge filtering method, is reasonable statistically and medically. We found that detecting highly risky HF patients is difficult but predicting outcomes of HF patients with low risks is easier. Our filtering method is help- ful to reject unacceptable results, though it is not scalable. Some of extracted knowledge is valuable in providing statistical evidence to support physicians’ hypotheses, and is also inspirable in discovering novel variables that have not been considered before. 6 Future Work Despite the compelling results from our models, our work can be improved in several aspects. Firstly, rules embedded with intrinsic temporal dependencies are helpful to mine knowledge of different clinical stages. Secondly, RuleFit equally treats each decision rule derived from the tree model component, conversely their spatial dependencies should be taken into account in the linear model. Thirdly, more features can be considered, like the examinations of cardiac ultra- sound, cardiac synchronization therapy. Last but not least, scalably automatic evaluation on the extracted knowledge is non-trivial and indispensable. The dif- ficulty towards automatic evaluation of medical knowledge mining stems from the medical expertise behind it. We suggest a multi-task learning solution to mine knowledge from a wide range of heart diseases in order to decrease the usage of domain knowledge. References 1. Aronson D., Mittleman M A., Burger A J.: Elevated blood urea nitrogen level as a predictor of mortality in patients admitted for decompensated heart failure. The American journal of medicine, 116(7), 466–473 (2004) Interpretable Knowledge Mining for Heart Failure Prognosis Risk Evaluation 11 2. Bhatla N., Jyoti K.: An analysis of heart disease prediction using different data mining techniques. International Journal of Engineering 1(8), 1–4 (2012) 3. Chandola V., Sukumar S R., Schryver J C.: Knowledge discovery from massive healthcare claims data.Proceedings of the 19th ACM SIGKDD international con- ference on Knowledge discovery and data mining, 1312–1320 (2013) 4. Choi D J., Park J J., Ali T., et al.: Artificial intelligence for the diagnosis of heart failure. NPJ digital medicine 3(1), 1–6 (2020) 5. Curcio F., Sasso G., Liguori I., et al.: The reverse metabolic syndrome in the elderly: Is it a “catabolic” syndrome?. Aging clinical and experimental research 30(6), 547– 554 (2018) 6. Damen J A A G., Hooft L,, Schuit E,, et al.: Prediction models for cardiovascular disease risk in the general population: systematic review. bmj 353, (2016) 7. Dangare C S., Apte S S.: Improved study of heart disease prediction system using data mining classification techniques. International Journal of Computer Applica- tions 47(10), 44–48 (2012) 8. Demissei B G., Cotter G., Prescott M F., et al.: A multimarker multi-time point- based risk stratification strategy in acute heart failure: results from the RELAX- AHF trial. European journal of heart failure 19(8),1001–1010 (2017) 9. Esfandiari N., Babavalian M R., Moghadam A M E., et al.: Knowledge discovery in medicine: Current issue and future trend. Expert Systems with Applications 41(9), 4434–4463 (2014) 10. Fonarow G C., Adams K F., Abraham W T., et al.: Risk stratification for in- hospital mortality in acutely decompensated heart failure: classification and regres- sion tree analysis. Jama 293(5), 572–580 (2005) 11. Friedman J H., Popescu B E.: Predictive learning via rule ensembles. Annals of Applied Statistics 2(3), 916–954 (2008) 12. Frizzell J D., Liang L., Schulte P J., et al.: Prediction of 30-day all-cause readmis- sions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches. JAMA cardiology 2(2), 204–209 (2017) 13. Golas S B., Shibahara T., Agboola S., et al.: A machine learning model to predict the risk of 30-day readmissions in patients with heart failure: a retrospective analysis of electronic medical records data. BMC medical informatics and decision making 18(1), 1–17 (2018) 14. Hussain M., Afzal M., Ali T., et al.: Data-driven knowledge acquisition, validation, and transformation into HL7 Arden Syntax. Artificial intelligence in medicine 92, 51–70 (2018) 15. Kadi I., Idri A., Fernandez-Aleman J L.: Knowledge discovery in cardiology: A systematic literature review. International journal of medical informatics 97, 12–32 (2017) 16. Kalantar-Zadeh K., Block G., Horwich T., et al.: Reverse epidemiology of conven- tional cardiovascular risk factors in patients with chronic heart failure. Journal of the American College of Cardiology 43(8), 1439–1444 (2004) 17. Kawahara C., Tsutamoto T., Sakai H., et al.: Prognostic value of serial measure- ments of highly sensitive cardiac troponin I in stable outpatients with nonischemic chronic heart failure. American heart journal 162(4), 639–645 (2011) 18. Lavrač N., Zupan B.: Data mining in medicine. Data Mining and knowledge dis- covery handbook. Springer, Boston (2005) 19. Lee D S., Austin P C., Rouleau J L., et al.: Predicting mortality among patients hospitalized for heart failure: derivation and validation of a clinical model. Jama 290(19), 2581–2587 (2003) 12 S. Wang et al. 20. Levy W C., Mozaffarian D., Linker D T., et al.: The Seattle heart failure model. Circulation 113(11), 1424–1433 (2006) 21. Levy W C., Aaronson K D., Dardas T F., et al.: Prognostic impact of the addition of peak oxygen consumption to the Seattle Heart Failure Model in a transplant referral population. The Journal of heart and lung transplantation 31(8), 817–824 (2012) 22. Li J., Tan X., Xu X., et al.: Efficient mining template of predictive temporal clinical event patterns from Patient Electronic Medical Records. IEEE journal of biomedical and health informatics 23(5), 2138–2147 (2018) 23. Lv H., Yang X., Wang B., et al.: Machine Learning–Driven Models to Predict Prognostic Outcomes in Patients Hospitalized With Heart Failure Using Electronic Health Records: Retrospective Study. Journal of medical Internet research 23(4), e24996 (2021) 24. Maggioni A P., Dahlström U., Filippatos G., et al.: EURObservational Research Programme: regional differences and 1-year follow-up results of the Heart Failure Pilot Survey (ESC-HF Pilot). European journal of heart failure 15(7), 808–817 (2013) 25. Miller T.: Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence 267, 1–38 (2019) 26. Mosterd A., Hoes A W.: Clinical epidemiology of heart failure. heart 93(9), 1137– 1146 (2007) 27. van Riet E E S., Hoes A W., Wagenaar K P., et al.: Epidemiology of heart failure: the prevalence of heart failure and ventricular dysfunction in older adults over time. A systematic review. European journal of heart failure 18(3), 242–252 (2016) 28. Lauren S., Michael M G.: Acute heart failure. Trends in cardiovascular medicine 30(2), 104—112 (2020) 29. Tamariz L,. Harzand A., Palacio A., et al.: Uric acid as a predictor of all-cause mortality in heart failure: a meta-analysis. Congestive heart failure 17(1), 25–30 (2011) 30. Tang F., Xiao C., Wang F., et al.: Predictive modeling in urgent care: a comparative study of machine learning approaches. Jamia Open 1(1), 87–98 (2018) 31. Taslimitehrani V., Dong G., Pereira N L., et al.: Developing EHR-driven heart failure risk prediction models using CPXR (Log) with the probabilistic loss function. Journal of biomedical informatics 60, 260–269 (2016) 32. Wrigley B J., Lip G Y H., Shantsila E.: The role of monocytes and inflammation in the pathophysiology of heart failure. European journal of heart failure 13(11), 1161–1171 (2011) 33. Molnar C. Interpretable machine learning. Lulu. com, (2020) 34. Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?: Explaining the predictions of any classifier.” Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM (2016). 35. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. ”Anchors: High-Precision Model-Agnostic Explanations.” AAAI Conference on Artificial Intelligence (AAAI), 2018 36. Kim, Been, Rajiv Khanna, and Oluwasanmi O. Koyejo. ”Examples are not enough, learn to criticize! Criticism for interpretability.” Advances in Neural Information Processing Systems (2016). 37. Friedman J, Popescu B. Gradient directed regularization for linear regression and classification. In Tech rep Stanford University, Department of Statistics (2004) Interpretable Knowledge Mining for Heart Failure Prognosis Risk Evaluation 13 38. Lundberg S, Lee S I. A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874,(2017). 39. Guidotti R, Monreale A, Ruggieri S, et al. Local rule-based explanations of black box decision systems. arXiv preprint arXiv:1805.10820, (2018) 40. Han, H., Wang, W.-Y., Mao, B.-H. (2005). Borderline-SMOTE: a new over- sampling method in imbalanced data sets learning. In ICIC’05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I (pp. 878–887).