=Paper=
{{Paper
|id=Vol-1663/bmaw2016_paper_1
|storemode=property
|title=A Risk Calculator for the Pulmonary Arterial Hypertension Based on a Bayesian Network
|pdfUrl=https://ceur-ws.org/Vol-1663/bmaw2016_paper_1.pdf
|volume=Vol-1663
|authors=Jidapa Kraisangka,Marek J. Druzdzel,Raymond L. Benza
|dblpUrl=https://dblp.org/rec/conf/uai/KraisangkaDB16
}}
==A Risk Calculator for the Pulmonary Arterial Hypertension Based on a Bayesian Network==
A Risk Calculator for the Pulmonary Arterial Hypertension Based on a Bayesian Network Jidapa Kraisangka & Marek J. Druzdzel ⇤ Raymond L. Benza Decision System Laboratory, Advanced Heart Failure, Transplant, School of Information Sciences, MCS and Pulmonary Hypertension University of Pittsburgh, Allegheny Health Network Pittsburgh, PA Allegheny General Hospital Pittsburgh, PA Abstract 1 Introduction Pulmonary arterial hypertension (PAH) is a fatal, chronic, Pulmonary arterial hypertension (PAH) is a se- and life-changing disease originating from an increase in vere and often deadly disease, originating from pulmonary vascular resistance, and leading to high blood an increase in pulmonary vascular resistance. Its pressure in the lung (Benza et al., 2010; Subias et al., 2010). prevention and treatment are of vital importance Patients with PAH suffer from shortness of breath, chest to public health. A group of medical researchers pain, dizziness, fatigue, and possibly other symptoms de- proposed a calculator for estimating the risk of pending on the progression of disease (Hayes, 2013). Cur- dying from PAH, available for a variety of com- rently, there is no cure for PAH and treatment is often deter- puting platforms and widely used by health-care mined based on the symptoms. With an early diagnosis and professionals. The PAH Risk Calculator is based proper treatment, patients’ lives can be extended by five or on the Cox’s Proportional Hazard (CPH) Model, more years. a popular statistical technique used in risk esti- mation and survival analysis, based on data from With the long-term goal to characterize the clinical course, a thoroughly collected and maintained Registry treatment, and predictors of outcomes in patients with PAH to Evaluate Early and Long-term Pulmonary Ar- in the United States, a group of medical researchers es- terial Hypertension Disease Management (RE- tablished a Registry to Evaluate Early and Long-term Pul- VEAL Registry). In this paper, we propose an monary Arterial Hypertension Disease Management (RE- alternative approach to calculating the risk of VEAL Registry) (Benza et al., 2010). The REVEAL reg- PAH that is based on a Bayesian network (BN) istry is quite likely the most comprehensive collection of model. Our first step has been to create a BN data of patients suffering from PAH and it has led to in- model that mimics the CPH model at the foun- teresting insights improving the diagnosis, prediction, and dation of the current PAH Risk Calculator. The treatment of PAH. One of the prominent applications of BN-based calculator reproduces the results of the the REVEAL Registry is the PAH Risk Calculator (Benza current PAH Risk Calculator exactly. Because et al., 2012), a statistical model learned from the REVEAL Bayesian networks do not require the somewhat Registry data and predicting the survival of patients at risk restrictive assumptions of the CPH model and for PAH. A computer implementation of the PAH Risk can readily combine data with expert knowledge, Calculator is available for a variety of computing plat- we expect that our approach will lead to an im- forms and widely used by health-care professionals (see provement over the current calculator. We plan http://www.pah-app.com/ for more information). to (1) learn the parameters of the BN model from The PAH Risk Calculator is based on the Cox’s Propor- the data captured in the REVEAL Registry, and tional Hazard (CPH) model (Cox, 1972), a popular statisti- (2) enhance the resulting BN model with medi- cal technique used in risk estimation and survival analysis. cal expert knowledge. We have been collaborat- One weakness of this approach is that the underlying model ing closely on both tasks with the authors of the can be only learned from data and is not readily amenable original PAH Risk Calculator. to refinement based on expert knowledge. Another possible weakness is that the CPH model rests on several assump- tions simplifying the interactions between the risk factors ⇤ Also Faculty of Computer Science, Bialystok University and the disease. While these assumptions are reasonable of Technology, Bialystok, Poland and the CPH model has been successfully used for decades, BMAW 2016 - Page 1 of 59 it is interesting to question them with a possible benefit in PAH Risk Factors terms of model accuracy. Risk can be defined as the rate of an occurrence of a par- In this paper, we propose an alternative approach to calcu- ticular disease or adverse event (Irvine, 2004). Although lating the risk of PAH that is based on a Bayesian network PAH can occur at any age, in any races, and any ethnic (BN) (Pearl, 1988) model. BNs are acyclic directed graphs background (Hayes, 2013), there are risk factors that make in which vertices represent random variables and directed some people more susceptible. For example, females are edges between pairs of vertices capture direct influences at least two and a half times more susceptible than men to between the variables represented by the vertices. A BN idiopathic PAH. Recently, medical care professionals treat- captures the joint probability distribution among a set of ing PAH have relied on existing patient registries to under- variables both intuitively and efficiently, modeling explic- stand PAH better. Several risk factors have been identified itly independences among them. A representation of the and used to develop prognostic models for guiding their joint probability distribution allows for calculation of prob- therapeutic decision making. For example, a study based ability distributions that are conditional on a subset of vari- on the Registry to Evaluate Early and Long-Term Pul- ables. This typically amounts to calculating the probability monary Arterial Hypertension Disease Management (RE- distributions over variables of interest given observations of VEAL) (Benza et al., 2010) extracted several demographic, other variables (e.g., probability of one-year survival given functional, laboratory, and hemodynamic parameters asso- a set of observed risk factors). There is a well developed ciated with patient survival in PAH (Benza et al., 2012) by theory expressing the relationship between causality and means of a multivariate Cox’s proportional hazard model probability and often the structure of a BN is given a causal (CPH) (discussed in more detail in the following section). interpretation. This is utmost convenient in terms of user By developing a prognosis model, physician can access interfaces, notably knowledge acquisition and explanation a short-term and long-term patient survival in the context of results. The first step in our work has been to create a of current treatment and clinical variables (Benza et al., BN model that mimics the CPH model at the foundation of 2012). Although prognostic tools for patient survival have the current PAH Risk Calculator. In this, we use the BN improved the quality of predictions, the models are still im- interpretation of the CPH model proposed by Kraisangka perfect and more research is needed on improving them. and Druzdzel (2014). Our BN-based calculator reproduces the results of the current PAH Risk Calculator exactly. Cox’s Proportional Hazard Model Because Bayesian networks do not require the assumptions Hazard is a measure of risk at a small time interval t, which of the CPH model and can readily combine data with ex- can be considered as a rate (Allison, 2010). In survival pert knowledge, we expect that our approach will even- analysis, the hazard function can be represented by prob- tually lead to an improvement over the current PAH Risk ability distributions (e.g., exponential distribution) or can Calculator. Our mid- to long terms plans include (1) learn- be modeled by regression techniques. The Cox’s propor- ing the parameters of the BN model directly from the data tional hazard model (CPH) (Cox, 1972) is a set of regres- captured in the REVEAL Registry, and (2) enhancing the sion methods used in the assessment of survival based on its resulting BN model with medical expert knowledge. We risk factors or explanatory variables. The probability of an are collaborating on both tasks with the team maintaining individual surviving beyond time t can be estimated with the REVEAL Registry and the authors of the original PAH respect to a hazard function (Allison, 2010). As defined Risk Calculator. originally by Cox (1972), the hazard regression model is The remainder of this paper is structured as follows. Sec- expressed as tion 2 describes the problem of PAH, the CPH model, and 0 ·X the PAH Risk Calculator. Sections 3 and 4 describe appli- (t) = 0 (t) exp . (1) cation of Bayesian networks to risk estimation and the pro- posed BN-based PAH Risk Calculator. Finally, Section 5 This hazard model is composed of two main parts: the describes our conclusions and future work. baseline hazard function, 0 (t), and the set of effect pa- rameters, 0 · X = 1 X1 + 2 X2 + ... + n Xn . The baseline hazard function determines the risks at an under- lying level of explanatory variables, i.e., when all explana- 2 Pulmonary Arterial Hypertension tory variables are absent. The s are the coefficients corre- sponding to the risk factors, X. According to Cox (1972), this 0 (t) can be unspecified or can follow any distribution and be estimated from data. This section introduces some facts related to the pulmonary arterial hypertension (PAH), notably its risk factors, the The application of the CPH model relies on the assumption Cox’s Proportional Hazard (CPH) model, and the PAH that the hazard ratio of two observations is constant over Risk Calculator based on the CPH model. time (Cox, 1972). For example, a hazard ratio of a group of BMAW 2016 - Page 2 of 59 PAH patients having renal insufficiency to a group of PAH Risk factors Xi exp( ) APAH-CTD 0.7737 1.59 without renal insufficiency (control/baseline group) is esti- FPAH 1.2801 3.60 mated as 1.90. This assumption means that patients with APAH-PoPH 0.4624 2.17 renal insufficiency always have a 90% higher risk for dy- Male >60 years age 0.7779 2.18 ing from PAH than patients without renal insufficiency by Renal insufficiency 0.6422 1.90 Cox’s assumptions. The ratio of two hazards is defined as NYHA Class I -0.8740 0.42 NYHA Class III 0.3454 1.41 : NYHA Class IV 1.1402 3.13 2 (t) exp ( 0 X2 ) SBP <110 mmHg 0.5128 1.67 = = . (2) 1 (t) exp ( 0 X1 ) Heart Rate >92bmp 0.3322 1.39 6MWD 440 m -0.5455 0.58 If the risk factors X are binary, their value could be ex- 6MWD <165 m 0.5210 1.68 pressed as presence (X = 1) or as absence or baseline BNP <50 pg/ML -0.6922 0.50 (X = 0) of the risk factor. Once, we know the hazard ra- BNP >180 pg/ML 0.6791 1.97 tio of one group toward another group, we can estimate the Pericardial effusion 0.3014 1.35 survival probability (Casea et al., 2002) by % Dlco 80% -0.5317 0.59 % Dlco 32% 0.3756 1.46 mRAP > 20 mmHg 0.5816 1.79 S (t) = S0 (t) . (3) PVR >32 Wood units% 1.4062 4.08 S0 (t) is the baseline survival probability estimated from Table 1: A list of 19 binary risk factors, their correspond- the data, i.e., when all risk factor are absent or at their base- ing coefficients , and hazard ratio exp( ) reported for the line value (X = 0) at any time t, while is hazard ratio of PAH REVEAL system (Benza et al., 2010). an interested group to the baseline group. In other words, the survival probability of any patients relative to the base- line group can be estimated from To be able to summarize from the model, patients were exp 0 ·X stratified into five risk groups according to their range of S (t) = S0 (t) . (4) survival probability (Benza et al., 2010) including the low risk group where the predicted 1-year survival probability An example of CPH model used as a prognosis model for > 95%, average risk with 90% to 95% survivals, moder- PAH patients is from the REVEAL Registry Risk Score ately high risk with 85% to 90% survivals, high risk with Calculator (Benza et al., 2012). The model, including 19 70% to 85% survival, and very high risk group with sur- risk factors, was developed to predict a one-year survival vival probability < 70%. probability. The main survivor function is exp 0 ·X PAH Calculator S(t = 1) = S0 (1) , (5) Based on the CPH model, the further application of the where S0 (1) is the baseline survivor function of 1 year CPH model is in the form of a risk calculator. This sim- (0.9698) and in this equation is the shrinkage coefficient plified calculator are useful in everyday clinical practice after model calibration (0.939) (Benza et al., 2010). The by helping physicians to decide patient therapies based on risk factors X (listed in Table 1) included PAH associated level of risk (Benza et al., 2012). The calculator was de- with portal hypertension (APAH-PoPH), PAH associated signed from assigning score to variables according to their with connective tissue disease (APAH-CTD), family his- hazard ratio. For the risk factors associated with increas- tory of PAH (FPAH), modified New York Heart Associa- ing mortality (positive coefficients), score of two points tion (NYHA)/World Health Organization(WHO)functional were assigned for the risk factors which has their hazard class I, III, and IV, men aged > 60, renal insufficiency, ratio (exp( )) at least two or more folds, i.e., those with systolic blood pressure(SBP) < 110 mm Hg, heart rate exp( ) 2 , and one point were assigned for other risk > 92 beats per min, mean right atrial pressure (mRAP) factors. Risk factors associated with decreasing mortality > 20 mm Hg, 6-minute walking distance(6MWD), brain (negative coefficients) were assigned a negative score. natriuretic peptide (BNP)> 180 pg/ml, 165 m, brain na- Figure 1 shows all risk factors and the interpretation of their triuretic peptide (BNP), 180 pg/mL, pulmonary vascular hazard ratio rate. resistance(PVR)> 32 Wood units, percentage predicted diffusing capacity of lung for carbon monoxide (Dlco) Figure 2 shows the user interface of the PAH Risk Calcu- 32%, and presence of pericardial effusion on echocar- lator. Each risk factor from the CPH model is listed and diogram. Most of the risk factors were associated with in- mapped with the score. The calculator allows for adding creasing mortality rate (indicated by positive sign in in and subtracting the score based on the data entered for an Table 1), while only four factors were associated with in- individual patient case. To avoid a negative total score, the creased one-year survival (indicated by negative sign in base score of 6 is set as a starting score. The total score in Table 1). is interpreted in the same way as the survival probability BMAW 2016 - Page 3 of 59 Figure 2: PAH risk score calculator (Benza et al., 2012) (electronic version developed by the United Therapeutics Figure 1: Cox’s proportional-hazards of 1-year PAH pa- Europe Limited) tients survival variables (Benza et al., 2010) indicating in- creasing/decreasing mortality rate for each risk factor conditional probability distributions, useful for prognosis and diagnosis, including medical decision support systems given by the CPH model, i.e., it includes the low risk group (Husmeier et al., 2005). with the score 7, average risk with score = 8, moderately high risk score = 9, high risk with score between 10 and To estimate risks using Bayesian network, the prognosis 11, and very high risk group with score 12. The score, can be created as a static model, i.e., it can predict the defined as above, makes it simpler for health care providers survival at a future point in time. For example, the work to use than probabilities. of Loghmanpour et al. (2015) focuses on risk assessment models for patients with the left ventricular assist devices (LVADs). Bayesian network have been shown to estimate 3 Application of Bayesian Networks to Risk the risk at various points in time (including 30 days, 90 Calculation days, 6 months, 1 year, and 2 years) with accuracy higher than traditional score-based methods (Loghmanpour et al., An alternative approach to the traditional survival analy- 2015). An alternative, more complex approach could use sis is the use of Bayesian networks (Pearl, 1988) to esti- dynamic Bayesian networks (DBN), which are an exten- mate risks. Compared to the CPH model and several other sion of Bayesian networks modeling time explicitly. van Artificial Intelligence and Machine Learning techniques, a Gerven et al. (2007) implemented a DBN for prognosis of Bayesian network can model explicitly the structure of the patients that suffer from low-grade midgut carcinoid tumor. relationships among explanatory variables with their prob- Instead of treating risk factors independently at each time ability (Hanna and Lucas, 2001). A Bayesian network can point, the DBN model considered how the state of patient be built from expert knowledge, available data, or combina- changed under the influence of choices made by physicians. tion of both. If there exists a probabilistic interpretation of This model was shown suitable to temporal nature of medi- existing modeling tool, like in case of the CPH model, a BN cal problems throughout the course of care and provide de- model can also be an interpretation of the existing model. tailed prognostic predictions. However, DBNs requires ad- The structure of a Bayesian network can depict a complex ditional effort during model construction, for example ex- structure of a problem and provide a way to infer posterior pertise to structure of temporal interaction, large amounts BMAW 2016 - Page 4 of 59 of (complete) data, which translates to time-consuming ef- of each variable by configuring states of other risk factors forts (van Gerven et al., 2007). to be absent. For example, the hazard ratio of a risk factor xj can be estimated from 4 Bayesian Network PAH Risk Calculator log(P r(s |x̄1 , . . . , x̄j 1 , xj , x̄j+1 , . . . , x̄n )) = . (7) log(P r(s |x̄1 , . . . , x̄j 1 , x̄j , x̄j+1 , . . . , x̄n )) BN Cox model With no access to the REVEAL Registry data, we created The term log(P r(s |x̄1 , . . . , x̄j 1 , x̄j , x̄j+1 , . . . , x̄n )) is a Bayesian network model that is a formal interpretation of similar to the baseline survival probability in the CPH the CPH model, for which the parameters are reported in model (S0 (1) = 0.9698). Hence, with this equation, we the literature (Benza et al., 2010). To this effect, we used can track back all hazard ratios. the method proposed by Kraisangka and Druzdzel (2014). We use the same criteria as the original PAH Risk Calcu- We first created a Bayesian network structure by using all lator to convert the hazard rate to the score, i.e., score of risk factors of the PAH CPH models. We converted all bi- 2 indicates at least two-fold increase in risk of mortality nary risk factors to random variables, which were the par- compared to the baseline risk. ents of the survival node. In our case, we have omitted the time variable, as the purpose of the PAH Risk Calculator Figure 4 shows a screen shot of our prototype of the is to capture the risk at one point in time (in this case, it Bayesian network risk calculator. The left-hand pane al- is one year). Figure 3 shows the structure of the BNCox lows for entering risk factors for a given patient case. The model for the BN-based calculator. right-hand pane shows the calculated score and survival probabilities. Currently, our calculator is a Windows app running on a local server. The numerical risks that pro- duced by the BN calculator are identical to those of the original CPH-based PAH Risk Calculator (Benza et al., 2012). Figure 3: A Bayesian network representing the interaction among variables for the PAH CPH model. All random vari- ables are from the original PAH CPH model and the Sur- vival node was added to capture the survival probabilities from the CPH model. In the next step, we created the conditional probability ta- ble for the survival node. The survival probabilities from a CPH model can be encoded into the conditional probabili- ties as 0 e( X i ) P r(s | Xi , T = t) = S0 (t) , (6) where s means the state of survived in the survival node, Xi are all risk factors, T is the time point which is 1 in this Figure 4: A prototype for Bayesian network risk score cal- case. culator for a 1-year PAH prognosis model. The left-hand pane allows for entering risk factors for a given patient We configured all risk factors cases (all binary risk factors case. The right-hand pane shows the calculated score and generated 219 cases) and obtained all survival probabilities survival probabilities. filled in the conditional probability table of a survival node. This allowed us to reproduce fully the PAH CPH model by means of a Bayesian network. 5 Conclusions and Future Work BN Interpretation of the PAH Calculator In this paper, we propose an alternative the the exist- The original PAH Risk Calculator uses the hazard ratios ing Pulmonary Arterial Hypertension (PAH) Risk Calcu- in the CPH model to derive the risk score for the calcula- lator that replaces the original Cox Proportional Hazard tor (Benza et al., 2012). We apply the same approach in (CPH) model with a Bayesian network. Because we did our model. Equation 6 captures the survival probabilities s not have access to the REVEAL Registry data, we created given the states of risk factor. We can extract a hazard ratio a Bayesian network model that uses the CPH parameters BMAW 2016 - Page 5 of 59 learned from the REVEAL Registry data and available in score calculator in patients newly diagnosed with pul- the literature. To this effect, we used a Bayesian network monary arterial hypertension. Chest, 141(2):354–362. interpretation of the CPH model (Kraisangka and Druzdzel, Benza, R. L., Miller, D. P., Gomberg-Maitland, M., Frantz, 2014). R. P., Foreman, A. J., Coffey, C. S., Frost, A., Barst, Our calculator reproduces the results of the current PAH R. J., Badesch, D. B., Elliott, C. G., Liou, T. G., and Mc- Risk Calculator exactly. From this point of view, we have Goon, M. D. (2010). Predicting survival in pulmonary not yet offered a superior calculator. However, we plan to arterial hypertension: Insights from the Registry to Eval- refine the calculator by (1) learning the parameters of the uate Early and Long-term Pulmonary Arterial Hyper- BN model from the data captured in the REVEAL Reg- tension disease management (REVEAL). Circulation, istry, and (2) enhancing the resulting BN model with med- 122(2):164–172. ical expert knowledge. The extended model will relax the Casea, L. D., Kimmickb, G., Pasketta, E. D., Lohmana, K., assumption of the multiplicative character of interactions and Tucker, R. (2002). Interpreting measures of treat- between the risk factors and the survival variable. It will ment effect in cancer clinical trials. The Oncologist, also relax the assumption that the risk ratio is constant over 7(3):181–187. time. Another direction of our work is allowing risk vari- Cox, D. R. (1972). Regression models and life-tables. Jour- ables that are not binary. Instead of having 19 binary risk nal of the Royal Statistical Society. Series B (Method- factors, we will be able to group those risk factors that ological), 34(2):187–220. are mutually exclusive, e.g., WHO Group or NYHA/WHO Functional Class. As a result, we can control the number of Hanna, A. A. and Lucas, P. J. (2001). Prognostic models in risk factors and reduce complexities of the model. Yet an- medicine- AI and statistical approaches. Method Inform other direction is allowing dependencies between the risk Med, 40:1–5. factors, something that is not straightforward in the CPH Hayes, G. B. (2013). Pulmonary Hypertension: A Pa- model. We should be able to refine the Bayesian network tient’s Survival Guide - Fifth Edition, 2013 Revision. model by using expert knowledge or by training its ele- Pulmonary Hypertension Association. ments from available data. The current calculator produces Husmeier, D., Dybowski, R., and Roberts, S. (2005). Prob- a patient-specific score based on hazard ratio. Because the abilistic modeling in bioinformatics and medical infor- new Bayesian network model will no longer use the mul- matics. Springer. tiplicative CPH model, we plan to create new risk score criteria based on the probability of survival rather than the Irvine, E. J. (2004). Measurement and expression of risk: hazard ratio. We have little doubt that with some further optimizing decision strategies. The American Journal of modeling effort we should be able to obtain a superior cal- Medicine Supplements, 117(5):2–7. culator in the sense of producing higher accuracy of the risk Kraisangka, J. and Druzdzel, M. J. (2014). Discrete estimate than the original CPH-based risk calculator. Bayesian network interpretation of the Coxs propor- tional hazards model. In van der Gaag, L. C. and Acknowledgements Feelders, A. J., editors, Probabilistic Graphical Mod- els, volume 8754 of Lecture Notes in Computer Science, We acknowledge the support the National Institute of pages 238–253. Springer International Publishing. Health under grant number U01HL101066-01 and the Faculty of Information and Communication Technology, Loghmanpour, N. A., Kanwar, M. K., Druzdzel, M. J., Mahidol University, Thailand. Implementation of this work Benza, R. L., Murali, S., and Antaki, J. F. (2015). A is based on GeNIe and SMILE, a Bayesian inference en- new bayesian network-based risk stratification model for gine available free of charge for academic teaching and re- prediction of short-term and long-term lvad mortality. search use at http://www.bayesfusion.com/. While we take ASAIO Journal, 61(3):313–323. full responsibility for any remaining errors and shortcom- Pearl, J. (1988). Probabilistic reasoning in intelligent sys- ings of this paper, we would like to thank anonymous re- tems: networks of plausible inference. Morgan Kauf- viewers for their valuable suggestions. mann Publishers Inc., San Francisco, CA, USA. Subias, P. E., Mir, J. A. B., and Suberviola, V. (2010). Cur- References rent diagnostic and prognostic assessment of pulmonary hypertension. Revista Española de Cardiologı́a (English Allison, P. D. (2010). Survival Analysis Using SAS: A Prac- Edition), 63(5):583–596. tical Guide, Second Edition. SAS Institute Inc., Cary, van Gerven, M. A., Taal, B. G., and Lucas, P. J. (2007). NA. Dynamic Bayesian networks as prognostic models for Benza, R. L., Gomberg-Maitland, M., Miller, D. P., Frost, clinical patient management. Journal of Biomedical In- A., Frantz, R. P., Foreman, A. J., Badesch, D. B., and formatics, 41(4):515–529. McGoon, M. D. (2012). The REVEAL registry risk BMAW 2016 - Page 6 of 59