Using Features from Pre-trained TimeNet for Clinical Predictions Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, Gautam Shroff TCS Research, New Delhi, India {priyanka.g35, malhotra.pankaj, lovekesh.vig, gautam.shroff}@tcs.com, Abstract be an effective way to model the sequential aspects of EHR data, e.g. diagnoses [Lipton et al., 2015; Che et al., 2016; Predictive models based on Recurrent Neural Net- Choi et al., 2016], mortality prediction and estimating length works (RNNs) for clinical time series have been of stay [Harutyunyan et al., 2017; Purushotham et al., 2017; successfully used for various tasks such as phe- Rajkomar et al., 2018]. notyping, in-hospital mortality prediction, and di- However, training RNNs requires large labeled training agnostics. However, RNNs require large labeled data like any other deep learning approach, and can be com- data for training and are computationally expen- putationally inefficient because of sequential nature of com- sive to train. Pre-training a network for some su- putations. On the other hand, training a deep network on di- pervised or unsupervised tasks on a dataset, and verse instances can provide generic features for unseen in- then fine-tuning via transfer learning for a related stances, e.g. VGGNet [Simonyan and Zisserman, 2014] for end-task can be an efficient way to leverage deep images. Also, fine-tuning a pre-trained network with trans- models for scenarios that lack in either computa- fer learning is often faster and easier than constructing and tional resources or labeled data, or both. In this training a new network from scratch [Bengio, 2012]. The ad- work, we consider an approach to leverage a deep vantage of learning in such a manner is that the pre-trained RNN – namely TimeNet [Malhotra et al., 2017] network has already learned a rich set of features that can – that is pre-trained on a large number of diverse then be applied to a wide range of other similar tasks. publicly available time-series from UCR Reposi- tory [Chen et al., 2015]. TimeNet maps varying- Deep RNNs have been shown to perform hierarchical pro- length time series to fixed-dimensional feature vec- cessing of time series with different layers tackling different tors and acts as an off-the-shelf feature extractor. time scales [Hermans and Schrauwen, 2013; Malhotra et al., TimeNet-based approach overcome the need for 2015]. TimeNet [Malhotra et al., 2017] is a general-purpose hand-crafted features, and allows for use of tradi- multi-layered RNN trained on large number of diverse time tional easy-to-train and interpretable linear mod- series from UCR Time Series Archive [Chen et al., 2015] els for the end-task, while still leveraging the fea- (refer Section 3 for details) that has been shown to be use- tures from a deep neural network. Empirical evalu- ful as off-the-shelf feature extractor for time series. TimeNet ation of the proposed approach on MIMIC-III1 data has been trained on 18 different datasets simultaneously via suggests promising direction for future exploration: an RNN autoencoder in an unsupervised manner for recon- our results are comparable to existing benchmarks struction task. Features extracted from TimeNet have been while our models require lesser training and hyper- found to be useful for classification task on 25 datasets not parameter tuning effort. seen during training of TimeNet, proving its ability to pro- vide meaningful features for unseen datasets. In this work, we provide an efficient way to learn predic- 1 Introduction tion models for clinical time series by leveraging general- There has been a growing interest in using deep learning purpose features via TimeNet. TimeNet maps variable-length models for various clinical prediction tasks from Electronic clinical time series to fixed-dimensional feature vectors, that Health Records, e.g. Doctor AI [Choi et al., 2016] for med- are subsequently used for patient phenotyping and in-hospital ical diagnosis, Deep Patient [Miotto et al., 2016] to predict mortality prediction tasks on MIMIC-III database [Johnson et future diseases in patients, DeepR [Nguyen et al., 2017] to al., 2016] via easily trainable non-temporal linear classifica- predict unplanned readmission after discharge, etc. With var- tion models. We observe that TimeNet-based features can be ious medical parameters being recorded over a period of time used to build such classification models with very little train- in EHR databases, Recurrent Neural Networks (RNNs) can ing effort while yielding performance comparable to mod- els with hand-crafted features or carefully trained domain- 1 TimeNet-based features for MIMIC-III time series are available specific RNNs, as benchmarked in [Harutyunyan et al., 2017; on request from authors. Song et al., 2017]. Further, we propose a simple mechanism to leverage the weights of the linear classification models to zT : the decoder relies on zT as the only input to reconstruct provide insights into the relevance of each raw input feature the time series, forcing the encoder to capture all the rele- (physiological parameter) for a given phenotype (discussed in vant information in the time series into the fixed-dimensional Section 4.2). vector zT . This vector zT is used as the feature vector for in- put x1...T . This feature vector is then used to train a simpler 2 Related Work classifier (e.g. SVM, as used in [Malhotra et al., 2017]) for the end task. TimeNet maps a univariate input time series to TimeNet-based features have been shown to be useful for 180-dimensional feature vector, where each dimension corre- various tasks including ECG classification [Malhotra et al., sponds to final output of one of the 60 GRUs in the 3 recurrent 2017]. In this work, we consider application of TimeNet layers. to phenotyping and in-hospital mortality tasks for multivari- ate clinical time series classification. Deep Patient [Miotto et al., 2016] proposes leveraging features from a pre-trained 4 TimeNet Features for Clinical Time Series stacked-autoencoder for EHR data. However, it does not Consider a set D of labeled time series instances from an EHR leverage the temporal aspect of the data and uses a non- database: D = {(x(i) , y (i) )}Ni=1 , where x (i) is a multivariate temporal model based on stacked-autoencoders. Our ap- (i) time series, y ∈ {y1 , . . . , yC }, C is the number of classes, proach extracts temporal features via TimeNet incorporat- N is the number of unique patients (in our experiments, we ing the sequential nature of EHR data. Doctor AI [Choi et consider each episode of hospital stay for a patient as a sepa- al., 2016] uses discretized medical codes (e.g. diagnosis, rate data instance). In this work, we consider presence or ab- medication, procedure) from longitudinal patient visits via a sence of a phenotype as a binary classification task such that purely supervised setting while we use real-valued time se- C = 2. We learn an independent model for each phenotype ries. While approaches like Doctor AI require training a deep (unlike [Harutyunyan et al., 2017] which consider phenotyp- RNN from scratch, our approach leverages a general-purpose ing as a multi-label classification problem). This allows us to RNN for feature extraction. build simple linear binary classification models as described [Harutyunyan et al., 2017] consider training a deep RNN next in Section 4.1. In practice, the outputs of these binary model for multiple prediction tasks simultaneously includ- classifiers can then be considered together to estimate the set ing phenotyping and in-hospital mortality to learn a general- of phenotypes present in a patient. Similarly, mortality pre- purpose deep RNN for clinical time series. They show that diction is considered to be a binary classification task where it is possible to train a single network for multiple tasks si- the goal is to classify whether the patient will survive (after multaneously by capturing generic features that work across admission to ICU) or not. different tasks. We also consider leveraging generic features for clinical time series but using an RNN that is pre-trained 4.1 Classification using TimeNet features on diverse time series across domains, making our approach Feature Extraction for Multivariate Clinical Time Series more efficient. Further, we provide an approach to rank the raw input features in order of their relevance that helps vali- For a multivariate time series x = x1 x2 . . . xT , where xt ∈ date the models learned. Rn , we consider time series for each of the n raw input features (physiological parameters, e.g. glucose level, heart rate, etc.) independently, to obtain univariate time series 3 Background: TimeNet xj = xj1 xj2 . . . xjT , j = 1 . . . n. (Note: We use x instead of TimeNet [Malhotra et al., 2017] is a pre-trained off-the-shelf x(i) and omit superscript (i) for ease of notation). We obtain feature extractor for univariate time series with three recur- the vector representation zjT = fE (xj ; WE ) for xj , where rent layers having 60 Gated Recurrent Units (GRUs) [Cho zjT ∈ Rc using TimeNet as fE with c = 180 (as described in et al., 2014] each. TimeNet is an RNN trained via an au- Section 3). In general, time series length T also depends on i, toencoder consisting of an encoder RNN and a decoder RNN e.g. based on length of stay in hospital. We omit this for sake trained simultaneously using the sequence-to-sequence learn- of clarity without loss of generality. In practice, we convert ing framework [Sutskever et al., 2014; Bahdanau et al., 2014] each time series to have equal length T by suitable pre/post- as shown in Figure 1(a). RNN autoencoder is trained to obtain padding with 0s. We concatenate the TimeNet-features zjT the parameters WE of the encoder RNN fE via reconstruc- for each raw input feature j to get the final feature vector tion task such that for input x1...T = x1 , x2 , ..., xT (xi ∈ R), zT = [z1T , z2T , . . . , znT ] for time series x, where zT ∈ Rm , the target output time series xT ...1 = xT , xT −1 , ..., x1 is re- m = n × c as illustrated in Figure 1(b). verse of the input. The RNN encoder fE provides a non-linear mapping of Using TimeNet-based Features for Classification the univariate input time series to a fixed-dimensional vector The final concatenated feature vector zT is used as input for representation zT : zT = fE (x1...T ; WE ), followed by an the phenotyping and mortality prediction classification tasks. RNN decoder fD based non-linear mapping of zT to univari- We note that since c = 180 is large, zT has large number of ate time series: x̂T ...1 = fD (zT ; WD ); where WE and WD features m ≥ 180. We consider a linear mapping from input are the parameters of the encoder and decoder, respectively. TimeNet features zT to the target label y s.t. the estimate The model is trained to minimize the average squared recon- ŷ = w · zT , where w ∈ Rm . We constrain the linear model struction error. Training on 18 diverse datasets simultane- with weights w to use only a few of these large number of ously results in robust time series features getting captured in features. The weights are obtained using LASSO-regularized Output Decoder z’ Relevance score r1 r2 r3 ... rn Weights w Feature Vector zT z1T z2T z3T z4T ... znT Weights w W11,..,W1c W21,..,W2c W31,..,W3c ... Wn1,..,Wnc Feature Vector z z1T z2T z3T ... znT Embeddings T TimeNet Using RNN z11 z12 ... z1T z21 z22 ... z2T ... zn1 zn2 ... znT Time series (Encoder) z Encoder F1 F2 F3 ... Fn (Raw Features) x11 x12 ... x1T x21 x22 ... x2T xn1 xn2 ... xnT Time Series Input (Raw Features) F1 F2 ... Fn (a) (b) (c) Figure 1: (a) TimeNet trained via RNN Encoder-Decoder with three hidden GRU layers. (b) TimeNet based Feature Extraction. TimeNet is shown unrolled over time. (c) Obtaining relevance scores for raw input features. Here, T : time series length, n: number of raw input features. loss function [Tibshirani, 1996]: train, validation and test datasets2 based on 17 physiologi- N cal time series with 12 real-valued and 5 categorical time se- 1 X (i) (i) ries, sampled at 1 hour intervals. The categorical variables arg min (y − w · zT )2 + α||w||1 (1) w N i=1 are converted to one-hot vectors such that final multivariate time series has n = 76 raw input features (59 actual features Pn Pc where y (i) ∈ {0, 1}, ||w||1 = j=1 k=1 |wjk | is the L1 - and 17 masking features to denote missing values). norm, where wjk represents the weight assigned to the k-th For phenotyping task, the goal is to classify 25 pheno- TimeNet-feature for the j-th raw feature, and α controls the types common in adult ICUs. For in-hospital mortality task, extent of sparsity – with higher α implying more sparsity, i.e. the goal is to predict whether the patient will survive or not fewer TimeNet features are selected for the final classifier. given the time series observations up to 48 hours. In all our experiments, we restrict training time series data up to 4.2 Obtaining Relevance Scores for Raw Features first 48 hours in ICU stay, such that T = 48 while train- Determining relevance of the n raw input features for a given ing all models to imitate practical scenario where early pre- phenotype is potentially useful to obtain insights into the ob- dictions are important, unlike [Harutyunyan et al., 2017; tained classification model. The sparse weights w are easy Song et al., 2017] which use entire time series for training to interpret and can give interesting insights into relevant fea- the classifier for phenotyping task. tures for a classification task (e.g. as used in [Micenková et al., 2013]). We obtain the relevance rj of the j-th raw input 5.2 Evaluation feature as the sum of the absolute values of the weights wjk We have n = 76 raw input features resulting in m = 13, 680- assigned to the corresponding TimeNet features zjT as shown dimensional (m = 76 × 180) TimeNet feature vector for each in Figure 1(c), s.t. admission. We use α = 0.0001 for phenotype classifiers and c X use α = 0.0003 for in-hospital mortality classifier (α is cho- rj = |wjk |, j = 1 . . . n. (2) sen based on hold-out validation set). Table 1 summarizes the k=1 results and provides comparison with existing benchmarks. Refer Table 2 for detailed phenotype-wise results. Further, rj is normalized using min-max normalization such We consider two variants of classifier models for pheno- rj −rmin that rj0 = rmax −rmin ∈ [0, 1]; rmin is minimum of typing task: i) TimeNet-x using data from current episode, ii) {r1 , . . . , rn }, rmax is maximum of {r1 , . . . , rn }. In prac- TimeNet-x-Eps using data from previous episode of a patient tice, this kind of relevance scores for the raw features help as well (whenever available) via an additional input feature to interpret and validate the overall model. For example, one related to presence or absence of the phenotype in previous would expect blood glucose level feature to have a high rele- episode. Each classifier is trained using up to first 48 hours of vance score when learning a model to detect diabetes mellitus data after ICU admission. However, we consider two classi- phenotype (we provide such insights later in Section 5). fier variants depending upon hours of data x used to estimate the target class at test time. For x = 48, data up to first 48 5 Experimental Evaluation hours after admission is used for determining the phenotype. For x = All, the learned classifier is applied to all 48-hours 5.1 Dataset Details windows (overlapping with shift of 24 hours) over the en- We use MIMIC-III (v1.4) clinical database [Johnson et al., tire ICU stay period of a patient, and the average phenotype 2016] which consists of over 60,000 ICU stays across 40,000 probability across windows is used as the final estimate of critical care patients. We use same experimental setup as in [Harutyunyan et al., 2017], with same splits and features for 2 https://github.com/yerevann/mimic3-benchmarks Table 1: Classification Performance Comparison. Here, LR: Logistic regression, LSTM-Multi: LSTM-based multitask model, SAnD (Simply Attend and Diagnose): Fully attention-based model, SAnD-Multi: SAnD-based multitask model. (Note: *For phenotyping, we compare TimeNet-48-Eps with existing benchmarks over TimeNet-All-Eps as it is more applicable in practical scenarios. **Only TimeNet-48 variant is applicable for in-hospital mortality task.) [Harutyunyan et al., 2017] [Song et al., 2017] Proposed (Features using [Malhotra et al., 2017]) Metric LR LSTM LSTM-Multi SAnD SAnD-Multi TimeNet-48 TimeNet- TimeNet- TimeNet- All 48-Eps All-Eps* Task 1: Phenotyping Micro AUC 0.801 0.821 0.817 0.816 0.819 0.812 0.813 0.820 0.822 Macro AUC 0.741 0.77 0.766 0.766 0.771 0.761 0.764 0.772 0.775 Weighted AUC 0.732 0.757 0.753 0.754 0.759 0.751 0.754 0.765 0.768 Task 2: In-Hospital Mortality Prediction** AUROC 0.845 0.854 0.863 0.857 0.859 0.852 - - - AUPRC 0.472 0.516 0.517 0.518 0.519 0.519 - - - min(Se,+ P) 0.469 0.491 0.499 0.5 0.504 0.486 - - - 1.0 1.0 around 30 minutes for obtaining any of the binary classifiers while tuning for α ∈ [10−5 − 10−3 ] (five equally-spaced val- 0.8 0.8 ues) on a 32GB RAM machine with Quad Core i7 2.7GHz 0.6 0.6 processor. 0.4 0.4 We observe that LASSO leads to 96.2 ± 0.8 % sparsity (i.e. 0.2 0.2 percentage of weights wjk ≈ 0) for all classifiers leading to 0.0 10 20 30 40 50 0.0 10 20 30 40 50 around 550 useful features (out of 13,680) for each phenotype (a) P1 (b) P2 classification. Figure 2: Feature relevance after LASSO. x-axis: Feature Number, y-axis: Relevance Score. Here, P1 : Diabetes Mellitus with Compli- Relevance Scores for Raw Input Features cations, P2 : Essential Hypertension. We observe intuitive interpretation for relevance of raw in- put features using the weights assigned to various TimeNet the target class. In TimeNet-x-Eps, the additional feature is features (refer Equation 2): For example, as shown in Fig- related to the presence (1) or absence (0) of the phenotype ure 2, we obtain highest relevance scores for Glucose Level during the previous episode. We use the ground-truth value (feature 1) and Systolic Blood Pressure (feature 20) for Dia- for this feature during training time, and the probability of betes Mellitus with Complications (Figure 2(a)), and Essen- presence of phenotype during previous episode (as given via tial Hypertension (Figure 2(b)), respectively. Refer Supple- LASSO-based classifier) at test time. mentary Material Figure 3 for more details. We conclude that even though TimeNet was never trained on MIMIC-III data, it 5.3 Observations still provides meaningful general-purpose features from time Classification Tasks series of raw input features, and LASSO helps to select the For the phenotyping task, we make following observations most relevant ones for end-task by using labeled data. Fur- from Table 1: ther, extracting features using a deep recurrent neural network 1. TimeNet-48 vs LR: TimeNet-based features perform signif- model for time series of each raw input feature independently icantly better than hand-crafted features as used in LR (lo- – rather than considering a multivariate time series – eventu- gistic regression), while using first 48 hours of data only un- ally allows to easily assign relevance scores to raw features like the LR approach that uses entire episode’s data. This in the input domain, allowing a high-level basic model vali- proves the effectiveness of TimeNet features for MIMIC-III dation by domain-experts. data. Further, it only requires tuning a single hyperparameter α for LASSO, unlike other approaches like LSTM [Harutyun- yan et al., 2017] that would involve tuning number of hidden 6 Discussion and Future Work units, layers, learning rate, etc. 2. TimeNet-x vs TimeNet-x-Eps: Leveraging previous In this work, we leverage deep learning models efficiently episode’s time series data for a patient significantly improves via TimeNet for phenotyping and mortality prediction tasks, the classification performance. with little hyperparameter tuning effort. TimeNet-based fea- 3. TimeNet-48-Eps performs better than existing benchmarks, tures can be efficiently transferred to train linear interpretable while still being practically more feasible as it looks at only classifiers for the end tasks considered while still achieving up to 48 hours of current episode of a patient rather than the classification performance similar to more compute-intensive entire current episode. For in-hospital mortality task, we ob- deep models trained from scratch. In future, evaluating a serve comparable performance to existing benchmarks. domain-specific TimeNet-like model for clinical time series Training linear models is significantly fast and it took (e.g. trained only on MIMIC-III database) will be interesting. References [Nguyen et al., 2017] Phuoc Nguyen, Truyen Tran, Nilmini Wick- [Bahdanau et al., 2014] Dzmitry Bahdanau, Kyunghyun Cho, and ramasinghe, and Svetha Venkatesh. Deepr: A convolutional net Yoshua Bengio. Neural machine translation by jointly learning for medical records. IEEE journal of biomedical and health in- to align and translate. arXiv preprint arXiv:1409.0473, 2014. formatics, 21(1):22–30, 2017. [Purushotham et al., 2017] Sanjay Purushotham, Chuizheng Meng, [Bengio, 2012] Yoshua Bengio. Deep learning of representations Zhengping Che, and Yan Liu. Benchmark of deep learning for unsupervised and transfer learning. In Proceedings of ICML models on large healthcare mimic datasets. arXiv preprint Workshop on Unsupervised and Transfer Learning, pages 17–36, arXiv:1710.08531, 2017. 2012. [Rajkomar et al., 2018] Alvin Rajkomar, Eyal Oren, Kai Chen, An- [Che et al., 2016] Zhengping Che, Sanjay Purushotham, drew M Dai, Nissan Hajaj, Peter J Liu, Xiaobing Liu, Mimi Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent Sun, Patrik Sundberg, Hector Yee, et al. Scalable and accu- neural networks for multivariate time series with missing values. rate deep learning for electronic health records. arXiv preprint arXiv preprint arXiv:1606.01865, 2016. arXiv:1801.07860, 2018. [Chen et al., 2015] Yanping Chen, Eamonn Keogh, Bing Hu, Nur- [Simonyan and Zisserman, 2014] Karen Simonyan and Andrew jahan Begum, et al. The ucr time series classification archive, Zisserman. Very deep convolutional networks for large-scale im- July 2015. www.cs.ucr.edu/˜eamonn/time_series_ age recognition. arXiv preprint arXiv:1409.1556, 2014. data/. [Song et al., 2017] Huan Song, Deepta Rajan, Jayaraman J Thi- [Cho et al., 2014] Kyunghyun Cho, Bart Van Merriënboer, Caglar agarajan, and Andreas Spanias. Attend and diagnose: Clini- Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, cal time series analysis using attention models. arXiv preprint and Yoshua Bengio. Learning phrase representations using arXiv:1711.03905, 2017. RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014. [Sutskever et al., 2014] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Ad- [Choi et al., 2016] Edward Choi, Mohammad Taha Bahadori, Andy vances in Neural Information Processing Systems, pages 3104– Schuetz, Walter F Stewart, and Jimeng Sun. Doctor ai: Predicting 3112, 2014. clinical events via recurrent neural networks. In Machine Learn- ing for Healthcare Conference, pages 301–318, 2016. [Tibshirani, 1996] Robert Tibshirani. Regression shrinkage and se- lection via the lasso. Journal of the Royal Statistical Society. [Harutyunyan et al., 2017] Hrayr Harutyunyan, Hrant Khachatrian, Series B (Methodological), pages 267–288, 1996. David C Kale, and Aram Galstyan. Multitask learning and benchmarking with clinical time series data. arXiv preprint arXiv:1703.07771, 2017. [Hermans and Schrauwen, 2013] Michiel Hermans and Benjamin Schrauwen. Training and analysing deep recurrent neural net- works. In Advances in Neural Information Processing Systems, pages 190–198, 2013. [Johnson et al., 2016] Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghas- semi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. Mimic-iii, a freely accessible critical care database. Scientific data, 3:160035, 2016. [Lipton et al., 2015] Zachary C Lipton, David C Kale, Charles Elkan, and Randall Wetzel. Learning to diagnose with lstm re- current neural networks. arXiv preprint arXiv:1511.03677, 2015. [Malhotra et al., 2015] Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal. Long Short Term Memory Net- works for Anomaly Detection in Time Series. In ESANN, 23rd European Symposium on Artificial Neural Networks, Computa- tional Intelligence and Machine Learning, pages 89–94, 2015. [Malhotra et al., 2017] Pankaj Malhotra, Vishnu TV, Lovekesh Vig, Puneet Agarwal, and Gautam Shroff. TimeNet: Pre-trained deep recurrent neural network for time series classification. In 25th European Symposium on Artificial Neural Networks, Computa- tional Intelligence and Machine Learning, 2017. [Micenková et al., 2013] Barbora Micenková, Xuan-Hong Dang, Ira Assent, and Raymond T Ng. Explaining outliers by subspace separability. In Data Mining (ICDM), 2013 IEEE 13th Interna- tional Conference on, pages 518–527. IEEE, 2013. [Miotto et al., 2016] Riccardo Miotto, Li Li, Brian A Kidd, and Joel T Dudley. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific reports, 6:26094, 2016. Table 2: Phenotype-wise Classification Performance in terms of AUROC. S.No. Phenotype LSTM- TimeNet-48 TimeNet- TimeNet- TimeNet- Multi All 48-Eps All-Eps 1 Acute and unspecified renal failure 0.8035 0.7861 0.7887 0.7912 0.7941 2 Acute cerebrovascular disease 0.9089 0.8989 0.9031 0.8986 0.9033 3 Acute myocardial infarction 0.7695 0.7501 0.7478 0.7533 0.7509 4 Cardiac dysrhythmias 0.684 0.6853 0.7005 0.7096 0.7239 5 Chronic kidney disease 0.7771 0.7764 0.7888 0.7960 0.8061 6 Chronic obstructive pulmonary disease and bronchiectasis 0.6786 0.7096 0.7236 0.7460 0.7605 7 Complications of surgical procedures or medical care 0.7176 0.7061 0.6998 0.7092 0.7029 8 Conduction disorders 0.726 0.7070 0.7111 0.7286 0.7324 9 Congestive heart failure; nonhypertensive 0.7608 0.7464 0.7541 0.7747 0.7805 10 Coronary atherosclerosis and other heart disease 0.7922 0.7764 0.7760 0.8007 0.8016 11 Diabetes mellitus with complications 0.8738 0.8748 0.8800 0.8856 0.8887 12 Diabetes mellitus without complication 0.7897 0.7749 0.7853 0.7904 0.8000 13 Disorders of lipid metabolism 0.7213 0.7055 0.7119 0.7217 0.7280 14 Essential hypertension 0.6779 0.6591 0.6650 0.6757 0.6825 15 Fluid and electrolyte disorders 0.7405 0.7351 0.7301 0.7377 0.7328 16 Gastrointestinal hemorrhage 0.7413 0.7364 0.7309 0.7386 0.7343 17 Hypertension with complications and secondary hypertension 0.76 0.7606 0.7700 0.7792 0.7871 18 Other liver diseases 0.7659 0.7358 0.7332 0.7573 0.7530 19 Other lower respiratory disease 0.688 0.6847 0.6897 0.6896 0.6922 20 Other upper respiratory disease 0.7599 0.7515 0.7565 0.7595 0.7530 21 Pleurisy; pneumothorax; pulmonary collapse 0.7027 0.6900 0.6882 0.6909 0.6997 22 Pneumonia 0.8082 0.7857 0.7916 0.7890 0.7943 23 Respiratory failure; insufficiency; arrest (adult) 0.9015 0.8815 0.8856 0.8834 0.8876 24 Septicemia (except in labor) 0.8426 0.8276 0.8140 0.8296 0.8165 25 Shock 0.876 0.8764 0.8564 0.8763 0.8562 Raw features 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 1 1.0 2 3 4 5 0.8 6 7 8 9 10 0.6 Phenotype 11 12 13 14 15 0.4 16 17 18 19 20 0.2 21 22 23 24 25 0.0 Figure 3: Feature relevance scores for 25 phenotypes. Refer Table 2 for names of phenotypes, and Table 3 for names of raw features. Table 3: List of raw input features. 1 Glucose 31 Glascow coma scale eye opening → 3 To speech 2 Glascow coma scale total → 7 32 Height 3 Glascow coma scale verbal response → Incomprehensible sounds 33 Glascow coma scale motor response → 5 Localizes Pain 4 Diastolic blood pressure 34 Glascow coma scale total → 14 5 Weight 35 Fraction inspired oxygen 6 Glascow coma scale total → 8 36 Glascow coma scale total → 12 7 Glascow coma scale motor response → Obeys Commands 37 Glascow coma scale verbal response → Confused 8 Glascow coma scale eye opening → None 38 Glascow coma scale motor response → 1 No Response 9 Glascow coma scale eye opening → To Pain 39 Mean blood pressure 10 Glascow coma scale total → 6 40 Glascow coma scale total → 4 11 Glascow coma scale verbal response → 1.0 ET/Trach 41 Glascow coma scale eye opening → To Speech 12 Glascow coma scale total → 5 42 Glascow coma scale total → 15 13 Glascow coma scale verbal response → 5 Oriented 43 Glascow coma scale motor response → 4 Flex-withdraws 14 Glascow coma scale total → 3 44 Glascow coma scale motor response → No response 15 Glascow coma scale verbal response → No Response 45 Glascow coma scale eye opening → Spontaneously 16 Glascow coma scale motor response → 3 Abnorm flexion 46 Glascow coma scale verbal response → 4 Confused 17 Glascow coma scale verbal response → 3 Inapprop words 47 Capillary refill rate → 0.0 18 Capillary refill rate → 1.0 48 Glascow coma scale total → 13 19 Glascow coma scale verbal response → Inappropriate Words 49 Glascow coma scale eye opening → 1 No Response 20 Systolic blood pressure 50 Glascow coma scale motor response → Abnormal extension 21 Glascow coma scale motor response → Flex-withdraws 51 Glascow coma scale total → 11 22 Glascow coma scale total → 10 52 Glascow coma scale verbal response → 2 Incomp sounds 23 Glascow coma scale motor response → Obeys Commands 53 Glascow coma scale total → 9 24 Glascow coma scale verbal response → No Response-ETT 54 Glascow coma scale motor response → Abnormal Flexion 25 Glascow coma scale eye opening → 2 To pain 55 Glascow coma scale verbal response → 1 No Response 26 Heart Rate 56 Glascow coma scale motor response → 2 Abnorm extensn 27 Respiratory rate 57 pH 28 Glascow coma scale verbal response → Oriented 58 Glascow coma scale eye opening → 4 Spontaneously 29 Glascow coma scale motor response → Localizes Pain 59 Oxygen saturation 30 Temperature