<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Features from Pre-trained TimeNet for Clinical Predictions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Priyanka Gupta</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pankaj Malhotra</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lovekesh Vig</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gautam Shroff TCS Research</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>New Delhi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>India</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>priyanka.g</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>malhotra.pankaj</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>lovekesh.vig</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>gautam.shroffg@tcs.com</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Predictive models based on Recurrent Neural Networks (RNNs) for clinical time series have been successfully used for various tasks such as phenotyping, in-hospital mortality prediction, and diagnostics. However, RNNs require large labeled data for training and are computationally expensive to train. Pre-training a network for some supervised or unsupervised tasks on a dataset, and then fine-tuning via transfer learning for a related end-task can be an efficient way to leverage deep models for scenarios that lack in either computational resources or labeled data, or both. In this work, we consider an approach to leverage a deep RNN - namely TimeNet [Malhotra et al., 2017] - that is pre-trained on a large number of diverse publicly available time-series from UCR Repository [Chen et al., 2015]. TimeNet maps varyinglength time series to fixed-dimensional feature vectors and acts as an off-the-shelf feature extractor. TimeNet-based approach overcome the need for hand-crafted features, and allows for use of traditional easy-to-train and interpretable linear models for the end-task, while still leveraging the features from a deep neural network. Empirical evaluation of the proposed approach on MIMIC-III1 data suggests promising direction for future exploration: our results are comparable to existing benchmarks while our models require lesser training and hyperparameter tuning effort.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>There has been a growing interest in using deep learning
models for various clinical prediction tasks from Electronic
Health Records, e.g. Doctor AI [Choi et al., 2016] for
medical diagnosis, Deep Patient [Miotto et al., 2016] to predict
future diseases in patients, DeepR [Nguyen et al., 2017] to
predict unplanned readmission after discharge, etc. With
various medical parameters being recorded over a period of time
in EHR databases, Recurrent Neural Networks (RNNs) can
1TimeNet-based features for MIMIC-III time series are available
on request from authors.
be an effective way to model the sequential aspects of EHR
data, e.g. diagnoses [Lipton et al., 2015; Che et al., 2016;
Choi et al., 2016], mortality prediction and estimating length
of stay [Harutyunyan et al., 2017; Purushotham et al., 2017;
Rajkomar et al., 2018].</p>
      <p>However, training RNNs requires large labeled training
data like any other deep learning approach, and can be
computationally inefficient because of sequential nature of
computations. On the other hand, training a deep network on
diverse instances can provide generic features for unseen
instances, e.g. VGGNet [Simonyan and Zisserman, 2014] for
images. Also, fine-tuning a pre-trained network with
transfer learning is often faster and easier than constructing and
training a new network from scratch [Bengio, 2012]. The
advantage of learning in such a manner is that the pre-trained
network has already learned a rich set of features that can
then be applied to a wide range of other similar tasks.</p>
      <p>Deep RNNs have been shown to perform hierarchical
processing of time series with different layers tackling different
time scales [Hermans and Schrauwen, 2013; Malhotra et al.,
2015]. TimeNet [Malhotra et al., 2017] is a general-purpose
multi-layered RNN trained on large number of diverse time
series from UCR Time Series Archive [Chen et al., 2015]
(refer Section 3 for details) that has been shown to be
useful as off-the-shelf feature extractor for time series. TimeNet
has been trained on 18 different datasets simultaneously via
an RNN autoencoder in an unsupervised manner for
reconstruction task. Features extracted from TimeNet have been
found to be useful for classification task on 25 datasets not
seen during training of TimeNet, proving its ability to
provide meaningful features for unseen datasets.</p>
      <p>In this work, we provide an efficient way to learn
prediction models for clinical time series by leveraging
generalpurpose features via TimeNet. TimeNet maps variable-length
clinical time series to fixed-dimensional feature vectors, that
are subsequently used for patient phenotyping and in-hospital
mortality prediction tasks on MIMIC-III database [Johnson et
al., 2016] via easily trainable non-temporal linear
classification models. We observe that TimeNet-based features can be
used to build such classification models with very little
training effort while yielding performance comparable to
models with hand-crafted features or carefully trained
domainspecific RNNs, as benchmarked in [Harutyunyan et al., 2017;
Song et al., 2017]. Further, we propose a simple mechanism
to leverage the weights of the linear classification models to
provide insights into the relevance of each raw input feature
(physiological parameter) for a given phenotype (discussed in
Section 4.2).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>TimeNet-based features have been shown to be useful for
various tasks including ECG classification [Malhotra et al.,
2017]. In this work, we consider application of TimeNet
to phenotyping and in-hospital mortality tasks for
multivariate clinical time series classification. Deep Patient [Miotto
et al., 2016] proposes leveraging features from a pre-trained
stacked-autoencoder for EHR data. However, it does not
leverage the temporal aspect of the data and uses a
nontemporal model based on stacked-autoencoders. Our
approach extracts temporal features via TimeNet
incorporating the sequential nature of EHR data. Doctor AI [Choi et
al., 2016] uses discretized medical codes (e.g. diagnosis,
medication, procedure) from longitudinal patient visits via a
purely supervised setting while we use real-valued time
series. While approaches like Doctor AI require training a deep
RNN from scratch, our approach leverages a general-purpose
RNN for feature extraction.</p>
      <p>[Harutyunyan et al., 2017] consider training a deep RNN
model for multiple prediction tasks simultaneously
including phenotyping and in-hospital mortality to learn a
generalpurpose deep RNN for clinical time series. They show that
it is possible to train a single network for multiple tasks
simultaneously by capturing generic features that work across
different tasks. We also consider leveraging generic features
for clinical time series but using an RNN that is pre-trained
on diverse time series across domains, making our approach
more efficient. Further, we provide an approach to rank the
raw input features in order of their relevance that helps
validate the models learned.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Background: TimeNet</title>
      <p>TimeNet [Malhotra et al., 2017] is a pre-trained off-the-shelf
feature extractor for univariate time series with three
recurrent layers having 60 Gated Recurrent Units (GRUs) [Cho
et al., 2014] each. TimeNet is an RNN trained via an
autoencoder consisting of an encoder RNN and a decoder RNN
trained simultaneously using the sequence-to-sequence
learning framework [Sutskever et al., 2014; Bahdanau et al., 2014]
as shown in Figure 1(a). RNN autoencoder is trained to obtain
the parameters WE of the encoder RNN fE via
reconstruction task such that for input x1:::T = x1; x2; :::; xT (xi 2 R),
the target output time series xT :::1 = xT ; xT 1; :::; x1 is
reverse of the input.</p>
      <p>
        The RNN encoder fE provides a non-linear mapping of
the univariate input time series to a fixed-dimensional vector
representation zT : zT = fE (x1:::T ; WE ), followed by an
RNN decoder fD based non-linear mapping of zT to
univariate time series: x^T :::1 = fD(zT ; WD); where WE and WD
are the parameters of the encoder and decoder, respectively.
The model is trained to minimize the average squared
reconstruction error. Training on 18 diverse datasets
simultaneously results in robust time series features getting captured in
zT : the decoder relies on zT as the only input to reconstruct
the time series, forcing the encoder to capture all the
relevant information in the time series into the fixed-dimensional
vector zT . This vector zT is used as the feature vector for
input x1:::T . This feature vector is then used to train a simpler
classifier
        <xref ref-type="bibr" rid="ref12 ref15 ref16 ref19 ref7">(e.g. SVM, as used in [Malhotra et al., 2017])</xref>
        for
the end task. TimeNet maps a univariate input time series to
180-dimensional feature vector, where each dimension
corresponds to final output of one of the 60 GRUs in the 3 recurrent
layers.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>TimeNet Features for Clinical Time Series</title>
      <p>Consider a set D of labeled time series instances from an EHR
database: D = f(x(i); y(i))gi=1, where x(i) is a multivariate</p>
      <p>
        N
time series, y(i) 2 fy1; : : : ; yC g, C is the number of classes,
N is the number of unique patients (in our experiments, we
consider each episode of hospital stay for a patient as a
separate data instance). In this work, we consider presence or
absence of a phenotype as a binary classification task such that
C = 2. We learn an independent model for each phenotype
        <xref ref-type="bibr" rid="ref12 ref15 ref16 ref19 ref7">(unlike [Harutyunyan et al., 2017] which consider
phenotyping as a multi-label classification problem)</xref>
        . This allows us to
build simple linear binary classification models as described
next in Section 4.1. In practice, the outputs of these binary
classifiers can then be considered together to estimate the set
of phenotypes present in a patient. Similarly, mortality
prediction is considered to be a binary classification task where
the goal is to classify whether the patient will survive (after
admission to ICU) or not.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Classification using TimeNet features</title>
      </sec>
      <sec id="sec-4-2">
        <title>Feature Extraction for Multivariate Clinical Time Series</title>
        <p>For a multivariate time series x = x1x2 : : : xT , where xt 2
Rn, we consider time series for each of the n raw input
features (physiological parameters, e.g. glucose level, heart
rate, etc.) independently, to obtain univariate time series
xj = xj1xj2 : : : xjT , j = 1 : : : n. (Note: We use x instead of
x(i) and omit superscript (i) for ease of notation). We obtain
the vector representation zjT = fE (xj ; WE ) for xj , where
zjT 2 Rc using TimeNet as fE with c = 180 (as described in
Section 3). In general, time series length T also depends on i,
e.g. based on length of stay in hospital. We omit this for sake
of clarity without loss of generality. In practice, we convert
each time series to have equal length T by suitable
pre/postpadding with 0s. We concatenate the TimeNet-features zjT
for each raw input feature j to get the final feature vector
zT = [z1T ; z2T ; : : : ; znT ] for time series x, where zT 2 Rm,
m = n c as illustrated in Figure 1(b).</p>
      </sec>
      <sec id="sec-4-3">
        <title>Using TimeNet-based Features for Classification</title>
        <p>The final concatenated feature vector zT is used as input for
the phenotyping and mortality prediction classification tasks.
We note that since c = 180 is large, zT has large number of
features m 180. We consider a linear mapping from input
TimeNet features zT to the target label y s.t. the estimate
y^ = w zT , where w 2 Rm. We constrain the linear model
with weights w to use only a few of these large number of
features. The weights are obtained using LASSO-regularized
Decoder
...
where y(i) 2 f0; 1g, jjwjj1 = Pjn=1 Pck=1 jwjkj is the
L1norm, where wjk represents the weight assigned to the k-th
TimeNet-feature for the j-th raw feature, and controls the
extent of sparsity – with higher implying more sparsity, i.e.
fewer TimeNet features are selected for the final classifier.
4.2</p>
      </sec>
      <sec id="sec-4-4">
        <title>Obtaining Relevance Scores for Raw Features</title>
        <p>
          Determining relevance of the n raw input features for a given
phenotype is potentially useful to obtain insights into the
obtained classification model. The sparse weights w are easy
to interpret and can give interesting insights into relevant
features for a classification task
          <xref ref-type="bibr" rid="ref13">(e.g. as used in [Micenkova´ et
al., 2013])</xref>
          . We obtain the relevance rj of the j-th raw input
feature as the sum of the absolute values of the weights wjk
assigned to the corresponding TimeNet features zjT as shown
in Figure 1(c), s.t.
        </p>
        <p>c
rj = X jwjkj; j = 1 : : : n:
k=1
(2)
Further, rj is normalized using min-max normalization such
that rj0 = rmrjax rmrminin 2 [0; 1]; rmin is minimum of
fr1; : : : ; rng, rmax is maximum of fr1; : : : ; rng. In
practice, this kind of relevance scores for the raw features help
to interpret and validate the overall model. For example, one
would expect blood glucose level feature to have a high
relevance score when learning a model to detect diabetes mellitus
phenotype (we provide such insights later in Section 5).
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental Evaluation</title>
      <sec id="sec-5-1">
        <title>Dataset Details</title>
        <p>We use MIMIC-III (v1.4) clinical database [Johnson et al.,
2016] which consists of over 60,000 ICU stays across 40,000
critical care patients. We use same experimental setup as in
[Harutyunyan et al., 2017], with same splits and features for
train, validation and test datasets2 based on 17
physiological time series with 12 real-valued and 5 categorical time
series, sampled at 1 hour intervals. The categorical variables
are converted to one-hot vectors such that final multivariate
time series has n = 76 raw input features (59 actual features
and 17 masking features to denote missing values).</p>
        <p>For phenotyping task, the goal is to classify 25
phenotypes common in adult ICUs. For in-hospital mortality task,
the goal is to predict whether the patient will survive or not
given the time series observations up to 48 hours. In all
our experiments, we restrict training time series data up to
first 48 hours in ICU stay, such that T = 48 while
training all models to imitate practical scenario where early
predictions are important, unlike [Harutyunyan et al., 2017;
Song et al., 2017] which use entire time series for training
the classifier for phenotyping task.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Evaluation</title>
        <p>We have n = 76 raw input features resulting in m = 13;
680dimensional (m = 76 180) TimeNet feature vector for each
admission. We use = 0:0001 for phenotype classifiers and
use = 0:0003 for in-hospital mortality classifier ( is
chosen based on hold-out validation set). Table 1 summarizes the
results and provides comparison with existing benchmarks.
Refer Table 2 for detailed phenotype-wise results.</p>
        <p>We consider two variants of classifier models for
phenotyping task: i) TimeNet-x using data from current episode, ii)
TimeNet-x-Eps using data from previous episode of a patient
as well (whenever available) via an additional input feature
related to presence or absence of the phenotype in previous
episode. Each classifier is trained using up to first 48 hours of
data after ICU admission. However, we consider two
classifier variants depending upon hours of data x used to estimate
the target class at test time. For x = 48, data up to first 48
hours after admission is used for determining the phenotype.
For x = All, the learned classifier is applied to all 48-hours
windows (overlapping with shift of 24 hours) over the
entire ICU stay period of a patient, and the average phenotype
probability across windows is used as the final estimate of
2https://github.com/yerevann/mimic3-benchmarks
the target class. In TimeNet-x-Eps, the additional feature is
related to the presence (1) or absence (0) of the phenotype
during the previous episode. We use the ground-truth value
for this feature during training time, and the probability of
presence of phenotype during previous episode (as given via
LASSO-based classifier) at test time.
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Observations</title>
      </sec>
      <sec id="sec-5-4">
        <title>Classification Tasks</title>
        <p>For the phenotyping task, we make following observations
from Table 1:
1. TimeNet-48 vs LR: TimeNet-based features perform
significantly better than hand-crafted features as used in LR
(logistic regression), while using first 48 hours of data only
unlike the LR approach that uses entire episode’s data. This
proves the effectiveness of TimeNet features for MIMIC-III
data. Further, it only requires tuning a single hyperparameter
for LASSO, unlike other approaches like LSTM
[Harutyunyan et al., 2017] that would involve tuning number of hidden
units, layers, learning rate, etc.
2. TimeNet-x vs TimeNet-x-Eps: Leveraging previous
episode’s time series data for a patient significantly improves
the classification performance.
3. TimeNet-48-Eps performs better than existing benchmarks,
while still being practically more feasible as it looks at only
up to 48 hours of current episode of a patient rather than the
entire current episode. For in-hospital mortality task, we
observe comparable performance to existing benchmarks.</p>
        <p>Training linear models is significantly fast and it took
around 30 minutes for obtaining any of the binary classifiers
while tuning for 2 [10 5 10 3] (five equally-spaced
values) on a 32GB RAM machine with Quad Core i7 2.7GHz
processor.</p>
        <p>We observe that LASSO leads to 96:2 0:8 % sparsity (i.e.
percentage of weights wjk 0) for all classifiers leading to
around 550 useful features (out of 13,680) for each phenotype
classification.</p>
      </sec>
      <sec id="sec-5-5">
        <title>Relevance Scores for Raw Input Features</title>
        <p>We observe intuitive interpretation for relevance of raw
input features using the weights assigned to various TimeNet
features (refer Equation 2): For example, as shown in
Figure 2, we obtain highest relevance scores for Glucose Level
(feature 1) and Systolic Blood Pressure (feature 20) for
Diabetes Mellitus with Complications (Figure 2(a)), and
Essential Hypertension (Figure 2(b)), respectively. Refer
Supplementary Material Figure 3 for more details. We conclude that
even though TimeNet was never trained on MIMIC-III data, it
still provides meaningful general-purpose features from time
series of raw input features, and LASSO helps to select the
most relevant ones for end-task by using labeled data.
Further, extracting features using a deep recurrent neural network
model for time series of each raw input feature independently
– rather than considering a multivariate time series –
eventually allows to easily assign relevance scores to raw features
in the input domain, allowing a high-level basic model
validation by domain-experts.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Discussion and Future Work</title>
      <p>In this work, we leverage deep learning models efficiently
via TimeNet for phenotyping and mortality prediction tasks,
with little hyperparameter tuning effort. TimeNet-based
features can be efficiently transferred to train linear interpretable
classifiers for the end tasks considered while still achieving
classification performance similar to more compute-intensive
deep models trained from scratch. In future, evaluating a
domain-specific TimeNet-like model for clinical time series
(e.g. trained only on MIMIC-III database) will be interesting.
Figure 3: Feature relevance scores for 25 phenotypes. Refer Table 2 for names of phenotypes, and Table 3 for names of raw features.
1
2
3
4
5
6
7
8
9
e 10
typ 1112
o 13
henP 11114567
0.8
0.6
0.4
0.2
0.0</p>
      <p>Glucose</p>
      <p>Glascow coma scale total ! 7
Glascow coma scale verbal response ! Incomprehensible sounds</p>
      <p>Diastolic blood pressure</p>
      <p>Weight</p>
      <p>Glascow coma scale total ! 8
Glascow coma scale motor response ! Obeys Commands</p>
      <p>Glascow coma scale eye opening ! None
Glascow coma scale eye opening ! To Pain</p>
      <p>Glascow coma scale total ! 6
Glascow coma scale verbal response ! 1.0 ET/Trach</p>
      <p>Glascow coma scale total ! 5
Glascow coma scale verbal response ! 5 Oriented</p>
      <p>Glascow coma scale total ! 3</p>
      <p>Glascow coma scale verbal response ! No Response
Glascow coma scale motor response ! 3 Abnorm flexion
Glascow coma scale verbal response ! 3 Inapprop words</p>
      <p>Capillary refill rate ! 1.0
Glascow coma scale verbal response ! Inappropriate Words</p>
      <p>Systolic blood pressure
Glascow coma scale motor response ! Flex-withdraws</p>
      <p>Glascow coma scale total ! 10
Glascow coma scale motor response ! Obeys Commands
Glascow coma scale verbal response ! No Response-ETT</p>
      <p>Glascow coma scale eye opening ! 2 To pain</p>
      <p>Heart Rate</p>
      <p>Respiratory rate</p>
      <p>Glascow coma scale verbal response ! Oriented
Glascow coma scale motor response ! Localizes Pain</p>
      <p>Temperature</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Bahdanau et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Dzmitry</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          , Kyunghyun Cho, and
          <string-name>
            <surname>Yoshua Bengio.</surname>
          </string-name>
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>arXiv preprint arXiv:1409.0473</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Bengio</source>
          , 2012]
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <article-title>Deep learning of representations for unsupervised and transfer learning</article-title>
          .
          <source>In Proceedings of ICML Workshop on Unsupervised and Transfer Learning</source>
          , pages
          <fpage>17</fpage>
          -
          <lpage>36</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Che et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Zhengping</given-names>
            <surname>Che</surname>
          </string-name>
          , Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu.
          <article-title>Recurrent neural networks for multivariate time series with missing values</article-title>
          .
          <source>arXiv preprint arXiv:1606</source>
          .
          <year>01865</year>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Chen et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Yanping</given-names>
            <surname>Chen</surname>
          </string-name>
          , Eamonn Keogh, Bing Hu,
          <string-name>
            <given-names>Nurjahan</given-names>
            <surname>Begum</surname>
          </string-name>
          , et al.
          <article-title>The ucr time series classification archive</article-title>
          ,
          <year>July 2015</year>
          . www.cs.ucr.edu/˜eamonn/time_series_ data/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Cho et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Kyunghyun</given-names>
            <surname>Cho</surname>
          </string-name>
          , Bart Van Merrie¨nboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <article-title>Learning phrase representations using RNN encoder-decoder for statistical machine translation</article-title>
          .
          <source>arXiv preprint arXiv:1406.1078</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Choi et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Edward</given-names>
            <surname>Choi</surname>
          </string-name>
          , Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart,
          <string-name>
            <given-names>and Jimeng</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <article-title>Doctor ai: Predicting clinical events via recurrent neural networks</article-title>
          .
          <source>In Machine Learning for Healthcare Conference</source>
          , pages
          <fpage>301</fpage>
          -
          <lpage>318</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Harutyunyan et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Hrayr</given-names>
            <surname>Harutyunyan</surname>
          </string-name>
          , Hrant Khachatrian, David C Kale, and
          <string-name>
            <given-names>Aram</given-names>
            <surname>Galstyan</surname>
          </string-name>
          .
          <article-title>Multitask learning and benchmarking with clinical time series data</article-title>
          .
          <source>arXiv preprint arXiv:1703.07771</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Hermans and Schrauwen</source>
          , 2013]
          <string-name>
            <given-names>Michiel</given-names>
            <surname>Hermans</surname>
          </string-name>
          and
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Schrauwen</surname>
          </string-name>
          .
          <article-title>Training and analysing deep recurrent neural networks</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          , pages
          <fpage>190</fpage>
          -
          <lpage>198</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Johnson et al.,
          <source>2016] Alistair EW Johnson</source>
          , Tom J Pollard, Lu Shen,
          <string-name>
            <given-names>H Lehman</given-names>
            <surname>Li-wei</surname>
          </string-name>
          , Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark.
          <article-title>Mimic-iii, a freely accessible critical care database</article-title>
          .
          <source>Scientific data</source>
          ,
          <volume>3</volume>
          :
          <fpage>160035</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Lipton et al.,
          <year>2015</year>
          ] Zachary
          <string-name>
            <given-names>C Lipton</given-names>
            , David C Kale,
            <surname>Charles Elkan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Randall</given-names>
            <surname>Wetzel</surname>
          </string-name>
          .
          <article-title>Learning to diagnose with lstm recurrent neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1511.03677</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Malhotra et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Pankaj</given-names>
            <surname>Malhotra</surname>
          </string-name>
          , Lovekesh Vig, Gautam Shroff, and
          <article-title>Puneet Agarwal. Long Short Term Memory Networks for Anomaly Detection in Time Series</article-title>
          .
          <source>In ESANN, 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning</source>
          , pages
          <fpage>89</fpage>
          -
          <lpage>94</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Malhotra et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Pankaj</given-names>
            <surname>Malhotra</surname>
          </string-name>
          ,
          <string-name>
            <surname>Vishnu</surname>
            <given-names>TV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lovekesh</surname>
            <given-names>Vig</given-names>
          </string-name>
          , Puneet Agarwal, and Gautam Shroff.
          <article-title>TimeNet: Pre-trained deep recurrent neural network for time series classification</article-title>
          .
          <source>In 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>[Micenkova</surname>
          </string-name>
          ´ et al.,
          <year>2013</year>
          ] Barbora Micenkova´,
          <string-name>
            <surname>Xuan-Hong</surname>
            <given-names>Dang</given-names>
          </string-name>
          , Ira Assent, and Raymond T Ng.
          <article-title>Explaining outliers by subspace separability</article-title>
          .
          <source>In Data Mining (ICDM)</source>
          ,
          <year>2013</year>
          IEEE 13th International Conference on, pages
          <fpage>518</fpage>
          -
          <lpage>527</lpage>
          . IEEE,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Miotto et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Riccardo</given-names>
            <surname>Miotto</surname>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>Li</given-names>
          </string-name>
          ,
          <article-title>Brian A Kidd,</article-title>
          and Joel T Dudley.
          <article-title>Deep patient: an unsupervised representation to predict the future of patients from the electronic health records</article-title>
          .
          <source>Scientific reports</source>
          ,
          <volume>6</volume>
          :
          <fpage>26094</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Nguyen et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Phuoc</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , Truyen Tran, Nilmini Wickramasinghe, and
          <string-name>
            <given-names>Svetha</given-names>
            <surname>Venkatesh</surname>
          </string-name>
          .
          <article-title>Deepr: A convolutional net for medical records</article-title>
          .
          <source>IEEE journal of biomedical and health informatics</source>
          ,
          <volume>21</volume>
          (
          <issue>1</issue>
          ):
          <fpage>22</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Purushotham et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Sanjay</given-names>
            <surname>Purushotham</surname>
          </string-name>
          , Chuizheng Meng, Zhengping Che, and Yan Liu.
          <article-title>Benchmark of deep learning models on large healthcare mimic datasets</article-title>
          .
          <source>arXiv preprint arXiv:1710.08531</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Rajkomar et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Alvin</given-names>
            <surname>Rajkomar</surname>
          </string-name>
          , Eyal Oren, Kai Chen, Andrew M Dai,
          <string-name>
            <given-names>Nissan</given-names>
            <surname>Hajaj</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peter J Liu</surname>
            , Xiaobing Liu, Mimi Sun, Patrik Sundberg,
            <given-names>Hector</given-names>
          </string-name>
          <string-name>
            <surname>Yee</surname>
          </string-name>
          , et al.
          <article-title>Scalable and accurate deep learning for electronic health records</article-title>
          .
          <source>arXiv preprint arXiv:1801.07860</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Simonyan and Zisserman</source>
          , 2014]
          <string-name>
            <given-names>Karen</given-names>
            <surname>Simonyan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          .
          <source>arXiv preprint arXiv:1409.1556</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Song et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Huan</given-names>
            <surname>Song</surname>
          </string-name>
          , Deepta Rajan,
          <string-name>
            <surname>Jayaraman J Thiagarajan</surname>
            , and
            <given-names>Andreas</given-names>
          </string-name>
          <string-name>
            <surname>Spanias</surname>
          </string-name>
          .
          <article-title>Attend and diagnose: Clinical time series analysis using attention models</article-title>
          .
          <source>arXiv preprint arXiv:1711.03905</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Sutskever et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , Oriol Vinyals, and Quoc V Le.
          <article-title>Sequence to sequence learning with neural networks</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          , pages
          <fpage>3104</fpage>
          -
          <lpage>3112</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[Tibshirani</source>
          , 1996]
          <string-name>
            <given-names>Robert</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          .
          <article-title>Regression shrinkage and selection via the lasso</article-title>
          .
          <source>Journal of the Royal Statistical Society. Series B (Methodological)</source>
          , pages
          <fpage>267</fpage>
          -
          <lpage>288</lpage>
          ,
          <year>1996</year>
          .
          <article-title>Glascow coma scale eye opening ! 3 To speech Height Glascow coma scale motor response ! 5 Localizes Pain Glascow coma scale total ! 14 Fraction inspired oxygen Glascow coma scale total ! 12 Glascow coma scale verbal response ! Confused Glascow coma scale motor response ! 1 No Response Mean blood pressure Glascow coma scale total ! 4 Glascow coma scale eye opening ! To Speech Glascow coma scale total ! 15</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>Glascow coma scale motor response ! 4 Flex-withdraws Glascow coma scale motor response ! No response Glascow coma scale eye opening ! Spontaneously Glascow coma scale verbal response ! 4 Confused Capillary refill rate ! 0.0 Glascow coma scale total ! 13 Glascow coma scale eye opening ! 1 No Response Glascow coma scale motor response ! Abnormal extension Glascow coma scale total ! 11</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <article-title>Glascow coma scale verbal response ! 2 Incomp sounds Glascow coma scale total ! 9</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <article-title>Glascow coma scale motor response ! Abnormal Flexion Glascow coma scale verbal response ! 1 No Response Glascow coma scale motor response ! 2 Abnorm extensn pH Glascow coma scale eye opening ! 4 Spontaneously Oxygen saturation</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>