<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Attention-based Recurrent Neural Networks Framework for Health Data Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qiuling Suo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fenglong Ma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Canino</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jing Gao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aidong Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agostino Gnasso</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Tradigo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierangelo Veltri</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, University at Bu alo</institution>
          ,
          <addr-line>NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Surgical and Medical Sciences, Magna Graecia University</institution>
          ,
          <addr-line>Catanzaro</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Metabolic Diseases Unit, Department of Clinical and Experimental Medicine, Mater Domini Hospital, Magna Graecia University</institution>
          ,
          <addr-line>Catanzaro</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we focus on prediction of health status of patients from the historical Electronic Health Records (EHR). We propose a multi-task framework that can monitor the multiple status of diagnoses. Patients' historical records are fed into a Recurrent Neural Network (RNN) which memorizes all the past visit information, and then a task-speci c layer is trained to predict multiple diagnoses. Experimental results show that prediction accuracy is reliable if compared to widely used approaches 1</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Disease monitoring is often limited by physician experience, test time, economic
barriers and so on. The Electronic Health Record (EHR) is a valuable source
for exploratory analysis to monitor diseases and assist clinical decision making.
However, due to the complexity of EHR data, the e cient mining of EHRs is
not trivial.</p>
      <p>
        Recent work has made rapid progress in utilizing EHRs for predictive
modeling tasks in healthcare, including predicting unplanned readmission [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], early
prediction of chronic disease [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], adverse event detection [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and monitoring
disease progression [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The main idea here is to learn a good representation of a
patient's historical health information, in order to improve the performance of
the prediction for future risks.
      </p>
      <p>
        In order to model the dependencies of diagnoses, deep leaning techniques,
such as recurrent neural networks, can be employed. Recent work [
        <xref ref-type="bibr" rid="ref1 ref10 ref3 ref8 ref9">10, 1, 8, 3, 9</xref>
        ]
shows that deep learning can signi cantly improve the prediction performance.
To handle the temporality of multivariate sequences, dynamically modeling the
1 An extended version of such a paper has been included in the proceedings of AMIA
2018, Washington DC[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
1 SEBD 2018, June 24-27, 2018, Castellaneta Marina, Italy. Copyright held by the
author(s).
sequential data is necessary. Recurrent neural networks (RNNs), in particular
Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), have
achieved state-of-the-art performance in handling long-term dependencies and
nonlinear dynamics.
      </p>
      <p>In this paper, our goal is to predict the status of multiple diagnoses (or
observations), with each diagnosis having multiple severity levels. We form our
problem as multi-task learning, which rst learns a shared representation from all the
features, and then performs task-speci c predictions. We propose an
attentionbased RNN model to monitor patient's longitudinal health information. First,
we use an RNN to memorize all the information from historical visits, and then
attention mechanisms to measure visit importance. Based on the latent
representation, we train multiple classi ers and each focuses on the prediction of a speci c
task. We perform our model on two applications: predicting chronic states for
bone health, and monitoring BloodTest values for cardiovascular disease.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Method</title>
      <p>The basic component of our framework is gated recurrent unit, which is a
stateof-the-art deep learning architecture for modelling long range sequences. To
further improve its performance, we apply attention mechanisms to measure the
importance of historical sequences. To predict the status of multiple diagnoses,
we add a multi-task classi cation layer on top of the learned representations.</p>
      <p>
        We implement our RNN with Gated Recurrent Units (GRU) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which
has been shown to have comparable performance as Long-Short Term Memory
(LSTM), while employing a simpler architecture.
      </p>
      <p>RNN can remember the past information for future prediction. However, it
is limited to only a few latest steps, with more impact from later ones, and may
not be able to discover major in uences from earlier timestamps. Therefore, we
apply attention mechanisms to memorize the e ect from long-time dependencies,
which have gained success in many tasks.</p>
      <p>Our task is to predict the status of multiple measurement results at the time
(t + 1) given the historical records from x1 to xt. Figure 1 shows a high-level
overview of the proposed model. Given the information from time 1 to t, the
i-th visit's health record xi is fed into an RNN network, which outputs a hidden
state hi as the representation of the i-th visit. Along with the set of hidden states
fhigit=11, we compute their relative importance t, and then obtain a context state
ct. From the context state ct and the current hidden state ht, we can obtain an
attentional hidden state h~t, which is used to predict diagnoses in the (t +
1)th visit. For the prediction, we use M softmax classi ers, which correspond to
the M di erent diagnoses, to predict the severity level for each diagnosis. The
representation ht contains the visit information of all the input features, and the
task-speci c classi er focuses on the prediction of each diagnosis.</p>
      <p>Task 1</p>
      <p>Task 2</p>
      <p>Task  − 1
  −</p>
      <p>Task 
 
…
 
…
RNN</p>
      <p>RNN</p>
      <p>RNN
  −</p>
      <p>Softmax
Attention</p>
      <p>RNN
 
We conduct experiments on two real-word datasets, and evaluate the
performance of the proposed attention-based RNN models compared to other
prediction methods.</p>
      <p>
        Study of Osteoporotic Fractures Dataset. The study of osteoporotic fracture
(SOF) [?] is the largest and most comprehensive study focused on bone diseases.
It includes 20 years longitudinal data about osteoporosis of 9,704 Caucasian
women aged 65 years and older. Potential risk factors and confounders belong
to several groups such as demographics, family history, and lifestyle. We process
people's bone health diagnoses of di erent areas using the bone mineral density
(BMD) values by comparison with young healthy references [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], resulting in
three BMD levels: normal, osteopenia and osteoporosis.
      </p>
      <p>
        BloodTest Dataset. This dataset [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] contains multivariate blood tests of 3,000
patients a ected by cardiovascular disease from the University Hospital of
Catanzaro, Italy. For each patient, there are several blood tests during their in-hospital
stay, such as hemoglobin, triglycerides, glucose, and calcium. As suggested by
doctors, we pick 12 blood analytes variables which are important to
cardiovascular. Each variable has a normal range provided by doctors. Knowing variable
transitions in advance can alarm doctors to take actions before the abnormal
occurs, in order to reduce the risk of diseases.
      </p>
      <p>As a common issue of EHR, these datasets are irregularly sampled and sparse,
so that data preprocessing is needed. For each person, we remove those visits
without any monitored variables recorded, and remove patients with less than
three visits. We use simple imputation to ll missing variables. For the SOF
data, we ll the missing variables with the values in the previous visit. For the
BloodTest data, we impute missing sequences (where a single variable is missing
entirely) with a clinical normal value. The used datasets with statistics is shown
in Table 1.</p>
      <p>For each patient, we want to predict the diagnosis results of each visit based
on his/her previous records. To validate the performance of the proposed models</p>
      <p>Dataset SOF BloodTests
Number of patients 5,318 2,055
Number of visits 22,313 18,758
Average number of visits per patient 4.19 9.13
Number of normal claims 25,145 221,642
Number of low abnormal claims 55,399 17,407
Number of high abnormal claims 31,021 79,837
Total number of features 42 17</p>
      <p>Number of monitored diagnoses 5 17
in this diagnosis prediction task, we conduct experiments on two categories of
methods: baselines and RNN-based models.</p>
      <p>We set up two kinds of baselines. The rst baseline is to use the median
value of each monitored variable from V1 to Vt to predict Vt+1 for continuous
variables. This is based on a heuristic assumption that the most frequent state
is more likely to occur. For each patient, we use his/her most popular health
status as the current status, regardless of time variations. The second baseline
is a multi-task logistic regression (LR). To predict information at Vt+1, we feed
the health records at Vt to a logistic regression model with multiple softmax
classi ers. This can be viewed as a simpli ed model of Figure 1 without using
RNNs and attention mechanism to learn latent states. This model only considers
the e ect from the previous one time step, rather than long time history.
Diagnosis Prediction Table 2 shows the accuracy of the proposed approaches in
comparison with baselines on the two datasets. For each patient in the testing
set, we predict the health conditions for the subsequent visits using his/her
historical health records. For the SOF dataset, we predict the probability of BMD
states of normal, osteopenia and osteoporosis for di erent measurements such as
hip and femoral neck. For the BloodTest dataset, we predict the probability of
each blood analyte falling into normal, low abnormal and high abnormal. The
results are averaged over 5 random trials of 5-fold cross validation. Avg.#
Correct represents the average number of correctly predicted claims of 5 random
trials. Accuracy represents the ratio between correctly predicted claims and
total number of claims to be predicted. For the two datasets, RNNl, RNNg and
RNNc can clearly outperform plain RNN. Since the prediction of RNN mostly
depends on recent visits, it may not memorize all the past information. Through
attention-mechanism, RNNl, RNNg and RNNc can fully take all the previous
visit information into consideration, assign di erent attention scores for past
visits, and achieve better performance compared to RNN.</p>
      <p>Visit Interpretation The attention mechanism can be used to understand the
importance of historical visits to the current visit. As an example, here we
analyze the concatenation-based attention mechanism on the SOF dataset. Figure 2
shows a case study for predicting the diagnoses in the sixth visit through the
previous ve visits.</p>
      <p>For chronic diseases, the last visit is often the most important since patients'
health conditions change slowly. As in the gure, for the rst, fourth and fth
patients, the importance of visit increases with time going on. However, this is
not always the case due to the complexity of disease progression and impact
from risk factors. Table 3 shows the variation of bone mineral density (BMD)
diagnoses and attention scores of di erent visits of the second patient. In each
visit, there are ve di erent BMD diagnoses, and the values in the table indicate
the severity of bone density loss. Although V4 and V5 are closer to V6 in terms
of time, V2 and V3 have the same condition as V6. Thus health records of V2
and V3 are more important to V6. We can see that the attention mechanism
correctly assigns larger weights to V2 and V3. As for the BloodTest dataset, using
attention mechanism to memorize all the past information is also important. An
abnormal blood analyte can temporarily turn into normality via medicine, but it
may fall back after some time. Therefore, interpreting visit importance through
the attention mechanism can help to better monitor disease progression.</p>
      <p>In diagnosis prediction, making decisions using very recent record is usually
not enough, and it is important to lookup long term health information. To
understand the relationship between the length of patient medical history and
the prediction performance, we select 1,000 patients from the BloodTest dataset
with more than seven visits. Table 4 shows the accuracy of RNNl in predicting
the diagnoses from V2 to V7. We can see that with the number of visit increasing,
the performance can often improve. We believe that it is due to the fact that
RNN is able to learn better estimates of patient information as it memorizes
longer health records.
Acknowledgement This work was supported in part by NSF IIS-1218393 and
IIS-1514204, and by SISTABENE POR project as PIHGIS POR project.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Nguyen</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wickramasinghe</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venkatesh</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Deepr</surname>
          </string-name>
          :
          <article-title>A convolutional net for medical records</article-title>
          .
          <source>IEEE Journal of Biomedical and Health Informatics</source>
          .
          <source>2016 Dec</source>
          <volume>1</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Qiuling</given-names>
            <surname>Suo</surname>
          </string-name>
          , Fenglong Ma, Giovanni Canino,
          <string-name>
            <given-names>Jing</given-names>
            <surname>Gao</surname>
          </string-name>
          , Aidong Zhang, Pierangelo Veltri,
          <article-title>Agostino Gnasso A Multi-task Framework for Monitoring Health Conditions via Attention-based Recurrent Neural Networks</article-title>
          .
          <source>AMIA</source>
          <year>2017</year>
          , American Medical Informatics Association Annual Symposium, Washington, DC, November 4-
          <issue>8</issue>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cheng</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <article-title>Risk prediction with electronic health records: A deep learning approach</article-title>
          .
          <source>In Proceedings of the 2016 SIAM International Conference on Data Mining 2016 Jun</source>
          <volume>30</volume>
          (pp.
          <fpage>432</fpage>
          -
          <lpage>440</lpage>
          ).
          <source>Society for Industrial and Applied Mathematics.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ma</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meng</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiao</surname>
            <given-names>H</given-names>
          </string-name>
          , et al.
          <article-title>Unsupervised Discovery of Drug Side-E ects from Heterogeneous Data Sources</article-title>
          .
          <source>In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Wang</surname>
            <given-names>X</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sontag</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>F</given-names>
          </string-name>
          .
          <article-title>Unsupervised learning of disease progression models</article-title>
          .
          <source>In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining 2014 Aug</source>
          <volume>24</volume>
          (pp.
          <fpage>85</fpage>
          -
          <lpage>94</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Zhou</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narayan</surname>
            <given-names>VA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ye</surname>
            <given-names>J.</given-names>
          </string-name>
          and
          <article-title>Alzheimer's Disease Neuroimaging Initiative</article-title>
          .
          <article-title>Modeling disease progression via multi-task learning</article-title>
          .
          <source>NeuroImage. 2013 Sep</source>
          <volume>30</volume>
          ;
          <fpage>78</fpage>
          :
          <fpage>233</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Henriques</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antunes</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madeira</surname>
            <given-names>SC</given-names>
          </string-name>
          .
          <article-title>Generative modeling of repositories of health records for predictive tasks</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          .
          <source>2015 Jul</source>
          <volume>1</volume>
          ;
          <issue>29</issue>
          (
          <issue>4</issue>
          ):
          <fpage>999</fpage>
          -
          <lpage>1032</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Li</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>X</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramanathan</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <article-title>Zhang A. Prediction and informative risk factor selection of bone diseases</article-title>
          .
          <source>IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)</source>
          .
          <source>2015 Jan</source>
          <volume>1</volume>
          ;
          <issue>12</issue>
          (
          <issue>1</issue>
          ):
          <fpage>79</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Suo</surname>
            <given-names>Q</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xue</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Risk Factor Analysis Based on Deep Learning Models</article-title>
          .
          <source>In Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics 2016 Oct</source>
          <volume>2</volume>
          (pp.
          <fpage>394</fpage>
          -
          <lpage>403</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Che</surname>
            <given-names>Z</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kale</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahadori</surname>
            <given-names>MT</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Deep computational phenotyping</article-title>
          .
          <source>In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2015 Aug</source>
          <volume>10</volume>
          (pp.
          <fpage>507</fpage>
          -
          <lpage>516</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lipton</surname>
            <given-names>ZC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kale</surname>
            <given-names>DC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elkan</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wetzell R</surname>
          </string-name>
          .
          <article-title>Learning to diagnose with LSTM recurrent neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1511.03677. 2015 Nov</source>
          <volume>11</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Choi</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahadori</surname>
            <given-names>MT</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <article-title>Doctor ai: Predicting clinical events via recurrent neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1511.05942. 2015 Nov</source>
          <volume>18</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Choi</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahadori</surname>
            <given-names>MT</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kulas</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuetz</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stewart</surname>
            <given-names>W. RETAIN</given-names>
          </string-name>
          :
          <article-title>An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          <year>2016</year>
          (pp.
          <fpage>3504</fpage>
          -
          <lpage>3512</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Choi</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahadori</surname>
            <given-names>MT</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stewart</surname>
            <given-names>WF</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            <given-names>J. GRAM</given-names>
          </string-name>
          :
          <article-title>Graph-based Attention Model for Healthcare Representation Learning</article-title>
          .
          <source>arXiv preprint arXiv:1611.07012. 2016 Nov</source>
          <volume>21</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ma</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chitta</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            <given-names>J</given-names>
          </string-name>
          , et al.
          <article-title>Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks</article-title>
          .
          <source>In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Chung</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gulcehre</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Empirical evaluation of gated recurrent neural networks on sequence modeling</article-title>
          .
          <source>arXiv preprint arXiv:1412.3555. 2014 Dec</source>
          <volume>11</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zeiler</surname>
            <given-names>MD</given-names>
          </string-name>
          .
          <article-title>ADADELTA: an adaptive learning rate method</article-title>
          .
          <source>arXiv preprint arXiv:1212.5701. 2012 Dec</source>
          <volume>22</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Bonnick</surname>
            <given-names>SL</given-names>
          </string-name>
          .
          <article-title>Bone densitometry in clinical practice</article-title>
          . Totowa, NJ: Humana Press; 1998 Jun 24.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Luong</surname>
            <given-names>MT</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pham</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning CD</surname>
          </string-name>
          .
          <article-title>E ective Approaches to Attention-based Neural Machine Translation</article-title>
          .
          <source>In Empirical Methods in Natural Language Processing</source>
          .
          <year>2015</year>
          Aug.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Canino</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guzzi</surname>
            <given-names>PH</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tradigo</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veltri</surname>
            <given-names>P</given-names>
          </string-name>
          .
          <article-title>On the analysis of diseases and their related geographical data</article-title>
          .
          <source>IEEE journal of biomedical and health informatics</source>
          .
          <source>2015 Oct</source>
          <volume>30</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Bergstra</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Breuleux</surname>
            <given-names>O</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastien</surname>
            <given-names>F</given-names>
          </string-name>
          , et al.
          <article-title>Theano: A CPU and GPU math compiler in Python</article-title>
          .
          <source>In Proc. 9th Python in Science Conf 2010 Jun</source>
          (pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>