<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Interpretable Vital Sign Forecasting with Model Agnostic Attention Maps</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yuwei Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chen Dan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anubhav Bhatti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bingjie Shen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Divij Gupta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Suraj Parmar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>San Lee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>SpassMed Inc.</institution>
          ,
          <addr-line>Toronto, Ontario</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Sepsis is a leading cause of mortality in intensive care units (ICUs), representing a substantial medical challenge. The complexity of analyzing diverse vital signs to predict sepsis further aggravates this issue. While deep learning techniques have been advanced for early sepsis prediction, their 'black-box' nature obscures the internal logic, impairing interpretability in critical settings like ICUs. This paper introduces a framework that combines a deep learning model with an attention mechanism that highlights the critical time steps in the forecasting process, thus improving model interpretability and supporting clinical decision-making. We show that the attention mechanism could be adapted to various black box time series forecasting models such as N-HiTS and N-BEATS. Our method preserves the accuracy of conventional deep learning models while enhancing interpretability through attention-weight-generated heatmaps. We evaluated our model on the eICU-CRD dataset, focusing on forecasting vital signs for sepsis patients. We assessed its performance using mean squared error (MSE) and dynamic time warping (DTW) metrics. We explored the attention maps of N-HiTS and N-BEATS, examining the diferences in their performance and identifying crucial factors influencing vital sign forecasting.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Sepsis is a life-threatening condition that occurs when the immune system of the body responds
incorrectly to an infection and causes rapid organ dysfunction and failure [1]. A meta-analysis
conducted on articles published in PubMed and the Cochrane Database revealed that the average
30-day mortality rate for sepsis was 24.4%, and the average 90-day mortality rate was 32.2%
between 2009 and 2019 [2]. While sepsis has been acknowledged for a long time, its clinical
definition did not emerge until the late 20 ℎ century [3]. In 1991, a consensus conference
posited that sepsis arises from the individual’s inflammatory response to infection, marked
by systemic inflammatory response syndrome (SIRS), emphasizing the human response to
invading organisms. This syndrome is characterized by variations in temperature, heart rate
(HR), respiratory rate (RR), blood pressure (BP), and white blood cell (WBC) count [4]. In 2016,
the definition of sepsis was revised to multiple organ dysfunction syndrome (MODS) [ 5]. Systolic
blood pressure (SBP) and RR abnormalities indicate organ dysfunction [6]. Thus, creating precise
models for forecasting vital signs becomes essential in predicting sepsis [7]. Accurate vital sign
predictions can promptly aid clinicians in identifying and intervening in sepsis cases, potentially
saving lives and improving the intensive care unit (ICU) patient outcomes.</p>
      <p>
        The growth in explainable artificial intelligence (XAI) research is mainly attributed to the rapid
growth in the popularity of deep learning with widespread healthcare applications. However,
most models developed using these technologies are considered ’black-boxes’ by experts due to
their intricate, non-linear structures that are challenging for non-experts to understand [8]. The
proposed research contributes to the following two aspects: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Adding an attention mechanism
to show the relationship between input time steps and forecasted results; (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Providing analysis
and interpretation of the findings derived from the attention map.
      </p>
      <sec id="sec-1-1">
        <title>1.1. Literature Review</title>
        <p>In recent years, the significance of model explainability has been widely recognized, leading to
the integration of an increasing number of explainable methods into data-driven models [9].
Prior research has demonstrated the development of deep learning neural networks
incorporating attention mechanisms, resulting in interpretable models with strong performance within the
medical field. Kaji et al. demonstrated that integrating an attention mechanism into the LSTM
network, trained with Electronic Health Record (EHR) data, not only improves the daily sepsis
onset prediction’s Area Under the Receiver Operating Characteristic Curve (AUROC) score
to 0.876 but also highlights critical time points for prediction [10]. An attention-based gated
recurrent unit (GRU) was developed by Shickel et al. Self-attention was applied to focus on
significant time steps when predicting in-hospital mortality [ 11]. Choi et al. proposed reverse
time attention (RETAIN), processing EHR data in reverse order, achieving an Area Under the
ROC Curve (AUC) of 0.87 in heart failure prediction. It adds interpretability using a two-level
neural attention model [12].</p>
        <p>While previous XAI research integrating deep learning models with interpretable modules has
excelled in time series classification, attention mechanisms in interpretable time series
forecasting remain underexplored. Our approach aims to explore attention mechanism interpretability
in time series forecasting.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Method</title>
      <p>In this section, we begin by detailing the information of the eICU Collaborative Research
Database (eICU-CRD) [13], followed by an outline of the composition of our input data.
Subsequently, we dive into the specifics of the attention mechanism and the frameworks of our
forecasting models.</p>
      <sec id="sec-2-1">
        <title>2.1. Dataset Description and Data Preprocessing</title>
        <p>The eICU-CRD data is a publicly accessible repository containing data from over 200,000 ICU
admissions across 208 hospitals in the United States between 2014 and 2015 [13]. This
comprehensive dataset comprises diverse patient information, including demographics, diagnoses,
medications, and laboratory results. Our research focuses on the ’diagnosis’ and ’vitalAperiodic’
tables, from which we extract dynamic physiological data such as temperature, HR, and BP
at 5-minute intervals. The core of our study revolves around forecasting two crucial dynamic
variables: HR and mean blood pressure (MBP), derived from SBP and diastolic blood pressure
(DBP) measurements. Following the works of [14, 15], we create one or more groups within a
9-hour time window for each patient to predict vital signs for the subsequent 3 hours based on
the preceding 6 hours of data. Data preprocessing involves imputing missing values, filtering
outliers, and scaling using domain-specific knowledge. Clinically reasonable boundaries for
each critical vital sign were set using this specialized knowledge: HR ranged from 0 to 300 bpm,
MBP from 0 to 190 mmHg, and RR from 0 to 100 bpm [16].</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Experiment Setup</title>
        <p>The dataset is divided into training, validation, and test sets in an 80:10:10 ratio. Within these
intervals, the initial 6 hours consist of 72 time steps, while the subsequent 3 hours encompass
36 time points. The forecasting model integrates either HR alone or HR combined with RR as
covariates to forecast MBP or conversely. Training of the model occurs over the first 72 time
steps, followed by predictions for the remaining 36 time steps. Ultimately, model performance is
assessed through Mean Squared Error (MSE) and Dynamic Time Warping (DTW) evaluations.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Deep Learning Forecasting Model</title>
        <p>Based on the forecasting performance of the N-HiTS and N-BEATS model [17, 18, 19], as well as
the idea proposed by Pantiskas et al. [20] we aim to address their inherent lack of interpretability
and understand why the model has diferent performances. To achieve this, we implemented
an attention mechanism that can be applied to the N-HiTS and N-BEATS architecture, which
may also be applied to other black-box deep learning models. The N-HiTS and N-BEATS model
consists of a series of stacks, each responsible for learning residual values from the preceding
stack.</p>
        <p>Within each stack are blocks comprising several fully connected layers, which generate
backward ( ) and forward (  ) expansion coeficients according to Equation 1 , where ℎ,4
represents the output of the fourth fully connected layer in the basic block, and  denotes
Here, , and , represent forecast and backcast basis vectors. Notably, for N-HiTS, it has a

max-pooling layer (Equation 3) before passing the values to the fully connected layer, which is
applied to enable multi-rate signal sampling for the ℎ basic block [17]:</p>
        <p>(−):, = MaxPool (− :,, ) ,
where  is the kernel size of the MaxPool layer.</p>
        <p>Subsequently, inspired by Pantiskas et al. [20] idea, we introduced an attention mechanism
to explore the relationship between learned information and original inputs after obtaining the
residuals from the final stack. The forecasted result is utilized to construct the Query (Q), while
the original input forms the basis for the Value (V) and Key (K) [20]. The resulting output is
computed as follows:

*  =  ·  =  ( √</p>
        <p>)
* 1*  = * 1*  · * *  + * 1*</p>
        <p>* 1*  = * 1*  · * * 
and  is the number of input multi time seires,  is the forecasting horizon length, and 
is the history input horizon. As shown in Figure 1, after the attention layer, a normalizer is
applied, and skip connections are employed to mitigate the vanishing gradient issue. Finally, a
fully connected layer is utilized to generate the forecasted results.
a linear projection layer [17]:</p>
        <p>= Linear(ℎ,4),   = Linear (ℎ,4),</p>
        <p>Additionally, each block includes backward () and forward ( ) basis layers that produce
backcast and forecast outputs as per Equation 2, where ̂︀ and ̂︀ denote forecast and backcast
outputs, respectively:</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Interpretable Attention Map</title>
        <p>To illustrate the attention map for a specific item, we computed [20]:</p>
        <p>* *  = *  · (* * )</p>
        <p>Here, * *  denotes the attention map, where *  represents the ℎ series in the
multivariate time series. Each row  in *  signifies the relationship between the ℎ forecasted
data point and the historical input of length .</p>
        <p>
          This computation enables the visualization of how the model attends to diferent historical
inputs when forecasting specific data points across the multivariate time series.
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
(7)
        </p>
        <p>Models
Persistence [19]</p>
        <p>N-HiTS [19]
N-HiTS [19]
N-BEATS [19]
N-BEATS [19]</p>
        <p>TFT [19]</p>
        <p>TFT [19]
N-BEATS with Attention
N-BEATS with Attention
N-HiTS with Attention
N-HiTS with Attention</p>
        <p>Cov.</p>
        <p>W C
W/o C
W C
W/o C
W C
W/o C
W C
W/o C
W C
W/o C</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results and Discussion</title>
      <sec id="sec-3-1">
        <title>3.1. Forecasting Benchmarks</title>
        <p>Here, table 1 shows the results using diferent deep learning time series forecasting models.
We compared N-HiTS [17], N-BEATS [18], Temporal Fusion Transformer (TFT) [21], which are
computed by Bhatti et al. [19] using MSE and DTW as the evaluation metrics.</p>
        <p>The results indicate that the N-HiTS model, both with and without an attention mechanism,
consistently outperforms other models across MBP and HR predictions when considering MSE.
Similarly, the N-BEATS model also performs well both with and without attention mechanisms.</p>
        <p>Furthermore, the TFT model demonstrates competitive performance, especially when
considering MSE. But in the previous paper by Bhatti et al. [19], the forecasting result of TFT is
relatively smooth and doesn’t show fluctuations.</p>
        <p>In conclusion, the N-HiTS model, when augmented with an attention mechanism, emerges
as a robust choice for forecasting MBP and HR, showcasing its eficacy in capturing complex
temporal patterns. However, further exploration and experimentation are warranted to optimize
model performance, particularly regarding temporal alignment and covariate incorporation.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Interpretability Analysis</title>
        <p>In the heatmap provided (Fig 2a, Fig 2b), darker colors indicate higher attention weights at
specific time points, which correspondingly have a greater influence on prediction outcomes.
Conversely, lighter colors suggest a lesser impact. The "N-HiTS + Attention" in Fig 2a
demonstrates that areas after the 20ℎ time point exhibit darker shades compared to earlier sections.
Notably, significant changes or peaks at certain points (like the 35 ℎ, 54ℎ, and 63 points)
increasingly darken, highlighting their crucial role in shaping the prediction. This pattern
suggests that N-HiTS places a stronger emphasis on data after the 20ℎ points, efectively capturing
(a) N-HiTS Attention distribution
(b) N-BEATS Attention distribution.
both data fluctuations and overall trends. As a result, the predictions closely align with the
actual data and accurately reflect downward trends.</p>
        <p>On the other hand, the predictions from N-BEATS do not closely follow the downward trend
of the actual data and display considerable fluctuation. This model’s attention map reveals that
N-BEATS in Fig 2b assigns larger weights to almost every rise and fall (such as at the 3, 10ℎ,
and 29ℎ points), but without considering if it’s worth to focus on the trend, which contributes
to less efective information capture. Moreover, it appears that N-BEATS prioritizes data from
the initial 1-2 hours more than N-BEATS, contributing to less stable prediction outcomes.</p>
        <p>Both models indicate that the initial 1-3 hours are crucial for prediction, suggesting that
medical staf should focus on interventions during this period. Significant changes occurring
up to three hours prior also substantially impact the predictions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this paper, we presented an interpretable time series forecasting algorithm that combines
black-box deep learning models (N-HiTS &amp; NBEATS) with a general attention mechanism. This
approach allows us to observe how the deep learning algorithm assigns importance to inputs
while transparently generating each step of its output. Upon applying this advanced architecture
to the eICU-CRD dataset, our findings demonstrate that the attention mechanism can enhance
interpretability in deep learning time series forecasting models with minimal reduction or even
no change in accuracy. By visualizing attention distributions, clinicians can identify which
vital signs and historical data points are most influential in predicting sepsis. Furthermore, our
model-agnostic attention mechanism is applicable to various deep learning forecasting models.
R. Bellomo, G. R. Bernard, J.-D. Chiche, C. M. Coopersmith, et al., The third international
consensus definitions for sepsis and septic shock (sepsis-3), Jama 315 (2016) 801–810.
[7] B. Behinaein, A. Bhatti, D. Rodenburg, P. Hungler, A. Etemad, A transformer architecture
for stress detection from ecg, in: Proceedings of the 2021 ACM International Symposium
on Wearable Computers, 2021, pp. 132–134.
[8] G. Vilone, L. Longo, Notions of explainability and evaluation approaches for explainable
artificial intelligence, Information Fusion 76 (2021) 89–106.
[9] L. Longo, R. Goebel, F. Lecue, P. Kieseberg, A. Holzinger, Explainable artificial intelligence:
Concepts, applications, research challenges and visions, in: International cross-domain
conference for machine learning and knowledge extraction, Springer, 2020, pp. 1–16.
[10] D. A. Kaji, J. R. Zech, J. S. Kim, S. K. Cho, N. S. Dangayach, A. B. Costa, E. K. Oermann, An
attention based deep learning model of clinical events in the intensive care unit, PloS one
14 (2019) e0211057.
[11] B. Shickel, T. J. Loftus, L. Adhikari, T. Ozrazgat-Baslanti, A. Bihorac, P. Rashidi, Deepsofa: a
continuous acuity score for critically ill patients using clinically interpretable deep learning,
Scientific reports 9 (2019) 1879.
[12] E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, W. Stewart, Retain: An interpretable
predictive model for healthcare using reverse time attention mechanism, Advances in
neural information processing systems 29 (2016).
[13] T. J. Pollard, A. E. Johnson, J. D. Rafa, L. A. Celi, R. G. Mark, O. Badawi, The eicu
collaborative research database, a freely available multi-center database for critical care
research, Scientific data 5 (2018) 1–13.
[14] A. Bhatti, N. Thangavelu, M. Hassan, C. Kim, S. Lee, Y. Kim, J. Y. Kim, Interpreting
forecasted vital signs using n-beats in sepsis patients, arXiv preprint arXiv:2306.14016
(2023).
[15] H. M. O’Halloran, K. Kwong, R. A. Veldhoen, D. M. Maslove, Characterizing the patients,
hospitals, and data quality of the eicu collaborative research database, Critical Care
Medicine 48 (2020) 1737–1743.
[16] S. Parmar, T. Shan, S. Lee, Y. Kim, J. Y. Kim, Extending machine learning-based early
sepsis detection to diferent demographics, in: 2024 IEEE First International Conference
on Artificial Intelligence for Medicine, Health and Care (AIMHC), IEEE, 2024, pp. 70–71.
[17] C. Challu, K. G. Olivares, B. N. Oreshkin, F. G. Ramirez, M. M. Canseco, A. Dubrawski,
Nhits: Neural hierarchical interpolation for time series forecasting, in: Proceedings of the
AAAI Conference on Artificial Intelligence, volume 37, 2023, pp. 6989–6997.
[18] B. N. Oreshkin, D. Carpov, N. Chapados, Y. Bengio, N-beats: Neural basis expansion
analysis for interpretable time series forecasting, 2020. arXiv:1905.10437.
[19] A. Bhatti, Y. Liu, C. Dan, B. Shen, S. Lee, Y. Kim, J. Y. Kim, Vital sign forecasting for sepsis
patients in icus, arXiv preprint arXiv:2311.04770 (2023).
[20] L. Pantiskas, K. Verstoep, H. Bal, Interpretable multivariate time series forecasting with
temporal attention convolutional neural networks, in: 2020 IEEE Symposium Series on
Computational Intelligence (SSCI), IEEE, 2020, pp. 1687–1694.
[21] B. Lim, S. Ö. Arık, N. Loef, T. Pfister, Temporal fusion transformers for interpretable
multi-horizon time series forecasting, International Journal of Forecasting 37 (2021)
1748–1764.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gül</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Arslantaş</surname>
          </string-name>
          , İ. Cinel,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , Changing definitions of sepsis,
          <source>Turkish journal of anaesthesiology and reanimation 45</source>
          (
          <year>2017</year>
          )
          <fpage>129</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gerlach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vogelmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Preissing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stiefel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <article-title>Mortality in sepsis and septic shock in europe, north america and australia between 2009 and 2019-results from a systematic review and meta-analysis</article-title>
          ,
          <source>Critical Care</source>
          <volume>24</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gotts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Matthay</surname>
          </string-name>
          ,
          <article-title>Sepsis: pathophysiology and clinical management</article-title>
          ,
          <source>Bmj</source>
          <volume>353</volume>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.-L.</given-names>
            <surname>Vincent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Opal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Marshall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. J.</given-names>
            <surname>Tracey</surname>
          </string-name>
          ,
          <article-title>Sepsis definitions: time for change</article-title>
          ,
          <source>The Lancet</source>
          <volume>381</volume>
          (
          <year>2013</year>
          )
          <fpage>774</fpage>
          -
          <lpage>775</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cheng</surname>
          </string-name>
          , S. T. Abrams,
          <string-name>
            <given-names>J.</given-names>
            <surname>Toh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-H.</given-names>
            <surname>Toh</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. Wang,</surname>
          </string-name>
          <article-title>The critical roles and mechanisms of immune cell death in sepsis</article-title>
          ,
          <source>Frontiers in immunology 11</source>
          (
          <year>2020</year>
          )
          <year>1918</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Singer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Deutschman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Seymour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shankar-Hari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Annane</surname>
          </string-name>
          , M. Bauer,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>