<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Making for Sepsis Detection using Reinforcement Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lakshita Singh</string-name>
          <email>lakshitasingh1806@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lakshay Kamra</string-name>
          <email>kaylakshay@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muskan Agarwal</string-name>
          <email>muskanagarwal47@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anjana Gupta</string-name>
          <email>anjanagupta@dce.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>H.C. Taneja</string-name>
          <email>hctaneja@dce.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Applied Mathematics, Delhi Technological University</institution>
          ,
          <addr-line>Delhi</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dueling Double Deep Q Learning Networks (DDDQN)</institution>
          ,
          <addr-line>Sepsis, MIMIC III, Deep-Q</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>When the body's defense against an infection damages its own tissues and causes organ malfunction, it develops sepsis, a catastrophic medical illness. Administering intravenous fluids and antibiotics promptly can increase the patient's chances of survival. In order to determine the best treatment plans for septic patients, this study investigates the application of deep reinforcement learning and continuous state-space clinically comprehensible policies that could assist doctors in intensive care in empowering medical professionals to make informed decisions that ultimately enhance the prospects of patient survival.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>2 Corresponding Author</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Sepsis is a clinical syndrome caused by the invasion of bacteria and/or toxins that triggers a harmful
reaction in the body, leading to severe morbidity and mortality [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Failure to detect and manage this
condition early can result in organ failure, septic shock, and death. To improve patient outcomes, it is
crucial to detect sepsis as soon as possible, as each hour of delayed treatment after hypotension increases
the risk of dying from septic shock by 7.6%. Recent studies have shown that administering a 3-hour
bundle of care for sepsis patients, including a blood culture, broad-spectrum antibiotics, and lactate
measurement, can significantly reduce in-hospital mortality [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Therefore, timely and aggressive
treatment is essential in managing sepsis. Even experienced professionals face difficulties in diagnosing
sepsis early and accurately, as its symptoms can be easily confused with those of other medical
conditions. However, the electronic health record (EHR) already captures data that could aid in
predicting sepsis, despite the challenges that come with the diagnosis of this condition [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Hence, early
warning scores that rely on data from the EHR hold great promise in detecting early clinical
deterioration in real-time. The National Early Warning Score (NEWS) was developed, validated, and
implemented by the Royal College of Physicians to detect patients who are acutely decompensating.
NEWS employs six physiological variables and compares them to their expected ranges to produce a
single composite score. In addition to antibiotics, intravenous fluids and vasopressors are used in severe
cases. However, patients' mortality rates vary considerably depending on the fluid and vasopressor
therapy methods used, highlighting the importance of making the right choices. In the realm of sepsis
management, the absence of tailored real-time decision support has posed significant challenges for
healthcare providers despite international efforts to provide general guidelines. In response to this
pressing issue, we present a pioneering data-driven approach that leverages advanced deep
reinforcement learning (RL) algorithms to optimize sepsis treatment strategies. This study builds upon
previous research and seeks to enhance the likelihood of septic patients' survival in the ICU by utilizing
continuous- state space models and shaped reward functions to identify the most effective course of
action. Our findings represent a crucial contribution to the field of sepsis treatment, as they pave the
      </p>
      <p>2023 Copyright for this paper by its authors.
CEUR</p>
      <p>
        ceur-ws.org
way for personalized and real-time decision-making strategies that have the potential to transform
patient outcomes and reduce mortality rates. We chose RL over supervised learning because there is a
lack of consensus in the medical literature on what constitutes an effective treatment approach. It is
worth noting that RL algorithms enable us to derive optimal strategies from training samples that do
not correspond to optimal behavior. Our primary emphasis lies in the development of continuous
statespace modeling, a sophisticated methodology that utilizes a patient's physiological data from the ICU
to represent their current physiological state as a continuous vector at any given instant. We use
DeepQ Learning to determine the appropriate responses. Our study presents remarkable contributions in the
realm of patient care, including the generation of treatment plans that have the potential to augment
patient outcomes and significantly decrease patient mortality rates. We achieved this by implementing
advanced deep reinforcement learning models that incorporate continuous-state spaces and precisely
designed reward functions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. BACKGROUND &amp; MOTIVATION</title>
      <p>
        The initial diagnosis of sepsis poses a daunting challenge owing to its inconspicuous presentation,
characterized by clinical manifestations resembling those of less severe ailments [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. While
international initiatives try to offer generic recommendations for managing sepsis, doctors at the
bedside still lack effective technologies to offer tailored real-time decision support. Developing and
validating early warning scores to forecast clinical deterioration and other related outcomes has been
the subject of a significant amount of research. For instance, two of the most popular scores used to
gauge overall clinical deterioration are the MEWS score and NEWS score. Additionally, the systemic
inflammatory response syndrome (SIRS) score (Fig 1) was a component of the initial clinical definition
of sepsis, however more recently, other sepsis-specific scores have gained popularity, including SOFA
and qSOFA. The Rothman Index, a more complex regression-based method, is also often used to
identify general deterioration. In numerous related investigations, multitask Gaussian processes were
also used to simulate multivariate physiological time series. Several studies utilized a model that was
comparable to ours but placed more emphasis on forecasting vital signs to predict clinical instability.
2.1.
      </p>
    </sec>
    <sec id="sec-4">
      <title>SERA Algorithm</title>
      <p>The SERA algorithm is a risk assessment tool designed to identify patients who may be at risk for
sepsis. The algorithm uses both structured and unstructured data from patient consultations to make a
prediction. The algorithm is designed to operate on a patient-by-patient basis, with each consultation
serving as an analytical unit. It is composed of two interrelated algorithms: the diagnosis algorithm and
the early prediction algorithm. When a patient is examined, the diagnosis algorithm determines if the
patient is presently suffering from sepsis.</p>
      <p>On the other hand, the early prediction algorithm ascertains whether sepsis is likely to manifest within
the next four hours if the patient does not already have the condition. The algorithm incorporates both
structured and unstructured data in its processing. While structured data entails vital signs, investigation
results, and treatment details, the unstructured data encompasses clinical notes. Developed to operate
in a typical clinical setting, where physicians utilize both types of data to analyze and diagnose patients,
the algorithm's construction procedures are illustrated in the elaborate flow diagram presented in Fig 2.
Supervised learning, particularly in medical applications, has been hindered by the challenge of
frequently missing labels per time point in time series datasets. This issue also affects early diagnosis
of sepsis. Prior research has addressed the problem of defining resolved sepsis labels by utilizing
adhoc approaches. These studies have relied on readily available ad-hoc criteria to predict the onset of
sepsis and have used a global time series label, such as an ICD illness number designed for billing
purposes, to define resolved sepsis labels.</p>
    </sec>
    <sec id="sec-5">
      <title>2.2 Algorithms for the Early Detection of Sepsis</title>
      <p>Over the past 10 years, several data-driven approaches for detecting sepsis in the ICU have been
proposed. Numerous methods compare only certain clinical scores, including SIRS, NEWS, or MEWS.
None of these ratings, meanwhile, are meant to serve as precise, ongoing sepsis risk scores. Doctors
now view the SIRS criteria as being non- specific and out of date for the definition of sepsis. A targeted
real-time warning score (TREWScore) was presented as an alternative to these scores to predict septic
shock, which is a common consequence after sepsis. Notably, even though numerous machine learning
techniques have outperformed general-purpose or oversimplified clinical schemes, almost no articles
have actually made a direct comparison to other machine learning techniques in the literature. It has
been demonstrated that using LSTMs is better than using the InSight model. Modern technology Sepsis
prevalence numbers range from 6.6% to 21.4%, and real-world datasets with these prevalence values
are typically used to build sepsis detection techniques.</p>
    </sec>
    <sec id="sec-6">
      <title>2.3 Reinforcement Learning in Medicine</title>
      <p>Reinforcement Learning, an intricate framework for optimizing sequential decision-making, has
emerged as a game-changing paradigm. In this sophisticated framework, a Markov Decision Process
(MDP), which constitutes a 5-tuple (S, A, r, γ, p), serves as the foundation for its seamless operation.
Different applications in the field of medicine have used reinforcement learning. References and
surveys offer thorough analyses of applications in critical care and healthcare, respectively. Doctors
employed dynamic programming-based approaches to construct the best treatment plans for sepsis
using a discrete state representation was crafted by leveraging a 25-dimensional discrete action space
and clustering patient physiological readouts. Others have thought about partial observability and
continuous state representations. Our suggested decision support system makes decisions, based on a
preference score (Fig 3).</p>
    </sec>
    <sec id="sec-7">
      <title>2.3.1 Gaussian Process Adapters</title>
      <p>
        It was demonstrated that maximizing a time series end- to-end GP [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] imputation using the gradients of
a subsequent classifier outperforms individually improving the classifier and the GP. This technique,
also known as GP adapters, is not just for imputed missing data. GP adapters have recently been shown
to be a suitable framework for handling the 13 unevenly spaced time series in early sepsis detection.
They specifically supported earlier findings that GP adapters outperform traditional GP imputation
approaches in time series classification, which call on a separate optimization step unrelated to the
classification objective
      </p>
    </sec>
    <sec id="sec-8">
      <title>2.3.2 Markov Decision Process</title>
      <p>
        Typically, mathematical models for sequential decision problems are formulated as Markov decision
processes (MDPs), which consist of a tuple M = (S, A, P, r). In this context, S refers to the possible
states of the system, A represents the feasible actions that can be taken, P represents the probability
distribution for the next state given the current state and action, and r denotes the reward function that
assigns a scalar reward to each state-action pair [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] .
      </p>
      <p>Above Fig 3 shows the use of Markov Decision Process (MDP) to model time-varying state spaces in
reinforcement learning. Amidst the realm of artificial intelligence, the agent observes the state of the
environment at each timestep denoted by s(t), and executes an action a(t) followed by earning a reward
r(t), leading to a transition to a new state s(t+1). The ultimate goal of the agent lies in the maximization
of the projected discounted future reward, popularly known as the "return", by choosing the most
suitable activities. Previous studies have applied reinforcement learning in healthcare contexts,
including treating septic patients using models with discretized state and action-spaces. In this study,
we used value-iteration procedures to discover an ideal policy, determined by contrasting the Q-values
obtained under it with those of a doctor's 14 policy. We improved upon this by utilizing continuous
state-space models, deep reinforcement learning, and a clinically oriented reward function. We also
evaluated how the learned policies serve patients of varying severity levels.</p>
      <p>
        We have used the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC-III v1.4) database
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to conduct this research. It is a freely available dataset which provides us with comprehensive data.
This data is generally taken every four hours and the data is recorded, when several data points are
present. It yields a feature vector of dimensions 48x1 at a given time ‘t’ and the state at this time is
called St. The data provided by MIMIC-III is focused on patients with sepsis-3 symptoms. Since, we
must apply multiple queries on the data set, so to filter out relevant data, it is first pre-processed using
postgreSQL [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and relations and tables are built [reference], further the processed data is analyzed to
get a MIMIC TABLE, which is directly used as the data source for out RL Algorithm.
      </p>
    </sec>
    <sec id="sec-9">
      <title>3. DATASET</title>
      <p>
        We have used the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC-III v1.4) database
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to conduct this research. It is a freely available dataset which provides us with comprehensive data.
This data is generally taken every four hours and the data is recorded, when several data points are
present. It yields a feature vector of dimensions 48x1 at a given time ‘t’ and the state at this time is
called St. The data provided by MIMIC-III is focused on patients with sepsis-3 symptoms. Since, we
must apply multiple queries on the data set, so to filter out relevant data, it is first pre-processed using
postgreSQL [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and relations and tables are built [reference], further the processed data is analyzed to
get a MIMIC TABLE, which is directly used as the data source for out RL Algorithm.
      </p>
    </sec>
    <sec id="sec-10">
      <title>4. HARDWARE/SOFTWARE REQUIREMENTS</title>
      <p>Our project on Data Analytics was implemented on a Windows operating system with Jupyter
Notebook. The experiments have partly been conducted with Ryzen 7 CPU, 16GB RAM, and Nvidia
RTX 3050 GPU with 4GB Memory. These requirements were enough to run python and any desired
ML algorithm.</p>
      <p>The project majorly uses Python. Some important libraries are Matplotlib, NumPy and Pandas and is
carried in the Jupyter notebook.</p>
      <p>• Python: It is designed to be easy to read and write, with a clean syntax and an emphasis on
readability and simplicity. Its popularity stems from its ease of use, powerful standard library,
and large number of third-party modules and packages. Python's community development
model and open-source license have also contributed to its widespread adoption and continued
growth.
• NumPy: A distinguished Python package that caters to the realm of numerical analysis and
scientific computing. It endows an unparalleled N-dimensional array object and a vast array of
mathematical operations that can be performed effortlessly on these arrays. The versatility of
NumPy makes it a quintessential tool for researchers, scientists, and analysts across various
fields such as physics, engineering, economics, machine learning, and more.
• Matplotlib: A plotting library for Python that allows users to create high-quality, publishable
graphs and visualizations. It provides a range of visualization tools, from simple line charts to
3D charts.
• Pandas: A library for data manipulation and analysis. It provides data structures for efficiently
storing and querying large datasets, as well as powerful tools for data cleaning, aggregation,
and visualization. Pandas is widely used in data science, finance, and other fields dealing with
large amounts of data.
• Scikit-learn: A formidable machine learning library for Python, presents an array of powerful
tools beyond just model selection and evaluation. Its rich repertoire boasts of an exquisite set
of methods for classification, regression, clustering, and dimensionality reduction, all crafted
to elevate the art of machine learning to the next level.
• PostgreSQL: PostgreSQL is a powerful open-source relational database management system.</p>
      <p>It is widely used in web applications, data science, and other fields that require robust and
scalable data storage.
• TQDM: TQDM is a library for creating progress bars in Python. It is often used in long-running
processes, such as data processing or model training, to provide users with feedback on the
progress of an operation.</p>
    </sec>
    <sec id="sec-11">
      <title>5. RESEARCH METHODOLOGY</title>
    </sec>
    <sec id="sec-12">
      <title>5.1. Action Space</title>
      <p>
        We will work with a discrete action space for this research. The action space [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] defined is a 5x5
matrix which will cover maximum vasopressor dose and Intravenous fluids dose over a period of four
hours. The action space is defined such that it covers all the non-zero dosages of VP and IV fluids,
measured per/dosage and converted into an integer value by the concatenation of dosage, drug and the
time stamp. All the zero dosage entries will be represented by 0 bin value. Medical data and records are
very uncertain when it comes to finding the appropriate tuples for the action space, that’s why we are
going forward with the standard i.e. total IV fluid dosage and max Vasopressor dosage as our key tuples
for the action space.
      </p>
    </sec>
    <sec id="sec-13">
      <title>Reward Function</title>
      <p>
        We need a successfully working reward function [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to map each state-action pair with a numeric
value which will intrinsically define the value of that state, which will finally help our model to reach
conclusions and not just predictions. We basically measure the lactate levels defining the cell hypoxia
and SOFA score which gives a numeric value to measure organ failure in sepsis patients to define the
overall health of a sepsis-3 patient. These two measures are the key features of our reward function
where increase in SOFA score and lactate levels will result in a negative reward. For a terminal patient,
at the time of ending state, the state is rewarded positive if he survives, otherwise negative.
5.3.
      </p>
    </sec>
    <sec id="sec-14">
      <title>Model-Used</title>
      <p>
        The model used in this research is Dueling Double Deep Q Learning Networks (DDDQN) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which
is based on a variant of DQN can be seen in Fig 4.
      </p>
      <p>DDDQN minimizes the error between target and output. We use neural approximation of Q * (s,a)
which yields optimal value function. If we include θ in this function, we can easily calculate the output
of the networks, i.e. Q(s, a; θ).</p>
      <p>
        The desired output given by the model is Qtarget = r + γ*maxa`(s`, a`, θ) where we have sets of the
form &lt; s, a, r, s`&gt;. To minimize the expected loss between Qtarget and Qoutput we introduce a stochastic
batch gradient descent in your model. Moreover, since the target values are highly volatile, addition of
an extra network, which is dynamically upgraded, helps to improve the overall yield. The basic Deep
Q Networks are not very efficient due to the problem of overestimation, which is very persistent in
these networks, which generally leads to incorrect predictions and large error ranges. This problem of
overestimation is due to the presence of Max of Q value for the next state in the Q learning update
equation. This is solved through a better variant of DQN, i.e. double deep Q networks (DDQN), where
we calculate the Q value by a feed-forward pass on the main network rather than using the main network
directly for calculation of Q values. Now, to solve another problem, i.e. when we find optimal
treatments, we have to ignore the influence of the previous state if it has a positive reward and correct
action is to be taken at the present time stamp. For this we turn to Dueling deep Q network (DDDQN),
where the values of 19 tuples action and state given by Q(s,a) are divided in two parts namely,
estimation of advantage of a stage representing the quality of chosen action and estimation of flow of
value representing the quality of chosen state. This yields a fully formed Dueling Double-Deep Q
Network in Fig 5 having 2 hidden layers of size 128, combining the above ideas [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The training of
the model based on this methodology gives us the optimal state of a patient as, π ∗ (s) = arg maxa Q(s,a).
Start with the initialization of two Q networks, D3QN-A network defined as QA(s,a; θA) and D3QN-B
network defined as QB (s,a; θB ). Here all the parameters in the networks are defined by θA and θB.
Secondly, the aproach involves initializing the Experience Reply with (st,st+1,at,rt). st has two
components, the first being a comprehensive feature comprising a basic feature and salience map, and
the second being a historical experience vector that preserves previously used action indexes. Our
approach initializes a 20*13d vector to represent historical experience, with a maximum exploring step
of 20 and 13 action numbers.
The art of state representation lies in its ability to incorporate valuable historical experiences and
emulate the human decision-making process, thus facilitating informed decision-making in the present.
The feature extraction part generates s0 as the initial state. Subsequently, states are sent to the Agent,
which uses the ε-greedy algorithm to select the current cropping action, followed by execution of the
chosen cropping action and obtaining the cropped image. This process is repeated, and each one-step
cropping operation (st,st+1,at,rt) is recorded in the experience reply pool. The maximum number of
cropping steps is 20, and in the training process, N_episodes is set to 160,000, with a group of records
randomly selected for learning each time. See Fig 6 for the algorithm steps.
      </p>
    </sec>
    <sec id="sec-15">
      <title>6. RESULTS</title>
      <p>On a held-out test set which was accurately on 50 epochs, the y-axis of the graph depicts mortality rates,
which fluctuate based on the variance between recommended dosages dictated by the optimal policy
and those administered by healthcare providers, which serves as the return of action. This difference
was computed and correlated with whether the patient lived or passed away in the hospital for
each timestep as shown in Fig 7, enabling the computation of observed mortality. In Fig 8, With a 95%
confidence level, this bound would always be higher than the clinicians' guideline if enough models
were produced. The statistical safety of the novel artificial intelligence (AI) policy in question is a topic
of current discourse in theory which is maximized by this model selection method.</p>
    </sec>
    <sec id="sec-16">
      <title>7. CONCLUSION</title>
      <p>Employing deep learning in this research, the problem of treating sepsis patients is being addressed in
a practical manner. The study investigated fully continuous state-space/discrete action space models to
discover the most efficient treatment options, learning an estimate for the best action-value function,
using Dueling Double-Deep Q networks, Q *(s, a). It was discovered that the resulting continuous state
space model generated interpretable regulations that might enhance sepsis treatment. The taught
policies will be put through a patient evaluation and contrasted with other investigative algorithms in
future study. The results of this study may significantly influence medical practice for sepsis
identification. The use of a model like described in the papers can anticipate the onset of sepsis that
could lessen the vexing issue of alarm fatigue that plagues the existing clinical scoring systems, improve
patient outcomes, and lessen the burden on the healthcare system because sepsis is a condition that is
poorly understood and challenging for practitioners to diagnose. Although the focus of this work was
on early sepsis identification, it would be simple to adapt the techniques to other clinical events of
relevance, such as cardiac arrests, code blue occurrences, ICU admissions, and cardiogenic shock. This
will enable practitioner to employ the techniques in a real-world clinical context, and the model's
usefulness can be objectively demonstrated by gathering information on the reliability of the warnings
it raises and how it is applied on the actual wards.</p>
    </sec>
    <sec id="sec-17">
      <title>8. REFERENCES</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Futoma</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>“Gaussian process-based models for clinical time series in healthcare” (Doctoral dissertation</article-title>
          , Duke University).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Raghu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Aniruddh</surname>
          </string-name>
          , et al. “
          <article-title>Deep reinforcement learning for sepsis treatment</article-title>
          .
          <source>” arXiv preprint arXiv:1711.09602</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Tardini</surname>
          </string-name>
          , Elisa et al. “
          <article-title>Optimal Treatment Selection in Sequential Systemic and Locoregional Therapy of Oropharyngeal Squamous Carcinomas: Deep Q-Learning With a PatientPhysician Digital Twin Dyad</article-title>
          .
          <source>” Journal of medical Internet research</source>
          vol.
          <volume>24</volume>
          ,
          <issue>4</issue>
          e29455. 20 Apr.
          <year>2022</year>
          , doi:10.2196/29455.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Goh</surname>
            ,
            <given-names>Kim</given-names>
          </string-name>
          &amp; Wang,
          <string-name>
            <surname>Le</surname>
          </string-name>
          &amp; Yeow, Adrian &amp; Poh,
          <string-name>
            <surname>Hermione</surname>
          </string-name>
          &amp; Li,
          <string-name>
            <surname>Ke</surname>
          </string-name>
          &amp; Yeow, Joannas &amp; Tan,
          <string-name>
            <surname>Gamaliel.</surname>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>“Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare</article-title>
          .
          <source>Nature Communications”, doi:10.1038/s41467-021- 20910-4.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Jonsson</surname>
            <given-names>A.</given-names>
          </string-name>
          “
          <article-title>Deep Reinforcement Learning in Medicine”</article-title>
          ,
          <source>Kidney Dis</source>
          <year>2019</year>
          ;
          <volume>5</volume>
          :
          <fpage>18</fpage>
          -
          <lpage>22</lpage>
          . doi:
          <volume>10</volume>
          .1159/000492670.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Littman</surname>
          </string-name>
          , Michael L. “
          <article-title>A tutorial on partially observable Markov decision processes</article-title>
          .
          <source>” Journal of Mathematical Psychology</source>
          <volume>53</volume>
          .3 (
          <year>2009</year>
          ):
          <fpage>119</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pollard</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mark</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>MIMIC-III Clinical Database Demo(v14) . PhysioNet</article-title>
          . https://doi.org/10.13026/C2HM2Q.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>LiChun</given-names>
          </string-name>
          &amp; ZhiMin,. (
          <year>2019</year>
          ),
          <article-title>“An Overview of Deep Reinforcement Learning”</article-title>
          ,
          <source>CACRE2019: Proceedings of the 2019 4th International Conference on Automation, Control and Robotics Engineering. 1-9. 10.1145/3351917</source>
          .3351989.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>A. E. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pollard</surname>
            ,
            <given-names>T. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehman</surname>
            ,
            <given-names>L. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghassemi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Moody, B.,
          <string-name>
            <surname>Szolovits</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Celi</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mark</surname>
            ,
            <given-names>R. G.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>MIMIC-III, a freely accessible critical care database</article-title>
          .
          <source>Scientific Data</source>
          ,
          <volume>3</volume>
          ,
          <fpage>160035</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Joseph</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Carew</surname>
          </string-name>
          , “
          <article-title>Tech Target blog,” Tech Target Enterprise AI</article-title>
          . [Online]. URL: techtarget.com/definition/reinforcement-learning.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Mary</surname>
            <given-names>Mammen</given-names>
          </string-name>
          , Priyanka &amp; Kumar,
          <string-name>
            <surname>Hareesh.</surname>
          </string-name>
          (
          <year>2019</year>
          ), “
          <string-name>
            <surname>Explainable</surname>
            <given-names>AI</given-names>
          </string-name>
          :
          <article-title>Deep Reinforcement Learning Agents for Residential Demand Side Cost Savings in Smart Grids</article-title>
          .”
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>