<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Modelling Temporal Relationships in Pseudomonas Aeruginosa Antimicrobial Resistance Prediction in Intensive Care Unit</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>A`lvar Herna` ndez-Carnerero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristina Soguero-Ruiz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miquel Sa` nchez-Marre`</string-name>
          <email>miquel@cs.upc.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergio Mart´ınez-Agu¨ ero</string-name>
          <email>sergio.martinez@urjc.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Campus del Poblenou, Universitat Pompeu Fabra</institution>
          ,
          <addr-line>08018 Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Signal Theory and Communications</institution>
          ,
          <addr-line>Telematics and Com-</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Inmaculada Mora-Jime ́nez</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Intensive Care Department, University Hospital of Fuenlabrada, Madrid Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). This volume is published and copyrighted by its editors. Advances in Artificial Intelligence for Healthcare</institution>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Knowledge Engineering and Machine Learning Group (KEMLG-UPC), Intelligent Data Science and Artificial Intelligence Research Centre (IDEAI- UPC), Dept. of Computer Science, Universitat Polite`cnica de Catalunya</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, the prediction of antimicrobial resistance of Pseudomonas aeruginosa bacteria caused by nosocomial infections in the Intensive Care Unit (ICU) was considered. It was trained a Logistic Regression model using health records information from patients together with the history of past sensitivity tests (antibiograms). To predict the antimicrobial resistance for a certain patient, this study proposes to model the temporal relationships using bacterial information from the rest of the patients who are at the same time in the ICU. Furthermore, a training window with incremental size is used so that training set is always temporarily as near as possible to test instances to be predicted. Using these contributions, experiments show promising results to predict antimicrobial resistance even when few training data is available. From these results it is further inferred that resistant bacteria may be spreading among patients in the ICU and their populations rapidly mutate, changing the underlying data distribution, along time.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Antimicrobial resistance has been increasing for decades, and the
rate at which new antibiotics are synthesized is not as fast as it would
be required to prevent this trend [
        <xref ref-type="bibr" rid="ref17 ref7">7, 17</xref>
        ]. A large proportion of
infections caused by resistant bacteria occur during hospital stays,
specially in the Intensive Care Unit (ICU) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. There, infection rates are
much higher than in other hospital divisions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This is due to its
severely vulnerable population and to the high risk of becoming
infected through multiple procedures and to the use of invasive devices
distorting the anatomical integrity-protective barriers of patients
(intubation, mechanical ventilation, vascular access, etc.) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        It is frequent to find in the ICU some kinds of bacteria which can
become multidrug-resistant. Among them, Acinetobacter spp.,
Enterococcous fecalis and Enterococcus faecium, Escherichia coli,
Klebsiella pneumoniae, Pseudomonas aeruginosa and Staphylococcus
aureus are usually found. In this paper, we focus on Pseudomonas
aeruginosa due to their prevalence and virulence. It is naturally
resistant to many antibiotics and has a remarkable capacity for acquiring
new resistance mechanisms, creating therapeutic problems [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Pseudomonas aeruginosa is considered to be multi-drug resistant (MDR)
when it is observed a reduced in vitro susceptibility to three or more
antimicrobial families [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>
        It is known that infections due to MDR microorganisms are a
major problem. This has a significant impact in the ICU, where they can
cause additional morbidity, mortality, and health care costs [
        <xref ref-type="bibr" rid="ref11 ref3">3, 11</xref>
        ].
Inappropriate initial antimicrobial treatment of P. aeruginosa is
statistically linked to a higher mortality compared to initial treatment
with appropriate antimicrobial. The growing MDR rate of P.
aeruginosa also increases the chance of inappropriate initial antimicrobial
treatment [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        To tackle with MDR in the hospital, a culture or microbiological
analysis is usually performed to test whether the bacterium is
resistant or susceptible to a set of antibiotics. For this purpose, first the
germ (bacterium) is isolated and an antibiogram is carried out. The
antibiogram represents the in vitro bacterium’s resistance to a series
of antibiotics. The set of antibiotics used in the antibiogram can be
selected for the specific type of bacteria being tested. The result of
the antibiogram is a vector of pairs antibiotic/sensibility [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Antibiograms are often used by clinicians to assess bacteria susceptibility
rates, as an aid in selecting empiric antibiotic therapy [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Hence, the
antibiogram result could vary between bacterium species, depending
on the resistance of a particular bacterium to different antibiotics.
However, quite often groups of antibiotics still have similar
sensitivity when tested on a given bacterium species, despite its strains
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        In addition, the result of the antibiogram can help to reduce the
bacteria spread by taking special measurements such as isolation of
the patient. One of the most relevant factors of the spread of
bacterial resistance is the so called cross-transmission [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], which may
facilitate the spread of resistant bacteria from one patient to another.
Also, some measures such as hand hygiene, skin cleansing, and
contact precautions can help to prevent cross-transmission [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. To know
how and where to extreme caution, information about the kind of
bacteria in the ICU and their resistance plays a key role. Since ICU
patients have a critical health status and the antibiogram result can
take from 24h to 48h [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], it is of major importance to develop tools
which can help to anticipate this result. This would contribute not
only to save patient’s lives but also to prevent the spread of a
resistant bacteria.
      </p>
      <p>
        Because of the aforementioned reasons, in the current study we
propose to use a Data Mining (DM) technique, and specific
temporal data processing to get a quick estimation of the antibiogram
result. In this sense, many of the state-of-the-art studies regarding the
use of DM methods to predict antimicrobial susceptibility are using
whole genome sequencing [
        <xref ref-type="bibr" rid="ref1 ref14 ref15 ref8">8, 15, 14, 1</xref>
        ]. Despite it is a very
promising technique, it involves very significant costs. As an alternative, we
propose to use information from the ICU health records and
demographic data of patients, along with historic antibiogram results to
train a DM model, aiming to predict resistant bacteria in new
cultures. As opposed to the whole genome sequencing, our approach
intends to use data which are already available in the vast majority
of hospitals, in order to speed up the process of identifying positive
cases. Similar strategies have been analyzed in the past [
        <xref ref-type="bibr" rid="ref12 ref19 ref20">12, 20, 19</xref>
        ].
      </p>
      <p>The remainder of this paper is structured as follows. Section 2
describes the dataset and the procedure to create new features. The
experimental setup is established in Section 3, and results in Section 4.
Finally, conclusions, limitations of the study, and suggested future
research are presented in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>DATA DESCRIPTION AND PREPROCESSING</title>
      <p>Data considered in this work is a unified and anonymized dataset
specifically collected for the study of antimicrobial resistance in the
ICU of the University Hospital of Fuenlabrada (UHF) in Spain. The
data set covers the years from 2004 to 2013. During this time interval,
1914 patients were admitted to the ICU, and 22142 cultures were
carried out from 2186 admissions. It has a number of 257 different
types of bacteria and 26 antimicrobial families.</p>
      <p>The dataset contains the results of antibiograms carried out to
patients in the ICU, that is, the results of the sensitivity tests
(susceptible (s); or resistant (r)) for certain pairs of bacteria and family of
antibiotic used in the test. It also includes demographic data of the
patients and information of their ICU admission: age, gender, date
of ICU admission, clinical origin of the patient before ICU
admission, reason for admission, patient category, comorbidities and
pluripathology (it indicates whether a patient has more than two
comorbidities).</p>
      <p>As already mentioned, in this study we focused on just one type
of bacteria among the multiple available in the data set: P.
aeruginosa. This bacterium is considered MDR if it is resistant to three or
more of the following antimicrobial families within the same culture:
Aminoglycosides, Carbapenems, 4th Generation Cephalosporins,
Extended-spectrum penicillins, Polymyxins and Quinolones.</p>
      <p>In this work, we analyze all instances containing the bacterium
and antimicrobial families of interest. Then, each instance is
represented by features described in Table 1. Features c&amp;car and c&amp;cf4
represent the target to be predicted, this is, the result of the sensibility
test for P. aeruginosa to the antimicrobial families of Carbapenems
and 4th Generation Cephalosporins, respectively. We consider only
these antimicrobial families since this is the first approach to analyze
the problem and allows us to reduce the scope of this study.</p>
    </sec>
    <sec id="sec-3">
      <title>Generation of new features</title>
      <p>In addition to the selected features, we propose to generate a new
kind of features based on the temporal information of cultures
recorded in the data set.</p>
      <p>The purpose of these features is to capture the presence of resistant
bacteria in the ICU along time and the ”intensity” of that presence.
By ”intensity”, we consider the number of patients infected and the
number of days since resistant bacterium was detected. For a specific
instance of the data set, which represents a culture Cp of patient Pp,
cultures containing P. aeruginosa have been collected for patients Pi
(Ci = fCij g) between 21 days and 48 hours before the date of the
culture Cp. These cultures exclude those associated to the patient Pp
under analysis. Note that, since the results of the test usually takes
48 hours to be provided, it is not possible to use cultures taken, for
instance, one hour ago. Apart from that, from a clinical viewpoint,
if the culture result is positive, it is kept as positive for the next 21
days. For this reason, we consider cultures collected 21 days before
the date Dp of the current culture Cp of patient Pp.</p>
      <p>
        In total, six features are created using the information of the past
cultures, one for each type of the antimicrobial families mentioned
above: r&amp;amg, r&amp;car, r&amp;cf4, r&amp;pap, r&amp;pol and r&amp;qui. For
instance, feature r&amp;amg identifies only cultures tested for the
Aminoglycosides family. The value of this feature is obtained taking into
account the set of past cultures for all other patients (Pi), and
excluding the patient under analysis (Pp). Each culture Cij on date
dij for patient Pi has the sensibility test result rij , which is 0 or 1
depending on whether the bacterium is susceptible or resistant to a
specific family of antibiotics. In order to give more emphasis to the
most recent cultures, we use a negative exponential function [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to
weight the culture results associated to each patient Pi as follows:
fCij (Dp) =
(0
n (Dp dij )
if rij = 0
if rij = 1
where, n is a real number experimentally set to 1:1. Then, to compute
the value of each of the six features linked to patient Pp on date Dp,
for each patient excepting Pp the maximum outcome in Eq. (1) is
determined and added up according to Eq. (2).
      </p>
      <p>F VCp (Dp) =</p>
      <p>X
8Pi6=Pp
max fCij (Dp)
Ci
(1)
(2)
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Data preprocessing</title>
      <p>To proceed with the model design for c&amp;car and c&amp;cf4, we created
two data subsets, one associated to each target. These subsets will be
used to train a Logistic Regression (LR) model for each target.
Training two different classifiers instead of, for instance a multi-class
classifier, allows each classifier to be specialized in predicting its
particular target, therefore tuning classifier’s weights individually. In order
to limit the study to MDR adquired in the ICU, only instances of
patients admitted in the ICU for more than 48h are considered.</p>
      <p>The final data set for the c&amp;car was composed by 450 cultures
and 34 features including the target one, and the final data set for the
r&amp;cf4 was composed by 556 cultures and the same 34 features
including the target one. In both data sets there are only missing values
in six features. The percentages of missing values in those features
are depicted in the Table 2.</p>
      <p>
        Before training the models, we deal with missing data associated
to the proposed features r&amp;amg, r&amp;car, r&amp;cf4, r&amp;pap, r&amp;pol
and r&amp;qui. Note that there may not be any culture in the ICU for P.
aeruginosa and a particular antimicrobial family for some time
intervals, and therefore, a missing value is considered for that feature.This
fact can be addressed following different approaches, such as
deleting instances with missing data or imputing missing values [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. In
this study we propose an strategy based on the clinical meaning of
the generated features. The smaller the value of these features, the
fewer patients will have been infected with resistant P. aeruginosa
bacteria and the greater the length of time since they were infected.
As a result, if r&amp;* features do not have any value, it suggests that no
P. aeruginosa has been recently detected in the ICU. Therefore, very
likely no patients would have been recently infected with a resistant
bacteria, and the value provided by Eq. (2) should be very small. In
this case, missing values are replaced by a 0.
      </p>
      <p>Afterwards, all categorical features are converted to binary
following a one-hot encoding strategy, except for the features representing
dates. Since dates have an intrinsic ordering, smaller numerical
values are assigned to further dates in the past, and greater values
correspond to more recent dates.</p>
      <p>
        Finally, Pearson correlation between features (without
considering the targets) is calculated in order to discard the most correlated
features, since they provide very similar information. The
methodology is as follows. When two features have a correlation
coefficient higher than 0.9 or lower than -0.9, just one of them is
randomly selected, discarding the rest. A visual representation of
Pearson correlation between features for Carbapenems and 4th
Generation Cephalosporins subsets is shown in Fig. 1. We can conclude
that a similar correlation in patterns is found for both subsets. In
both, there is just one group of features that are correlated more than
0.9, which are the following: date culture, year culture,
start date and year admission. From these four correlated
features, date culture is selected because it is the most
representative among them and the rest are discarded. Finally, both data
sets had 31 features after this discarding.
The first experiment, once data is processed, is to evaluate the
relevance of the set of features selected and proposed, regarding the two
different target features to be predicted. This is going to be analyzed
by using Mutual Information (MI) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It is a quantity that measures
the mutual dependence of the two variables, that is, it quantifies the
amount of information that a random variable provides about other
random variable. In terms of the probabilities, the MI of two jointly
discrete random variables X and Y is calculated as:
      </p>
      <p>I(X; Y ) = X X p(X;Y )(x; y) log
y2Y x2X
p(X;Y )(x; y)
pX (x) pY (y)
(3)
where p(X;Y ) is the joint probability mass function of X and Y ,
and pX and pY are the marginal probability mass functions of X and
Y respectively.</p>
      <p>Looking now at the prediction of the target, the type of the problem
proposed have associated a series of special characteristics that have
to be considered in order to properly address it.</p>
      <p>First property in health records have an inherent temporal
ordering. This forces, to use as training only instances that belong to a time
prior to the test antibiograms that are to be predicted. In addition a
margin of time has to be respected between train and test windows,
since results of antibiograms are not immediately available after they
are carried out. As before, in this particular case, a time margin of
48h needs to be considered.</p>
      <p>
        The second particularity encountered when trying to predict is the
concept drift. It is the fact that the concept of interest may depend on
some hidden context, not given explicitly in the form of predictive
features. Changes in the hidden context over time can induce more
or less radical changes in the target concept. Changes in hidden
context may not only result in a change of the target concept, but may
also cause a change of the underlying data distribution [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. In the
particular domain of this study, antimicrobial resistance, the hidden
context that changes over time are the mutations of bacteria, that
allow them to be more resistant to antibiotics, as time passes by. In [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]
it is proposed to use instance selection to handle concept drift as it
is the technique that is most commonly used and has been found to
offer good results. More specifically it is proposed to use a technique
based on instance selection that consists in generalizing from a
window that moves over recently arrived instances and uses the learnt
concepts for prediction only in the immediate future. This represents
a very good approach to apply to the resolution of the problem
analyzed in this study except for the third particularity of the data set.
      </p>
      <p>The data scarcity makes it difficult to learn from temporal
windows containing several months, even years. In the first data set
considered, the one with target feature c&amp;car there are a total of 450
instances, and for the second data set with target feature c&amp;cf4 there
are 556. Taking into account those data sets represent cultures from
10 years (from 2004 to 2013), the average of cultures per year is 45
and 56 respectively, which is a relatively small number of instances
considering how fast ICU bacteria is able to mutate and change its
sensitivity patterns. For this reason, we propose to use an incremental
window for training, which increases its size as test window moves
towards more recent instances. That is, the training window will be
fixed from the first temporal instances, which are the oldest, and it
will gradually increment in size, containing more instances, as more
recent instances of test are predicted. The test window, on the other
hand, will have a fixed size and it will progressively slide to select
more recent instances.</p>
      <p>The DM technique used in experiments is Logistic Regression
(LR). It is chosen because of its simplicity to be used as a baseline. A
baseline is a model that is both simple to set up and has a reasonable
chance of providing acceptable results. LR is a technique which is
used for the classification, and is used in this study to evaluate the
feasibility of learning the data. The classifier has to decide whether
the target is sensitive or resistant. Therefore it is a binary decision.</p>
      <p>To assess the performance of the proposed incremental window
framework, experiments are carried out with different configurations.
The characteristics of defined training and test windows are the
following:</p>
      <p>For each experiment a set of LR classifiers is trained, each for a
different training-test window.</p>
      <p>The size chosen for test windows is fixed in 3 months. Which is
a relatively short time, near to the training instances, containing
enough test instances.</p>
      <p>Different test windows do not overlap between them. That is,
compared to the others, each test window contains different instances
belonging to different time intervals.</p>
      <p>Instances in the training window do not contain antibiograms
belonging to patients that also are present in the test window. For
instance, if the result of an antibiogram of a particular patient is
to be predicted in the test set, there are not past antibiograms of
the same patient in the training set. That way, it is ensured that
patients from training and test windows are different.</p>
      <p>With each training and test pair of windows, a simple validation is
performed to maintain the temporal order. The class imbalance due
to the nature of the problem, causes that in the test window there
might be more instances of one class than the other. To get a
realistic approximation of the algorithm performance, the true
negatives (success in sensitive instances) and the true positives (success
in resistant instances) are calculated, together with the average
between these two values and the general accuracy. For a particular test
window with ns sensitive instances and nr resistant instances, if the
method succeeds in predicting ps sensitive instances and pr resistant
instances, the just mentioned values are calculated as:
general accuracy=
resistant success= pr</p>
      <p>sr
sensitive success= ps</p>
      <p>ns
ps + pr
ns + nr
(4)
(5)
(6)
average success= psrr + npss (7)
2
These metrics offer a better approximation of the performance,
because allow to track the success rate in the minority class label, which
in many real problems is the most important one. For instance, if
the test set counts with 8 sensitive and 2 resistant instances and the
DM technique predicts all instances as sensitive, the general accuracy
metric would pretty high value of a 80%, while it will be
performing poorly in indentifying resistant antibiograms which are the ones
most needed to detect.</p>
      <p>Finally, to calculate the mean accuracy among several windows,
an accumulation of the success rates is done, and the performance is
evaluated using Equations 4, 5, 6 and 7. In other words, the
performance of several windows is not calculated by averaging the
individual performance values, but by accumulating the number of
success instances in each window and using it to calculate the accuracy.
This is done because test windows may have a different number of
instances, due to the fact that not all 3-month time intervals have
the same number of antibiograms. Therefore, making an average
between their accuracy values would not be adequate since some
instances would have more weight than others depending on the
number of instances in their test window.
This section presents the results obtained after carrying out the
experiments described in Section 3. In Section 4.1, the mutual dependence
among features is analyzed by using MI. Sections 4.2 and 4.3
evaluate the impact of features, date culture and r&amp;* respectively,
on the prediction of bacteria resistance. The improvement in
prediction obtained by using the incremental training window scheme is
assessed in Section 4.4.
4.1</p>
    </sec>
    <sec id="sec-5">
      <title>Features relevance using mutual information</title>
      <p>The results of feature relevance using MI for Carbapenems and 4th
Generation Cephalosporins targets are represented in left and right
columns of Table 3 respectively. The method MI computes the
relevance or weight of one feature according to the co-occurrence of
this feature and the target feature as described in equation 3. The
MI method does not take into account the possible interaction of
this feature with other ones regarding the target feature. In both
cases, the feature date culture is by far the variable containing
most information about the feature to be predicted. Regarding
Carbapenems, date culture has a value of 0.53, while the second
most important feature which is age just receives a relevance value
of 0.18. In 4th Generation Cephalosporins date culture gets a
value of 0.44 and r&amp;qui, the second one 0.16. The importance of
date culture suggests that results of antibiograms are highly
dependent on the time they were performed.</p>
      <p>In addition, it is notable that five of the six proposed features r&amp;*,
which consider antibiograms of other patients in the ICU, are
between the eight most relevant features for both Carbapenems and
4th Generation Cephalosporins. Therefore, we can infer they
contain a great amount of information to predict antimicrobial
resistance. The proposed feature with a considerably lowest importance
is r&amp;pol. This is probably caused because in the data set there are
a smaller number of antibiograms for Polymyxins antimicrobial
family (296) compared to the amount for Aminoglycosides (564),
Carbapenems (450), 4th Generation Cephalosporins (556),
Extendedspectrum penicillins (558) and Quinolones (558).</p>
      <p>Hence, one can infer that the result of the antibiogram for a
particular patient is dependent on the past results of antibiograms of other
patients in the ICU. To explain this fact we propose as a
hypothesis that bacteria has been spreading from one patient to another in
the ICU by cross-transmission. Therefore, the fact that a patient is
infected with a resistant bacteria may increase the odds of another
patient becoming infected as well.
4.2</p>
    </sec>
    <sec id="sec-6">
      <title>Prediction contribution of date culture</title>
      <p>feature
After observing that date culture is the most important feature
according to MI, the behavior of this significant variable is evaluated
when predicting the result of antibiograms. To do that the target
feature is predicted in two modes, one considering all features
including date culture, and a second one discarding date culture
from the features set. These predictions are made by using an
incremental window for training and a 3-months sliding window for test
as described in the experimental setup section. The training window
starts containing 2004 and 2005 instances and the test window the
first three months of 2006. After that, the training window increases
three months and test window slides three months until the end of the
database is reached. Results of the accumulated accuracy for all
windows are shown in the two left columns of Table 4 for Carbapenems
and in the two right columns of the same Table 4 for 4th Generation
Cephalosporins.</p>
      <p>It is remarkable that in both cases the success percentage slightly
increases when date culture is not used. We state that this may
be due precisely to the high influence this feature has on prediction
of antimicrobial resistance. For some time intervals where the
majority of instances belong to a particular label, this feature might be
forcing the DM method to learn that, in this particular interval of
values of date culture, it is highly probable that the predicted
instance belongs to the majority label. That is, it may be introducing
some kind of bias towards the majority label of the time interval. This
is reflected in the way resistant success and sensitive success varies
when date culture is removed. For Carbapenems target it can
be seen that when using this feature, success in resistant instances
increases and success in sensitive instances decreases. In
Carbapenems data set there is a higher number of resistant instances (238)
than sensitive instances (212). This seems to indicate that the
majority class enhances its accuracy when using date culture, and
the minority class makes it worse. The same is observed for 4th
generation Cephalosporins.In 4th generation Cephalosporins columns,
the opposite situation happens, when date culture is considered,
Resistant success decreases and Sensitive success increases, and now
the majority class are sensitive instances (350) as opposed to resistant
instances (206). This apparently shows the bias date culture
feature introduces on classifing instances as the majority class,
because of its high influence over the prediction.</p>
      <p>In next experiments, date culture is discarded since it slightly
reduces the accuracy of the algorithm.
Comparison with and without using date culture feature for
prediction of years 2006 to 2013 for both c&amp;car and c&amp;cf4
target features.
4.3</p>
      <p>Prediction contribution of r&amp;* features
A similar experiment is carried out for proposed r&amp;* features.
Accuracy metrics are calculated with and without them, for test years
between 2006 and 2013. In Table 5 the results of this experiment
are represented. The accuracy remains almost the same whether r&amp;*
features are used to predict or not, for both antimicrobial families.
The difference is observed in resistant success and sensitive success.
When r&amp;* features are taken into account, these two metrics are
more balanced, which means that both the majority class and
minority class get a similar success rate, which is a desirable effect.
Moreover, one can note that in both antimicrobial families, resistant
success increases, which means that cultures from other patients are
helping to better discriminate the resistant instances. Sensitive
success decreases which is probably caused by the LR decision
boundary, which after moving to better predict resistant instances is
lowering its performance in recognizing sensitive instances.
At last, the usefulness of the incremental training widow and
3month test window scheme is evaluated to assess whether it improves
success metrics with respect to using training windows furthest from
the test set, and the same 3-month test windows. In this experiment
years 2012 and 2013 are predicted. In Table 6 success metrics for
Carbapenems data sets are calculated, for 2012 and 2013. Figure 2
shows the number of test instances in each 3-month test window
during the mentioned years. In the two left columns of Table 6 accuracy
improves from a 59%, using a fixed window of training from 2004 to
2011, to a 68% when using the incremental window to predict 2012
instances. In the three columns on the right of Table 6 it is observed
that, when predicting 2013, accuracy raises from a 64% when
using training data until 2011 to a 71% when considering year 2012
as training too. When using the incremental window, the accuracy
remains the same in a 71%. The reason why, in this case using the
incremental window does not increase the accuracy with respect to
training data until 2012 may be that the number of instances from
first months of 2013 is small as it can be seen in Figure 2, and
including them in the incremental training window may not increase
the knowledge of the problem.</p>
      <p>Table 7 and Figure 3, represent the results of the same experiment
for 4th Generation Cephalosporins data. In the two left columns of
Table 7 accuracy for predicting 2012 is the same considering training
instances until 2011 and using the incremental window. As before,
that can be due to instances of first months of 2012 not providing
enough knowledge to improve accuracy of prediction, although in
this case there is a greater number of them as it is shown in Figure 3.
The three columns on the right of Table 7show that accuracy to
predict 2013 is maintained in a 71% when making use of training data
until 2011 and training data until 2012, which as we have just seen
may be caused by 2012 instances not providing much information.
Considering the incremental window for training, the percentage
increases to an 80%.</p>
      <p>These last experiments reveal that using training data temporarily
as close as possible to the test set, always maintain the same
success rates or improve them than using more distant data. Therefore,
we conclude that the incremental window is a good scheme for the
training set to make predictions of this particular problem. Also in
these last experiments, in which test years used are 2012 and 2013,
the accuracy values achieved are higher than the ones in previous
experiments where the accuracy value was accumulated from 2006 to
2013. The fact that accuracy improves as the training set is bigger
implies that the algorithm is able to learn the model and generalize
from the training data.
In the two left columns, comparison of the year 2012 prediction
using a fixed training window with instances from 2004 to 2011 and
an incremental training window. In the three columns on the right,
comparison of 2013 prediction using a fixed training window with
instances from 2004 to 2011, a fixed training window from 2004 to
2012 and an incremental training window. The target feature predicted
in all cases is c&amp;cf4.
5</p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>In this paper we suggest to use health records and past antibiogram
data to predict antimicrobial resistance in the ICU. We propose to
use information about recently detected resistant germs in ICU as
features of the data set. To handle changes in data distribution over
time caused by progressive mutation of bacteria, we suggest to use
an incremental window for training set, and a test window with fixed
size such that training and test instances are temporarily as close as
possible.</p>
      <p>Experiments show that information of recent resistant bacteria
detected in patients of the ICU, contains a relatively high amount of
information to predict bacteria resistance in other patients, which could
indicate that bacteria is spreading among ICU patients. It has also
been observed that features providing the specific temporal ordering
between all instances in the data set tend to decrease prediction
accuracy. Lastly, experiments indicate that using an incremental window
for training, maintain success rates or improve them. Therefore, we
can conclude it is a scheme that improves the algorithm performance.</p>
      <p>As future work we consider including further patient’s details
about their admission, such as the antibiotics they have been
administered, whether they have required intubation, if they have needed
mechanical ventilation, among others, which are indicators that can
have an impact on the appearance of resistant bacteria. Also,
including past antibiogram information, for each particular patient whose
sensibility test has to be predicted, would be an interesting approach
to evaluate. To extend this study we propose to use other kind of DM
algorithms different to LR, to assess whether they can improve
success rates seen in this study. In addition, it would be advantageous to
predict the resistance to the six antimicrobial families relevant to P.
aeruginosa mentioned in the current study, so it would be possible to
detect when a bacteria may become multidrug resistant.</p>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGEMENTS</title>
      <p>We are thankful to University Hospital of Fuenlabrada, Madrid,
Spain for providing the database used in this research.</p>
      <p>This work has been partly supported by the Spanish Thematic
Network “Learning Machines for Singular Problems and
Applications (MAPAS)” (TIN2017-90567-REDT, MINECO/FEDER EU),
by the IDEAI-UPC Consolidated Research Group Grant from
Catalan Agency of University and Research Grants (AGAUR, Generalitat
de Catalunya) (2017 SGR 574), by the Spanish Ministry of Economy,
Industry and Competitiveness under the Research Project Klinilycs
(TEC2016-75361-R), by the Science and Innovation Ministry Grants
AAVis-BMR (PID2019-107768RA-I00) and BigTheory
(PID2019106623RB-C41), by the Spanish Institute of Health Carlos III (grant
DTS 17/00158), by Project Ref. F656 financed by Rey Juan Carlos
University, by the Young Researchers R&amp;D Project Ref. 2020-661,
financed by Rey Juan Carlos University and Community of Madrid
(Spain), and by the Youth Employment Initiative (YEI) R&amp;D Project
Ref. TIC-11649 financed by the Community of Madrid (Spain).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Gustavo</given-names>
            <surname>Arango-Argoty</surname>
          </string-name>
          , Emily Garner, Amy Pruden, Lenwood S Heath,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Vikesland</surname>
          </string-name>
          , and Liqing Zhang, '
          <article-title>Deeparg: a deep learning approach for predicting antibiotic resistance genes from metagenomic data'</article-title>
          ,
          <source>Microbiome</source>
          ,
          <volume>6</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K</given-names>
            <surname>Balakrishnan</surname>
          </string-name>
          ,
          <article-title>Exponential distribution: theory, methods and applications</article-title>
          , Routledge,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Nele</given-names>
            <surname>Brusselaers</surname>
          </string-name>
          , Dirk Vogelaers, and Stijn Blot, '
          <article-title>The rising problem of antimicrobial resistance in the intensive care unit'</article-title>
          ,
          <source>Annals of intensive care</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <fpage>47</fpage>
          , (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Rafael</given-names>
            <surname>Cant</surname>
          </string-name>
          <article-title>o´n, 'Lectura interpretada del antibiograma: una necesidad cl´ınica', Enfermedades Infecciosas y microbiolog´ıa cl´ınica,</article-title>
          <volume>28</volume>
          (
          <issue>6</issue>
          ),
          <fpage>375</fpage>
          -
          <lpage>385</lpage>
          , (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Thomas</surname>
            <given-names>M Cover</given-names>
          </string-name>
          and Joy A Thomas.
          <source>Elements of information theory</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C</given-names>
            <surname>Defez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P</given-names>
            <surname>Fabbro-Peray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N</given-names>
            <surname>Bouziges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Gouby</surname>
          </string-name>
          ,
          <string-name>
            <surname>A Mahamat</surname>
          </string-name>
          ,
          <article-title>JP Daures, and A Sotto, 'Risk factors for multidrug-resistant pseudomonas aeruginosa nosocomial infection'</article-title>
          ,
          <source>Journal of Hospital Infection</source>
          ,
          <volume>57</volume>
          (
          <issue>3</issue>
          ),
          <fpage>209</fpage>
          -
          <lpage>216</lpage>
          , (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Milislav</given-names>
            <surname>Demerec</surname>
          </string-name>
          , '
          <article-title>Origin of bacterial resistance to antibiotics'</article-title>
          ,
          <source>Journal of bacteriology</source>
          ,
          <volume>56</volume>
          (
          <issue>1</issue>
          ),
          <fpage>63</fpage>
          , (
          <year>1948</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>MJ</given-names>
            <surname>Ellington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O</given-names>
            <surname>Ekelund</surname>
          </string-name>
          , Frank Møller Aarestrup,
          <string-name>
            <given-names>R</given-names>
            <surname>Canton</surname>
          </string-name>
          ,
          <string-name>
            <surname>M Doumith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Giske</surname>
          </string-name>
          , Hajo Grundman, Henrik Hasman, MTG Holden,
          <string-name>
            <surname>Katie L Hopkins</surname>
          </string-name>
          , et al., '
          <article-title>The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the eucast subcommittee', Clinical microbiology and infection</article-title>
          ,
          <volume>23</volume>
          (
          <issue>1</issue>
          ),
          <fpage>2</fpage>
          -
          <lpage>22</lpage>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9] Ha˚kan Hanberger, Jose´-Angel
          <string-name>
            <surname>Garcia-Rodriguez</surname>
          </string-name>
          , Miguel Gobernado, Herman Goossens, Lennart E Nilsson,
          <string-name>
            <surname>Marc J Struelens</surname>
          </string-name>
          , et al., '
          <article-title>Antibiotic susceptibility among aerobic gram-negative bacilli in intensive care units in 5 european countries'</article-title>
          ,
          <source>Jama</source>
          ,
          <volume>281</volume>
          (
          <issue>1</issue>
          ),
          <fpage>67</fpage>
          -
          <lpage>71</lpage>
          , (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S</given-names>
            <surname>Joshi</surname>
          </string-name>
          et al.,
          <source>'Hospital antibiogram: a necessity'</source>
          ,
          <source>Indian journal of medical microbiology</source>
          ,
          <volume>28</volume>
          (
          <issue>4</issue>
          ),
          <fpage>277</fpage>
          , (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Lisa</surname>
            <given-names>L Maragakis</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eli N Perencevich</surname>
          </string-name>
          , and Sara E Cosgrove, '
          <article-title>Clinical and economic burden of antimicrobial resistance', Expert review of anti-infective therapy</article-title>
          ,
          <volume>6</volume>
          (
          <issue>5</issue>
          ),
          <fpage>751</fpage>
          -
          <lpage>763</lpage>
          , (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Sergio</given-names>
            <surname>Mart</surname>
          </string-name>
          <article-title>´ınez-Agu¨ ero, Inmaculada Mora-Jime´nez, Jon Le´ridaGarc´ıa, Joaqu´ın A´lvarez-Rodr´ıguez, and Cristina Soguero-Ruiz, 'Machine learning techniques to identify antimicrobial resistance in the intensive care unit'</article-title>
          ,
          <source>Entropy</source>
          ,
          <volume>21</volume>
          (
          <issue>6</issue>
          ),
          <fpage>603</fpage>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Scott</surname>
            <given-names>T Micek</given-names>
          </string-name>
          , Ann E Lloyd, David J Ritchie, Richard M Reichley,
          <string-name>
            <surname>Victoria J Fraser,</surname>
          </string-name>
          and Marin H Kollef,
          <article-title>'Pseudomonas aeruginosa bloodstream infection: importance of appropriate initial antimicrobial treatment', Antimicrobial agents</article-title>
          and chemotherapy,
          <volume>49</volume>
          (
          <issue>4</issue>
          ),
          <fpage>1306</fpage>
          -
          <lpage>1311</lpage>
          , (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Marcus</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S Wesley</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <surname>Patrick F McDermott</surname>
          </string-name>
          ,
          <string-name>
            <surname>Randall J Olsen</surname>
          </string-name>
          , Robert Olson, Rick L Stevens,
          <string-name>
            <surname>Gregory H Tyson</surname>
            ,
            <given-names>Shaohua</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
          </string-name>
          , and
          <string-name>
            <surname>James</surname>
            J Davis,
            <given-names>'</given-names>
          </string-name>
          <article-title>Using machine learning to predict antimicrobial mics and associated genomic features for nontyphoidal salmonella'</article-title>
          ,
          <source>Journal of clinical microbiology</source>
          ,
          <volume>57</volume>
          (
          <issue>2</issue>
          ), (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15] Mitchell W Pesesky, Tahir Hussain, Meghan Wallace, Sanket Patel, Saadia Andleeb,
          <string-name>
            <surname>Carey-Ann D Burnham</surname>
          </string-name>
          , and Gautam Dantas, '
          <article-title>Evaluation of machine learning and rules-based approaches for predicting antimicrobial resistance profiles in gram-negative bacilli from whole genome sequence data'</article-title>
          ,
          <source>Frontiers in microbiology, 7</source>
          ,
          <year>1887</year>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Paz</given-names>
            <surname>Revuelta-Zamorano</surname>
          </string-name>
          ,
          <article-title>Alberto Sa´nchez, Jose´ Luis Rojo- A´lvarez, Joaqu´ın A´ lvarez-Rodr´ıguez, Javier Ramos-Lo´ pez, and Cristina Soguero-Ruiz, 'Prediction of healthcare associated infections in an intensive care unit using machine learning and big data tools'</article-title>
          ,
          <source>in XIV Mediterranean Conference on Medical and Biological Engineering and Computing</source>
          <year>2016</year>
          , pp.
          <fpage>840</fpage>
          -
          <lpage>845</lpage>
          . Springer, (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>AD</given-names>
            <surname>Russell</surname>
          </string-name>
          , '
          <article-title>Antibiotic and biocide resistance in bacteria: introduction'</article-title>
          ,
          <source>Journal of applied microbiology</source>
          ,
          <volume>92</volume>
          ,
          <fpage>1S</fpage>
          -
          <lpage>3S</lpage>
          , (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Jean-Francois</surname>
            <given-names>Timsit</given-names>
          </string-name>
          , Stephan Harbarth, and Jean Carlet.
          <article-title>De-escalation as a potential way of reducing antibiotic use and antimicrobial resistance in icu</article-title>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>ML</given-names>
            <surname>Tlachac</surname>
          </string-name>
          ,
          <article-title>Elke A Rundensteiner, Kerri Barton</article-title>
          , Scott Troppy, Kirthana Beaulac, and Shira Doron, '
          <article-title>Predicting future antibiotic susceptibility using regression-based methods on longitudinal massachusetts antibiogram data</article-title>
          .',
          <string-name>
            <surname>in</surname>
            <given-names>HEALTHINF</given-names>
          </string-name>
          , pp.
          <fpage>103</fpage>
          -
          <lpage>114</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Alexey</surname>
            <given-names>Tsymbal</given-names>
          </string-name>
          , Mykola Pechenizkiy, Padraig Cunningham, and Seppo Puuronen, '
          <article-title>Handling local concept drift with dynamic integration of classifiers: Domain of antibiotic resistance in nosocomial infections'</article-title>
          ,
          <source>in 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06)</source>
          , pp.
          <fpage>679</fpage>
          -
          <lpage>684</lpage>
          . IEEE, (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Lindsey</surname>
            <given-names>M Weiner</given-names>
          </string-name>
          ,
          <article-title>Amy K Webb, Brandi Limbago, Margaret A Dudeck, Jean Patel</article-title>
          , Alexander J Kallen, Jonathan R Edwards, and
          <string-name>
            <surname>Dawn</surname>
            <given-names>M Sievert</given-names>
          </string-name>
          , '
          <article-title>Antimicrobial-resistant pathogens associated with healthcare-associated infections: summary of data reported to the national healthcare safety network at the centers for disease control and prevention,</article-title>
          <year>2011</year>
          -
          <fpage>2014</fpage>
          ', infection control &amp;
          <source>hospital epidemiology</source>
          ,
          <volume>37</volume>
          (
          <issue>11</issue>
          ),
          <fpage>1288</fpage>
          -
          <lpage>1301</lpage>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Shichao</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Xindong Wu, and Manlong Zhu, '
          <article-title>Efficient missing data imputation for supervised learning'</article-title>
          ,
          <source>in 9th IEEE International Conference on Cognitive Informatics (ICCI'10)</source>
          , pp.
          <fpage>672</fpage>
          -
          <lpage>679</lpage>
          . IEEE, (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>