<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Intrusion Detection System for Healthcare Applications using Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Miloud Khaldi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nadir Mahammed</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammed Abdrrahim Lahmar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fadela Djelloul Daouadji</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LabRI-SBA Lab, Ecole Superieure en Informatique Sidi Bel Abbes</institution>
          ,
          <country country="DZ">Algeria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Internet of Things (IoT) is an extension of the current Internet to all objects that can communicate, directly or indirectly, with electronic equipment that is connected to the Internet. IoT ofers services in many areas related to human life such as health, transport, home, smart cities, etc. The security of these components and data transfers is a major issue. In the ifeld of healthcare, a medical staf submits requests to the Internet of Medical Things (IoMT) for tasks execution. Intruders can submit false requests to disrupt the operation of these devices. The detection of these attacks requires the development of a reliable security system capable of detecting any intrusion during all phases of the execution process. In this work we propose a supervised Machine Learning based Intrusion Detection System (IDS) for Internet of Medical Things (IoMT), in which we have adopted a Features Selection approach to improve the proposed system performance. This ML-based IDS has been designed to detect suspicious or malicious activities in IoMT, thereby contributing to preventing privacy breaches and security attacks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Intrusion Detection System (IDS)</kwd>
        <kwd>Machine Learning (ML)</kwd>
        <kwd>Features selection</kwd>
        <kwd>Internet of Medical Things (IoMT)</kwd>
        <kwd>Healthcare</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        (IoMT) in a healthcare environment. The goal is to
develop a reliable Intrusion Detection System (IDS) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] that
The Internet of Medical Things (IoMT) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a technology can detect any intrusion during all phases of the
exethat connects medical devices to the Internet, enabling cution process. This system will use Machine Learning
them to communicate with each other. It has various techniques to detect any false requests submitted by
inapplications in healthcare, such as remote patient moni- truders that could potentially compromise the security
toring, medical device management, and healthcare ana- of the network and devices. The ultimate objective is to
lytics, and remote surgery. The security of IoMT devices provide a secure and reliable environment for the users
and data transfers is a significant concern. Malicious to submit their requests without any threat of intrusion.
individuals can exploit vulnerabilities in the system to To reach this goal, we introduced a classification model
send false requests, leading to system disruptions and using a combination of two features selection methods
compromising network security. and hyper parameters tuning. These methods include
      </p>
      <p>The problem statement of this project is the security Pearson’s Correlation Coeficient (PCC), which is one of
challenges associated with IoT components and data the filtering methods, and Backward Elimination (BE),
transfers in a Cloud-IoT system specifically designed for which is one of the wrapper methods. We sought to find
healthcare applications. In the realm of healthcare, the the optimal threshold to obtain an eficient classification
Internet of Things enables seamless interaction and com- model with high accuracy and a low false alarm rate. In
munication between various objects, facilitating the de- our experiments, we used the NSL-KDD datasets to train
livery of diverse services. However, ensuring the security and evaluate our model.
of these components and data transfers is a critical issue. The rest of this paper is structured as follows. Section 2
The system is vulnerable to potential threats, as malicious provides an overview of the related work. Section 3
introactors can exploit vulnerabilities by submitting false re- duces the proposed model. Section 4 describes the dataset
quests, leading to disruptions in system operations and used. The preprocessing of the dataset is presented in
compromising the overall network security. Section 5. Section 6 details the features selection method</p>
      <p>The purpose of this project is to address the security applied. In Section 7, the experimental results of our
concerns associated with the Internet of Medical Things proposed model are presented and discussed. Finally,
Section 8 concludes the paper and gives some research
perspectives.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>KDD standard dataset is used for the experiments of this</title>
        <p>
          research. Ten popular Machine Learning (ML) algorithms
Many works are being carried out in the context of Intru- were evaluated using the NSL-KDD dataset in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. These
sion Detection System using Machine learning to find the algorithms were ranked based on their performance on
best parameters and results in terms of performance in various parameters, including specificity, sensitivity, and
various environments, but only a few researches are done accuracy. After analyzing the top four performing
algoparticularly for IoMT and healthcare systems. Pandey rithms, it was found that they consumed a significant
and Badal [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] proposed a Machine Learning-based In- amount of time during model building. As a result,
featrusion Detection System for Denial of Service (DoS) ture selection techniques were applied to reduce the time
attacks. In this work, the training and test datasets were required for intrusion detection without sacrificing
acpreprocessed by removing irrelevant attack classes such curacy. The experimental results clearly demonstrated
as Probe, User to Root (U2R), and Remote to Local (R2L). the efectiveness of various algorithms with or without
Then, 14 new datasets were generated for each combi- feature selection in achieving high accuracy while
minination of features group. Next, Random Tree was used mizing the time taken for model building. A study was
as a binary classifier to train and test the models with carried out by the authors [7] to explore the potential of
the datasets. The instances were classified as either nor- Machine Learning classification algorithms in
safeguardmal or attack. The experimental results of 15 models ing IoT against DoS attacks. The researchers conduct
were compared based on performance metrics. Then, the a thorough examination of classifiers that can enhance
“best class model” for features selection was chosen based the development of anomaly-based Intrusion Detection
on superior performance compared to the other models. Systems (IDSs). To evaluate the performance of the
clasFinally, Correlation-based Feature Selection (CFS), Infor- sifiers, the study employs prominent metrics and
valimation Gain (IG), and Gain Ratio (GR) algorithms were dation methods and utilizes well-known datasets, such
applied to the datasets of the “best class model” to per- as CIDDS-001, UNSW-NB15, and NSL-KDD, for
benchform features selection. The work done by Saheed et marking. Furthermore, the study proposes a
methodal. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] presents a Machine Learning-based intrusion de- ology for selecting the best classifier based on specific
tection for detecting internet of things network attacks application requirements. The primary objectives of the
approach. The proposed approach utilizes a combination research are to inspire IoT security researchers to develop
of feature reduction techniques and ensemble learning al- IDSs using ensemble learning and suggest appropriate
gorithms to efectively identify attacks. The researchers approaches for statistically assessing the classifier’s
peremployed Principal Component Analysis (PCA) feature formance. The performance of single classifiers including
selection method. The proposed model was evaluated CART and MLP, and classifier ensembles namely Random
on UNSW-NB15 dataset. Several machine learning algo- Forest (RF), AdaBoost (AB), Extreme Gradient Boosting
rithms are trained on the dataset, such as Extreme Gradi- (XGB), Gradient Boosted Machine (GBM), and Extremely
ent Boosting (XGBoost), Cat Boost, K Nearest Neighbor Randomized Trees (ETC) is measured in terms of
promi(KNN), Support Vector Machine (SVM), Quadratic Dis- nent metrics, i.e., accuracy, specificity, sensitivity, false
criminant Analysis and Naïve Bayes. The experimental positive rate, area under the receiver operating
characresults showed that the proposed model outperformed teristic curve. Hyper-tuning of all the classifiers is done
other models in terms of accuracy, precision, recall, and using random search. The significant diferences of
clasF1 score. The XGBoost gave outstanding accuracy reach- sifiers are statistically assessed using a well-known
staing 99.99%, precision, F1 score, and MCC compared to tistical test. Random Forest outperforms other classifiers
other proposed models. In [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] the authors build an Intru- in terms of accuracy (94.94%) and specificity ( 91.6%).
sion Detection System (IDS) model based on optimized Pande et al. [8] provide novel deep learning framework
Machine Learning algorithms. The machine learning al- for the detection of attacks. Also, a comparison of
magorithms used in this research are KNN, SVM and RF. To chine learning and deep learning algorithms is provided.
improve these algorithms classification accuracy, some Findings the obtained results are more than 99% for the
parameters of the algorithms are optimized using Parti- NSL-KDD dataset.
cle Swarm Optimization (PSO) and Artificial Bee Colony
(ABC) optimization techniques, while other parameters
are used with default values. The result of this experi- 3. Proposed Model
ment shows that optimized KNN, SVM and RF perform
better than these algorithms with their default parame- In this section, we will present our IDS architecture,
deter values. Furthermore, the results of the experiment scribing in detail the datasets used for this work and
show that KNN is the most suitable algorithm for net- giving a comprehensive overview of the entire Machine
work anomaly detection regarding detection of known Learning (ML) process. Starting from the initial step
network attacks and unknown network attacks. NSL- of dataset preprocessing, including feature engineering,
through to Cross-Validation. Our Intrusion Detection
System is of the Network-based IDS (N-IDS)
anomalybased detection type, its role is to detect if there is an
attack that wants to damage our devices (medical things)
or perturb their operation (see Figure 1).
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Data-Set</title>
      <sec id="sec-3-1">
        <title>4.1. Description of dataset features</title>
        <sec id="sec-3-1-1">
          <title>The NSL-KDD dataset includes a total of 41 features</title>
          <p>shown in Table 1. These features are divided into three
diferent types:
• Basic features.
• Content features.
• Trafic features.
• Normal class : Represents normal network
connections that are considered legitimate and
non-threatening.
• DoS class : Denotes Denial-of-Service attacks,
where the goal is to disrupt or disable the targeted
system or network [11].
• Probe class : Represents probing attacks,
where an attacker attempts to gather information
about the target system or network for potential
vulnerabilities [11].
• R2L class : Stands for Remote-to-Local attacks,
where an unauthorized user tries to gain access
to a local system from a remote location [12].
• U2R class : Represents User-to-Root attacks,
where a local user with limited privileges
attempts to escalate their privileges to gain root
access [13].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Data Preprocessing</title>
      <sec id="sec-4-1">
        <title>In this section, we will discuss the approach and methods used to preprocess this data.</title>
        <sec id="sec-4-1-1">
          <title>5.1. Data transformation</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>The train set and the test includes three categorical fea</title>
        <p>tures: “protocol_type”, “service” and “flag”.</p>
        <p>To transform these categorical features into numerical
features, we used the “1-N encoding” method.</p>
        <sec id="sec-4-2-1">
          <title>5.2. Data normalization</title>
          <p>In our study, we opted for “MinMaxScaler”
normalization, which scale the features values in a range between 0
and 1 for both train and test datasets using the following
equation:</p>
          <p>− ()
() − ()</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>5.3. Binary classification</title>
          <p>Binary classification in the context of the NSL-KDD
dataset refers to the task of classifying network
connections into two distinct categories: “normal” and
“attack”, for which we have assigned:
• The value “0” to the normal class.
• The value “1” to the attack class (DoS, U2R, R2L
and Probe).</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>After the data preprocessing stage, we obtain a new structure of datasets that are shown in Table 2.</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Features selection</title>
      <sec id="sec-5-1">
        <title>The features selection procedure that we followed is based on the hybridization of filter methods (Pearson’s Correlation Coeficient) and wrapper methods (Backward Elimination).</title>
        <sec id="sec-5-1-1">
          <title>6.1. Filter method: Pearson’s Correlation</title>
        </sec>
        <sec id="sec-5-1-2">
          <title>Coeficient (PCC)</title>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>As part of the filter method, several criteria can be used to</title>
        <p>evaluate the relevance of features. The features selection
technique used is Pearson’s Correlation Coeficient (PCC).
After performing multiple tests for correlation, we found
that the best result achieved was a correlation coeficient
of 0.6. This result was obtained by utilizing a set of 27
features.</p>
        <p>The Pearson Correlation Coeficient  between two
random variables X and Y is given by the following
equation:
where, cov is the covariance and  is the variance. The
value of  lies between -1 and 1,  is close to the extreme
values -1 and 1 if X and Y are strongly correlated, and 
= 0 if X and Y are totally uncorrelated. Thus, a feature
which is strongly correlated to some other features is a
redundant one.</p>
        <sec id="sec-5-2-1">
          <title>6.2. Wrapper method: Backward</title>
        </sec>
        <sec id="sec-5-2-2">
          <title>Elimination (BE)</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>7. Results and discussion</title>
      <sec id="sec-6-1">
        <title>In this section, we will proceed to evaluate and improve</title>
        <p>the chosen methods. The experimental results of our
dataset were tested for six (06) Machine Learning
algorithms: Decision Tree (DT), Random Forest (RF),
KNearest Neighbor (KNN), Extreme Gradient Boosting
(XGBoost), Logistic Regression (LR), Support Vector
Machine (SVM), as well as the performance evaluation
metrics: Confusion Matrix, Accuracy, Precision, Recall,
F1Score and False Alarm.</p>
        <sec id="sec-6-1-1">
          <title>7.1. Initial state results</title>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>The Table 4 represents the training results of the initial state without normalization and without selection of features for each classifier.</title>
      </sec>
      <sec id="sec-6-3">
        <title>After conducting a correlation analysis, we proceeded to</title>
        <p>employ the Backward Elimination (BE) technique on the 7.2. Improvements
initial set of 27 features. Through this iterative process,
we refined the feature selection to a final subset of 20 In this section, we will present the performance
improvefeatures, resulting in optimal accuracy performance (See ment we’ve got at each stage for each algorithm using
Table 3). accuracy as performance evaluation metric from the
ini</p>
        <p>Figure 2 shows the pseudo code for the Backward Elim- tial state the nfial state: after normalization, after features
ination (BE) method: selection and after cross validation.</p>
        <p>These features are considered to be the most
important, significant, informative and relevant for the attack 7.2.1. Normalization improvements
prediction model.</p>
      </sec>
      <sec id="sec-6-4">
        <title>After considering normalization, we obtained the improvements shown in Table 5:</title>
        <p>7.2.2. Features selection improvements</p>
      </sec>
      <sec id="sec-6-5">
        <title>After selecting best features, we obtained the improvements shown in Table 6:</title>
        <p>7.2.3. Cross-Validation improvements</p>
      </sec>
      <sec id="sec-6-6">
        <title>After performing Cross-Validation with 10-fold, we obtained the improvements shown in Table 7:</title>
        <p>7.2.4. Total improvements (final state results)</p>
      </sec>
      <sec id="sec-6-7">
        <title>The overall improvement in term of accuracy from the ifrst state to the last state (after Normalization, FS and CV) are shown in Table 8, Figure 3, Figure 4 and Figure 5:</title>
        <p>7.2.5. Results discussion</p>
      </sec>
      <sec id="sec-6-8">
        <title>Through the application of learning techniques including</title>
        <p>Normalization, Features Selection and Cross-Validation,
significant improvements in performance (Accuracy,
Precision, Recall, F1-Score and False Alarm) were observed
comparing to the initial state.</p>
        <p>Firstly, data transformation was performed by
encoding categorical features using “1-N encoding” method,
and the “MinMaxScaler” function was applied to
normalize the data where the Support Vector Machine (SVM)
model showcased the most substantial improvement,
relfecting a remarkable improvement of 34.2308%.</p>
        <p>Next, features selection was carried out using a hybrid
approach. The filter method, which involved correlation
8. Conclusion
(PCC) analysis, was combined with the wrapper method
utilizing Backward Elimination (BE). Through this
process, 20 features were selected based on their ability to In this paper, we explored the use of Machine Learning
improve accuracy. The Logistic Regression (LR) model techniques for intrusion detection in Internet of Medical
demonstrated notable progress indicating an improve- Things (IoMT). To gain an in-depth understanding of the
ment of 8.4767%. concepts and mechanisms used in our project, we
be</p>
        <p>Finally, Cross-Validation was employed to determine gan by conducting a global study of IoMT, their security
the optimal hyperparameters. This involved dividing the issues and the various solutions available in the
literadataset into 10 folds and iteratively training and evaluat- ture. We have developed an Intrusion Detection System
ing the model using diferent hyperparameter settings. (IDS) intended for IoMT which is based on a learning
The aim was to identify the hyperparameters that yielded method based on the features selection using a hybrid
apthe best performance. The Decision Tree (DT) model proach between a filtering method (PCC) and a wrapper
exhibited a modest improvement with an increase of method (BE). During our experiments, we introduced our
4.9503%. own features selection technique to the NSL-KDD dataset</p>
        <p>In terms of overall improvements, the Support Vector after an encoding and normalization phase. Our
techMachine (SVM) model showed the most robust improve- nique has proved extremely efective, and is independent
ment, with accuracy rising from 43.08% in the first state of the classification model used. Using the correlation
to an impressive 85.48% in the last state, representing a method (PCC), we identified the features most closely
remarkable improvement of 42.4015%. related to the target class. Subsequently, the use of the
BE method allowed us to select the most important
features among those previously selected by the correlation</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Lutkevich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Vecchio</surname>
          </string-name>
          ,
          <article-title>What is the internet of medical things (iomt</article-title>
          )?,
          <year>2023</year>
          . URL: https://www.techtarget.com/iotagenda/definition/ IoMT-Internet-
          <article-title>of-Medical-Things.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>F. M.</surname>
          </string-name>
          et al,
          <article-title>Machine learning for classification analysis of intrusion detection on nsl-kdd dataset</article-title>
          ,
          <source>Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12</source>
          (
          <year>2021</year>
          )
          <fpage>2286</fpage>
          -
          <lpage>2293</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Badal</surname>
          </string-name>
          ,
          <article-title>Machine learning based intrusion detection system for denial of service attack, Computational Methodologies for Electrical and Electronics Engineers (</article-title>
          <year>2021</year>
          )
          <fpage>29</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Y. K. S.</surname>
          </string-name>
          et al,
          <article-title>A machine learning-based intrusion detection for detecting internet of things network attacks</article-title>
          ,
          <source>Alexandria Engineering Journal</source>
          <volume>61</volume>
          (
          <year>2022</year>
          )
          <fpage>9395</fpage>
          -
          <lpage>9409</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Khorram</surname>
          </string-name>
          ,
          <article-title>Network intrusion detection using optimized machine learning algorithms</article-title>
          ,
          <source>European Journal of Science and Technology</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Malhotra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , Intrusion detection using
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>