A Deep Learning Approach for Intrusion Detection System in
                      Industry Network
          Ahmad HIJAZI                               EL Abed EL SAFADI                               Jean-Marie FLAUS
   Univ.Grenoble Alpes, G-SCOP,                  Univ.Grenoble Alpes, G-SCOP,                  Univ.Grenoble Alpes, G-SCOP,
     F-38000 Grenoble, France                      F-38000 Grenoble, France                       F-38000 Grenoble, France
        ahd.hjz@gmail.com                         Abed.safadi@grenoble-inp.fr                 Jean-marie.Flaus@grenoble-inp.fr


    Abstract— Network has brought convenience to the world by        positive rates, in addition to that it is difficult to select normal
allowing flexible transformation of data, but it also exposes a      behavior of traffic dataset in the network.
high number of vulnerabilities. A Network Intrusion Detection
System (NIDS) helps system and network administrators to             Various machine learning techniques have been used to
detect network security breaches in their organizations.
                                                                     develop NIDSs, such as Articial Neural Networks (ANN),
Identifying anonymous and new attacks is one of the main
challenges              in           IDSs              researches.   Support VectorMachines (SVM), Naive-Bayesian (NB),
Deep learning (2010’s), which is a subfield of machine learning      Random Forests (RF), Self-Organized Maps (SOM), etc. The
(1980’s), is concerned with algorithms that are based on the         NIDSs are developed as classifiers to differentiate the normal
structure and function of brain called artificial neural networks.   traffic from the anomalous traffic [3].
The progression on such learning algorithms may improve the
functionality of IDS especially in Industrial Control Systems to     In this paper, an intrusion detection system using the deep
increase its detection rate on unknown attacks. In this work, we     learning is proposed to secure the ICS network. The proposed
propose a deep learning approach to implement an effective and       technique uses multi-layer perceptron with binary
enhanced IDS for securing industrial network.
                                                                     classification and trains high-dimensional Modbus packet data
Keywords—Intrusion Detection System, Deep Learning, SCADA,           after a network simulation and label the data with normal and
Modbus, Industrial Control Systems, Artificial Neural Networks.      malicious in order to the neural network to understand the
                                                                     underlining structure of the normal and anomalous behavior of
                        I. INTRODUCTION                              the network.
Targeted attacks on industrial control systems are the biggest
threat to critical national infrastructure, says Kaspersky Lab.                               II.ICS AND IDS
Today’s industrial control systems (ICS) face an array of
digital threats. Two in particular stand out. On the one hand,       A. ICS overview
digital attackers are increasingly targeting and succeeding in       Industrial control system (ICS) is a general term that
gaining unauthorized access to industrial organizations. Some        encompasses several types of control systems, including
actors use malware, while others resort to spear-phishing (or        supervisory control and data acquisition (SCADA) systems,
whaling) and other social engineering techniques [1]. The            distributed control systems (DCS), and other control system
main challenge is linked to the fact these systems typically         configurations such as Programmable Logic Controllers (PLC)
control physical processes that relate to power, transport,          often found in the industrial sectors and critical infrastructures.
water, gas and other critical infrastructure. Because the output
of ICS relates to physical processes, the effects of any             ICS have different performance and reliability requirements,
downtime – such as a power outage – can affect millions of           and also use operating systems and applications that may be
people [2].                                                          considered unconventional in a typical IT network
                                                                     environment. Security protections must be implemented in a
Signature-based and anomaly-based Intrusion Detection                way that maintains system integrity during normal operations
System is one aspect of an effective network security                as well as during times of cyber-attack.
monitoring strategy. Very few asset owners have IDS/IPS
deployed and configured appropriately at the boundary                A typical ICS contains numerous control loops, human
between the Enterprise IT and ICS networks.                          interfaces, and remote diagnostics and maintenance tools built
However, network intrusion detection has been criticized for         using an array of network protocols on layered network
its propensity to generate a perceived large amount of false         architectures. A control loop utilizes sensors, actuators, and
positives and false negatives. Signature-based IDS lacks the         controllers (e.g., PLCs) to manipulate some controlled
capability of detecting new forms of attacks that it had not         process. A sensor is a device that produces a measurement of
seen before, and anomaly based produces high amount of false         some physical property and then sends this information as
                                                                     controlled variables to the controller. The controller interprets


                                                                                                                                            55
the signals and generates corresponding manipulated                 C. Deep learning and IDS
variables, based on a control algorithm and target set points,          Signature based IDS is effective in the detection of known
which it transmits to the actuators.                                attacks and results in a high detection accuracy and less false-
                                                                    alarm rates. However, its performance suffers during detection
Industrial control systems underpin the critical national           of unknown or new attacks due to the limitation of rules that
infrastructure and are essential for the success of industries      can be installed beforehand in an IDS. On the other hand,
such as:                                                            anomaly based IDS, is well-suited for the detection of
                                                                    unknown and new attacks. Although Anomaly Detection IDS
        Electricity production and distribution                    produces high false-positive rates, its theoretical potential in
        Water supply and treatment                                 the identification of new attacks has caused its wide acceptance
        Food production                                            among the research community. There are primarily two
                                                                    challenges that arise while developing an effective and flexible
        Oil and gas production and supply                          NIDS for the unknown future attacks. First, proper feature
        Chemical and pharmaceutical production                     selections from the network traffic dataset for anomaly
        Telecommunications                                         detection is difficult. As attack scenarios are continuously
        Manufacturing of components and finished products          changing and evolving, the features selected for one class of
        Paper and pulp production [5].                             attack may not work well for other classes of attacks. Second,
                                                                    unavailability of labeled traffic dataset from real networks for
SCADA and industrial protocols, such as Modbus/TCP, are             developing an NIDS.
critical for communications to most control devices.                    Deep learning belongs to a class of machine learning
Unfortunately, many of these protocols were designed without        methods, where employs consecutive layers of information-
security built in and do not typically require any authentication   processing stages in hierarchical manners for pattern
to remotely execute commands on a control device.                   classification and feature or representation learning. Usually
                                                                    deep learning plays the important role in image classification
B. IDS for ICS                                                      results. In addition, deep learning is also commonly used for
                                                                    language, graphical modeling, pattern recognition, speech,
For a long time, ICS/SCADA was an area that relied on               audio, image, video, natural language and signal processing.
different embedded devices and clear-text communications            There are many deep learning methods such as Deep Belief
such as Modbus/TCP, without taking into consideration the           Network (DBN), Restricted Boltzman Machine (RBM), Deep
security approach which made it vulnerable to different types       Boltzman Machine (DBM), Deep Neural Network (DNN),
of attacks and it becomes a target of cyber threats. This           Auto Encoder, Deep / stacked Auto Encoder, etc… [6].
resulted in a new focus on the security issues related to
                                                                        The advancements on learning algorithms might improve
industrial control systems.
                                                                    IDS ability to reach higher detection rate and lower false alarm
                                                                    rate. It is envisioned that the deep learning based approaches
Intrusion Detection System are capable of providing visibility      can help to overcome the challenges of developing an effective
and detection of any breach on the network, IDS can alarm in        NIDS.
response to network security or endpoint security events.
IDSs for ICT networks have become very popular; especially             In this work, we will use Multi-layer Perceptrons with
for identifying the signatures of many pieces of known              binary classification which we found the most useful type of
malicious code (e.g. SNORT rules), other IDSs utilize model-        neural network where the only two output classes will be
base anomaly detectors. Modern ICS equipment does not               normal and malicious ones. A Perceptron is a single neuron
                                                                    model that was a precursor to larger neural networks.
normally fall in the same category as computer systems in
modern-day ICT networks. ICS equipment is not typically                 The power of neural networks come from their ability to
designed with security logging and processing in mind. It does      learn the representation in your training data and how to best
not usually run standard operating systems used in ICT              relate it to the output variable that you want to predict. In this
desktops and servers. Network-based IDSs are a network              sense neural networks learn a mapping. Mathematically, they
device that collects network traffic directly from the network,     are capable of learning any mapping function and have been
often from a central point such as a router or switch. Data         proven to be a universal approximation algorithm. The data
from multiple network sensors can be aggregated into a central      structure can pick out (learn to represent) features at different
processing engine, or processing may occur on the collection        scales or resolutions and combine them into higher-order
machine itself. The network traffic can also be analyzed for        features. For example from lines, to collections of lines to
unsatisfactory traffic or behavior patterns; either patterns that   shapes.
are anomalous to a previously established traffic or behavior
model, or specific traffic patterns that display non-conformity      III.APPLICATION OF DEEP LEARNING ALGORTHM TO NETWORK
to standards, e.g. violations of specific communication                                         TRAFFIC
protocols.
                                                                    The steps for building a good deep learning approach consists
                                                                    of preparing the data, defining and compiling the model,
                                                                    fitting the model, and evaluation (prediction) the model. We


                                                                                                                                         56
will start with a brief overview concerning the deep learning
structure.

A. Overview of deep neural networks
    1) Neurons
The building block for neural networks are artificial neurons.
These are simple computational units that have weighted input
signals and produce an output signal using an activation
function.


                                                                        Fig. 2. An example of deep neural network with five layers


                                                                       a) Input Layer
                                                                   The first layer that takes input from some dataset is called the
                                                                   input or visible layer, because it is the exposed part of the
                                                                   neural network. Often a neural network is characterized with
                                                                   an input layer with one neuron per each input value in the
                                                                   dataset.
                Fig. 1. Model of a Simple Neuron
                                                                        b) Hidden Layer
     2) Neuron Weights                                             After the input layer, we have the hidden layers, they are
Each neuron has a bias which can be thought of as an input         called hidden because they are not directly exposed to the
that always has the value 1.0 and it too must be weighted. For     input. The simplest example of a neural network is to have a
example, a neuron may have two inputs in which case it             single neuron in the hidden layer that directly outputs a value.
requires three weights. One for each input and one for the bias.   With the increase in computing power and very efficient
Weights are often initialized to small random values, such as      libraries, very deep neural networks can be built. Neural
values in the range 0 to 0.3, although more complex                network can have many hidden layers in it.
initialization schemes can be used. Like linear regression,
larger weights indicate increased complexity and fragility of          c) Output Layer
the model. It is desirable to keep weights in the network small    The last layer is called the output layer and it is responsible for
and regularization techniques can be used.                         exporting the value or vector of values that correspond to the
                                                                   format required for the problem.
     3) Activation                                                 B. Training The Network
The weighted inputs are summed and passed through an                    a) Data Classification
activation function, sometimes called a transfer function. An      In order to use binary classification, we should capture two
activation function is a simple mapping of summed weighted         types of data, in our case it will be normal and malicious
input to the output of the neuron. It is called an activation      packets to train the neural network on. As neural networks can
function because it governs the threshold at which the neuron      only work with numerical data, we have to label the network
is activated and the strength of the output signal. Historically   packets with 0 or 1 for normal and malicious packets.
simple step activation functions were used where if the            We captured a big dataset that is composed of normal network
summed input was above a threshold, for example 0.5, then          traffic, i.e. a normal behavior of the ICS devices. In order to
the neuron would output a value of 1.0, otherwise it would         get the malicious packets, we prepared a table consisting of
output a 0.0.                                                      the opposite functions and values of the normal ones, that is
                                                                   different IP sources, IP destinations, port numbers, protocol
     4) Network of Neurons                                         numbers, Modbus (functions, values, registers, coils) etc…
DL involves making very large and deep (i.e. many layers of        And then we captured almost the same number of packets.
neurons) neural networks to solve specific problems, as shown      After that, we combined the normal and malicious packets into
in Fig.3. Thus, similar to how neurons are organized in layers     one dataset and added a column labeling the packets 0 for
in the human brain cells, neurons in neural networks are often     normal and 1 for malicious one.
organized in layers as well. So, an algorithm is deep if the
input is passed through several non-linearities before being           b) Data Values
output.                                                            Data must be numerical, for example real values. If we have
                                                                   categorical data, such as a sex attribute with the values male
                                                                   and female, we can convert it to a real-valued representation


                                                                                                                                         57
called a one hot encoding. This is where one new column is
added for each class value (two columns in the case of sex of           d) Prediction
male and female) and a 0 or 1 is added for each row depending      Once a neural network has been trained it can be used to make
on the class value for that row.                                   predictions. You can make predictions on test or validation
Neural networks require the input to be scaled in a consistent     data in order to estimate the skill of the model on unseen data.
way. We can rescale it to the range between 0 and 1 called         You can also deploy it operationally and use it to make
normalization. Another popular technique is to standardize it      predictions continuously. The network topology and the final
so that the distribution of each column has the mean of zero       set of weights is all that you need to save from the model.
and the standard deviation of 1. Scaling also applies to image     Predictions are made by providing the input to the network
pixel data. In our case, the data will be a captured PCAP file     and performing a forward-pass allowing it to generate an
where the fields consists of IP addresses, port numbers,           output that you can use as a prediction [7].
hexadecimal Modbus values as shown in Fig. 4.
                                                                   C. Model Approach
                                                                        a) Preparing the Neural Network
                                                                   As deep learning structure is defined as a sequence of layers,
                                                                   we will create a sequential model and add layers one at a time
                     Fig. 3. Modbus Frame                          until we are satisfied with our network topology. The first thing
                                                                   to get right is to ensure the input layer has the right number of
Thus, data must be well-prepared before training the neural        inputs. In our case, the number of inputs will be the number of
network on, we should convert the IP addresses, hexadecimal        fields extracted from the network packets as shown in Fig.6, in
values, and all other non-decimal attributes into decimal ones,    addition to the last field which indicates if the packet is normal
preferred between 0 and 1.                                         or malicious.

     c) Stochastic Gradient Descent
The classical and still preferred training algorithm for neural
networks is called stochastic gradient descent. This is where
one row of data is exposed to the network at a time as input.
The network processes the input upward activating neurons as
it goes to finally produce an output value. This is called a
forward pass on the network. It is the type of pass that is also
used after the network is trained in order to make predictions
on new data.
The output of the network is compared to the expected output
and an error is calculated. This error is then propagated back
through the network, one layer at a time, and the weights are
updated according to the amount that they contributed to the
error. This clever bit of math is called the Back Propagation
algorithm. The process is repeated for all of the examples in
your training data. One round of updating the network for the                 Fig. 5. Input parameters of the neural network
entire training dataset is called an epoch. A network may be
trained for tens, hundreds or thousands of epochs, an example      As shown in the above figure, we have 12 inputs including
of epoch round is shown in Fig. 5.                                 different types of fields (IP, TCP, and MODBUS). The neural
                                                                   network will try to train and learn using those attributes.
                                                                   How do we know the number of hidden layers to use and their
                                                                   types? This is a bit hard question. There are heuristics that we
                                                                   can use and often the best network structure is found through a
                                                                   process of trial and error experimentation. Generally, we need
                                                                   a network large enough to capture the structure of the problem
                                                                   if that helps at all. In our case we will use a fully-connected
                                                                   network structure with three layers as shown in Fig. 6.

                                                                   Next, it’s best to think about the structure of our layer, we
                                                                   have an input layer, some hidden layers and an output layer.
                                                                   As stated previously, a type of network that performs well on
                                                                   binary classification problem is a multi-layer perceptron. This
          Fig. 4. Epoch example during network training


                                                                                                                                        58
type of neural network is often fully connected. That means          deviation of 1. This can be thought of as subtracting the mean
that we are looking to build a fairly simple stack of fully-         value or centering the data. Standardization can be useful, and
connected layers to solve this problem. As for the activation        even required in some machine learning algorithms when the
function that you we will use, it’s best to use one of the most      input     data    values      are    of     different    scales.
common functions which is relu activation function [8].              Below is a table showing the network input conversion for a
The Rectified Linear Unit has become very popular in the last        normal packet:
few years for logistic/continues output. It computes the
                                                                                                      Table-1
function                                                                              Network packet different conversion stages

                     𝑓(𝑥) = max⁡(0, 𝑥)                               Attribute            Normal Value      Decemalized Value   Encoded Value

One way ReLUs improve neural networks is by speeding up              IP Source            192.168.1.5       3232235781          0.53640178
training. The gradient computation is very simple (either 0 or 1
depending on the sign of x).
                                                                     IP Destination       192.168.1.3       3232235779          0
When we are building our model, it’s therefore important to
take into account that the first layer needs to make the input       Protocol             6                 6                   0
shape clear. The model needs to know what input shape to
expect and that’s why you’ll always find the input shape, input      TTL                  128               128                 0.71646104
dimension, input length arguments in the documentation of the
layers and in practical examples of those layers Fig.7.              TCP Window Size      524288            524288              -1.06582338


                                                                     Destination Port     56783             56783               1.0261182
                          Output Layer
                           (1 output)
                                                                     Source Port          502               502                 -1.01072698


                                                                     TCP Length           0                 0                   -0.99563837

                          Hidden Layer
                                                                     Modbus Data          FF:00             65280               -0.01348645
                           (8 neurons)
                                                                     Modbus Code          5                 5                   -0.88003806


                                                                     Modbus Register      0                 0                   -0.05902683
                          Input Layer
                          (18 inputs)
                                                                     Modbus               100               100                 -0.13751838
                                                                     Reference
         Fig. 6. Visualization of Neural Network Structure


    b) Encoding                                                          c) Computation Time
However, the training must be on numerical fields only, that is      The machine used to run the algorithm is a Intel® Core™ i7-
if we have an IP address which have the format                       3630QM @ 2.4GHz with 8GB installed memory (RAM)
xxx.xxx.xxx.xxx, the network wont understand it, same as if          having x64-based processor with 4 cores and 8 Logical
we have a hexadecimal Modbus data of FF00 for example.To             Processors. The total time for learning (Training + Testing)
Solve this problem, data must be converted into decimals, we         was 3228 seconds that is 54 minute (Fig.8).
used Excel plugins to convert IP addresses and hexadecimal
values into numbers, so that all the fields became of decimal
values.

As the scales of the different fields are wildly different, it may
have a knock-on effect on network ability to learn. To
overcome this, we used data standarization. Standardization is
a scaling technique that assumes your data conforms to a
normal distribution. If a given data attribute is normal or close
to normal, this is probably the scaling method to use.
                                                                                          Fig. 7. Training computation time
The result of standardization is that the features will be
rescaled so that they’ll have the properties of a standard
normal distribution with a mean of =0 and a standard


                                                                                                                                                59
                 IV.RESULTS AND DISCUSSION                        containing a PLC, a local network, a SCADA control and a
                                                                  virtual mockup built of electronic-designed parts, and a IHM
                                                                  for operator interaction. Fig.9 presents the generic schema of
A. Description of the Network                                     the                                                     system.
Our ICS network is composed of the SCADA, PLC, and a
simulated heater process which triggers the network with a
large amount of traffic for gathering and analyzing a real time
data to be shown on the SCADA screen, the reactor diagram is
shown in Fig.9.
                                                                                   Fig.9. General ICS architecture

                                                                  The PLC performs the control of the virtual mockup. It
                                                                  receives the data from the digital mockup as though it were a
                                                                  sensor capturing ongoing information of a physical process
                                                                  such as a fluid heater process. Then, it uses the received data
                                                                  to calculate a control signal that is sent to the mockup through
                                                                  an analog output.
                                                                  The SCADA displays the system information for a supervisor
                                                                  that can access the major system information about the
                                                                  industrial process, the information comes from the PLC that
                                                                  gets information from the sensor and updates the system
                                                                  status. The supervisor uses a PC to control some functions of
         Fig. 8. Reactor diagram with inputs/outputs label        the systems such as the water temperature and the height.
                                                                  The     real    network     is     created      by   a   Switch.
The following table summarizes the system inputs/outputs
shown in the above figure.                                        B. Proposed Approach
                                                                  The proposed intrusion detection systems considers a general
                             Table-2                              type of an attack scenario where malicious packets are injected
             Reactor system inputs and outputs values             into a SCADA network system composed of a heater and a
                                                                  PLC. The proposed intrusion detection monitors incoming
       Variable                             Value                 packets and determines an attack.
                                                                  In this work, we consider the most common industrial protocol,
          X1                           Opened/Closed              that is to say MODBUS protocol.
                                                                  Our IDS design is composed of two main phases, the training
          X2                           Opened/Closed              phase and the detection phase. The training phase is performed
                                                                  offline as it is somehow time consuming. In the training phase,
         Xout                          Opened/Closed              the Modbus packet is processed to extract a feature that
                                                                  represents the normal behavior of the network. Each trained
                                                                  Modbus packet has a label indicating either normal or
      Coolant Qc                           [0; 500]
                                                                  malicious packet, that what we call the supervised learning. We
                                                                  adopt the Neural Network structure to train the features. The
   Liquid Height H                         [0; 200]               detection phase works almost the same, the same feature is
                                                                  extracted from an incoming packet and the Neural Network
Liquid Temperature T         [coolant temperature, undefined]     structure calculates with the trained parameters to predict the
                                                                  binary decision that is either normal or malicious.
       Reactant                         [0; undefined]            In order to perform the training phase, we simulated a network
     concentration                                                traffic composed of real values to let the neural network train
                                                                  on.
  Explosion Notifier                     True/False
                                                                      a) Preparing the simulation
                                                                  The simulation is composed of three virtual machines, the first
                                                                  one is the process that will be executed each 0.1s in order to
Using existing approaches of a HIL system and a local             generate high network traffic, the second one is the SCADA –
network a hybrid approaches was designed respecting some          HMI screen that will display the result and is capable of
constraints in order to simulate an industrial environment        changing the temperature and finally the PLC controller who


                                                                                                                                     60
is responsible for reading/writing from/to the registers and
coils it is holding as shown in Fig.10, the PLC will control the
cooling flow rate.                                                 The captured PCAP file can be saved into CSV by using
                                                                   Tshark (A tool installed when installing Wireshark) where we
                                                                   can choose specific fields to be saved only (IP source, IP
                                                                   destination, Ports, Protocols, Modbus Data, etc…). Now after
                                                                   obtaining a good traffic and converting it into CSV file we can
                                                                   adjust and perform any operation on any field before training
                                                                   the               neural              network               on.

                                                                   C. Results
                                                                   Upon training the neural network on the prepared dataset
                                                                   using Tensorflow and Keras, we can evaluate the performance
                                                                   of the network on the same dataset, this will give us the
                                                                   accuracy and the loss of the training after splitting the data
                                                                   into 70% for training and 30% for testing, these evaluations
                                                                   shows how well the network is doing on the data it is being
    Fig. 10. Simulation of Modbus traffic using virtual machines   trained, training accuracy usually keeps increasing throughout
                                                                   training. Using Tensorflow visualization on training and
                                                                   testing dataset, we can view the accuracy of our approach
The process sends and receieves multiple input/output              which is shown in Fig. 13.
variables, these variables corresponds to modbus addresses in
addition to the value sent for this variable, the addresses with
their correspondant variables are shown in Fig.11.


                Fig. 11. Process input/output values
                                                                        Fig. 13. Model accuracy during the training of the network

                                                                   As we can see, the accuracy of the trained data is increasing as
      b) Capturing the traffic                                     number of steps (epochs) is increasing, until it reaches
Upon running the PLC, process and SCADA, a high volume             approximately 99.89% of accuracy, which means that there is
of network packets can be captured using Wireshark, and then       a change of 99.89% of detecting any malicious packet
filtered in order to get the Modbus/TCP traffic only that is       destined towards the network. It is good to note that the neural
running between the machines. An example of those packets is       network performed very well while training, this can be
shown in Fig.12.                                                   noticed by viewing the speed by which the network learned to
                                                                   draw a pattern from the data given to him, so that between 0
                                                                   and 40 epochs the accuracy reached approximately 100% of
                                                                   detecting. This is to ensure the importance of decimalizing and
                                                                   reshaping of the data before training the network on them.


      Fig. 12. Modbus/TCP packets capture using Wireshark


                                                                                                                                      61
Moreover, after each epoch, the model is tested against a           detect Denial of Service attacks and adding time stamps to the
validation set, Keras can separate a portion of the training data   fields in order to learn the interval of times packets usually
into a validation dataset and evaluate the performance of the       arrive by.
model on that validation dataset after each epoch. The lower
the loss, the better the model. Loss is not in percentage as                               VI.REFERENCES
opposed to accuracy and it is a summation of the errors made
for each example in training or validation sets. Fig. 14 shows      [1] David Bisson. (2016, Nov 13) How to Approach Cyber
the loss upon training the network.                                 Security for Industrial Control Systems. [Online]. Available:
                                                                    https://www.tripwire.com/state-of-security/ics-
                                                                    security/approach-cyber-security-industrial-control-systems/

                                                                    [2] Warwick Ashford. (2014, Oct 15) Industrial control
                                                                    systems: What are the security challenges? [Online].
                                                                    Available:
                                                                    http://www.computerweekly.com/news/2240232680/Industrial
                                                                    -control-systems-What-are-the-security-challenges

                                                                    [3] C.-F. Tsai, Y.-F. Hsu, C.-Y. Lin, and W.-Y. Lin,”
                                                                    Intrusion Detection by Machine Learning: A Review," Expert
                                                                    Systems with Applications, vol. 36, no. 10, pp. 11994 - 12000,
                                                                    2009.

                                                                    [4] Keith Stouffer, Victoria Pillitteri, Suzanne Lightman,
                                                                    Marshall Abrams, Adam Hahn, “Guide to Industrial Control
       Fig. 14. Model loss during the training of the network
                                                                    Systems (ICS) Security”, rev 2, NIST National Institute of
                                                                    Standards and Technology, U.S Department of Commere,
Similar to accuracy, loss will decrease as number of epochs
                                                                    May. 2015.
increase till it reaches a value of 0.005% which is almost a
negligible loss at the end of the training.
                                                                    [5] Characteristics of Industrial Control Systems. [Online].
                                                                    Available: https://www.citicus.com/Characteristics-of-
To test the neural network on malicious packets, we prepared
                                                                    Industrial-Control-Systems
a lot of anomalous packets with different IP addresses, ports,
functions, and values combinations and injected the IDS with
                                                                    [6] Muhamad Erza Aminantoa, Kwangjo Kimb, “Deep
them, the IDS detects all the packets with a high accuracy of
                                                                    Learning in Intrusion Detection System: An Overview”,
99.9%, an example of the result Keras shows when injecting it
                                                                    School of Computing, KAIST, Korea.
with a normal packet is 0.99987454, which when rounded
becomes 1 that is a normal one.
                                                                    [7] Jason Brownlee, “Deep Learning With Python: Develop
                                                                    Deep Learning Models on Theano and TensorFlow Using
This result when compared to self-taught learning (STL) and
                                                                    Keras”, v1.7.
soft-max regression (SMR) [9] shows a higher performance
rate, where when using SMR the accuracy reached 97% and
                                                                    [8] Karlijn Willems (2017, May 2) “Keras Tutorial: Deep
STR reached 98.4%, whereas our discussed approach reached
                                                                    Learning in Python”. [Online]. Available:
99.9% of accuracy.
                                                                    https://www.datacamp.com/community/tutorials/deep-
                                                                    learning-python
          V.CONCLUSION AND FUTURE WORK
                                                                    [9] Quamar Niyaz, Weiqing Sun, Ahmad Y Javaid, and
                                                                    Mansoor Alam, “A Deep Learning Approach for Network
We proposed a deep learning based approach to build an
                                                                    Intrusion Detection System”, College of Engineering, The
effective and flexible IDS. A multi-layer perceptron and
                                                                    University of Toledo, USA.
binary based IDS was implemented. We used a network
dataset that we simulated to evaluate anomaly detection
accuracy. We observed that the IDS anomaly detection
accuracy showed a very high percentage of detecting. The
performance can further be enhanced by adding the ability to


                                                                                                                                     62