=Paper= {{Paper |id=Vol-3762/582 |storemode=property |title=Real-Time Intrusion Detection via Machine Learning Approaches |pdfUrl=https://ceur-ws.org/Vol-3762/582.pdf |volume=Vol-3762 |authors=Erik Murtaj,Michela Quadrini,Fausto Marcantoni,Michele Loreti,Hans-Friedrich Witschel |dblpUrl=https://dblp.org/rec/conf/ital-ia/MurtajQMLW24 }} ==Real-Time Intrusion Detection via Machine Learning Approaches== https://ceur-ws.org/Vol-3762/582.pdf
                                Real-Time Intrusion Detection via Machine Learning
                                Approaches
                                Erik Murtaj1 , Fausto Marcantoni1 , Michele Loreti1 , Michela Quadrini1,* and
                                Hans-Friedrich Witschel2
                                1
                                    School of Science and Technology, University of Camerino, Via Madonna delle Carceri, 9, Camerino, 62032, Italy
                                2
                                    FHNW University of Applied Sciences and Arts Northwestern Switzerland, Riggenbachstrasse 16, CH-4600 Olten


                                                  Abstract
                                                  In many cybersecurity contexts, the real-time detections of hostile actions play a fundamental role in protecting network
                                                  infrastructures. In this scenario, Intrusion Detection Systems (IDS), based on signature-based or anomaly detection, are
                                                  widely used to analyze network traffic. The signature-based detection relies on databases of known attack signatures, and
                                                  anomaly detection is mainly based on Artificial Intelligence (AI) techniques. The latter is promising to detect new kinds of
                                                  cyberattacks in real time.
                                                       In this work, we propose ReTiNA-IDS, a framework that integrates the CICFlowmeter tool with Machine Learning
                                                  techniques to analyze Real-Time network traffic patterns and detect abnormalities that may suggest a possible intrusion. The
                                                  considered machine learning techniques, random forest and multi-layer network, are based on selected features to enhance
                                                  efficiency and scalability. To select the features and train the models, we use a version of the public dataset, CSECICI-IDS2018.
                                                  The framework’s effectiveness has been tested in real-case scenarios by identifying different forms of intrusion. Analyzing
                                                  the results, we conclude that the proposed solution shows valuable features.

                                                  Keywords
                                                  Random Forest, Feature Selection, analysis of Real-Time network traffic, Intrusion Detection Systems



                                1. Introduction                                                that integrates the CICFlowmeter tool with Machine
                                                                                               Learning techniques to analyze real-time network traf-
                                Intrusion Detection Systems (IDS) are relevant tools em- fic patterns and detect abnormalities that may suggest a
                                ployed in cybersecurity to protect networks from possible possible intrusion. The integrated methodology, which
                                cyber attacks.                                                 is based on random forest and multi-layer networks, is
                                   In recent years, the world of cyber security has become based on selected features to enhance efficiency and scala-
                                more turbulent, with a rise in the number of cyber-attacks bility. To select the features and train the models, the pub-
                                that target businesses worldwide. For this reason, always lic dataset CSECICI-IDS2018 has been used. The frame-
                                new methodologies are needed to shield vital assets from work’s effectiveness has been tested in real-case scenarios
                                hostile actors in reaction to this expanding danger.           by identifying different forms of intrusion. Analyzing the
                                   Recently, an increasing focus on the use of Artificial results, we conclude that the proposed solution shows
                                Intelligence (AI) in cyber security. As a subset of artificial valuable features.
                                intelligence, machine learning algorithms can improve             The paper is structured as follows. In Section 2 related
                                danger detection and automate procedures. Organiza- works are discussed while in Section 3 some basic back-
                                tions may examine massive volumes of data in real-time, ground is introduced. In Section 4 the tool ReTiNa-IDS
                                spot patterns suggestive of malicious behaviour, and take is presented, while in Section 5 some evaluation experi-
                                preemptive measures to reduce risks by utilizing machine ments are proposed. Section 6 concludes the paper.
                                learning algorithms.
                                   In this work, we propose ReTiNA-IDS, a framework
                                                                                                                                           2. Related Works
                                Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga-
                                nized by CINI, May 29-30, 2024, Naples, Italy
                                                                                                                                           The use of machine learning approaches in intrusion
                                *
                                  Michele Loreti                                                                                           detection systems to obtain real-time analysis has been
                                †
                                  These authors contributed equally.                                                                       exploited by many researchers. Many of them take advan-
                                $ erik.murtaj@studenti.unicam.it (E. Murtaj);                                                              tage of Deep Learning (DL) approaches. ARCADE is an
                                fausto.marcantoni@unicam.it (F. Marcantoni);                                                               unsupervised DL-based approach for early anomaly de-
                                michele.loreti@unicam.it (M. Loreti); michela.quadrini@unicam.it                                           tection using 1D Convolutional Neural Networks (CNNs)
                                (M. Quadrini); hansfriedrich.witschel@fhnw.ch (H. Witschel)
                                 0000-0002-7779-203X (F. Marcantoni); 0000-0003-3061-863X
                                                                                                                                           proposed by Lunardi et al. [1]. The approach builds
                                (M. Loreti); 0000-0003-0539-0290 (M. Quadrini);                                                            a profile of normal traffic based on raw packet bytes.
                                0000-0002-8608-9039 (H. Witschel)                                                                          Kathareios et al. designed and tested a real-time net-
                                            © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                            Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
work AD system, able to operate on encrypted and non-         3.1. CICFlowMeter
encrypted network packets, based on two learning stages:
                                                              CICFlowmeter is a network traffic flow generator and
an autoencoder for adaptive unsupervised AD and a
                                                              analyser [13, 14]. It generates bidirectional flows, where
custom nearest-neighbour classifier to filter false pos-
                                                              the first packet determines the forward (source to desti-
itives [2]. Shuai proposed a prototype that combines
                                                              nation) and backward (destination to source) directions.
big data processing frameworks like Apache Hadoop,
                                                              The tool enables the extraction of more than 80 statisti-
Apache Kafka, and Apache Storm, along with ML tech-
                                                              cal network traffic features such as Duration, Number of
niques, i.e., Naïve Bayesian (NB), Support Vector Machine
                                                              packets, Number of bytes, Length of packets, etc. Such
(SVM), and Decision Tree (DT). The proposed approach
                                                              features can be calculated independently for both direc-
considers six features related to the IP addresses of the
                                                              tions. The tool is developed in JAVA and provides a useful
sender, receiver, and correspondent port without taking
                                                              Graphical User Interface, shown in Figure 1 to monitor
into account flow measurements. Ho et al. suggested an
                                                              network flows in real-time. TCP flows are usually termi-
Intrusion Detection System (IDS) based on CNN that clas-
                                                              nated upon connection teardown (by FINpacket), while
sifies all packet traffic as benign or malicious, detecting
                                                              a flow timeout terminates UDP flows [15].
network intrusions [3]. Atefnia and Ahmadi proposed
a modular deep neural network model that consists of
four complete architectures that are combined with an
aggregator module, each generating distinct outputs [4].
The four architectures are a Deep Feed-Forward Mod-
ule (DFFM), a Stacked Restricted Boltzmann Machine
Module (SRBMM), and two recurrent modules, one uti-
lizing gated recurrent units (GRUM) and the other utiliz-
ing long short-term memory (LSTMM). Catillo et al. [5]
proposed an approach based on Deep Autoencoder, and
Fitni and Ramli [6] proposed a model based on deci-
sion trees that takes into account 23 features selected
by Spearman’s rank correlation coefficient [7]. Gamage
and Samarabandu considered four DL architectures, i.e.,
feed-forward neural network, autoencoder, deep belief
network, and LSTM [8]. Karatas et al. in [9] reviewed         Figure 1: Example of the CICFlowmeter’s GUI
the implementation of a Synthetic Minority Oversam-
pling Technique (SMOTE) [10] to balance the data by              The tool is developed in JAVA and provides a useful
exploiting six models. Kanimozhi and Jacob presented          GUI (Graphical User Interface) to monitor network flows
a two-layer MLP to detect only botnet attacks that ex-        in real time.
ploit a grid search for hyper-parameter optimization and
a 10-fold cross-validation for mitigating the overfitting     3.2. Machine Learning Approaches and
problems [11]. Huancayo Ramos et al. extended this ap-
proach by considering botnet data and Random Forests.              Feature Selection
Kim et al. also designed a model that exploits CNN for   The Random Forest is an ML ensemble model used for
training on a single type of attack, specifically Denial of
                                                         both classification and regression tasks. During training,
Service (DoS) attacks [12].                              the model creates numerous decision trees and deter-
                                                         mines the output class by either the mode (for classifi-
                                                         cation) or the mean/average prediction (for regression)
3. Background                                            of the classes predicted by individual trees. Introduced
In this section, we present the CICFlowMeter, an Ether- by Breiman in [16], this approach combines the bagging
net traffic Bi-flow generator and analyzer for anomaly technique with the random selection of features. Such a
detection, and the Random Forest, a machine learning random selection ensures that the decision trees within
method used for classifying flow data and evaluating the forest are uncorrelated. In the bagging phase, de-
the importance of features. This classifier will then be cision trees are constructed from bootstrap samples of
integrated into CICFlowMeter for classifying network the training dataset, where each sample is drawn with
flows.                                                   replacement, allowing for the possibility of repeated sam-
                                                         ples. These replicated datasets are then used to train
                                                         decision trees, ensuring that each tree only sees different
                                                         portions of the original dataset during training. This bag-
                                                         ging approach is coupled with random feature selection,
which involves using distinct random subsets of the en- Table 2
tire feature space√to train each tree in the random forest. The first 13 attributes ordered by importance
Usually, around 𝑛 features are employed in each split        Id Attribute               Description
for a classification task that considers ′ 𝑛′ features.      1    FWD Init Win Bytes    The total number of bytes sent in
                                                                                        initial window in the forward direction
                                                                      2    Packet Length Std       Standard deviation length of a packet
                                                                      3    Packet Length Mean      Mean length of a packet
3.3. Dataset: CSE-CIC-IDS2018                                         4    Bwd Packet Length Std   Standard deviation size of packet
                                                                                                   in backward direction
The data used in this study is the CSE-CIC-IDS2018, a        5
                                                             6
                                                                Bwd Packet Length Max
                                                                Bwd PSH Flags
                                                                                                   Maximum size of packet in backward direction
                                                                                                   Number of times the PSH flag was set in packets
benchmark dataset for the evaluation of IDSs. Such data                                            travelling in the backward direction
                                                             7  ACK Flag Count                     Number of packets with ACK
was collected by the Communications Security Establish-      8  Fwd Seg Size Min                   Minimum segment size observed in
ment (CSE) and the Canadian Institute for Cybersecurity                                            the forward direction
                                                             9  Fwd PSH Flags                      Number of times the PSH flag was set in packets
(CIC). The recorded data consists of ten days of traffic                                           travelling in the forward direction
and includes seven types of attacks. Liu et al. identified   10 CWR Flag Count
                                                             11 Packet Length Variance
                                                                                                   Number of packets with CWR
                                                                                                   Variance length of a packet
some issues in such dataset related to the creation life-    12 Fwd Packet Length Max              Maximum size of packet in forward direction
cycle, including attack orchestration, feature generation,   13 Bwd Packet Length Mean             Mean size of packet in backward direction

documentation, and labelling and provided to reconstruct
the datasets by deleting artefacts and corrected labelling
logic, including corrected implementations of existing
                                                           4. ReTiNA-IDS Approach
features and new features that capture valuable flow state ReTiNA-IDS, Real-Time anomaly Detection IDS
information [17]. Table 1 reports the corrupt amount of Approach, integrates a ML model mainly based on
data.                                                      Random Forest in the CICFlowMeter tool to detect
                                                           Real-Time cyber-attacks and act as a simple IDS. The
         Attack Type         Corruption Rate (%)           Random Forest classifier considers only 13 of the 80
         Bot                         50.06                 features calculated by the CICFlowMeter tool. The list
         Web - Brute Force           53.85                 of features with the relative description, selected by
         Web Attack - XSS            50.43                 another Random Forest model, is in Table 2. After being
         DoS Attacks                  >50                  trained, the model has been exported in a pmml format
         DDoS Attacks                 >50                  with the use of the “sklearn-pmml-model“ library from
         FTP-Patator                 100.00
                                                           Sklearn [18]. The exported model is then imported into
         Infiltration                76.84
         SQL Injection               54.02                 CICFlowMeter, which is developed in Java.
         SSH-Patator                      49.97
                                                                     4.1. ML Pipeline
Table 1
Corruption Rate of Different Attacks on the CSE-CIC-IDS              The proposed approach is based on Random Forest, de-
2018 dataset [17]                                                    scribed in Section 3.2, and its scheme is shown in Figure 2.



3.4. Metrics
We evaluate the performance and effectiveness of the ap-
proaches by using Precision (𝑃 ), Recall (𝑅) and , defined
as follows
                             𝑇𝑃
                   𝑃 =
                         𝑇𝑃 + 𝐹𝑃
                             𝑇𝑃                            Figure 2: Pipeline of our Approach
                   𝑅=
                         𝑇𝑃 + 𝐹𝑁
                             𝑃 ·𝑄
                    𝐹1 = 2
                            𝑃 +𝑄                           4.1.1. Data Preprocessing
  where 𝑇 𝑃 represents the number of true positive, 𝐹 𝑁              In this study, the used dataset is a revised version of
denotes the number of false negative, 𝐹 𝑃 represents the             CSE-CIC-IDS2018, as introduced in Section 3.3. The
number of false positive, 𝑇 𝑁 denotes the number of true             dataset consists of the network traffic captured on ten
negative.                                                            days, stored in 10 distinct files according to the day of
                                                                     data capture, as shown in Table 3.
Table 3                                                        Table 4
CSE-CIC-IDS2018 files                                          Amount data per network traffic class
         Id File Name                    Size                                          Class                          Count
         1    Wednesday-14-02-2018       3.03 GB                                       BENIGN                         145904
                                                                                       DoS Attack                     145904
         2    Thursday-15-02-2018        2.18 GB                                       BruteForce Attack              99147
         3    Friday-16-02-2018          3.92 GB                                       PortScan Attack                49740
         4    Tuesday-20-02-2018         3.19 GB                                       BotNet Attack                  142921
         5    Wednesday-21-02-2018       3.68 GB                                       Total                          583.616
         6    Thursday-22-02-2018        3.23 GB
         7    Friday-23-02-2018          3.17 GB               Table 5
         8    Wednesday-28-02-2018       3.54 GB               Classification Performance Metrics Random Forest
         9    Thursday-01-03-2018        3.54 GB
                                                                           Class                          Precision     Recall   F1-score
         10 Friday-02-03-2018            3.43 GB                           BENIGN                           1.00         1.00      1.00
                                                                           Botnet Ares                      1.00         1.00      1.00
                                                                           BruteForce Attack                1.00         1.00      1.00
                                                                           DoS Attack                       1.00         1.00      1.00
                                                                           Infiltration - NMAP Portscan     0.99         1.00      1.00

   The first step of the preprocessing consists of data              Accuracy               1.00

cleaning, i.e., removing missing values, such as incom-
plete rows, and containing invalid (or infinite) numerical
values. Moreover, many non-relevant features for spot- respectively. To avoid eventually issue related to overfit-
ting cyber-attacks have been eliminated, such as the IP ting, we consider the cross-validation with 5-fold. Figure
address of the sender and receiver, the connection times- 3 shows the obtained confusion matrix.
tamp, the protocol type, and the destination/sender port.
Furthermore, the traffic data related to Web Attacks is
deleted since its volume is insufficient.

4.1.2. Data Balancing and Data Augmentation
The collected data related to network traffic is substan-
tially unbalanced: benign traffic is more prevalent than
malicious traffic. To balance the data, we have used the
one step of the bootstrapping procedure, implemented
in the resample function of Sklearn. Due to the corrupted
data on the original dataset, it does not contain data re-
lated to FTP Brute Force attacks. Therefore, we have
                                                               Figure 3: Confusion Matrix of the Random Forest Classifier
added this kind of data by collecting such data during a
simulation of brute force attacks via FTP (File Transfer
Protocol). The simulation involved the use of a Windows          The performance of the model, evaluated in terms of
host (victim machine) and a Kali-Linux host (attacker ma-      Precision, Recall and 𝐹1 -score, is shown in the Table 5.
chine), both in the same local area network (connected
to the same router). The victim machine runs a FileZilla
server, an open-source software utility that facilitates the   5. Experimental Setup
transmission of files using the File FTP. It enables users
to establish their own FTP servers or connect to existingThe ML models have been implemented in a Google Co-
FTP servers to exchange data, and the victim machine     lab document with Python 3. The default CPU in the
accepts connections on port 21, used to attack. When the environment is an Intel Xeon CPU equipped with 2 vir-
                                                         tual CPUs (vCPUs) and 13GB of RAM [20]. For this study,
FileZilla server on the victim machine is running, the Kali
Linux host performs a brute-force attack using Patator, athe configuration involved the utilization of extra RAM,
multi-purpose brute-forcer tool [19]. Table 4 shows the  resulting in a total memory capacity of 50GB (included
amount of data and the relative kind of attack, after thewith Google Colab Pro [20]).
cleaning and balancing phases.                              For data handling, preprocessing, analysis, training,
                                                         and evaluation metrics, the recommended model was
                                                         built and evaluated using Numpy [21], Pandas [22], and
4.1.3. Feature Selection and Classifier
                                                         Scikit Learn [23]. Matplotlib [24] were used to visual-
To select the features, a Random Forest has been consid- ize the data. The testing phase for this study used a
ered and implemented by setting up the depth of each Windows operating system for the with the following
decision tree and number of estimators to 16 and 20, specifications: an Intel Core i5-4670 CPU at 3.40GHz, 16
                                                         GB of DDR4 memory and a Nvidia GTX 1050 Ti GPU.
5.1. Testing                                                  6. Conclusion and Future Work
Retina-IDS, a tool that integrates an ML model into CI-       In this work, we have presented ReTiNA-IDS, a tool that
CFlowMeter, analyzes data patterns and distinguishes          integrates an ML model into CICFlowMeter, which ana-
benign traffic from malicious traffic. The testing phase of   lyzes data patterns and distinguishes benign traffic from
ReTiNA-IDS intends to assess the efficiency and efficacy      malicious traffic in real-time. The ML model is based on
of the machine learning model in real-world network           a Random Forest, used to select features and to classify
situations. We take advantage of the Graphical Network        the data. The testing phase, performed by running the
Simulator-3 (GNS3) software, an open-source network           tool in a normal traffic situation (without performing
simulation tool used for creating, modelling, and testing     any cyberattack) in a local network and the University
virtual and real networks [25], to perform the simulations.   of Camerino’s network, shows that the tool does not
To reach the aim, we create a simple network composed         identify false positives.
of a Cisco router [26] and two generic switches, outlining       In the near future, we intend to test the approach in bot-
two different areas of a hypothetical Local Area Network      net traffic to investigate the performance of the ReTiNA-
(LAN), a Windows machine and a Kali Linux machine.            IDS. To reach this aim, we intend to create a central server
Figure 4shows the network infrastructure.                     to control potentially infected hosts. Moreover, we have
                                                              planned to consider other machine learning models, both
                                                              supervised and unsupervised. Moreover, motivated by
                                                              the results obtained for modelling and verifying prop-
                                                              erties of Collective Adaptive Systems [27, 28, 29], we
                                                              intend to define formal approaches to specify and verify
                                                              properties of the data traffic to monitor the traffic and
                                                              identify anomalous pattern in the traffic.

                                                              Acknowledgements. This work has been funded by
                                                              the European Union - NextGenerationEU under the Ital-
                                                              ian Ministry of University and Research (MUR) National
                                                              Innovation Ecosystem grant ECS00000041 - VITALITY -
                                                              CUP J13C22000430001
Figure 4: Network structure in GNS3 for testing simulations


   The Windows machine represents the hypothetical            References
victim running the Retina-IDS tool, acting as an IDS,
                                                               [1] W. T. Lunardi, M. A. Lopez, J.-P. Giacalone, Ar-
while the Kali Linux machine plays the role of attacker.
                                                                   cade: Adversarially regularized convolutional au-
The victim machine is a Windows 10 host, while the
                                                                   toencoder for network anomaly detection, IEEE
used Kali Linux version is Kali 2023.4. Instead, the victim
                                                                   Transactions on Network and Service Management
machine is a Windows 10 host.
                                                                   (2022).
   Different attack simulations were performed, each one
                                                               [2] G. Kathareios, A. Anghel, A. Mate, R. Clauberg,
resulting in a positive detection by the tool:
                                                                   M. Gusat, Catch it if you can: Real-time network
     • DoS attacks                                                 anomaly detection with low false alarm rates, in:
     • File Transfer Protocol (FTP) and Secure SHell               2017 16th IEEE International Conference on Ma-
       (SHH) Bruteforce attacks                                    chine Learning and Applications (ICMLA), IEEE,
     • Portscan attacks                                            2017, pp. 924–929.
                                                               [3] S. Ho, S. Al Jufout, K. Dajani, M. Mozumdar, A novel
   Additionally, more tests were performed with the tool           intrusion detection model for detecting known and
running in a normal traffic situation (without performing          innovative cyberattacks using convolutional neu-
any cyberattack) in a local network and in the University          ral network, IEEE Open Journal of the Computer
of Camerino’s network, for a total of around 5 hours of            Society 2 (2021) 14–25.
workload. The purpose of letting the tool run for hours        [4] R. Atefinia, M. Ahmadi, Network intrusion detec-
on end was to see whether any crashes occurred during              tion using multi-architectural modular deep neu-
execution and to spot any false positive results. During           ral network, The Journal of Supercomputing
the experiments zero false positives were identified.              3571–3593 (2020).
                                                               [5] M. Catillo, M. Rak, U. Villano, 2l-zed-ids: A two-
                                                                   level anomaly detector for multiple attack classes,
     in: Web, Artificial Intelligence and Network Appli-           cic-ids-2017 and cse-cic-ids-2018, in: 2022 IEEE
     cations: Proceedings of the Workshops of the 34th             Conference on Communications and Network Se-
     International Conference on Advanced Informa-                 curity (CNS), IEEE, 2022, pp. 254–262.
     tion Networking and Applications (WAINA-2020),           [18] scikit-learn: machine learning in python — scikit-
     Springer, 2020, pp. 687–696.                                  learn 1.4.1 documentation, 2024. URL: https://
 [6] Q. R. S. Fitni, K. Ramli, Implementation of ensemble          scikit-learn.org/stable/index.html.
     learning and feature selection for performance im-       [19] Kali linux tools, patator, 2024. URL: https://www.
     provements in anomaly-based intrusion detection               kali.org/tools/patator/.
     systems, in: 2020 IEEE International Conference          [20] Google, Google colab, 2024. URL: https://research.
     on Industry 4.0, Artificial Intelligence, and Com-            google.com/colaboratory/faq.html.
     munications Technology (IAICT), IEEE, 2020, pp.          [21] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gom-
     118–124.                                                      mers, P. Virtanen, D. Cournapeau, E. Wieser, J. Tay-
 [7] W. W. Daniel, The spearman rank correlation coef-             lor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer,
     ficient, Biostatistics: A Foundation for Analysis in          M. H. van Kerkwijk, M. Brett, A. Haldane, J. F. del
     the Health Sciences (1987).                                   Río, M. Wiebe, P. Peterson, P. Gérard-Marchant,
 [8] S. Gamage, J. Samarabandu, Deep learning meth-                K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi,
     ods in network intrusion detection: A survey and              C. Gohlke, T. E. Oliphant, Array programming
     an objective comparison, Journal of Network and               with NumPy, Nature 585 (2020) 357–362. URL:
     Computer Applications 169 (2020) 102767. doi:10.              https://doi.org/10.1038/s41586-020-2649-2. doi:10.
     1016/j.jnca.2020.102767.                                      1038/s41586-020-2649-2.
 [9] G. Karatas Baydogmus, O. Demir, O. Sahingoz,             [22] T. pandas development team, pandas-dev/pandas:
     Increasing the performance of machine learning-               Pandas, 2020. URL: https://doi.org/10.5281/zenodo.
     based idss on an imbalanced and up-to-date dataset,           3509134. doi:10.5281/zenodo.3509134.
     IEEE Access PP (2020) 1–1. doi:10.1109/ACCESS.           [23] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
     2020.2973219.                                                 B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
[10] B. Jason, Smote for imbalanced classification with            R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,
     python, 2021.                                                 D. Cournapeau, M. Brucher, M. Perrot, E. Duch-
[11] V. Kanimozhi, T. P. Jacob, Artificial intelligence            esnay, Scikit-learn: Machine learning in Python,
     based network intrusion detection with hyper-                 Journal of Machine Learning Research 12 (2011)
     parameter optimization tuning on the realistic cy-            2825–2830.
     ber dataset cse-cic-ids2018 using cloud computing,       [24] J. D. Hunter, Matplotlib: A 2d graphics environ-
     in: 2019 international conference on communica-               ment, Computing in Science & Engineering 9 (2007)
     tion and signal processing (ICCSP), IEEE, 2019, pp.           90–95. doi:10.1109/MCSE.2007.55.
     0033–0036.                                               [25] S. Worldwide, Gns3 documentation, 2024. URL:
[12] J. Kim, J. Kim, H. Kim, M. Shim, E. Choi, Cnn-                https://docs.gns3.com/docs/.
     based network intrusion detection against denial-        [26] Cisco 3600 series - cisco, 2015. URL:
     of-service attacks, Electronics 9 (2020) 916.                 https://www.cisco.com/c/en/us/td/docs/ios/
[13] A. H. Lashkari, G. D. Gil, M. S. I. Mamun, A. A.              12_2/12_2x/12_2xa/release/notes/rn3600xa.html.
     Ghorbani, Characterization of tor traffic using time     [27] M. Loreti, M. Quadrini, A spatial logic for simplicial
     based features, in: International Conference on In-           models, Log. Methods Comput. Sci. 19 (2023).
     formation Systems Security and Privacy, volume 2,        [28] N. Del Giudice, L. Matteucci, M. Quadrini,
     SciTePress, 2017, pp. 253–262.                                A. Rehman, M. Loreti, Sibilla: A tool for reason-
[14] G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, A. A.          ing about collective systems, Science of Computer
     Ghorbani, Characterization of encrypted and vpn               Programming (2024) 103095.
     traffic using time-related, in: Proceedings of the 2nd   [29] N. D. Giudice, L. Matteucci, M. Quadrini,
     international conference on information systems               A. Rehman, M. Loreti, Sibilla: A tool for reasoning
     security and privacy (ICISSP), 2016, pp. 407–414.             about collective systems, in: Coordination Models
[15] U. of New Brunswick | UNB, Applications | research            and Languages - 24th IFIP WG 6.1 International
     | canadian institute for cybersecurity | unb, 2017.           Conference, COORDINATION 2022, Held as Part
     URL: https://www.unb.ca/cic/research/applications.            of the 17th International Federated Conference on
     html.                                                         Distributed Computing Techniques, DisCoTec 2022,
[16] L. Breiman, Random forests, Machine learning 45               Lucca, Italy, June 13-17, 2022, Proceedings, 2022, pp.
     (2001) 5–32.                                                  92–98. doi:10.1007/978-3-031-08143-9\_6.
[17] L. Liu, G. Engelen, T. Lynar, D. Essam, W. Joosen,
     Error prevalence in nids datasets: A case study on