=Paper=
{{Paper
|id=Vol-3762/582
|storemode=property
|title=Real-Time Intrusion Detection via Machine Learning Approaches
|pdfUrl=https://ceur-ws.org/Vol-3762/582.pdf
|volume=Vol-3762
|authors=Erik Murtaj,Michela Quadrini,Fausto Marcantoni,Michele Loreti,Hans-Friedrich Witschel
|dblpUrl=https://dblp.org/rec/conf/ital-ia/MurtajQMLW24
}}
==Real-Time Intrusion Detection via Machine Learning Approaches==
Real-Time Intrusion Detection via Machine Learning
Approaches
Erik Murtaj1 , Fausto Marcantoni1 , Michele Loreti1 , Michela Quadrini1,* and
Hans-Friedrich Witschel2
1
School of Science and Technology, University of Camerino, Via Madonna delle Carceri, 9, Camerino, 62032, Italy
2
FHNW University of Applied Sciences and Arts Northwestern Switzerland, Riggenbachstrasse 16, CH-4600 Olten
Abstract
In many cybersecurity contexts, the real-time detections of hostile actions play a fundamental role in protecting network
infrastructures. In this scenario, Intrusion Detection Systems (IDS), based on signature-based or anomaly detection, are
widely used to analyze network traffic. The signature-based detection relies on databases of known attack signatures, and
anomaly detection is mainly based on Artificial Intelligence (AI) techniques. The latter is promising to detect new kinds of
cyberattacks in real time.
In this work, we propose ReTiNA-IDS, a framework that integrates the CICFlowmeter tool with Machine Learning
techniques to analyze Real-Time network traffic patterns and detect abnormalities that may suggest a possible intrusion. The
considered machine learning techniques, random forest and multi-layer network, are based on selected features to enhance
efficiency and scalability. To select the features and train the models, we use a version of the public dataset, CSECICI-IDS2018.
The framework’s effectiveness has been tested in real-case scenarios by identifying different forms of intrusion. Analyzing
the results, we conclude that the proposed solution shows valuable features.
Keywords
Random Forest, Feature Selection, analysis of Real-Time network traffic, Intrusion Detection Systems
1. Introduction that integrates the CICFlowmeter tool with Machine
Learning techniques to analyze real-time network traf-
Intrusion Detection Systems (IDS) are relevant tools em- fic patterns and detect abnormalities that may suggest a
ployed in cybersecurity to protect networks from possible possible intrusion. The integrated methodology, which
cyber attacks. is based on random forest and multi-layer networks, is
In recent years, the world of cyber security has become based on selected features to enhance efficiency and scala-
more turbulent, with a rise in the number of cyber-attacks bility. To select the features and train the models, the pub-
that target businesses worldwide. For this reason, always lic dataset CSECICI-IDS2018 has been used. The frame-
new methodologies are needed to shield vital assets from work’s effectiveness has been tested in real-case scenarios
hostile actors in reaction to this expanding danger. by identifying different forms of intrusion. Analyzing the
Recently, an increasing focus on the use of Artificial results, we conclude that the proposed solution shows
Intelligence (AI) in cyber security. As a subset of artificial valuable features.
intelligence, machine learning algorithms can improve The paper is structured as follows. In Section 2 related
danger detection and automate procedures. Organiza- works are discussed while in Section 3 some basic back-
tions may examine massive volumes of data in real-time, ground is introduced. In Section 4 the tool ReTiNa-IDS
spot patterns suggestive of malicious behaviour, and take is presented, while in Section 5 some evaluation experi-
preemptive measures to reduce risks by utilizing machine ments are proposed. Section 6 concludes the paper.
learning algorithms.
In this work, we propose ReTiNA-IDS, a framework
2. Related Works
Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga-
nized by CINI, May 29-30, 2024, Naples, Italy
The use of machine learning approaches in intrusion
*
Michele Loreti detection systems to obtain real-time analysis has been
†
These authors contributed equally. exploited by many researchers. Many of them take advan-
$ erik.murtaj@studenti.unicam.it (E. Murtaj); tage of Deep Learning (DL) approaches. ARCADE is an
fausto.marcantoni@unicam.it (F. Marcantoni); unsupervised DL-based approach for early anomaly de-
michele.loreti@unicam.it (M. Loreti); michela.quadrini@unicam.it tection using 1D Convolutional Neural Networks (CNNs)
(M. Quadrini); hansfriedrich.witschel@fhnw.ch (H. Witschel)
0000-0002-7779-203X (F. Marcantoni); 0000-0003-3061-863X
proposed by Lunardi et al. [1]. The approach builds
(M. Loreti); 0000-0003-0539-0290 (M. Quadrini); a profile of normal traffic based on raw packet bytes.
0000-0002-8608-9039 (H. Witschel) Kathareios et al. designed and tested a real-time net-
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
work AD system, able to operate on encrypted and non- 3.1. CICFlowMeter
encrypted network packets, based on two learning stages:
CICFlowmeter is a network traffic flow generator and
an autoencoder for adaptive unsupervised AD and a
analyser [13, 14]. It generates bidirectional flows, where
custom nearest-neighbour classifier to filter false pos-
the first packet determines the forward (source to desti-
itives [2]. Shuai proposed a prototype that combines
nation) and backward (destination to source) directions.
big data processing frameworks like Apache Hadoop,
The tool enables the extraction of more than 80 statisti-
Apache Kafka, and Apache Storm, along with ML tech-
cal network traffic features such as Duration, Number of
niques, i.e., Naïve Bayesian (NB), Support Vector Machine
packets, Number of bytes, Length of packets, etc. Such
(SVM), and Decision Tree (DT). The proposed approach
features can be calculated independently for both direc-
considers six features related to the IP addresses of the
tions. The tool is developed in JAVA and provides a useful
sender, receiver, and correspondent port without taking
Graphical User Interface, shown in Figure 1 to monitor
into account flow measurements. Ho et al. suggested an
network flows in real-time. TCP flows are usually termi-
Intrusion Detection System (IDS) based on CNN that clas-
nated upon connection teardown (by FINpacket), while
sifies all packet traffic as benign or malicious, detecting
a flow timeout terminates UDP flows [15].
network intrusions [3]. Atefnia and Ahmadi proposed
a modular deep neural network model that consists of
four complete architectures that are combined with an
aggregator module, each generating distinct outputs [4].
The four architectures are a Deep Feed-Forward Mod-
ule (DFFM), a Stacked Restricted Boltzmann Machine
Module (SRBMM), and two recurrent modules, one uti-
lizing gated recurrent units (GRUM) and the other utiliz-
ing long short-term memory (LSTMM). Catillo et al. [5]
proposed an approach based on Deep Autoencoder, and
Fitni and Ramli [6] proposed a model based on deci-
sion trees that takes into account 23 features selected
by Spearman’s rank correlation coefficient [7]. Gamage
and Samarabandu considered four DL architectures, i.e.,
feed-forward neural network, autoencoder, deep belief
network, and LSTM [8]. Karatas et al. in [9] reviewed Figure 1: Example of the CICFlowmeter’s GUI
the implementation of a Synthetic Minority Oversam-
pling Technique (SMOTE) [10] to balance the data by The tool is developed in JAVA and provides a useful
exploiting six models. Kanimozhi and Jacob presented GUI (Graphical User Interface) to monitor network flows
a two-layer MLP to detect only botnet attacks that ex- in real time.
ploit a grid search for hyper-parameter optimization and
a 10-fold cross-validation for mitigating the overfitting 3.2. Machine Learning Approaches and
problems [11]. Huancayo Ramos et al. extended this ap-
proach by considering botnet data and Random Forests. Feature Selection
Kim et al. also designed a model that exploits CNN for The Random Forest is an ML ensemble model used for
training on a single type of attack, specifically Denial of
both classification and regression tasks. During training,
Service (DoS) attacks [12]. the model creates numerous decision trees and deter-
mines the output class by either the mode (for classifi-
cation) or the mean/average prediction (for regression)
3. Background of the classes predicted by individual trees. Introduced
In this section, we present the CICFlowMeter, an Ether- by Breiman in [16], this approach combines the bagging
net traffic Bi-flow generator and analyzer for anomaly technique with the random selection of features. Such a
detection, and the Random Forest, a machine learning random selection ensures that the decision trees within
method used for classifying flow data and evaluating the forest are uncorrelated. In the bagging phase, de-
the importance of features. This classifier will then be cision trees are constructed from bootstrap samples of
integrated into CICFlowMeter for classifying network the training dataset, where each sample is drawn with
flows. replacement, allowing for the possibility of repeated sam-
ples. These replicated datasets are then used to train
decision trees, ensuring that each tree only sees different
portions of the original dataset during training. This bag-
ging approach is coupled with random feature selection,
which involves using distinct random subsets of the en- Table 2
tire feature space√to train each tree in the random forest. The first 13 attributes ordered by importance
Usually, around 𝑛 features are employed in each split Id Attribute Description
for a classification task that considers ′ 𝑛′ features. 1 FWD Init Win Bytes The total number of bytes sent in
initial window in the forward direction
2 Packet Length Std Standard deviation length of a packet
3 Packet Length Mean Mean length of a packet
3.3. Dataset: CSE-CIC-IDS2018 4 Bwd Packet Length Std Standard deviation size of packet
in backward direction
The data used in this study is the CSE-CIC-IDS2018, a 5
6
Bwd Packet Length Max
Bwd PSH Flags
Maximum size of packet in backward direction
Number of times the PSH flag was set in packets
benchmark dataset for the evaluation of IDSs. Such data travelling in the backward direction
7 ACK Flag Count Number of packets with ACK
was collected by the Communications Security Establish- 8 Fwd Seg Size Min Minimum segment size observed in
ment (CSE) and the Canadian Institute for Cybersecurity the forward direction
9 Fwd PSH Flags Number of times the PSH flag was set in packets
(CIC). The recorded data consists of ten days of traffic travelling in the forward direction
and includes seven types of attacks. Liu et al. identified 10 CWR Flag Count
11 Packet Length Variance
Number of packets with CWR
Variance length of a packet
some issues in such dataset related to the creation life- 12 Fwd Packet Length Max Maximum size of packet in forward direction
cycle, including attack orchestration, feature generation, 13 Bwd Packet Length Mean Mean size of packet in backward direction
documentation, and labelling and provided to reconstruct
the datasets by deleting artefacts and corrected labelling
logic, including corrected implementations of existing
4. ReTiNA-IDS Approach
features and new features that capture valuable flow state ReTiNA-IDS, Real-Time anomaly Detection IDS
information [17]. Table 1 reports the corrupt amount of Approach, integrates a ML model mainly based on
data. Random Forest in the CICFlowMeter tool to detect
Real-Time cyber-attacks and act as a simple IDS. The
Attack Type Corruption Rate (%) Random Forest classifier considers only 13 of the 80
Bot 50.06 features calculated by the CICFlowMeter tool. The list
Web - Brute Force 53.85 of features with the relative description, selected by
Web Attack - XSS 50.43 another Random Forest model, is in Table 2. After being
DoS Attacks >50 trained, the model has been exported in a pmml format
DDoS Attacks >50 with the use of the “sklearn-pmml-model“ library from
FTP-Patator 100.00
Sklearn [18]. The exported model is then imported into
Infiltration 76.84
SQL Injection 54.02 CICFlowMeter, which is developed in Java.
SSH-Patator 49.97
4.1. ML Pipeline
Table 1
Corruption Rate of Different Attacks on the CSE-CIC-IDS The proposed approach is based on Random Forest, de-
2018 dataset [17] scribed in Section 3.2, and its scheme is shown in Figure 2.
3.4. Metrics
We evaluate the performance and effectiveness of the ap-
proaches by using Precision (𝑃 ), Recall (𝑅) and , defined
as follows
𝑇𝑃
𝑃 =
𝑇𝑃 + 𝐹𝑃
𝑇𝑃 Figure 2: Pipeline of our Approach
𝑅=
𝑇𝑃 + 𝐹𝑁
𝑃 ·𝑄
𝐹1 = 2
𝑃 +𝑄 4.1.1. Data Preprocessing
where 𝑇 𝑃 represents the number of true positive, 𝐹 𝑁 In this study, the used dataset is a revised version of
denotes the number of false negative, 𝐹 𝑃 represents the CSE-CIC-IDS2018, as introduced in Section 3.3. The
number of false positive, 𝑇 𝑁 denotes the number of true dataset consists of the network traffic captured on ten
negative. days, stored in 10 distinct files according to the day of
data capture, as shown in Table 3.
Table 3 Table 4
CSE-CIC-IDS2018 files Amount data per network traffic class
Id File Name Size Class Count
1 Wednesday-14-02-2018 3.03 GB BENIGN 145904
DoS Attack 145904
2 Thursday-15-02-2018 2.18 GB BruteForce Attack 99147
3 Friday-16-02-2018 3.92 GB PortScan Attack 49740
4 Tuesday-20-02-2018 3.19 GB BotNet Attack 142921
5 Wednesday-21-02-2018 3.68 GB Total 583.616
6 Thursday-22-02-2018 3.23 GB
7 Friday-23-02-2018 3.17 GB Table 5
8 Wednesday-28-02-2018 3.54 GB Classification Performance Metrics Random Forest
9 Thursday-01-03-2018 3.54 GB
Class Precision Recall F1-score
10 Friday-02-03-2018 3.43 GB BENIGN 1.00 1.00 1.00
Botnet Ares 1.00 1.00 1.00
BruteForce Attack 1.00 1.00 1.00
DoS Attack 1.00 1.00 1.00
Infiltration - NMAP Portscan 0.99 1.00 1.00
The first step of the preprocessing consists of data Accuracy 1.00
cleaning, i.e., removing missing values, such as incom-
plete rows, and containing invalid (or infinite) numerical
values. Moreover, many non-relevant features for spot- respectively. To avoid eventually issue related to overfit-
ting cyber-attacks have been eliminated, such as the IP ting, we consider the cross-validation with 5-fold. Figure
address of the sender and receiver, the connection times- 3 shows the obtained confusion matrix.
tamp, the protocol type, and the destination/sender port.
Furthermore, the traffic data related to Web Attacks is
deleted since its volume is insufficient.
4.1.2. Data Balancing and Data Augmentation
The collected data related to network traffic is substan-
tially unbalanced: benign traffic is more prevalent than
malicious traffic. To balance the data, we have used the
one step of the bootstrapping procedure, implemented
in the resample function of Sklearn. Due to the corrupted
data on the original dataset, it does not contain data re-
lated to FTP Brute Force attacks. Therefore, we have
Figure 3: Confusion Matrix of the Random Forest Classifier
added this kind of data by collecting such data during a
simulation of brute force attacks via FTP (File Transfer
Protocol). The simulation involved the use of a Windows The performance of the model, evaluated in terms of
host (victim machine) and a Kali-Linux host (attacker ma- Precision, Recall and 𝐹1 -score, is shown in the Table 5.
chine), both in the same local area network (connected
to the same router). The victim machine runs a FileZilla
server, an open-source software utility that facilitates the 5. Experimental Setup
transmission of files using the File FTP. It enables users
to establish their own FTP servers or connect to existingThe ML models have been implemented in a Google Co-
FTP servers to exchange data, and the victim machine lab document with Python 3. The default CPU in the
accepts connections on port 21, used to attack. When the environment is an Intel Xeon CPU equipped with 2 vir-
tual CPUs (vCPUs) and 13GB of RAM [20]. For this study,
FileZilla server on the victim machine is running, the Kali
Linux host performs a brute-force attack using Patator, athe configuration involved the utilization of extra RAM,
multi-purpose brute-forcer tool [19]. Table 4 shows the resulting in a total memory capacity of 50GB (included
amount of data and the relative kind of attack, after thewith Google Colab Pro [20]).
cleaning and balancing phases. For data handling, preprocessing, analysis, training,
and evaluation metrics, the recommended model was
built and evaluated using Numpy [21], Pandas [22], and
4.1.3. Feature Selection and Classifier
Scikit Learn [23]. Matplotlib [24] were used to visual-
To select the features, a Random Forest has been consid- ize the data. The testing phase for this study used a
ered and implemented by setting up the depth of each Windows operating system for the with the following
decision tree and number of estimators to 16 and 20, specifications: an Intel Core i5-4670 CPU at 3.40GHz, 16
GB of DDR4 memory and a Nvidia GTX 1050 Ti GPU.
5.1. Testing 6. Conclusion and Future Work
Retina-IDS, a tool that integrates an ML model into CI- In this work, we have presented ReTiNA-IDS, a tool that
CFlowMeter, analyzes data patterns and distinguishes integrates an ML model into CICFlowMeter, which ana-
benign traffic from malicious traffic. The testing phase of lyzes data patterns and distinguishes benign traffic from
ReTiNA-IDS intends to assess the efficiency and efficacy malicious traffic in real-time. The ML model is based on
of the machine learning model in real-world network a Random Forest, used to select features and to classify
situations. We take advantage of the Graphical Network the data. The testing phase, performed by running the
Simulator-3 (GNS3) software, an open-source network tool in a normal traffic situation (without performing
simulation tool used for creating, modelling, and testing any cyberattack) in a local network and the University
virtual and real networks [25], to perform the simulations. of Camerino’s network, shows that the tool does not
To reach the aim, we create a simple network composed identify false positives.
of a Cisco router [26] and two generic switches, outlining In the near future, we intend to test the approach in bot-
two different areas of a hypothetical Local Area Network net traffic to investigate the performance of the ReTiNA-
(LAN), a Windows machine and a Kali Linux machine. IDS. To reach this aim, we intend to create a central server
Figure 4shows the network infrastructure. to control potentially infected hosts. Moreover, we have
planned to consider other machine learning models, both
supervised and unsupervised. Moreover, motivated by
the results obtained for modelling and verifying prop-
erties of Collective Adaptive Systems [27, 28, 29], we
intend to define formal approaches to specify and verify
properties of the data traffic to monitor the traffic and
identify anomalous pattern in the traffic.
Acknowledgements. This work has been funded by
the European Union - NextGenerationEU under the Ital-
ian Ministry of University and Research (MUR) National
Innovation Ecosystem grant ECS00000041 - VITALITY -
CUP J13C22000430001
Figure 4: Network structure in GNS3 for testing simulations
The Windows machine represents the hypothetical References
victim running the Retina-IDS tool, acting as an IDS,
[1] W. T. Lunardi, M. A. Lopez, J.-P. Giacalone, Ar-
while the Kali Linux machine plays the role of attacker.
cade: Adversarially regularized convolutional au-
The victim machine is a Windows 10 host, while the
toencoder for network anomaly detection, IEEE
used Kali Linux version is Kali 2023.4. Instead, the victim
Transactions on Network and Service Management
machine is a Windows 10 host.
(2022).
Different attack simulations were performed, each one
[2] G. Kathareios, A. Anghel, A. Mate, R. Clauberg,
resulting in a positive detection by the tool:
M. Gusat, Catch it if you can: Real-time network
• DoS attacks anomaly detection with low false alarm rates, in:
• File Transfer Protocol (FTP) and Secure SHell 2017 16th IEEE International Conference on Ma-
(SHH) Bruteforce attacks chine Learning and Applications (ICMLA), IEEE,
• Portscan attacks 2017, pp. 924–929.
[3] S. Ho, S. Al Jufout, K. Dajani, M. Mozumdar, A novel
Additionally, more tests were performed with the tool intrusion detection model for detecting known and
running in a normal traffic situation (without performing innovative cyberattacks using convolutional neu-
any cyberattack) in a local network and in the University ral network, IEEE Open Journal of the Computer
of Camerino’s network, for a total of around 5 hours of Society 2 (2021) 14–25.
workload. The purpose of letting the tool run for hours [4] R. Atefinia, M. Ahmadi, Network intrusion detec-
on end was to see whether any crashes occurred during tion using multi-architectural modular deep neu-
execution and to spot any false positive results. During ral network, The Journal of Supercomputing
the experiments zero false positives were identified. 3571–3593 (2020).
[5] M. Catillo, M. Rak, U. Villano, 2l-zed-ids: A two-
level anomaly detector for multiple attack classes,
in: Web, Artificial Intelligence and Network Appli- cic-ids-2017 and cse-cic-ids-2018, in: 2022 IEEE
cations: Proceedings of the Workshops of the 34th Conference on Communications and Network Se-
International Conference on Advanced Informa- curity (CNS), IEEE, 2022, pp. 254–262.
tion Networking and Applications (WAINA-2020), [18] scikit-learn: machine learning in python — scikit-
Springer, 2020, pp. 687–696. learn 1.4.1 documentation, 2024. URL: https://
[6] Q. R. S. Fitni, K. Ramli, Implementation of ensemble scikit-learn.org/stable/index.html.
learning and feature selection for performance im- [19] Kali linux tools, patator, 2024. URL: https://www.
provements in anomaly-based intrusion detection kali.org/tools/patator/.
systems, in: 2020 IEEE International Conference [20] Google, Google colab, 2024. URL: https://research.
on Industry 4.0, Artificial Intelligence, and Com- google.com/colaboratory/faq.html.
munications Technology (IAICT), IEEE, 2020, pp. [21] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gom-
118–124. mers, P. Virtanen, D. Cournapeau, E. Wieser, J. Tay-
[7] W. W. Daniel, The spearman rank correlation coef- lor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer,
ficient, Biostatistics: A Foundation for Analysis in M. H. van Kerkwijk, M. Brett, A. Haldane, J. F. del
the Health Sciences (1987). Río, M. Wiebe, P. Peterson, P. Gérard-Marchant,
[8] S. Gamage, J. Samarabandu, Deep learning meth- K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi,
ods in network intrusion detection: A survey and C. Gohlke, T. E. Oliphant, Array programming
an objective comparison, Journal of Network and with NumPy, Nature 585 (2020) 357–362. URL:
Computer Applications 169 (2020) 102767. doi:10. https://doi.org/10.1038/s41586-020-2649-2. doi:10.
1016/j.jnca.2020.102767. 1038/s41586-020-2649-2.
[9] G. Karatas Baydogmus, O. Demir, O. Sahingoz, [22] T. pandas development team, pandas-dev/pandas:
Increasing the performance of machine learning- Pandas, 2020. URL: https://doi.org/10.5281/zenodo.
based idss on an imbalanced and up-to-date dataset, 3509134. doi:10.5281/zenodo.3509134.
IEEE Access PP (2020) 1–1. doi:10.1109/ACCESS. [23] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
2020.2973219. B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
[10] B. Jason, Smote for imbalanced classification with R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,
python, 2021. D. Cournapeau, M. Brucher, M. Perrot, E. Duch-
[11] V. Kanimozhi, T. P. Jacob, Artificial intelligence esnay, Scikit-learn: Machine learning in Python,
based network intrusion detection with hyper- Journal of Machine Learning Research 12 (2011)
parameter optimization tuning on the realistic cy- 2825–2830.
ber dataset cse-cic-ids2018 using cloud computing, [24] J. D. Hunter, Matplotlib: A 2d graphics environ-
in: 2019 international conference on communica- ment, Computing in Science & Engineering 9 (2007)
tion and signal processing (ICCSP), IEEE, 2019, pp. 90–95. doi:10.1109/MCSE.2007.55.
0033–0036. [25] S. Worldwide, Gns3 documentation, 2024. URL:
[12] J. Kim, J. Kim, H. Kim, M. Shim, E. Choi, Cnn- https://docs.gns3.com/docs/.
based network intrusion detection against denial- [26] Cisco 3600 series - cisco, 2015. URL:
of-service attacks, Electronics 9 (2020) 916. https://www.cisco.com/c/en/us/td/docs/ios/
[13] A. H. Lashkari, G. D. Gil, M. S. I. Mamun, A. A. 12_2/12_2x/12_2xa/release/notes/rn3600xa.html.
Ghorbani, Characterization of tor traffic using time [27] M. Loreti, M. Quadrini, A spatial logic for simplicial
based features, in: International Conference on In- models, Log. Methods Comput. Sci. 19 (2023).
formation Systems Security and Privacy, volume 2, [28] N. Del Giudice, L. Matteucci, M. Quadrini,
SciTePress, 2017, pp. 253–262. A. Rehman, M. Loreti, Sibilla: A tool for reason-
[14] G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, A. A. ing about collective systems, Science of Computer
Ghorbani, Characterization of encrypted and vpn Programming (2024) 103095.
traffic using time-related, in: Proceedings of the 2nd [29] N. D. Giudice, L. Matteucci, M. Quadrini,
international conference on information systems A. Rehman, M. Loreti, Sibilla: A tool for reasoning
security and privacy (ICISSP), 2016, pp. 407–414. about collective systems, in: Coordination Models
[15] U. of New Brunswick | UNB, Applications | research and Languages - 24th IFIP WG 6.1 International
| canadian institute for cybersecurity | unb, 2017. Conference, COORDINATION 2022, Held as Part
URL: https://www.unb.ca/cic/research/applications. of the 17th International Federated Conference on
html. Distributed Computing Techniques, DisCoTec 2022,
[16] L. Breiman, Random forests, Machine learning 45 Lucca, Italy, June 13-17, 2022, Proceedings, 2022, pp.
(2001) 5–32. 92–98. doi:10.1007/978-3-031-08143-9\_6.
[17] L. Liu, G. Engelen, T. Lynar, D. Essam, W. Joosen,
Error prevalence in nids datasets: A case study on