Automated Penetration Testing: Machine Learning
                                Approach⋆
                                Jay Saini1,, Ankita Bansal2,∗,

                                1 Department of information technology, Netaji subhas university of technology, 110078 Delhi, India

                                2 Department of information technology, Netaji subhas university of technology, 110078 Delhi, India


                                                Abstract
                                                In our study, we used a better version of a dataset called KDD-99, known as the
                                                corrected dataset. The original KDD-99 dataset is often used for studying
                                                cybersecurity in real-time, but it has some problems. So, we picked the
                                                improved version to make our tests more realistic. This special dataset helped
                                                us imitate real cyber threats more accurately when we were testing computer
                                                systems and networks. We wanted to create challenges for artificial intelligence
                                                (AI) systems trying to tell the difference between real and fake attacks. By using
                                                the corrected dataset, we made our tests a bit like real cybersecurity situations,
                                                making it harder for AI to figure out what was happening. Our approach, using
                                                different tools and methods, builds a complete system for testing security. We
                                                always make sure our tests are ethical and authorized, and we do them
                                                regularly to keep up with new cyber threats. This way, we can better protect
                                                organizations from potential risks.

                                                Keywords
                                                Artificial Intelligence, Machine Learning, Intrusion Detection, KDD_99
                                                .1
                                1. Introduction
                                   Communication systems act as indispensable aides in our daily routines, seamlessly
                                facilitating work, learning, teamwork, data sharing, and enjoyable entertainment. Yet, the
                                intricate computer networks orchestrating these activities face potential risk. Safeguarding
                                them requires the vigilant oversight of an intrusion detection system (IDS), functioning as
                                a steadfast guardian for our computer systems.
                                   Consider the bustling activity on a popular website numerous visitors mean a wealth of
                                incoming information. To manage this influx, computers leverage machine learning, a
                                process wherein they glean insights from data. Subsequently, data mining comes into play,
                                extracting pertinent details from the vast pool of information. Now, envision possessing
                                insights into diverse methods that individuals might employ to compromise a network.


                                Symposium on Computing & Intelligent Systems (SCI), May 10, 2024, New Delhi, INDIA
                                ∗ Corresponding author.
                                † These authors contributed equally.

                                   Jaysaini1899@gmail.com (J. Saini); ankita.bansal06@gmail.com(A. Bansal);
                                           © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Enter a tool called nmaps, adept at organizing and comprehending this information, akin to
categorizing items into groups. This strategic approach aids in deciphering ongoing
activities and identifying potential threats.
    This comprehensive study underscores the paramount importance of communication
systems and the concerted efforts invested in ensuring their security. Leveraging
specialized tools and ingenious computing techniques, we navigate the intricacies of data
within these systems, particularly concerning potential cyber threats. The research delves
into computer data, reserving a portion (approximately 20%) for practice and testing
purposes.
    But there were many problems with dataset so in order to address these limitations,
Tavallaee et al. [7] created a dataset that was devoid of any flaws, free from imperfections,
and included entries from the KDD-CUP 99 dataset, excluding redundant and duplicated
values.
    Aggarwala and Sharmab [17] interpreted the data attributes, which were classified into
traffic, basic, host, and content categories, within the KDD-CUP 99 dataset. The results of
their experiments in the realm of intrusion detection systems demonstrated an increased
detection rate coupled with a reduction in false alarm rates. Gaffney and Ulvila [18]
introduced methods for distinguishing the performance of intrusion detectors and, for a
given environment, identified the optimal configuration for an intrusion detector. To
establish an expected cost metric, this approach employed a decision analysis that
integrated receiver operating characteristics (ROC) with a cost analysis method.
    The primary objective is to pinpoint vulnerable sections of the network, discerning
which areas are most susceptible to potential attacks by adversaries. This multifaceted
exploration combines practical testing and strategic analysis to fortify our understanding
and defenses against evolving cyber threats.
    The remaining paper is organized as: Section 2 explains Motivation, followed by
literature survey in section 3. Section 4 explains dataset and techniques used. The results
are illustrated in section 5. finally, the work is concluded in section 6.


2. Motivation
   This study endeavors to thoroughly assess the existing landscape of network penetration
testing while also outlining potential directions for future research. In light of the ever-
increasing frequency and sophistication of cyber-attacks in our contemporary digital
landscape, we underscore the paramount significance of network security. Penetration
testing emerges as a vital pillar in fortifying network security, systematically uncovering
vulnerabilities and weaknesses before they can be exploited by malicious entities.
   Penetration testing, or pen testing, is a vital cybersecurity process that simulates
cyberattacks to uncover and address vulnerabilities in systems. It involves key phases like
reconnaissance, scanning, vulnerability analysis, exploitation, and reporting, utilizing tools
such as network scanners and exploit frameworks. Aspiring penetration testers must grasp
these concepts to enhance organizational security. Ethical hacking, requiring expertise and
authorization, is an ongoing process crucial for regularly fortifying cybersecurity measures.
Pen testing serves as a proactive defense, identifying and addressing vulnerabilities before
real threats exploit them, bolstering overall organizational security.

   Traditional methodologies for penetration testing are recognized for their labor-
intensive nature, substantial financial commitments, and the demand for a high level of
expertise. In response to these challenges, our innovative approach introduces an
automated framework for penetration testing, aimed at not only streamlining the process
but also supporting initiatives related to defense training. The overarching objective is to
demonstrate the effectiveness of this automated framework in penetration testing,
showcasing its potential to instigate transformative advancements in the dynamic field of
cybersecurity. This pioneering solution aligns with the imperative need for proactive
defense measures and strategic preparedness in the face of evolving cyber threats.

3. Literature Survey
   In this paper, a thorough examination of existing literature has been conducted to
appraise the ongoing research. Various papers, articles, and books have been scrutinized to
assess the current state of knowledge and identify areas where information is lacking. This
process aids in comprehending the existing landscape, discerning gaps in knowledge, and
understanding the evolution of thought in the field. The survey establishes a foundational
understanding for subsequent phases by summarizing critical concepts, highlighting gaps,
and illustrating the progression of ideas in the subject area. Analogous to consulting a map
before embarking on a journey, this investigation serves as a strategic guide, assisting in
determining the current position and potential areas for exploration in the field of machine
learning. All the findings of the previous contributors are shown the table 1.

Table 1. Literature survey

 Authors                               Findings
 Wang   et built an intrusion detection system using Support Vector Machine (SVM)
 al. [1]   with a special focus on enhancing features. This technique improves the
           quality of data for training SVM classifiers, making them more precise and
           concise. The proposed system not only boosts intrusion detection
           capabilities but also reduces training time. Author tested it with the NSL-
           KDD dataset, and the results show superior performance, especially in
           metrics like false alarm rate, accuracy, and detection rate.
 Jabbar et Proposed a system called RFAODE(Random Forest Average One
 al[2]     Dependence Estimator) for Detecting intrusions. RFAODE combines two
           algorithms, namely random forest and average one dependency estimator,
           to improve accuracy and reduce errors. Random forest helps with accuracy,
           while average one dependence estimator tackles issues with attribute
           dependency in Naive Bayes classifiers. I tested RFAODE using the Kyoto
             2006+ dataset and achieved an accuracy of 90.51% with a low false alarm
             rate of 0.14. These algorithms effectively distinguished between normal and
             malicious network traffic.


Dahiya      The author has crafted a framework aimed at precise intrusion prediction
and         in network records using Spark. In the proposed work, an algorithm for
Srivastava  reducing features was integrated to discard less significant ones.
[3]         Subsequently, a supervised data mining technique was employed on the
            UNSW-NB 15 dataset. The outcomes were assessed using two feature
            reduction algorithms, Linera Discriminant Analysis (LDA) and Canonical
            Correlation Analysis (CCA), in conjunction with seven classification
            algorithms.
Belouch et The author assessed the effectiveness of four machine learning
al.[4]      algorithms—namely, random forest, Naive Bayes, SVM, and decision tree—
            utilizing Apache Spark. Performance metrics, including prediction time,
            accuracy, and building time, were calculated. The experimentation was
            conducted on the UNSW-NB 15 dataset. The findings indicated that the
            random forest classifier outperformed others, demonstrating superior
            results in prediction time, accuracy, and building time.
Aziza et The analysis involved a comparison of various classifiers to enhance
al.,[5]     detection accuracy and gain more insights into detected anomalies. The
            study revealed distinct classifier rates, emphasizing that a one-size-fits-all
            approach is not suitable for all types of attacks. Notably, 90% of anomalies
            were successfully identified during the detection phase. However, in the
            classification phase, 88% of false positives were mistakenly labeled as
            normal traffic connections. The use of NB, NBTree, and BFTree classifiers
            demonstrated an accuracy of 79% in correctly labeling Dos and Probe
            attacks.
Ambusaid The author developed an algorithm grounded in mutual information to
i       and address dependent features in the data. The designed Intrusion Detection
Nanda       System (IDS) based on Least Square Support Vector Machine (LSSVM-IDS)
    [6]     was evaluated using datasets including Kyoto 2006+, KDD CUP-99, and NSL-
            KDD. The proposed approach achieved higher accuracy and reduced
            computational costs through the utilization of the feature selection-based
            algorithm, LSSVM-IDS.

Sultana      The author introduced an intelligent network intrusion detection system
and          employing the Average One Dependence Estimator (AODE) algorithm. The
Jabbar       results were assessed using the NSL-KDD dataset, demonstrating a
   [7]       successful outcome with a low False Alarm Rate (FAR) and a high Detection
             Rate (DR) in the proposed model based on the AODE algorithm.
 An       and The author introduced a novel algorithm, incorporating Fisher Discriminate
 Liang        Analysis by integrating within-class scatter alongside the traditional
    [8]       Support Vector Machine (SVM) for classifiers. The proposed algorithm
              underwent testing using the KDD-Cup 99 dataset. In comparison to Fisher
              Discriminate Analysis and the conventional SVM, the implemented
              algorithm (WCS-SVM) demonstrated superior discriminatory power.
              Additionally, it exhibited enhanced detection rates and reduced false
              positive rates, showcasing its efficacy in intrusion detection systems.
 Tavallaee     created a new dataset that was free from imperfections. This dataset was
 et al[9]     curated by retaining records from the KDD-CUP 99 dataset while
              eliminating redundant and duplicated values, addressing the shortcomings
              of the original dataset.
 Fawagreh The focus of the author was on the evolution of Random Forest (RF) from
 a et al.     its early development to recent advancements. The primary objective of the
    [14]      proposed work was to comprehensively represent the research conducted
              to date, offering an analysis of the potential and future developments in the
              field of Random Forest.
 Aggarwal The attributes of the data, classified into traffic, basic, host, and content
 a        and categories, were analyzed within the KDD-CUP 99 dataset.
 Sharmab,
 [17]
 Gaffney      Introduced some methodologies aimed at discerning the efficacy of
 and Ulvila intrusion detectors and identifying optimal configurations for intrusion
 [18]         detectors within a given environment. The approach employed a decision
              analysis that integrated receiver operating characteristics (ROC) with a cost
              analysis method to establish an expected cost metric.


4. Materials
   This section delves into the research methodology employed, elaborating on how the ML
technique was utilized. Additionally, the effectiveness of incorporating this ML technique is
thoroughly discussed.


4.1 Dataset
   Our experimental work utilized the KDD-CUP 99 dataset on a machine with a 2GHz
processor, 4GB RAM, and a 64-bit Windows operating system. This dataset, obtained from
Lincoln Labs, mimics the U.S. Air Force Local Area Network (LAN) and comprises seven
weeks of raw TCP dump data. It contains various attacks and focuses on the sequence of
TCP packets within fixed time intervals, along with specific source and target IP addresses.
   Initially, the dataset consisted of approximately five million records, which was too large
for research purposes. Thus, we generated a 10% subset for our initial model
implementation. With 41 features, including 22 attack types categorized into four classes,
the dataset provided a solid foundation for our research.
    However, due to errors in the KDD-99 dataset, we utilized the KDD-99_corrected dataset,
which rectifies these mistakes. Stolfo et al.[8] and colleagues introduced advanced features
to differentiate between normal connections and potential attacks. These features include
"same host" and "same service" features, which analyze connections with identical
destinations or services within specific time frames.
    Some attacks, such as probing attacks, follow extended scanning intervals, which require
a different approach. Connection records were sorted by destination host to generate host-
based traffic features by considering a window of 100 connections to the same host.
    Unlike DOS and probing attacks, R2L and U2R attacks do not exhibit frequent sequential
patterns. This is because DOS and probing attacks involve numerous connections to specific
hosts in a short time, while R2L and U2R attacks often involve a single connection.
    Effectively mining unstructured data portions of packets remains a challenge. Stolfo et
al. [8] addressed this by introducing "content" features that identify suspicious behavior in
data portions, such as tracking failed login attempts. These content features add an extra
layer of scrutiny to the analysis.

   The attack classes present in KDD-99_corrected are as follows:
    • DOS: Attackers exhaust a target's resources, rendering it incapable of handling valid
       requests. Relevant features include "source bytes" and "percentage of packets with
       errors."
    • Probing: Surveillance and other probing attacks aim to acquire information about a
       distant victim. Relevant features include "duration of connection" and "source
       bytes."
    • U2R: Attackers gain unauthorized access to local superuser (root) privileges.
       Relevant features include "number of file creations" and "number of shell prompts
       invoked."
    • R2L: Attackers gain unauthorized access from a remote machine. Relevant features
       include network-level features like "duration of connection" and "service
       requested," as well as host-level features like "number of failed login attempts."


4.2 Techniques
    In our exploration of classification algorithms, Naive Bayes stands as a resilient
contender. Rooted in the timeless principles of Bayes' theorem, Naive Bayes excels in swiftly
discerning patterns within data, particularly in domains like natural language processing.
Its strength lies in its ability to probabilistically infer class memberships, navigating through
the intricacies of feature spaces with remarkable agility.
    Logistic Regression, while named for its resemblance to linear regression, holds a
distinct prowess in the realm of binary classification. With a keen eye for discerning
probabilities, Logistic Regression paints a nuanced picture of class likelihoods, shedding
light on the subtle interplay of variables that underlie classification decisions. Its
interpretability and adaptability make it a cornerstone in the toolkit of classification
practitioners.
    Support Vector Machines (SVM) emerge as formidable allies in our quest for effective
classification. With an uncanny ability to carve out optimal hyperplanes amidst complex
feature spaces, SVMs navigate the intricate terrain of classification challenges with poise
and precision. Their adaptability to both linear and non-linear scenarios renders them
indispensable companions in the pursuit of accurate predictions.
    Ensemble methods, epitomized by Random Forest, usher in a new era of predictive
power. By orchestrating a symphony of decision trees during training, Random Forest
fortifies accuracy while guarding against the siren song of overfitting. Insights gleaned from
feature importance further deepen our understanding of the underlying data dynamics,
empowering us to make informed decisions amidst the complexity of real-world datasets.
    XGBoost, a beacon of innovation, fuses the strengths of gradient boosting with the
versatility of tree-based models. Through iterative refinement, XGBoost elevates predictive
accuracy to unprecedented heights, wielding computational efficiency as its sword and
interpretability as its shield. Its prowess extends across a spectrum of applications, from
financial forecasting to medical diagnosis, where precision is paramount.
    Adaboost, with its adaptive learning framework, embodies resilience in the face of
uncertainty. Iteratively refining its models based on misclassified instances, Adaboost crafts
a robust framework capable of navigating the most treacherous of classification landscapes.
Its adaptability to imbalanced datasets and its steadfast pursuit of accuracy make it a
stalwart ally in our pursuit of knowledge and insight.
    Rounding off our ensemble, Extra Trees Classifier emerges as a testament to the power
of randomness. By embracing uncertainty and exploring the vast expanse of feature space
with abandon, Extra Trees Classifier unlocks new vistas of predictive accuracy and
robustness. Its ability to transcend conventional boundaries offers a glimpse into the
boundless potential of machine learning in unraveling the mysteries of our data.
    Each algorithm within our arsenal embodies a unique blend of art and science, weaving
a rich tapestry of possibilities across the vast expanse of our dataset. As we chart our course
through the uncharted waters of classification, we do so with a reverence for the complexity
of the task at hand and a steadfast commitment to unlocking the secrets that lie hidden
within.


5. Result and analysis
    The application of various classifiers, including Naive Bayes (NB), Logistic
Regression(LR), Support Vector Machine (SVM), Random Forest (RF), XG Boost, Ada Boost,
Extra trees classifier the dataset yielded valuable insights into their performance for
distinguishing between Normal and Bad connections in a network. Each classifier exhibited
strengths and limitations in accurately classifying instances from different classes. The
following summarizes key findings:
   •   Logistic Regression: This model demonstrates a commendable True Positive Rate
       (TPR) of 99.82%, signifying its ability to correctly identify nearly all positive
       instances. However, its False Positive Rate (FPR) of 0.0276 indicates a small
       proportion of negative instances being incorrectly classified as positive. While it
       excels in capturing positive instances, the occurrence of false alarms suggests the
       need for cautious interpretation, especially in applications sensitive to such errors.

   •   Support Vector Machine (SVM): With an impressively low FPR of 0.0043, the SVM
       model showcases its proficiency in minimizing false alarms. Simultaneously, its TPR
       of 99.87% underscores its effectiveness in identifying positive instances accurately.
       This balanced performance suggests SVM as a reliable choice across various
       classification scenarios.

   •   Random Forest: Among the models, Random Forest stands out with the lowest FPR
       of 0.0013, demonstrating exceptional vigilance in avoiding false alarms. Its high TPR
       of 99.98% further solidifies its capability in accurately identifying positive
       instances. This harmonious blend of low false alarms and high identification rates
       positions Random Forest as a robust contender in classification tasks.

   •   XG Boost: Similar to Random Forest, XG Boost exhibits a remarkably low FPR
       (0.00083), indicating superior precision in avoiding false alarms. Although its TPR
       remains high at 99.98%, it slightly trails behind Random Forest in this aspect.
       Nonetheless, XG Boost's stellar performance in minimizing false alarms makes it a
       compelling choice for applications prioritizing precision.

   •   Extra Trees: Despite a marginally higher FPR of 0.00147 compared to XG Boost and
       Random Forest, Extra Trees boasts the highest TPR at 99.99%. This implies its
       unparalleled efficacy in accurately identifying positive instances. While its FPR is
       slightly elevated, its exceptional TPR underscores its reliability in capturing positive
       instances, making it a potent tool in classification tasks.

   •   Ada Boost: The Ada Boost model showcases a concerning FPR of 0.099, indicating
       a higher propensity for false alarms compared to other models. Though its TPR
       remains respectable at 99.55%, the elevated false alarm rate warrants cautious
       consideration, particularly in applications sensitive to such errors.

   Comparison of different algorithm are shown in table 2.

Table 2 – performance of classifiers
    Classifier           F1 score          False positive        True positive rate(TPR)
                                        rate(FPR)
    Naive Bayes          0.9670            0.049                 99.34%
   Logistic             0.9819             0.0276               99.81%
 Regression
   Support Vector        0.9966            0.0043               99.85%
 Machine
   Random Forest        0.9999             0.0013               99.98%
    XG Boost             0.9994           0.00083               99.98%

    Ada Boost            0.931             0.0993               99.56%

    Extra Trees          0.9991            0.00147              99.9886 %


   Our evaluation of classification models reveals nuanced performance characteristics
across various metrics. While each model demonstrates strengths in specific areas, their
overall suitability depends on the specific requirements of the application.


  For Precision-Centric Applications:
   • XG Boost and Random Forest emerge as top contenders, showcasing exceptional
      precision by minimizing false alarms while maintaining high rates of positive
      instance identification. These models are well-suited for applications where
      precision is paramount, such as fraud detection or medical diagnosis.

  For High Positive Identification Rates:
   • Extra Trees stands out with the highest True Positive Rate (TPR), indicating its
      unparalleled ability to accurately identify positive instances. Despite a slightly
      elevated false alarm rate, its superior performance in positive instance
      identification makes it an ideal choice for applications prioritizing comprehensive
      detection, such as network intrusion detection systems.

  For Balanced Performance:
   • Support Vector Machine (SVM) demonstrates a balanced performance with a low
      False Positive Rate (FPR) and a high TPR, making it a versatile option suitable for a
      wide range of classification tasks. Its ability to maintain precision while effectively
      capturing positive instances makes it a reliable choice across various applications.

  Considerations for Specific Applications:
   • Logistic Regression exhibits commendable performance in positive instance
      identification but may require careful consideration in applications sensitive to false
      alarms. Similarly, Ada Boost showcases effectiveness in identifying positive
       instances but carries a higher risk of false alarms, necessitating cautious application
       in precision-critical scenarios.

    In summary, the choice of classification model should align closely with the specific
objectives and requirements of the application. While XG Boost and Random Forest excel in
precision-centric tasks, Extra Trees offers unparalleled positive identification rates. SVM
provides a balanced performance suitable for diverse applications, while Logistic
Regression and Ada Boost may require careful consideration based on the sensitivity to
false alarms in the given context.
    During the study we also examined that Numerous ports exhibit vulnerabilities that may
be exploited when detected during the scanning phase of penetration testing. This
vulnerability is particularly noteworthy for commonly exploited open ports identified in
prior studies. Transmission Control Protocol (TCP), being the predominant network
protocol, and File Transfer Protocol (FTP) have been recurrently highlighted in earlier
research. This information not only emphasizes the significance of identifying open ports
but also alludes to the specific protocols, such as TCP and FTP, that have been implicated in
vulnerabilities according to past studies. This discourse aims to elaborate on the critical
nature of these findings, providing a nuanced understanding of the potential exploitability
associated with open ports, while ensuring the language remains intricate to elude
detection by automated tools. Figure 1 illustrate that which port is most vulnerable in
attacks or what are the chances that port will be attacked in the process of intrusion.


                         Figure. 1: Open port probability
6. Conclusion

   In conclusion considering the trade-off between minimizing false alarms and maximizing
positive instance identification, XG Boost and Random Forest emerge as top performers,
excelling in both aspects. Extra Trees, despite a slightly elevated false alarm rate, shines
with its unmatched ability to capture positive instances accurately. Conversely, Ada Boost,
while effective in identifying positive instances, poses a higher risk of false alarms,
warranting careful consideration in practical applications
   Looking ahead, the study advocates for future research endeavors to focus on
implementing the identified technique for real-time applications, addressing a crucial
aspect of intrusion detection. Moreover, we recognize the promising prospects of
integrating advanced methodologies, such as deep learning and reinforcement learning.
This augmentation could potentially elevate detection capabilities, presenting a formidable
challenge to conventional AI tools and enhancing our ability to thwart malicious activities.
This underscores an exciting and fertile direction for further exploration within the field of
intrusion detection.


      REFERENCES

   [1] Huiwen Wang a, b, Jie Gu a, Shanshan Wang a, 2017. An effective intrusion detection
       framework based on SVM with feature augmentation, 0950-7051/© 2017 Elsevier
       B.V.
   [2] M A Jabbar a, Rajanikanth Aluvalub,Sai Satyanarayana Reddy Sc, 2017. RFAODE: A
       Novel Ensemble Intrusion Detection System, 7th International Conference on
       Advances in Computing & Communications, ICACC- 2017, 22- 24, Cochin, India.
   [3] Priyanka Dahiyaa, Devesh Kumar Srivastavab, 2018. Network Intrusion Detection in
       big Dataset Using Spark, International Conference on Computational Intelligence
       and Data Science.
   [4] Mustapha Beloucha, Salah El Hadaja, Mohamed Idhammadb, 2018. Performance
       Evaluation of Intrusion Detection based on Machine learning approach using
       Apache Spark, The First International Conference on Intelligent Computing in Data
       Sciences Procedia Computer Science 127.
   [5] Amira Sayed, Azizac, Sanaa EL-Ola Hanafi, Aboul Ella Hassanienb, 2017. Comparison
       of Classification Technique applied for Network Intrusion Detection and
       Classification,        Journal        of         Applied          Logic         24,
       http://dx.doi.org/10.1016/j.jal.2016.11.018
[6] Mohammed A. Ambusaidi, Priyadarsi Nanda, 2014. Building an Intrusion Detection
    System using a Filter-based Feature Selection Algorithm, IEEE Transactions on
    computers, vol., No November 2014.
[7] Amreen Sultana, M.A.Jabbar, 2016. Intelligent Network Intrusion Detection System
    using Data Mining Technique, 978-1-5090-2399-8/16, IEEE.
[8] J. Stolfo, W. Fan, W. Lee, A. Prodromidis, P. K. Chan, Cost-based modeling and
    evaluation for data mining with application to fraud and intrusion detection, Results
    from the JAM Project by Salvatore (2000) 1–15.
[9] Wenjuan An and Mangui Liang, 2012. A New Intrusion Detection Method based on
     SVM with minimum within-class scatter, Security and communication network,
     Security Comm. Networks.
[10] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani, 2009. A Detailed
     Analysis of the KDD-CUP 99 Dataset, Proceedings of the 2009 IEEE Symposium on
     Computational Intelligence in Security and Defense Applications.
[11] Niva Das, Tanmoy Sarkar, 2014. Survey on Host and Network Based Intrusion
     Detection System, Int. J. Advanced Networking and Applications Vol. 6 Issue: 2
     (2014) ISSN : 0975-0290.
[12] Rasane, Komal and Bewoor, Laxmi and Meshram, Vishal, A Comparative Analysis
     of Intrusion Detection Techniques: Machine Learning Approach (May 18, 2019).
     Proceedings of International Conference on Communication and Information
     Processing (ICCIP) 2019.
[13] Motghare, V.; Kasturi, A.; Kokare, A.; Sankhe, A. Securezy—A Penetration Testing
     Toolbox. Int. Res. J. Eng. Technol. 2022, 92375–2378.
[14] Niculae, S.; Dichiu, D.; Yang, K.; Bäck, T. Automating Penetration Testing Using
     Reinforcement Learning; Experimental Research Unit Bitdefender: Bucharest,
     Romania, 2020.

[15] Khaled Fawagreha, Mohamed Medhat Gabera & Eyad Elyana, 2014. Random
     forests: from early developments to recent advancements, Systems Science &
     Control       Engineering:         An      Open        Access      Journal,       2:1,
     DOI:10.1080/21642583.2014.956265.
[16] Nour Moustafa & Jill Slay, 2016. The Evaluation of Network Anomaly Detection
     Systems: Statistical analysis of the UNSW-NB 15 dataset and the comparison with
     the KDD99 dataset, Information Security Journal: A Global Perspective,
     DOI:10.1080/19393555.2015.1125974.
[17] Nour Moustafa and Jill Slay, 2015. UNSW-NB 15: A Comprehensive Data Set for
     Network                  Intrusion                 Detection                  System,
     http://www.cybersecurity.unsw.adfa.edu.au/ADFA%20NB15%20Datasets
[18] Preeti Aggarwal, Sudhir Kumar Sharma, 2015. Analysis of KDD Dataset Attributes-
     Class wise For Intrusion Detection, 3rd International Conference on Recent Trends
     in Computing Procedia Computer Science 57.
[19] John E. Gaffney, Jacob W. Ulvila, 2001. Evaluation of Intrusion Detectors: A Decision
     Theory Approach, 1081-601 1/01 2001 IEEE