<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automated Penetration Testing: Machine Learning Approach⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jay Saini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ankita Bansal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of information technology, Netaji subhas university of technology</institution>
          ,
          <addr-line>110078 Delhi</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In our study, we used a better version of a dataset called KDD-99, known as the corrected dataset. The original KDD-99 dataset is often used for studying cybersecurity in real-time, but it has some problems. So, we picked the improved version to make our tests more realistic. This special dataset helped us imitate real cyber threats more accurately when we were testing computer systems and networks. We wanted to create challenges for artificial intelligence (AI) systems trying to tell the difference between real and fake attacks. By using the corrected dataset, we made our tests a bit like real cybersecurity situations, making it harder for AI to figure out what was happening. Our approach, using different tools and methods, builds a complete system for testing security. We always make sure our tests are ethical and authorized, and we do them regularly to keep up with new cyber threats. This way, we can better protect organizations from potential risks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Artificial Intelligence</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Intrusion Detection</kwd>
        <kwd>KDD_99</kwd>
        <kwd>1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Communication systems act as indispensable aides in our daily routines, seamlessly
facilitating work, learning, teamwork, data sharing, and enjoyable entertainment. Yet, the
intricate computer networks orchestrating these activities face potential risk. Safeguarding
them requires the vigilant oversight of an intrusion detection system (IDS), functioning as
a steadfast guardian for our computer systems.</p>
      <p>Consider the bustling activity on a popular website numerous visitors mean a wealth of
incoming information. To manage this influx, computers leverage machine learning, a
process wherein they glean insights from data. Subsequently, data mining comes into play,
extracting pertinent details from the vast pool of information. Now, envision possessing
insights into diverse methods that individuals might employ to compromise a network.
Enter a tool called nmaps, adept at organizing and comprehending this information, akin to
categorizing items into groups. This strategic approach aids in deciphering ongoing
activities and identifying potential threats.</p>
      <p>This comprehensive study underscores the paramount importance of communication
systems and the concerted efforts invested in ensuring their security. Leveraging
specialized tools and ingenious computing techniques, we navigate the intricacies of data
within these systems, particularly concerning potential cyber threats. The research delves
into computer data, reserving a portion (approximately 20%) for practice and testing
purposes.</p>
      <p>
        But there were many problems with dataset so in order to address these limitations,
Tavallaee et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] created a dataset that was devoid of any flaws, free from imperfections,
and included entries from the KDD-CUP 99 dataset, excluding redundant and duplicated
values.
      </p>
      <p>
        Aggarwala and Sharmab [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] interpreted the data attributes, which were classified into
traffic, basic, host, and content categories, within the KDD-CUP 99 dataset. The results of
their experiments in the realm of intrusion detection systems demonstrated an increased
detection rate coupled with a reduction in false alarm rates. Gaffney and Ulvila [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
introduced methods for distinguishing the performance of intrusion detectors and, for a
given environment, identified the optimal configuration for an intrusion detector. To
establish an expected cost metric, this approach employed a decision analysis that
integrated receiver operating characteristics (ROC) with a cost analysis method.
      </p>
      <p>The primary objective is to pinpoint vulnerable sections of the network, discerning
which areas are most susceptible to potential attacks by adversaries. This multifaceted
exploration combines practical testing and strategic analysis to fortify our understanding
and defenses against evolving cyber threats.</p>
      <p>The remaining paper is organized as: Section 2 explains Motivation, followed by
literature survey in section 3. Section 4 explains dataset and techniques used. The results
are illustrated in section 5. finally, the work is concluded in section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Motivation</title>
      <p>This study endeavors to thoroughly assess the existing landscape of network penetration
testing while also outlining potential directions for future research. In light of the
everincreasing frequency and sophistication of cyber-attacks in our contemporary digital
landscape, we underscore the paramount significance of network security. Penetration
testing emerges as a vital pillar in fortifying network security, systematically uncovering
vulnerabilities and weaknesses before they can be exploited by malicious entities.</p>
      <p>Penetration testing, or pen testing, is a vital cybersecurity process that simulates
cyberattacks to uncover and address vulnerabilities in systems. It involves key phases like
reconnaissance, scanning, vulnerability analysis, exploitation, and reporting, utilizing tools
such as network scanners and exploit frameworks. Aspiring penetration testers must grasp
these concepts to enhance organizational security. Ethical hacking, requiring expertise and
authorization, is an ongoing process crucial for regularly fortifying cybersecurity measures.
Pen testing serves as a proactive defense, identifying and addressing vulnerabilities before
real threats exploit them, bolstering overall organizational security.</p>
      <p>Traditional methodologies for penetration testing are recognized for their
laborintensive nature, substantial financial commitments, and the demand for a high level of
expertise. In response to these challenges, our innovative approach introduces an
automated framework for penetration testing, aimed at not only streamlining the process
but also supporting initiatives related to defense training. The overarching objective is to
demonstrate the effectiveness of this automated framework in penetration testing,
showcasing its potential to instigate transformative advancements in the dynamic field of
cybersecurity. This pioneering solution aligns with the imperative need for proactive
defense measures and strategic preparedness in the face of evolving cyber threats.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Literature Survey</title>
      <p>In this paper, a thorough examination of existing literature has been conducted to
appraise the ongoing research. Various papers, articles, and books have been scrutinized to
assess the current state of knowledge and identify areas where information is lacking. This
process aids in comprehending the existing landscape, discerning gaps in knowledge, and
understanding the evolution of thought in the field. The survey establishes a foundational
understanding for subsequent phases by summarizing critical concepts, highlighting gaps,
and illustrating the progression of ideas in the subject area. Analogous to consulting a map
before embarking on a journey, this investigation serves as a strategic guide, assisting in
determining the current position and potential areas for exploration in the field of machine
learning. All the findings of the previous contributors are shown the table 1.
2006+ dataset and achieved an accuracy of 90.51% with a low false alarm
rate of 0.14. These algorithms effectively distinguished between normal and
malicious network traffic.</p>
      <p>
        Dahiya The author has crafted a framework aimed at precise intrusion prediction
and in network records using Spark. In the proposed work, an algorithm for
Srivastava reducing features was integrated to discard less significant ones.
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] Subsequently, a supervised data mining technique was employed on the
UNSW-NB 15 dataset. The outcomes were assessed using two feature
reduction algorithms, Linera Discriminant Analysis (LDA) and Canonical
Correlation Analysis (CCA), in conjunction with seven classification
algorithms.
      </p>
      <p>
        Belouch et The author assessed the effectiveness of four machine learning
al.[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] algorithms—namely, random forest, Naive Bayes, SVM, and decision tree—
utilizing Apache Spark. Performance metrics, including prediction time,
accuracy, and building time, were calculated. The experimentation was
conducted on the UNSW-NB 15 dataset. The findings indicated that the
random forest classifier outperformed others, demonstrating superior
results in prediction time, accuracy, and building time.
      </p>
      <p>
        Aziza et The analysis involved a comparison of various classifiers to enhance
al.,[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] detection accuracy and gain more insights into detected anomalies. The
study revealed distinct classifier rates, emphasizing that a one-size-fits-all
approach is not suitable for all types of attacks. Notably, 90% of anomalies
were successfully identified during the detection phase. However, in the
classification phase, 88% of false positives were mistakenly labeled as
normal traffic connections. The use of NB, NBTree, and BFTree classifiers
demonstrated an accuracy of 79% in correctly labeling Dos and Probe
attacks.
      </p>
      <p>
        Ambusaid The author developed an algorithm grounded in mutual information to
i and address dependent features in the data. The designed Intrusion Detection
Nanda System (IDS) based on Least Square Support Vector Machine (LSSVM-IDS)
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] was evaluated using datasets including Kyoto 2006+, KDD CUP-99, and
NSLKDD. The proposed approach achieved higher accuracy and reduced
computational costs through the utilization of the feature selection-based
algorithm, LSSVM-IDS.
      </p>
      <sec id="sec-3-1">
        <title>Sultana and Jabbar [7]</title>
        <p>The author introduced an intelligent network intrusion detection system
employing the Average One Dependence Estimator (AODE) algorithm. The
results were assessed using the NSL-KDD dataset, demonstrating a
successful outcome with a low False Alarm Rate (FAR) and a high Detection
Rate (DR) in the proposed model based on the AODE algorithm.
and The author introduced a novel algorithm, incorporating Fisher Discriminate
Analysis by integrating within-class scatter alongside the traditional
Support Vector Machine (SVM) for classifiers. The proposed algorithm
underwent testing using the KDD-Cup 99 dataset. In comparison to Fisher
Discriminate Analysis and the conventional SVM, the implemented
algorithm (WCS-SVM) demonstrated superior discriminatory power.
Additionally, it exhibited enhanced detection rates and reduced false
positive rates, showcasing its efficacy in intrusion detection systems.
created a new dataset that was free from imperfections. This dataset was
curated by retaining records from the KDD-CUP 99 dataset while
eliminating redundant and duplicated values, addressing the shortcomings
of the original dataset.</p>
        <p>The focus of the author was on the evolution of Random Forest (RF) from
its early development to recent advancements. The primary objective of the
proposed work was to comprehensively represent the research conducted
to date, offering an analysis of the potential and future developments in the
field of Random Forest.</p>
        <p>The attributes of the data, classified into traffic, basic, host, and content
categories, were analyzed within the KDD-CUP 99 dataset.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Aggarwal a and Sharmab, [17]</title>
        <p>
          Gaffney Introduced some methodologies aimed at discerning the efficacy of
and Ulvila intrusion detectors and identifying optimal configurations for intrusion
[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] detectors within a given environment. The approach employed a decision
analysis that integrated receiver operating characteristics (ROC) with a cost
analysis method to establish an expected cost metric.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Materials</title>
      <p>This section delves into the research methodology employed, elaborating on how the ML
technique was utilized. Additionally, the effectiveness of incorporating this ML technique is
thoroughly discussed.</p>
      <sec id="sec-4-1">
        <title>4.1 Dataset</title>
        <p>Our experimental work utilized the KDD-CUP 99 dataset on a machine with a 2GHz
processor, 4GB RAM, and a 64-bit Windows operating system. This dataset, obtained from
Lincoln Labs, mimics the U.S. Air Force Local Area Network (LAN) and comprises seven
weeks of raw TCP dump data. It contains various attacks and focuses on the sequence of
TCP packets within fixed time intervals, along with specific source and target IP addresses.</p>
        <p>Initially, the dataset consisted of approximately five million records, which was too large
for research purposes. Thus, we generated a 10% subset for our initial model
implementation. With 41 features, including 22 attack types categorized into four classes,
the dataset provided a solid foundation for our research.</p>
        <p>
          However, due to errors in the KDD-99 dataset, we utilized the KDD-99_corrected dataset,
which rectifies these mistakes. Stolfo et al.[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and colleagues introduced advanced features
to differentiate between normal connections and potential attacks. These features include
"same host" and "same service" features, which analyze connections with identical
destinations or services within specific time frames.
        </p>
        <p>Some attacks, such as probing attacks, follow extended scanning intervals, which require
a different approach. Connection records were sorted by destination host to generate
hostbased traffic features by considering a window of 100 connections to the same host.</p>
        <p>Unlike DOS and probing attacks, R2L and U2R attacks do not exhibit frequent sequential
patterns. This is because DOS and probing attacks involve numerous connections to specific
hosts in a short time, while R2L and U2R attacks often involve a single connection.</p>
        <p>
          Effectively mining unstructured data portions of packets remains a challenge. Stolfo et
al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] addressed this by introducing "content" features that identify suspicious behavior in
data portions, such as tracking failed login attempts. These content features add an extra
layer of scrutiny to the analysis.
        </p>
        <p>The attack classes present in KDD-99_corrected are as follows:
• DOS: Attackers exhaust a target's resources, rendering it incapable of handling valid
requests. Relevant features include "source bytes" and "percentage of packets with
errors."
• Probing: Surveillance and other probing attacks aim to acquire information about a
distant victim. Relevant features include "duration of connection" and "source
bytes."
• U2R: Attackers gain unauthorized access to local superuser (root) privileges.</p>
        <p>Relevant features include "number of file creations" and "number of shell prompts
invoked."
• R2L: Attackers gain unauthorized access from a remote machine. Relevant features
include network-level features like "duration of connection" and "service
requested," as well as host-level features like "number of failed login attempts."</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 Techniques</title>
        <p>In our exploration of classification algorithms, Naive Bayes stands as a resilient
contender. Rooted in the timeless principles of Bayes' theorem, Naive Bayes excels in swiftly
discerning patterns within data, particularly in domains like natural language processing.
Its strength lies in its ability to probabilistically infer class memberships, navigating through
the intricacies of feature spaces with remarkable agility.</p>
        <p>Logistic Regression, while named for its resemblance to linear regression, holds a
distinct prowess in the realm of binary classification. With a keen eye for discerning
probabilities, Logistic Regression paints a nuanced picture of class likelihoods, shedding
light on the subtle interplay of variables that underlie classification decisions. Its
interpretability and adaptability make it a cornerstone in the toolkit of classification
practitioners.</p>
        <p>Support Vector Machines (SVM) emerge as formidable allies in our quest for effective
classification. With an uncanny ability to carve out optimal hyperplanes amidst complex
feature spaces, SVMs navigate the intricate terrain of classification challenges with poise
and precision. Their adaptability to both linear and non-linear scenarios renders them
indispensable companions in the pursuit of accurate predictions.</p>
        <p>Ensemble methods, epitomized by Random Forest, usher in a new era of predictive
power. By orchestrating a symphony of decision trees during training, Random Forest
fortifies accuracy while guarding against the siren song of overfitting. Insights gleaned from
feature importance further deepen our understanding of the underlying data dynamics,
empowering us to make informed decisions amidst the complexity of real-world datasets.</p>
        <p>XGBoost, a beacon of innovation, fuses the strengths of gradient boosting with the
versatility of tree-based models. Through iterative refinement, XGBoost elevates predictive
accuracy to unprecedented heights, wielding computational efficiency as its sword and
interpretability as its shield. Its prowess extends across a spectrum of applications, from
financial forecasting to medical diagnosis, where precision is paramount.</p>
        <p>Adaboost, with its adaptive learning framework, embodies resilience in the face of
uncertainty. Iteratively refining its models based on misclassified instances, Adaboost crafts
a robust framework capable of navigating the most treacherous of classification landscapes.
Its adaptability to imbalanced datasets and its steadfast pursuit of accuracy make it a
stalwart ally in our pursuit of knowledge and insight.</p>
        <p>Rounding off our ensemble, Extra Trees Classifier emerges as a testament to the power
of randomness. By embracing uncertainty and exploring the vast expanse of feature space
with abandon, Extra Trees Classifier unlocks new vistas of predictive accuracy and
robustness. Its ability to transcend conventional boundaries offers a glimpse into the
boundless potential of machine learning in unraveling the mysteries of our data.</p>
        <p>Each algorithm within our arsenal embodies a unique blend of art and science, weaving
a rich tapestry of possibilities across the vast expanse of our dataset. As we chart our course
through the uncharted waters of classification, we do so with a reverence for the complexity
of the task at hand and a steadfast commitment to unlocking the secrets that lie hidden
within.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Result and analysis</title>
      <p>The application of various classifiers, including Naive Bayes (NB), Logistic
Regression(LR), Support Vector Machine (SVM), Random Forest (RF), XG Boost, Ada Boost,
Extra trees classifier the dataset yielded valuable insights into their performance for
distinguishing between Normal and Bad connections in a network. Each classifier exhibited
strengths and limitations in accurately classifying instances from different classes. The
following summarizes key findings:
•
•
•
•
•
•</p>
      <p>Logistic Regression: This model demonstrates a commendable True Positive Rate
(TPR) of 99.82%, signifying its ability to correctly identify nearly all positive
instances. However, its False Positive Rate (FPR) of 0.0276 indicates a small
proportion of negative instances being incorrectly classified as positive. While it
excels in capturing positive instances, the occurrence of false alarms suggests the
need for cautious interpretation, especially in applications sensitive to such errors.
Support Vector Machine (SVM): With an impressively low FPR of 0.0043, the SVM
model showcases its proficiency in minimizing false alarms. Simultaneously, its TPR
of 99.87% underscores its effectiveness in identifying positive instances accurately.
This balanced performance suggests SVM as a reliable choice across various
classification scenarios.</p>
      <p>Random Forest: Among the models, Random Forest stands out with the lowest FPR
of 0.0013, demonstrating exceptional vigilance in avoiding false alarms. Its high TPR
of 99.98% further solidifies its capability in accurately identifying positive
instances. This harmonious blend of low false alarms and high identification rates
positions Random Forest as a robust contender in classification tasks.</p>
      <p>XG Boost: Similar to Random Forest, XG Boost exhibits a remarkably low FPR
(0.00083), indicating superior precision in avoiding false alarms. Although its TPR
remains high at 99.98%, it slightly trails behind Random Forest in this aspect.
Nonetheless, XG Boost's stellar performance in minimizing false alarms makes it a
compelling choice for applications prioritizing precision.</p>
      <p>Extra Trees: Despite a marginally higher FPR of 0.00147 compared to XG Boost and
Random Forest, Extra Trees boasts the highest TPR at 99.99%. This implies its
unparalleled efficacy in accurately identifying positive instances. While its FPR is
slightly elevated, its exceptional TPR underscores its reliability in capturing positive
instances, making it a potent tool in classification tasks.</p>
      <p>Ada Boost: The Ada Boost model showcases a concerning FPR of 0.099, indicating
a higher propensity for false alarms compared to other models. Though its TPR
remains respectable at 99.55%, the elevated false alarm rate warrants cautious
consideration, particularly in applications sensitive to such errors.</p>
      <sec id="sec-5-1">
        <title>Comparison of different algorithm are shown in table 2. Table 2 – performance of classifiers</title>
        <p>Classifier F1 score</p>
      </sec>
      <sec id="sec-5-2">
        <title>Naive Bayes 0.9670</title>
        <p>Our evaluation of classification models reveals nuanced performance characteristics
across various metrics. While each model demonstrates strengths in specific areas, their
overall suitability depends on the specific requirements of the application.</p>
        <p>For Precision-Centric Applications:
• XG Boost and Random Forest emerge as top contenders, showcasing exceptional
precision by minimizing false alarms while maintaining high rates of positive
instance identification. These models are well-suited for applications where
precision is paramount, such as fraud detection or medical diagnosis.</p>
        <p>For High Positive Identification Rates:
• Extra Trees stands out with the highest True Positive Rate (TPR), indicating its
unparalleled ability to accurately identify positive instances. Despite a slightly
elevated false alarm rate, its superior performance in positive instance
identification makes it an ideal choice for applications prioritizing comprehensive
detection, such as network intrusion detection systems.</p>
        <p>For Balanced Performance:
• Support Vector Machine (SVM) demonstrates a balanced performance with a low
False Positive Rate (FPR) and a high TPR, making it a versatile option suitable for a
wide range of classification tasks. Its ability to maintain precision while effectively
capturing positive instances makes it a reliable choice across various applications.
Considerations for Specific Applications:
• Logistic Regression exhibits commendable performance in positive instance
identification but may require careful consideration in applications sensitive to false
alarms. Similarly, Ada Boost showcases effectiveness in identifying positive
instances but carries a higher risk of false alarms, necessitating cautious application
in precision-critical scenarios.</p>
        <p>In summary, the choice of classification model should align closely with the specific
objectives and requirements of the application. While XG Boost and Random Forest excel in
precision-centric tasks, Extra Trees offers unparalleled positive identification rates. SVM
provides a balanced performance suitable for diverse applications, while Logistic
Regression and Ada Boost may require careful consideration based on the sensitivity to
false alarms in the given context.</p>
        <p>During the study we also examined that Numerous ports exhibit vulnerabilities that may
be exploited when detected during the scanning phase of penetration testing. This
vulnerability is particularly noteworthy for commonly exploited open ports identified in
prior studies. Transmission Control Protocol (TCP), being the predominant network
protocol, and File Transfer Protocol (FTP) have been recurrently highlighted in earlier
research. This information not only emphasizes the significance of identifying open ports
but also alludes to the specific protocols, such as TCP and FTP, that have been implicated in
vulnerabilities according to past studies. This discourse aims to elaborate on the critical
nature of these findings, providing a nuanced understanding of the potential exploitability
associated with open ports, while ensuring the language remains intricate to elude
detection by automated tools. Figure 1 illustrate that which port is most vulnerable in
attacks or what are the chances that port will be attacked in the process of intrusion.</p>
      </sec>
      <sec id="sec-5-3">
        <title>Figure. 1: Open port probability</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In conclusion considering the trade-off between minimizing false alarms and maximizing
positive instance identification, XG Boost and Random Forest emerge as top performers,
excelling in both aspects. Extra Trees, despite a slightly elevated false alarm rate, shines
with its unmatched ability to capture positive instances accurately. Conversely, Ada Boost,
while effective in identifying positive instances, poses a higher risk of false alarms,
warranting careful consideration in practical applications</p>
      <p>Looking ahead, the study advocates for future research endeavors to focus on
implementing the identified technique for real-time applications, addressing a crucial
aspect of intrusion detection. Moreover, we recognize the promising prospects of
integrating advanced methodologies, such as deep learning and reinforcement learning.
This augmentation could potentially elevate detection capabilities, presenting a formidable
challenge to conventional AI tools and enhancing our ability to thwart malicious activities.
This underscores an exciting and fertile direction for further exploration within the field of
intrusion detection.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Huiwen</given-names>
            <surname>Wang</surname>
          </string-name>
          <article-title>a, b, Jie Gu a</article-title>
          ,
          <source>Shanshan Wang a</source>
          ,
          <year>2017</year>
          .
          <article-title>An effective intrusion detection framework based on SVM with feature augmentation</article-title>
          ,
          <fpage>0950</fpage>
          -
          <lpage>7051</lpage>
          /© 2017 Elsevier
          <string-name>
            <surname>B.V.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M A</given-names>
            <surname>Jabbar</surname>
          </string-name>
          <article-title>a</article-title>
          , Rajanikanth Aluvalub,
          <source>Sai Satyanarayana Reddy Sc</source>
          ,
          <year>2017</year>
          . RFAODE:
          <string-name>
            <given-names>A Novel</given-names>
            <surname>Ensemble Intrusion Detection System</surname>
          </string-name>
          ,
          <source>7th International Conference on Advances in Computing &amp; Communications, ICACC- 2017</source>
          ,
          <fpage>22</fpage>
          -
          <lpage>24</lpage>
          , Cochin, India.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Priyanka</given-names>
            <surname>Dahiyaa</surname>
          </string-name>
          , Devesh Kumar Srivastavab,
          <year>2018</year>
          .
          <article-title>Network Intrusion Detection in big Dataset Using Spark</article-title>
          ,
          <source>International Conference on Computational Intelligence and Data Science.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Mustapha</given-names>
            <surname>Beloucha</surname>
          </string-name>
          , Salah El Hadaja, Mohamed Idhammadb,
          <year>2018</year>
          .
          <article-title>Performance Evaluation of Intrusion Detection based on Machine learning approach using Apache Spark</article-title>
          ,
          <source>The First International Conference on Intelligent Computing in Data Sciences Procedia Computer Science</source>
          <volume>127</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Amira</given-names>
            <surname>Sayed</surname>
          </string-name>
          , Azizac, Sanaa
          <string-name>
            <surname>EL-Ola</surname>
            <given-names>Hanafi</given-names>
          </string-name>
          , Aboul Ella Hassanienb,
          <year>2017</year>
          .
          <article-title>Comparison of Classification Technique applied for Network Intrusion Detection and Classification</article-title>
          ,
          <source>Journal of Applied Logic</source>
          <volume>24</volume>
          , http://dx.doi.org/10.1016/j.jal.
          <year>2016</year>
          .
          <volume>11</volume>
          .018
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Mohammed</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ambusaidi</surname>
          </string-name>
          , Priyadarsi Nanda,
          <year>2014</year>
          .
          <article-title>Building an Intrusion Detection System using a Filter-based Feature Selection Algorithm, IEEE Transactions on computers</article-title>
          , vol.,
          <source>No November</source>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Amreen</given-names>
            <surname>Sultana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.A.</given-names>
            <surname>Jabbar</surname>
          </string-name>
          ,
          <year>2016</year>
          .
          <article-title>Intelligent Network Intrusion Detection System using Data Mining Technique</article-title>
          ,
          <fpage>978</fpage>
          -1-
          <fpage>5090</fpage>
          -2399-8/16, IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Stolfo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prodromidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <article-title>Cost-based modeling and evaluation for data mining with application to fraud and intrusion detection, Results from the JAM Project by Salvatore (</article-title>
          <year>2000</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Wenjuan</given-names>
            <surname>An</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mangui</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <year>2012</year>
          .
          <article-title>A New Intrusion Detection Method based on SVM with minimum within-class scatter, Security and communication network, Security Comm</article-title>
          . Networks.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Mahbod</surname>
            <given-names>Tavallaee</given-names>
          </string-name>
          , Ebrahim Bagheri, Wei Lu, and
          <string-name>
            <surname>Ali</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ghorbani</surname>
          </string-name>
          ,
          <year>2009</year>
          .
          <article-title>A Detailed Analysis of the KDD-CUP 99 Dataset</article-title>
          ,
          <source>Proceedings of the 2009 IEEE Symposium on Computational Intelligence in Security and Defense Applications.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Niva</surname>
            <given-names>Das</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanmoy Sarkar</surname>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>Survey on Host and Network Based Intrusion Detection System</article-title>
          ,
          <source>Int. J. Advanced Networking and Applications</source>
          Vol.
          <volume>6</volume>
          Issue: 2 (
          <year>2014</year>
          ) ISSN :
          <fpage>0975</fpage>
          -
          <lpage>0290</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Rasane</surname>
          </string-name>
          , Komal and Bewoor, Laxmi and Meshram,
          <string-name>
            <surname>Vishal</surname>
          </string-name>
          ,
          <source>A Comparative Analysis of Intrusion Detection Techniques: Machine Learning Approach (May</source>
          <volume>18</volume>
          ,
          <year>2019</year>
          ).
          <source>Proceedings of International Conference on Communication and Information Processing</source>
          (ICCIP)
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Motghare</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kasturi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kokare</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sankhe</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Securezy-A Penetration Testing</surname>
          </string-name>
          <article-title>Toolbox</article-title>
          .
          <source>Int. Res. J. Eng. Technol</source>
          .
          <year>2022</year>
          ,
          <fpage>92375</fpage>
          -
          <lpage>2378</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Niculae</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dichiu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bäck</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <article-title>Automating Penetration Testing Using Reinforcement Learning</article-title>
          ; Experimental Research Unit Bitdefender: Bucharest, Romania,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Khaled</surname>
            <given-names>Fawagreha</given-names>
          </string-name>
          ,
          <source>Mohamed Medhat Gabera &amp; Eyad Elyana</source>
          ,
          <year>2014</year>
          .
          <article-title>Random forests: from early developments to recent advancements</article-title>
          ,
          <source>Systems Science &amp; Control Engineering: An Open Access Journal</source>
          ,
          <volume>2</volume>
          :1, DOI:10.1080/21642583.
          <year>2014</year>
          .
          <volume>956265</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Nour</surname>
            <given-names>Moustafa &amp; Jill</given-names>
          </string-name>
          <string-name>
            <surname>Slay</surname>
          </string-name>
          ,
          <year>2016</year>
          .
          <article-title>The Evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB 15 dataset and the comparison with the KDD99 dataset, Information Security Journal: A Global Perspective</article-title>
          , DOI:10.1080/19393555.
          <year>2015</year>
          .
          <volume>1125974</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Nour</given-names>
            <surname>Moustafa</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jill</given-names>
            <surname>Slay</surname>
          </string-name>
          ,
          <year>2015</year>
          .
          <article-title>UNSW-NB 15: A Comprehensive Data Set for Network Intrusion Detection System</article-title>
          , http://www.cybersecurity.unsw.adfa.edu.au/ADFA%20NB15%
          <fpage>20Datasets</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Preeti</surname>
            <given-names>Aggarwal</given-names>
          </string-name>
          , Sudhir Kumar Sharma,
          <year>2015</year>
          .
          <article-title>Analysis of KDD Dataset AttributesClass wise For Intrusion Detection</article-title>
          ,
          <source>3rd International Conference on Recent Trends in Computing Procedia Computer Science</source>
          <volume>57</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>John</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Gaffney</surname>
          </string-name>
          , Jacob W. Ulvila,
          <year>2001</year>
          .
          <article-title>Evaluation of Intrusion Detectors: A Decision Theory Approach</article-title>
          ,
          <fpage>1081</fpage>
          -
          <lpage>601</lpage>
          1/01 2001 IEEE
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>