<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhancing ML model efficiency for DDoS attack detection on web servers in resource-constrained environments</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qais Saif Qassim</string-name>
          <email>qqassim@acm.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nessibeli Askarbekova</string-name>
          <email>n.askarbekova@iitu.edu.kz</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Computing and Information Sciences, University of Technology and Applied Sciences-Ibri</institution>
          ,
          <addr-line>Ibri</addr-line>
          ,
          <country country="OM">Oman</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>International Information Technology University</institution>
          ,
          <addr-line>34/1 Manas St., 050000, Almaty</addr-line>
          ,
          <country country="KZ">Kazakhstan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Distributed Denial of Service (DDoS) attacks, particularly High Orbit Ion Cannon (HOIC) attacks, present a significant threat to web servers. This study focuses on enhancing feature selection and model efficiency to ensure accurate detection while minimizing computational resources. The study aims to demonstrate that effective DDoS detection can be achieved with fewer instances and a limited number of key features. This approach not only optimizes the computational resources required for model training but also highlights the potential for deploying efficient and scalable detection systems in real-world environments where data processing capacity may be constrained. The selected features and resampled datasets form the basis for developing and evaluating machine learning models capable of accurately identifying HOIC attacks across varying dataset sizes.</p>
      </abstract>
      <kwd-group>
        <kwd>Feature Selection</kwd>
        <kwd>Classification</kwd>
        <kwd>DDoS</kwd>
        <kwd>Intrusion Detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In an era where the internet has become integral to virtually every aspect of modern life, the security of
web applications is paramount. As businesses, governments, and individuals increasingly rely on these
applications for critical services, the threat landscape has evolved, with attackers developing
sophisticated methods to disrupt and compromise systems. Among the large number of today’s cyber
threats, Distributed Denial of Service (DDoS) attacks have emerged as one of the most disruptive, capable
of overwhelming web applications and rendering them inaccessible to legitimate users [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. DDoS attacks
stand out from other cyber threats due to their ability to cause widespread disruption with relatively
minimal effort on the part of the attacker. By leveraging a large number of compromised devices,
attackers can generate vast volumes of malicious traffic that can saturate a target's network
infrastructure, leading to significant downtime, financial losses, and damage to an organization's
reputation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Within the spectrum of DDoS attacks, the High Orbit Ion Cannon (HOIC) represents a
particularly potent threat, characterized by its ability to generate large-scale traffic surges that can evade
traditional security measures [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        The HOIC is a powerful tool used by cyber criminals, specifically for executing Distributed Denial of
Service attacks. It is designed to overwhelm multiple targets simultaneously [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This capability makes it
a more sophisticated and versatile weapon in the arsenal of cyber attackers. HOIC operates by flooding a
target server with a massive volume of HTTP 'GET' and 'POST' requests. These requests are designed to
drain the server's capabilities and render it unable to respond to legitimate users [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The tool's capacity
to send several requests concurrently, paired with its multi-threading capabilities, enables it to produce a
large quantity of traffic in a short period of time.
      </p>
      <p>
        One of the distinguishing characteristics of HOIC is the usage of "boosters." These are add-ons that
effectiveness the tool's efficacy by customizing the attack to exploit specific vulnerabilities on the target
server [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Boosters can also obfuscate the attack traffic, making it difficult for defense mechanisms to
detect and mitigate the threat [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. HOIC's versatility and capacity to enhance the intensity of the attack
make it an intimidating tool for carrying out large-scale DDoS attacks. Despite its destructive power,
HOIC is rather user-friendly, having an interface that enables even those with limited technical expertise
to launch an attack. This accessibility has increased its popularity among cyber attackers. However, it
should be noted that employing HOIC for harmful reasons is both unlawful and unethical.
Understanding its mechanics is crucial for cybersecurity professionals to develop effective
countermeasures and protect digital infrastructure from such threats [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>This paper aims to underscore the significance of DDoS attacks in the context of web application
security, drawing comparisons with other forms of cyber threats. Furthermore, by analyzing a dataset
sourced from Kaggle, this research seeks to identify key network traffic parameters that are indicative of
HOIC attacks. By understanding these parameters, it becomes possible to develop more effective
detection mechanisms, thereby enhancing the resilience of web applications against these pervasive
threats.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Modifications</title>
      <p>This section demonstrates some of the efforts that have been made to address the attacks on web servers
and web applications. One of the key areas of focus has been the improvement of Intrusion Detection
Systems (IDS), which play an important role in detecting malicious activity in real time. These systems
use machine learning algorithms, anomaly detection techniques, and signature-based methods to
analyze network data, identify patterns that indicate an attack, and respond quickly to reduce its damage.</p>
      <p>
        Ramezany et. al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have presented a machine learning-based malicious payload detection and
classification mechanism based on TF-IDF feature vector construction. The study has proposed a new
HTTP attack dataset composed of eight attack categories (including RCE, Injection, LFI- lFD, XSS, XXE,
Open-Redirect, CRLF, and Deserialize) and clean category for non-malicious payload. The instances of
the dataset have been collected using honeypot machines and classified by a domain expert. To verify the
proposed model, the authors have performed a comparative study on three algorithms applied to the
framework: Support Vector Machine, Random Forest, and Stochastic Gradient Descent. The
experimental results showed that Random Forest outperforms other methods, given that sufficient
samples are available.
      </p>
      <p>Vartouni et. al. [7] proposes a machine learning-based firewall for web applications that relies on
anomaly detection. The proposed approach analyzes HTTP traffic and uses an n-gram model based on
character to create features from HTTP data. To minimize the problem's dimensionality, a stacked
autoencoder (SAE) with various configurations is used to extract relevant features from data. Finally, to
discover anomalies, an isolation forest method is used. The proposed model was tested using the CSIC
2010 dataset. The authors concluded that deep learning algorithms based on sigmoid have good
generalization based on detection rate and specificity. Hashim et. al. [8] have proposed a machine
learning-based detection algorithm to identify phishing websites. The proposed system also recommends
a mitigation technique for different web application attacks. The proposed method utilizes three machine
learning algorithms (Support Vector Machine, Random Forest and Logistic regression) and deep learning
using Long Short Term Memory (LSTM). 98% detection accuracy was achieved using LSTM.</p>
      <p>Protecting against SQL attacks, Gogoi et. al. [9] proposed an SQL injection detection method using
NLP and Machine Learning. The ML algorithms are trained using normal inputs and SQLi payloads. The
normal input is generated using the probabilistic method. On the other hand, the SQLi Attack payloads
are obtained from tools and manual settings. The experimental results showed that using ML and NLP
techniques can improve the traditional detection approaches. Moreover, SVM classification has
outperformed other machine learning algorithms. Another SQL-related attack detection system has been
proposed by Joshi and Geetha [10]. The study proposed a detection method based on Naïve Bayes
Machine Learning Algorithm combined with Role Based Access control mechanism. During the learning
phase, the application receives the training dataset from text files and applies each data to the classifier's
learning mechanism. The classifier learns using machine learning feature vectors generated by the blank
separation and tokenizing method from data collected. The function of the user is also included in the
feature vector, which is utilized for classification using the Role-Based Access Control mechanism.
During the classification process, the application reads the test dataset from text files and applies each
data to the classifier's classification procedure. Classification is performed using the generated feature
vector. Precision and Recall are used in the application to examine the classification results of the Nave
Bayes machine learning approach.</p>
      <p>To detect web application injection threats, a one-class Support Vector Machine (SVM) was proposed
by Zhou et. al. [11]. The authors proposed that detection of injection attacks be approached as an
anomaly detection challenge. Several genuine HTTP requests are used to train a one-class SVM model
during the training phase. The trained one-class SVM is used in the testing stage to determine if an HTTP
request is authentic or malicious. To extract features from HTTP requests, we use the 2u-gram technique
(a version of n-gram). The experimental results reveal that a one-class SVM detects web application
injection attacks with a 94.04% detection rate and a 1.62% false positive rate. Another machine
learningbased detection system is presented by Banerjee et. al. [12]. The authors have considered Cross-site
scripting detection by looking at the URL and the JavaScript. The study proposed employing four
machine learning techniques to estimate the severity of XSS threat (SVM, KNN, Random Forest, and
Logistic Regression). After analyzing and evaluating the dataset features, it was determined that the
Random Forest Classifier performed the best, with a False Positive Rate of 0.34.</p>
      <p>Blind cross-site scripting detection was proposed by G. Kaur et al. [13] using machine learning. The
proposed detection method investigates the presence of dangerous payloads that were likely to be stored
in databases via online applications. The Linear Support Vector Machine classifier has been utilized to
detect the malicious scripts. The system extracts JavaScript events which attackers usually use to insert
vulnerable payloads. The extracted events were utilized as training vectors to construct the hyperplane
of the SVM. The same has been used for testing purposes as well. The experimental results showed that
the proposed approach could detect blind XSS attacks with 95.4% accuracy. Sharma, Zavarsky and
Butakov [14] claimed that the selection of the features plays a vital role in improving the detection
system's accuracy. Therefore, the study suggested a fine-tuned feature set extracted from the generic
CSIC 2010 HTTP dataset. A specialized software tool has been developed to extract the required features/.
The experimental results with three machine learning algorithms (J48, Naïve Bayes, OneR) demonstrate
the reliability in detecting web-based attacks. The J48 decision tree algorithm was depicted to be the
bestperforming algorithm, with the best attack detection rate of 94.5%.</p>
      <p>The application of machine learning in web security was not only through proposing a detection
system nor improving the performance of WFA. Vulnerability identification has its share, such that
several researchers offered to implement web applications' vulnerability scanners using machine
learning. For example, Tommy, Sundeep and Jose [15] proposed a system which helps to automatically
find and fix the vulnerabilities present in web applications using machine learning. Machine learning
helps to improve the system's performance using the statistics of the previous results. The proposed
method uses web crawls to reach every web application page and utilizes pre-defined payloads to test
several vulnerabilities, including SQL injection and Cross-Site Scripting (XSS). Moreover, the proposed
system can correct the identified vulnerabilities through a pre-defined set of instructions. The Support
vector algorithm is used to improve the efficiency of the vulnerability identification process. In contrast,
the scan result of one system shall be utilized to determine the vulnerabilities of similar systems faster.</p>
    </sec>
    <sec id="sec-3">
      <title>3. IDS 2018 intrusion dataset</title>
      <p>The IDS 2018 intrusion dataset, developed by the University of New Brunswick [16], is a comprehensive
and widely recognized resource in the field of cybersecurity research. This dataset was meticulously
crafted to simulate a realistic and diverse range of network traffic, encompassing both normal and
malicious activities. It serves as an invaluable tool for researchers and practitioners aiming to advance
the development and evaluation of intrusion detection systems (IDS), including those designed to
identify DDoS attacks. The dataset is structured to reflect contemporary network environments,
incorporating various attack scenarios that mirror real-world threats. Among these, DDoS attacks are
prominently featured, providing a robust foundation for analyzing the characteristics and behaviors
associated with these disruptive activities. The IDS 2018 dataset includes detailed records of network
traffic, such as IP addresses, port numbers, and protocol types, alongside annotations that distinguish
between benign and malicious traffic. For this study, the focus will be on leveraging the IDS 2018 dataset
to identify network traffic parameters that are critical for detecting HOIC attacks. By systematically
analyzing the dataset, this research aims to uncover patterns and indicators that can enhance the
detection capabilities of security systems, thereby contributing to the broader effort to safeguard web
applications from DDoS threats.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>The analysis of the IDS 2018 intrusion dataset, particularly concerning the detection of HOIC attacks,
was conducted using the Weka workbench, a powerful suite of machine learning software for data
analysis and predictive modeling. The research methodology, as illustrated in Figure 1, followed a
structured approach to ensure that the dataset was unbiased and that the most significant attributes for
detecting HOIC attacks were accurately identified.</p>
      <sec id="sec-4-1">
        <title>4.1. Data preprocessing</title>
        <p>Initially, the IDS 2018 dataset was filtered to isolate the network traffic instances relevant to HOIC
attacks. To prevent any skewed results or biases in the analysis, the dataset was balanced by
incorporating an equal number of benign instances. This balancing process was crucial to ensure that the
learning algorithms could distinguish between normal and malicious traffic without being influenced by
an overwhelming majority of one class. The final dataset consisted of 523,422 instances, evenly split
between malicious and benign traffic, with each instance characterized by 80 attributes. Figure 2 depicts
the correlation among all the 80 attributes of the dataset. As illustrated, several attributes can be
eliminated due to their zero correlation with other attributes.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Attribute selection</title>
        <p>To identify the most significant network traffic parameters for detecting HOIC attacks, the Information
Gain (IG) was employed. Information Gain is an attribute selection technique used to reduce the number
of attributes by measuring the contribution of each attribute in predicting the target variable. It is
calculated by first determining the entropy, which measures the uncertainty in the dataset, and then
calculating how much this uncertainty is reduced when the data is split based on a specific feature. The
reduction in entropy is the Information Gain. Features are ranked by their IG scores, and those with
higher scores are deemed more relevant. By setting a threshold or selecting the top-ranked features,
irrelevant or redundant features are discarded, resulting in a smaller feature set that retains predictive
power.</p>
        <p>Information Gain measures the relevance of each attribute by evaluating its contribution to the
prediction of the class label—in this case, whether the traffic is associated with a HOIC attack or not. This
method was chosen for its effectiveness in reducing the dimensionality of the dataset while retaining the
most informative features. Using Weka’s built-in Information Gain attribute evaluator, each of the 80
attributes was ranked according to its importance. This ranking enabled the selection of a subset of
attributes that are most indicative of HOIC attack patterns. These selected attributes form the basis for
further analysis and the development of detection models. Figure 3 illustrates the score of each attribute
based on the calculated Information Gain, highlighting the importance of each feature in distinguishing
between HOIC attacks and benign traffic.</p>
        <sec id="sec-4-2-1">
          <title>Total bytes used for headers in the forward direction</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>Total packets in the forward direction</title>
        </sec>
        <sec id="sec-4-2-3">
          <title>The average number of packets in a sub flow in the forward direction</title>
        </sec>
        <sec id="sec-4-2-4">
          <title>Maximum size of packet in forward direction</title>
        </sec>
        <sec id="sec-4-2-5">
          <title>Mean size of packet in forward direction</title>
        </sec>
        <sec id="sec-4-2-6">
          <title>Average size observed in the forward direction</title>
        </sec>
        <sec id="sec-4-2-7">
          <title>The average number of bytes in a sub flow in the forward direction</title>
        </sec>
        <sec id="sec-4-2-8">
          <title>Total size of packet in forward direction</title>
          <p>By applying Information Gain attribute evaluator, we obtained a ranked list of attributes, with the
highest-ranked features being those most strongly correlated with the occurrence of HOIC attacks. The
top-ranked features based on Information Gain were selected for further analysis, as listed in Table 1.
These features demonstrated a 99% correlation score with the class label, indicating their strong
predictive power in identifying HOIC attack patterns. Among the 80 attributes, only 9 were chosen based
on their correlation score, as these attributes provided the most significant insights while minimizing
redundancy in the data. By focusing on this reduced set of highly relevant attributes, we enhance the
efficiency and accuracy of the machine learning models, ensuring that they are trained on the most
informative aspects of the network traffic data. This approach not only streamlines the computational
process but also improves the overall performance of the detection mechanisms, making them more
effective in real-world applications.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Skewed dataset</title>
        <p>The selected dataset exhibits a significant class imbalance, with 360,833 instances labeled as Benign
and only 686,012 instances labeled as Malicious. This imbalanced class distribution can lead to poor
performance in machine learning algorithms, as they may struggle to accurately identify and
differentiate between the minority and majority classes. When one class has a higher representation than
another, the algorithm becomes biased in favor of the majority. This bias develops when the model is
exposed to more instances from the majority class during training, causing it to prioritize accuracy on the
dominant class over the minority class. As a result, the model may fail to accurately recognize and
classify instances of the minority class, resulting in high rates of false negatives or false positives,
depending on the context.</p>
        <p>To address the challenge of class imbalance in machine learning, several techniques can be employed
to improve model performance. Resampling methods, such as oversampling the minority class and
undersampling the majority class, are commonly used to balance the dataset. Oversampling techniques,
like SMOTE, generate synthetic instances of the minority class, while undersampling reduces the
number of instances in the majority class to achieve balance. In our study, we have employed
undersampling of the majority class to balance our dataset, enabling the machine learning algorithm to
learn equally from both classes and improving its overall effectiveness. Figure 4 illustrates the number of
instances in the original and balanced datasets.</p>
        <p>To achieve the study’s objective of reducing the number of features and instances required to train
machine learning models for detecting DDoS attacks while maintaining detection accuracy, the overall
number of instances in the balanced dataset was further resampled as illustrated in Figure 4. This process
involved generating three subsets of the balanced dataset to evaluate the effectiveness of the selected
features under different sample sizes. The first subset (Dataset A) consisted of 60% of the balanced
dataset, providing a substantial amount of data to train the ML models while still reducing the overall
computational load. The second subset (Dataset B) included 40% of the balanced dataset, allowing for a
comparison of model performance as the dataset size decreased. The third subset (Dataset C), comprising
20% of the dataset, was created to test the models' ability to maintain detection accuracy with a
significantly reduced amount of data.</p>
        <p>800000
700000
600000
500000
400000
300000
200000
100000
0
3
3
8
0
6
3
2
1
0
6
8
6</p>
        <p>By resampling the dataset in this manner, the study aims to demonstrate that effective DDoS
detection can be achieved with fewer instances and a limited number of key features. This approach not
only optimizes the computational resources required for model training but also highlights the potential
for deploying efficient and scalable detection systems in real-world environments where data processing
capacity may be constrained. The selected features and resampled datasets form the basis for developing
and evaluating machine learning models capable of accurately identifying HOIC attacks across varying
dataset sizes.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Model development and evaluation</title>
        <p>Following the identification of significant attributes, these features will be utilized to train and test
various machine learning models within Weka. The performance of these models will be evaluated based
on their detection accuracy and the time required to build the machine learning model, which directly
impacts the feasibility of deploying such models in real-time or resource-constrained environments. The
models are built based on 66% of the provided data and have been tested based on the remaining portion.
allowing for an unbiased evaluation of their performance. This approach provides a clear measure of how
well the models generalize to new, unseen data, ensuring that the results accurately reflect the models'
real-world applicability in detecting HOIC attacks.</p>
        <p>J48 - The Weka workbench implementation of the C4.5 decision tree method that often used in
categorization due to their simplicity and interpretability. The J48 algorithm iteratively divides the
dataset by the attribute with the greatest Information Gain at each node to create a decision tree. The
method continues until it reaches a stopping condition, such as a minimum leaf occurrence or tree depth.</p>
        <p>JRip - It is Weka's variant of the RIPPER algorithm, which is a rule-based learner that use repeated
incremental pruning to produce error reduction. JRip derives a collection of if-then rules from the
training data, which are employed for the purpose of categorizing fresh occurrences. The algorithm
incrementally incorporates rules to accurately represent positive instances while reducing errors on the
negative instances. After the generation of the initial set of rules, JRip proceeds to selectively eliminate
any superfluous complexity, hence improving the model's capacity to generalize. Rule-based classifiers
such as JRip are highly appreciated for their interpretability and remarkable performance in dealing with
datasets that contain noise.</p>
        <p>Multilayer Perceptron - A type of artificial neural network that consists of an input layer, one or more
hidden layers, and an output layer. Each layer contains neurons (nodes) that are interconnected, with
each connection assigned a weight. MLPs are powerful classifiers capable of capturing complex patterns
in data through the backpropagation learning algorithm, which adjusts the weights based on the error
between the predicted and actual outputs. MLPs are particularly well-suited for problems where the
relationship between the input features and the output is non-linear. Despite their computational
complexity, MLPs are highly effective for a wide range of classification tasks, including those involving
large and intricate datasets.</p>
        <p>Random Forest - It constructs many decision trees during training and calculates the mode of the
classes for classification or the mean prediction for regression. Each tree in the forest uses a randomly
selected portion of the training data and divides nodes using a random selection of characteristics. This
method reduces overfitting and improves generalization. The model is more resilient and precise due to
the large range of tree variances. The Random Forest algorithm is known for its ability to analyze large
datasets with many features and resist overfitting.</p>
        <p>SMO (Sequential Minimal Optimization) – It is Weka's implementation of Support Vector Machine
(SVM), used for classification. SVM maximizes the margin between class nearest points (support vectors)
by selecting the hyperplane that best classifies data. SMO optimizes this process by dividing down SVM's
massive quadratic programming problem into smaller, analytically solvable ones. SMO is efficient and
scalable for huge datasets. SVM and SMO are ideal for difficult classification jobs due to their accuracy
and efficiency in high-dimensional domains.</p>
        <p>The following table presents the performance evaluation of the machine learning models used in this
study. The table includes metrics for detection accuracy, Ture Positive (TP) rate, False Positive (FP) rate,
Precision, Recall and F-Measure. The table provides insights into both the effectiveness and efficiency of
the models in identifying HOIC attacks.</p>
        <p>Figure 5 illustrates the comparative analysis of accuracy performance across the different machine
learning models used in this study. The figure presents a bar chart showing the detection accuracy of
each model, providing a visual representation of how effectively each algorithm identifies HOIC attacks.</p>
        <p>The experimental results, as depicted in Table 2 and illustrated in Figure 5, indicate that the machine
learning models were able to achieve similar accuracy even after reducing the number of instances. This
outcome demonstrates the effectiveness of the selected features and the robustness of the models in
maintaining high detection accuracy, despite the decrease in the amount of training data. The ability to
achieve comparable accuracy with fewer instances not only validates the feature selection process but
also highlights the potential for more efficient model training and deployment in resource-constrained
environments. On the other hand, Table 3 depicts the time required to train the model based on 66% of the
provided dataset.</p>
        <p>Figure 6 visually highlights the varying computational demands of different algorithms by showing
the amount of time measured in seconds, each model takes to complete the training process. This
comparison is essential in understanding the trade-offs between model complexity and the amount of the
training data samples, especially when deploying these models in real-world scenarios where resources
such as processing power and time are limited.</p>
        <p>250
200
150
100
50
0</p>
        <p>Dataset A</p>
        <p>Dataset B</p>
        <p>Dataset C</p>
        <p>As illustrated in Figure 6, reducing the amount of the dataset has significantly decreased the time
required to build the machine learning models, while maintaining high detection performance. This
reduction in model training time is crucial, particularly for future applications where intrusion detection
systems need to be implemented in environments where time and computational power are critical
constraints. The ability to efficiently train models with fewer instances without compromising accuracy
opens up opportunities for deploying these systems in real-time scenarios, such as in edge computing or
embedded systems, where resources are limited but rapid detection is essential. This efficiency ensures
that robust security measures can be maintained even in environments with stringent computational
requirements.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This study set out to identify the optimal features and machine learning models for detecting High
Orbit Ion Cannon (HOIC) attacks within the IDS 2018 dataset, with a focus on balancing detection
accuracy and computational efficiency. Through the use of various machine learning algorithms—
including J48, JRip, Multilayer Perceptron, Random Forest, and SMO—models were trained and evaluated
to determine their effectiveness in recognizing HOIC attacks while minimizing the time required for
model building.</p>
      <p>The experimental findings showed that it is feasible to sustain excellent detection accuracy even after
much fewer cases in the dataset are used. This result is important since it implies that less data may be
used to build efficient intrusion detection systems, therefore saving computational resources required for
training without compromising performance. Furthermore, the study underlined how significantly less
time was needed to create machine learning models by shrinking the dataset size. Practical applications
like real-time network monitoring, edge computing, and other resource-limited contexts where
processing power and time are critical restrictions depend notably on this decrease in training time. The
ability to train models quickly while retaining accuracy ensures that intrusion detection systems can be
deployed efficiently in these scenarios.</p>
      <p>In conclusion, the study successfully identified a set of features and models that not only achieve high
accuracy in detecting HOIC attacks but also do so in a computationally efficient manner. These findings
pave the way for the development of robust and scalable intrusion detection systems that can operate
effectively in environments where rapid detection and resource optimization are essential.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GAI tools in order to: Proofread the text. After
using these tools, the authors reviewed and edited the content as needed and takes full responsibility for
the publication’s content.
[7] A. M. Vartouni, S. S. Kashi, and M. Teshnehlab, “An anomaly detection method to detect web
attacks using Stacked Auto-Encoder,” in 2018 6th Iranian Joint Congress on Fuzzy and Intelligent
Systems (CFIS), IEEE, Feb. 2018, pp. 131–134. doi: 10.1109/CFIS.2018.8336654.
[8] A. Hashim, R. Medani, and T. A. Attia, “Defences Against web Application Attacks and Detecting
Phishing Links Using Machine Learning,” in 2020 International Conference on Computer, Control,
Electrical, and Electronics Engineering (ICCCEEE), IEEE, Feb. 2021, pp. 1–6. doi:
10.1109/ICCCEEE49695.2021.9429609.
[9] B. Gogoi, T. Ahmed, and A. Dutta, “Defending against SQL Injection Attacks in Web Applications
using Machine Learning and Natural Language Processing,” in 2021 IEEE 18th India Council
International Conference (INDICON), IEEE, Dec. 2021, pp. 1–6. doi:
10.1109/INDICON52576.2021.9691740.
[10] A. Joshi and V. Geetha, “SQL Injection detection using machine learning,” in 2014 International
Conference on Control, Instrumentation, Communication and Computational Technologies
(ICCICCT), IEEE, Jul. 2014, pp. 1111–1115. doi: 10.1109/ICCICCT.2014.6993127.
[11] L. Zhou, T. Lu, and X. Hu, “Detecting Web Application Injection Attacks Using One-Class SVM,”
in 2022 IEEE 5th International Conference on Computer and Communication Engineering
Technology (CCET), IEEE, Aug. 2022, pp. 275–279. doi: 10.1109/CCET55412.2022.9906382.
[12] R. Banerjee, A. Baksi, N. Singh, and S. K. Bishnu, “Detection of XSS in web applications using
Machine Learning Classifiers,” in 2020 4th International Conference on Electronics, Materials
Engineering &amp; Nano-Technology (IEMENTech), IEEE, Oct. 2020, pp. 1–5. doi:
10.1109/IEMENTech51367.2020.9270052.
[13] G. Kaur, Y. Malik, H. Samuel, and F. Jaafar, “Detecting Blind Cross-Site Scripting Attacks Using
Machine Learning,” in Proceedings of the 2018 International Conference on Signal Processing and
Machine Learning - SPML ’18, New York, New York, USA: ACM Press, 2018, pp. 22–25. doi:
10.1145/3297067.3297096.
[14] S. Sharma, P. Zavarsky, and S. Butakov, “Machine Learning based Intrusion Detection System for
Web-Based Attacks,” in 2020 IEEE 6th Intl Conference on Big Data Security on Cloud
(BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and
IEEE Intl Conference on Intelligent Data and Security (IDS), IEEE, May 2020, pp. 227–230. doi:
10.1109/BigDataSecurity-HPSC-IDS49724.2020.00048.
[15] R. Tommy, G. Sundeep, and H. Jose, “Automatic Detection and Correction of Vulnerabilities using
Machine Learning,” in 2017 International Conference on Current Trends in Computer, Electrical,
Electronics and Communication (CTCEEC), IEEE, Sep. 2017, pp. 1062–1065. doi:
10.1109/CTCEEC.2017.8454995.
[16] I. Sharafaldin, A. Habibi Lashkari, and A. A. Ghorbani, “Toward Generating a New Intrusion
Detection Dataset and Intrusion Traffic Characterization,” in Proceedings of the 4th International
Conference on Information Systems Security and Privacy, SCITEPRESS - Science and Technology
Publications, 2018, pp. 108–116. doi: 10.5220/0006639801080116.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Najafimehr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zarifzadeh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Mostafavi</surname>
          </string-name>
          , “
          <article-title>DDoS attacks and machine-learning-based detection methods: A survey and taxonomy</article-title>
          ,” Engineering Reports.
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1002/eng2.
          <fpage>12697</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Songma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sathuphan</surname>
          </string-name>
          , and T. Pamutha, “
          <article-title>Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-</article-title>
          <string-name>
            <surname>IDS-2018 Dataset</surname>
          </string-name>
          ,” Computers, vol.
          <volume>12</volume>
          , no.
          <issue>12</issue>
          , p.
          <fpage>245</fpage>
          ,
          <string-name>
            <surname>Nov</surname>
          </string-name>
          .
          <year>2023</year>
          , doi: 10.3390/computers12120245.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nallanathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Fu</surname>
          </string-name>
          , “
          <article-title>Towards Interpretable MachineLearning-Based DDoS Detection,” SN Comput</article-title>
          . Sci.,
          <year>2024</year>
          , doi: 10.1007/s42979-023-02383-y.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Black</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          , “
          <article-title>An Overview on Detection and Prevention of Application Layer DDoS Attacks,” in 2022 IEEE 12th Annual Computing</article-title>
          and Communication Workshop and Conference,
          <string-name>
            <surname>CCWC</surname>
          </string-name>
          <year>2022</year>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1109/CCWC54503.
          <year>2022</year>
          .
          <volume>9720741</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          , “
          <string-name>
            <surname>Identifying</surname>
          </string-name>
          Application-Layer
          <source>DDoS Attacks Based on Request Rhythm Matrices,” IEEE Access</source>
          ,
          <year>2019</year>
          , doi: 10.1109/ACCESS.
          <year>2019</year>
          .
          <volume>2950820</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ramezany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Setthawong</surname>
          </string-name>
          , and T. Tanprasert,
          <string-name>
            <surname>“</surname>
          </string-name>
          <article-title>A Machine Learning-based Malicious Payload Detection and Classification Framework for New Web Attacks</article-title>
          ,” in 2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information
          <string-name>
            <surname>Technology (ECTI-CON</surname>
            <given-names>)</given-names>
          </string-name>
          , IEEE, May
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1109/ECTI-CON54298.
          <year>2022</year>
          .
          <volume>9795455</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>