<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Machine Learning-Driven Anomaly Detection for Enhanced IoT Networks Security</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vijayakumar Ponnusamy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Siddharth Tiwari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abhinav Sinha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emilija Kisić</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Electronics and Communication Engineering, SRM Institute of Science and Technology</institution>
          ,
          <addr-line>Kattankulathur, Chennai</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Information Technology, Belgrade Metropolitan University</institution>
          ,
          <addr-line>Tadeuša Košćuška 63, 11000 Belgrade</addr-line>
          ,
          <country country="RS">Serbia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Although the quick spread of IoT devices has greatly improved operational eficiency and connection across businesses, it has also brought forth serious security flaws. Prior studies on IoT network anomaly detection in smart homes have mostly ignored the possibilities of the Isolation Forest model in favour of methods like Artificial Neural Networks (ANN), Random Forests, and Decision Trees. By developing and assessing the Isolation Forest model to identify anomalies in IoT networks, our work fills this vacuum. The objective of this project is to improve the eficiency and dependability of IoT networks by fortifying them against dynamic cyber threats.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Anomaly Detection</kwd>
        <kwd>IoT network</kwd>
        <kwd>Cybersecurity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The Internet of Things (IoT) has rapidly become a cornerstone of modern technology, connecting a
vast array of devices across various industries. These devices, ranging from household appliances
to industrial sensors, create a network where data flows seamlessly, enabling real-time monitoring,
automation, and decision-making. While this interconnection brings great benefits in terms of eficiency,
productivity, and user ease, it also exposes IoT devices to major security vulnerabilities. As the number
of IoT devices grows, so does the possibility for cyberattacks, data breaches, and operational problems.
IoT networks are especially vulnerable to cyberthreats like malware infiltration, unauthorised access,
Distributed Denial of Service (DDoS) attacks, and data manipulation. These threats can compromise
sensitive data, interrupt services, and even cause physical harm in industrial settings.</p>
      <p>
        A vital security feature, anomaly detection has come to be in order to protect IoT networks from
these always changing threats [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Anomaly detection techniques, in contrast to conventional methods,
which depend on predetermined signatures of known attacks, concentrate on spotting variations from
typical network behaviour that may point to unidentified or developing threats [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This strategy is
especially crucial in dynamic environments such as the Internet of Things (IoT), where communication
patterns and devices are ever-changing and static security protocols become less efective. Models for
anomaly detection are created to keep an eye on network activity and identify anomalous activities
that diverge from standard operating procedures. Examples of these anomalies include sudden requests,
anomalous device interactions, or bursts in data transfer.
      </p>
      <p>
        Artificial Neural Networks (ANN), Decision Trees, Random Forests, and other machine learning
methods have been used for anomaly detection in Internet of Things networks [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. These models have
proven to be rather efective at identifying abnormalities, but they frequently demand high processing
power and may have trouble processing the huge, high-dimensional datasets that are typical of Internet
of Things contexts. Furthermore, a lot of these models are prone to false positives, which are instances
in which typical behaviours are mistakenly categorised as abnormalities, ineficiently afecting security
responses and network monitoring.
      </p>
      <p>
        The Isolation Tree algorithm, an application of the Isolation Forest model, is a viable substitute for
anomaly detection [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. Recursively partitioning the dataset, this technique isolates data points that
seem anomalous by splitting them into smaller segments than normal points. Isolation trees, in contrast
to density- or distance-based techniques, concentrate on the ease of isolating a point, which makes
them especially useful for identifying outliers in intricate, high-dimensional datasets such as those
produced by IoT networks [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. Because of its scalability and lightweight nature, the Isolation Tree
technique is a good fit for real-time anomaly detection in Internet of Things systems. It can handle
massive amounts of data with little computing overhead [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>There are various benefits to utilising the Isolation Tree technique for detecting anomalies in IoT
networks. Reducing false positives not only increases detection accuracy but also strengthens the
overall security and dependability of IoT networks. This approach aids in proactive threat mitigation by
eficiently finding anomalies that might be indicative of cyberthreats or operational issues. The capacity
to identify abnormalities in real time is essential for preserving network integrity and guaranteeing the
smooth operation of connected devices, given the dynamic nature of cyber threats in IoT environments,
where attackers are always coming up with new ways to exploit vulnerabilities. All things considered,
the use of the Isolation Tree technique is a positive development for IoT network security. By addressing
the shortcomings of conventional anomaly detection techniques, it ofers a more efective and eficient
way to spot anomalous patterns, strengthening the ability of Internet of Things systems to fend of both
known and unknown dangers. As IoT technology develops further, it will be crucial to continuously
develop and improve these anomaly detection methods to make sure that the advantages it ofers are
not outweighed by the rising dangers of cyberattacks and system breakdowns.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Ease of Use</title>
      <p>The Isolation Tree algorithm ofers a high degree of ease of use for anomaly detection in IoT networks,
making it an attractive choice for both researchers and industry practitioners. One of its primary
advantages is its simplicity in both design and implementation. Unlike more computationally intensive
algorithms like Artificial Neural Networks (ANN) or Support Vector Machines (SVM), the Isolation
Tree algorithm operates through recursive partitioning, a process that is easy to understand and apply
without the need for advanced machine learning expertise. This intuitive process involves isolating
anomalous data points based on how easily they can be separated from the rest of the dataset, which
aligns well with the nature of IoT data that often includes rare, irregular patterns.</p>
      <p>In terms of implementation, the Isolation Tree algorithm does not require extensive tuning of
hyperparameters, which can be a significant challenge with more complex models. This reduces the
time and efort needed for model configuration, allowing users to quickly deploy the algorithm in
real-world IoT environments. Furthermore, the algorithm is compatible with most standard machine
learning libraries, making it accessible to those with basic programming knowledge.</p>
      <p>Another key aspect of its ease of use is its computational eficiency. Given that IoT networks
typically generate massive streams of data, scalability is essential for real-time anomaly detection. The
Isolation Tree algorithm is designed to handle large, high-dimensional datasets eficiently, with a low
computational overhead. This ensures that the algorithm can process large amounts of IoT data without
placing significant strain on system resources, which is especially important in environments with
constrained computational power, such as embedded systems or low-power IoT devices.</p>
      <p>The interpretability of the Isolation Tree algorithm further enhances its ease of use. The algorithm
generates clear, understandable results by isolating anomalous points, allowing users to readily identify
and investigate potential threats or system anomalies. Unlike black-box models like deep neural
networks, which often provide little insight into how decisions are made, the Isolation Tree algorithm’s
decision-making process is transparent. This makes it easier for users to validate and trust the results,
which is crucial in critical IoT applications such as healthcare, industrial automation, and smart cities,
where timely and accurate detection of anomalies can prevent serious disruptions.</p>
      <p>Additionally, the algorithm’s minimal need for specialized hardware contributes to its broad
applicability across various IoT platforms. Many advanced machine learning models require high-performance
GPUs or specialized processing units to function efectively, especially when dealing with large datasets.
However, the Isolation Tree algorithm can operate on standard computing infrastructure, reducing the
cost and complexity of deployment.</p>
      <p>Finally, the flexibility of the algorithm allows it to be integrated into various IoT security frameworks
with ease. It can be combined with other machine learning techniques or traditional security protocols to
provide a layered defense against cyber threats, enhancing its practical value in diverse IoT applications.
Its adaptability ensures that it can be applied across diferent industries, from smart homes and healthcare
systems to industrial IoT and connected vehicles, without the need for extensive modification or
customization. In summary, the Isolation Tree algorithm’s ease of use stems from its simplicity,
computational eficiency, scalability, interpretability, and flexibility. These features make it a powerful
yet user-friendly tool for improving anomaly detection in IoT networks, enabling enhanced security
with minimal technical barriers for adoption.</p>
      <sec id="sec-2-1">
        <title>2.1. Equations</title>
        <p>In the context of this study, the Isolation Forest algorithm is applied to detect anomalies within IoT
network data, leveraging the fundamental concept that anomalies are more easily isolated than normal
points. The following mathematical formulations and principles are used to understand and quantify
the behavior of the Isolation Forest model.</p>
        <sec id="sec-2-1-1">
          <title>1. Path Length Calculation</title>
          <p>For a data point , the path length ℎ() is defined as the number of edges traversed from the
root of an isolation tree to the leaf node where  is located. Since anomalies are isolated more
quickly than normal points, the path length provides an indication of how anomalous a point is.
The average path length for a point  in an isolation tree built from a sample of n points can be
approximated by:
︂( 2 ( − 1) )︂</p>
          <p>ℎ () ≈ 2 ( − 1) −
where () is the harmonic number, which approximates the average path length in a completely
random tree. It can be computed as () = () + , where  ≈ 0.577 is Euler’s constant, and
 is the number of data points in the tree.
2. Anomaly Score Calculation</p>
          <p>The anomaly score (, ) for a point  is computed based on the path length. The shorter the
path length, the higher the likelihood that the point is an anomaly. The anomaly score is given
by:</p>
          <p>(, ) = 2− ()(ℎ()),
where (ℎ()) is the average path length of  across all isolation trees in the forest, and () is
the average path length of a point in a binary tree built from  samples, and is given by
 () = 2 ( − 1) −
︂( 2 ( − 1) )︂

.</p>
          <p>The anomaly score (, ) ranges between 0 and 1, with higher values indicating that the point
is more likely to be an anomaly:
(, ) → 1 implies a high likelihood of the point being an anomaly;
(, ) ≈ 0.5 indicates the point is typical;
(, ) → 0 implies the point is likely normal.
(2)
(3)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Related Works</title>
      <p>
        Recent advancements in anomaly detection for IoT networks have produced a variety of methodologies
aimed at enhancing security, operational eficiency, and system reliability. One significant approach
involves the use of Sparse Autoencoders (SAEs) for dimensionality reduction, followed by the application
of Convolutional Neural Networks (CNNs) for efective anomaly detection. This SAE-CNN framework
has shown promising capabilities in identifying unusual patterns within network trafic. However, the
inherent complexity of the model raises challenges for real-time detection, particularly in
resourceconstrained IoT environments where computational power and memory are limited. Additionally, the
validation of this model solely on a single dataset, such as the Bot-IoT dataset, raises concerns regarding
its generalizability to the diverse and dynamic nature of real-world IoT network trafic. This limitation
highlights the need for further validation across multiple datasets to ensure robustness [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Another noteworthy study conducted a thorough analysis of a diferent IoT dataset using several
machine learning classifiers, including Random Forest, Decision Tree, AdaBoost, and Artificial Neural
Networks (ANN) [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. The evaluation of these models was grounded in metrics such as weighted
precision, recall, and F1 scores, providing a nuanced understanding of their performance. Despite
this comprehensive evaluation, the models displayed limited interpretability when deployed in varied
environments. This lack of transparency could hinder practical implementations, especially in critical
applications where understanding model decisions is vital. Moreover, the study identified scalability
issues when incorporating additional datasets, which can complicate the integration of new data sources.
There is also a pressing need for unsupervised techniques that could bolster model generalization for
new and unlabeled data, thereby improving the adaptability of these models in real-world settings [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        In addition to these methodologies, a systematic review of studies focusing on anomaly detection
in industrial machinery utilizing IoT devices ofered valuable insights into various aspects, including
the types of machinery employed, the sensors used for data collection, and the preprocessing methods
applied [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. This review synthesized findings from numerous studies, highlighting the importance
of understanding the interplay between diferent factors afecting anomaly detection performance.
However, the narrow focus on recent literature limited the exploration of foundational research,
potentially overlooking valuable insights that could inform current practices. The variability in sensors and
machinery across studies resulted in inconsistent preprocessing techniques, complicating comparative
analyses. Furthermore, the diversity of machine learning algorithms applied in diferent studies presents
challenges for drawing clear conclusions and establishing best practices in anomaly detection.
      </p>
      <p>
        Moreover, several studies have underscored the necessity for interdisciplinary approaches, integrating
domain knowledge from fields such as cybersecurity, data science, and engineering to improve anomaly
detection systems. This integration could facilitate the development of more sophisticated models
capable of adapting to evolving threats in IoT environments. The overall quality of the studies reviewed
varies significantly, which raises questions about the reliability and applicability of the findings in
practical scenarios. The need for standardized evaluation frameworks and benchmarks in the field of
IoT anomaly detection is evident, as it would enhance the comparability of research outcomes and
facilitate the development of more efective detection systems [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>In conclusion, while the body of research surrounding anomaly detection in IoT networks is expanding
rapidly, challenges remain in terms of model complexity, interpretability, scalability, and the need for a
deeper understanding of the diverse operational environments in which these models will be deployed.
Addressing these challenges will be crucial for advancing the efectiveness of anomaly detection systems
and ensuring their reliability in real-world applications.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>This study employs the Isolation Forest model to detect anomalies in IoT networks, leveraging the
BOT-IoT dataset obtained from Kaggle. The methodology comprises several crucial steps: dataset
acquisition, data preprocessing, feature engineering, model training, and evaluation.</p>
      <sec id="sec-4-1">
        <title>4.1. Dataset Acquisition</title>
        <p>The BOT-IoT dataset was sourced from Kaggle, comprising a comprehensive collection of IoT network
trafic data. This dataset contains various records of network interactions, including normal behavior
and several types of attacks. It serves as an ideal benchmark for evaluating anomaly detection algorithms,
specifically in IoT environments</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Data Inspection</title>
        <p>Upon loading the dataset, we conducted an initial exploratory data analysis (EDA) to understand its
structure, dimensions, and characteristics. Key attributes such as destination port (dport) and bytes
transferred were examined to identify unique values and potential data anomalies. This step also
involved assessing the distribution of various features to recognize patterns that might influence the
model’s performance.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Data Cleaning</title>
        <p>Data cleaning was an essential phase to ensure the quality and reliability of the dataset. We undertook
the following actions:
• Data Type Conversion: Relevant columns, particularly ’dport’ and ’bytes,’ were converted to
numeric formats. This conversion is crucial since machine learning algorithms require numerical
inputs for processing.
• Handling Missing Values: After conversion, we checked for NaN (Not a Number) entries within
critical columns. Rows containing NaN values were dropped to avoid complications during model
training. Alternatively, other strategies, such as filling NaN values with a default constant, were
considered but ultimately not implemented in this instance.
• Removing Unnecessary Columns: Columns that provided little value to the model, such as MAC
addresses and unnecessary identifiers, were removed to streamline the dataset and focus on the
most relevant features for anomaly detection.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Categorical Encoding</title>
        <p>As the dataset included categorical features, it was necessary to convert these variables into a numerical
format to facilitate the learning process. Label encoding was applied to categorical attributes such as
’flgs’ (flags), ’proto’ (protocol), ’saddr’ (source address), ’daddr’ (destination address), ’state’, ’category’,
and ’subcategory’. Each category was transformed into a unique integer, allowing the Isolation Forest
algorithm to process these features efectively.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Feature Definition</title>
        <p>The model’s performance heavily depends on the features selected for training. We defined our feature
set by excluding the target columns, particularly ’attack’ and ’category’, which are used as labels for
classification. The final feature set comprised several numerical and categorical attributes that provide
insights into network behavior.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.6. Feature Engineering</title>
        <p>To enhance the model’s predictive power, we created a new feature termed "bytes per packet." This
feature was calculated by dividing the total bytes transferred by the number of packets for each record.
The introduction of this feature is vital, as it ofers a more nuanced perspective on network activity,
aiding the model in distinguishing between normal and anomalous trafic patterns.</p>
      </sec>
      <sec id="sec-4-7">
        <title>4.7. Data Normalization</title>
        <p>Given the diverse ranges of the features, normalization was performed to ensure that each feature
contributed equally to the model’s learning process. We applied standardization, which rescales the
features to have a mean of zero and a standard deviation of one. This step is particularly important for
algorithms like Isolation Forest, where the distance metric plays a crucial role in anomaly detection.</p>
      </sec>
      <sec id="sec-4-8">
        <title>4.8. Train-Test Split</title>
        <p>To evaluate the model’s performance accurately, the dataset was split into training and testing sets
using an 80-20 ratio. The training set comprised 80% of the data, which the model would learn from,
while the remaining 20% served as the test set for validation. This separation is critical to prevent
overfitting and ensure that the model generalizes well to unseen data.</p>
      </sec>
      <sec id="sec-4-9">
        <title>4.9. Model Initialization and Training</title>
        <p>The Isolation Forest model was initialized with specific hyperparameters, including the number of
estimators (n_estimators), the maximum samples to be drawn (max_samples), and the contamination
rate (contamination) to define the expected proportion of anomalies in the data. After
initialization, the model was trained using the training dataset. The training process involves constructing an
ensemble of isolation trees, where each tree partitions the feature space recursively until it isolates
observations, thereby identifying anomalies based on their average path lengths in the trees.</p>
      </sec>
      <sec id="sec-4-10">
        <title>4.10. Prediction and Evaluation</title>
        <p>Once trained, the model was used to make predictions on the test dataset. The output of the model
includes a binary classification: -1 for anomalies and 1 for normal observations. To facilitate
interpretation, the predictions were converted to a binary format, where 1 indicates an anomalous instance and 0
denotes a normal instance.</p>
        <p>The model’s performance was evaluated using various metrics, including precision, recall, F1-score,
and confusion matrix analysis. These metrics provide insights into the model’s accuracy in identifying
anomalies, balancing false positives and false negatives to assess overall efectiveness.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>The performance of the Isolation Forest model in detecting anomalies in IoT networks was systematically
evaluated using several metrics, including accuracy, precision, recall, F1 score, and a detailed analysis
of the confusion matrix. Each of these metrics provides insights into the model’s ability to accurately
classify network trafic and its efectiveness in identifying potential threats.</p>
      <sec id="sec-5-1">
        <title>1. Accuracy</title>
        <p>The Isolation Forest model achieved an overall accuracy of 82.5%. This indicates that the model
correctly identified 82.5% of instances across the dataset, signifying a strong capacity to
diferentiate between normal and anomalous trafic. High accuracy is particularly important in IoT
environments where network reliability and security are paramount. It reflects the model’s ability
to learn from the underlying patterns in the data, suggesting that it can efectively generalize its
understanding to unseen instances.
2. Precision</p>
        <p>Precision, calculated at 97.27%, illustrates the model’s proficiency in identifying true anomalies
among the instances it predicted as anomalous. A high precision value indicates that when
the model flags an instance as anomalous, it is very likely to be correct, thus minimizing the
impact of false positives. In practical applications, this is crucial, as false alarms can lead to
unnecessary alerts, wasted resources, and potential disruption of normal operations. This high
precision suggests that the model is well-suited for environments where the cost of false alarms
is significant, such as critical infrastructure and industrial settings.
3. Recall</p>
        <p>The recall metric, standing at 87.4%, indicates the proportion of actual anomalies correctly
identified by the model out of all true anomalies present in the dataset. While an 87.4% recall is
commendable, it also highlights the model’s inability to detect some anomalies (false negatives),
which could represent missed security threats or system failures. This aspect emphasizes the
necessity for continuous improvement in recall, as high recall is vital for a robust security
framework, ensuring that most potential threats are identified and addressed promptly.
4. F1 Score</p>
        <p>The F1 score of 94.4% serves as a balanced measure that combines both precision and recall. This
score is particularly relevant in the context of imbalanced datasets, common in anomaly detection
tasks where normal instances typically outnumber anomalous ones. A high F1 score signifies
that the model not only has strong precision but also demonstrates considerable recall, indicating
efective identification of both normal and anomalous trafic. This balance is essential for the
model’s reliability in operational environments, where both types of misclassifications (false
positives and false negatives) can lead to significant consequences.
5. Confusion Matrix The confusion matrix provides a granular view of the model’s classification
performance. Analyzing the confusion matrix reveals important insights:
• True Negatives (TN): The model accurately identified 855 instances as normal, reflecting its
ability to recognize benign trafic patterns.
• False Positives (FP): There were 225 instances incorrectly classified as anomalies, indicating
the potential for unnecessary alerts that could lead to operational ineficiencies. This
suggests that further tuning of the model is needed to refine its sensitivity to normal trafic.
• False Negatives (FN): A substantial 153,397 normal instances were misclassified as anomalies,
highlighting a significant area for improvement. This emphasizes the need for further
investigation into the features contributing to this misclassification and potential enhancements
in feature selection or model architecture.
• True Positives (TP): The model successfully detected 8,005 actual anomalies, confirming its
utility in identifying genuine threats. This high number of detected anomalies is promising,
as it indicates the model’s capacity to safeguard against various attack vectors.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussions</title>
      <p>The results obtained from the evaluation of the Isolation Forest model reveal significant insights into
its efectiveness and limitations for anomaly detection in IoT networks. The high accuracy, precision,
recall, and F1 score illustrate the model’s capability to discern between normal and anomalous trafic.
However, a closer examination of the confusion matrix and the underlying factors contributing to these
results provides a nuanced understanding of the model’s performance.</p>
      <sec id="sec-6-1">
        <title>1. Implications of High Precision and F1 Score</title>
        <p>The achieved precision of 97.27% indicates that the model has a strong ability to identify true
anomalies without overwhelming the system with false alarms. In practical terms, this is
particularly valuable in operational settings where the cost of false positives can lead to unnecessary
downtime, resource allocation, and potential disruptions. The high F1 score of 94.4% further
emphasizes that the model maintains a balanced performance across both precision and recall.
This balance is critical in real-time applications, as it demonstrates that the model can efectively
handle the imbalanced nature of the dataset, where anomalous instances are often significantly
fewer than normal ones.
2. Addressing the False Negatives</p>
        <p>Despite the strengths of the model, the presence of 153,397 false negatives is a concerning aspect
of the results. Each false negative represents a missed opportunity to detect an actual anomaly,
which could result in undetected attacks or system failures. In IoT environments, where devices
may be interconnected and autonomous, the implications of such oversights can be dire. This
highlights the need for strategies aimed at improving recall. Potential approaches include:
• Feature Selection and Engineering: Investigating additional features that capture more
nuanced aspects of network behavior can enhance the model’s sensitivity. For instance,
analyzing temporal patterns or incorporating environmental context could lead to better
detection of anomalies.
• Model Ensemble Techniques: Exploring ensemble methods that combine multiple algorithms
might improve overall detection capabilities. For instance, integrating the Isolation Forest
with supervised learning methods could help leverage labeled data for more accurate
predictions.
• Threshold Adjustment: Modifying the threshold for classifying an instance as an anomaly
can also be a straightforward yet efective approach. By adjusting the decision boundary,
the model may be able to capture more anomalies while maintaining an acceptable level of
false positives.
3. Generalizability and Robustness</p>
        <p>The evaluation metrics reflect the model’s ability to generalize its learning to the unseen test
dataset. However, the training was conducted solely on the BOT-IoT dataset, which raises
questions about the model’s robustness in diverse operational environments. Real-world IoT
networks often exhibit a variety of trafic patterns influenced by diferent devices, applications,
and network configurations. Therefore, training the model on a more diverse set of datasets
that incorporate varying types of attacks, protocols, and normal behaviors is essential. Such
an approach could enhance the model’s adaptability to various IoT scenarios, increasing its
efectiveness in live environments.
4. Real-World Applications and Considerations</p>
        <p>The promising results of the Isolation Forest model suggest its practical applicability in securing
IoT networks. Organizations aiming to implement anomaly detection systems can consider
this model as a foundational component of their cybersecurity strategy. However, deploying
such models in real-world settings requires a comprehensive understanding of the specific IoT
environment. Key considerations include:
• Resource Constraints: IoT devices often operate with limited computational power and
energy resources. The model’s complexity must be balanced against the available resources
to ensure real-time detection without overwhelming the devices.
• Adaptability to Evolving Threats: As cyber threats continue to evolve, the model should be
periodically retrained and validated with new data to maintain its efectiveness. Continuous
monitoring and updating the detection model will be essential to address new attack vectors
and tactics.
• Integration with Existing Security Frameworks: The anomaly detection model should be
integrated into a broader security framework that includes other defensive measures, such
as firewalls and intrusion prevention systems. This layered security approach can enhance
overall network resilience.
5. Future Directions for Research The results of this study provide a solid foundation for future
research in the domain of anomaly detection for IoT networks. Areas for further investigation
include:
• Investigating Hybrid Models: Exploring the combination of supervised and unsupervised
learning techniques could lead to improved anomaly detection capabilities, allowing for a
more comprehensive understanding of normal and anomalous behavior.
• Deep Learning Approaches: While the Isolation Forest model demonstrates significant
promise, investigating deep learning techniques such as recurrent neural networks (RNNs)
or convolutional neural networks (CNNs) for anomaly detection could provide additional
insights and improvements.
• Real-Time Data Processing: Developing frameworks for real-time data processing and
anomaly detection will be crucial for the deployment of such models in dynamic IoT
environments. This will involve optimizing the model’s eficiency to ensure timely responses to
detected anomalies.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>In summary, while the Isolation Forest model shows considerable promise in detecting anomalies within
IoT network trafic, the challenges presented by false negatives necessitate ongoing refinement and
enhancement. The insights gained from this study not only highlight the model’s efectiveness but also
underscore the need for continued research and development to bolster its performance and adaptability
in real-world scenarios. The commitment to refining these models is essential to ensure the security
and reliability of IoT networks, particularly as they continue to expand and evolve in complexity.</p>
      <sec id="sec-7-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sarwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. S.</given-names>
            <surname>Bajwa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Z.</given-names>
            <surname>Hussain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ibrahim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Saleem</surname>
          </string-name>
          ,
          <article-title>Iot network anomaly detection in smart homes using machine learning</article-title>
          ,
          <source>IEEE Access</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Koetsier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fiosina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. N.</given-names>
            <surname>Gremmel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Woisetschläger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sester</surname>
          </string-name>
          ,
          <article-title>Detection of anomalous vehicle trajectories using federated learning</article-title>
          ,
          <source>ISPRS Open Journal of Photogrammetry and Remote Sensing</source>
          <volume>4</volume>
          (
          <year>2022</year>
          )
          <fpage>100013</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S. F.</given-names>
            <surname>Chevtchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D. S.</given-names>
            <surname>Rocha</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. C. M. Dos Santos</surname>
            ,
            <given-names>R. L.</given-names>
          </string-name>
          <string-name>
            <surname>Mota</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          <string-name>
            <surname>Vieira</surname>
          </string-name>
          , E.
          <string-name>
            <surname>C. De Andrade</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. R. B. De Araújo</surname>
          </string-name>
          ,
          <article-title>Anomaly detection in industrial machinery using iot devices and machine learning: A systematic mapping</article-title>
          ,
          <source>IEEE Access 11</source>
          (
          <year>2023</year>
          )
          <fpage>128288</fpage>
          -
          <lpage>128305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Sahu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Talwalkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Federated learning: Challenges, methods, and future directions</article-title>
          ,
          <source>IEEE signal processing magazine</source>
          <volume>37</volume>
          (
          <year>2020</year>
          )
          <fpage>50</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F. T.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. M.</given-names>
            <surname>Ting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Isolation forest</article-title>
          , in: 2008 eighth ieee international
          <source>conference on data mining, IEEE</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>413</fpage>
          -
          <lpage>422</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fang</surname>
          </string-name>
          , Flforest:
          <article-title>Byzantine-robust federated learning through isolated forest</article-title>
          ,
          <source>in: 2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>296</fpage>
          -
          <lpage>303</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Rousseeuw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hubert</surname>
          </string-name>
          ,
          <article-title>Robust statistics for outlier detection, Wiley interdisciplinary reviews: Data mining and knowledge discovery 1 (</article-title>
          <year>2011</year>
          )
          <fpage>73</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Smiti</surname>
          </string-name>
          ,
          <article-title>A critical overview of outlier detection methods</article-title>
          ,
          <source>Computer Science Review</source>
          <volume>38</volume>
          (
          <year>2020</year>
          )
          <fpage>100306</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Cook</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mısırlı</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <article-title>Anomaly detection for iot time-series data: A survey</article-title>
          ,
          <source>IEEE Internet of Things Journal</source>
          <volume>7</volume>
          (
          <year>2019</year>
          )
          <fpage>6481</fpage>
          -
          <lpage>6494</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Haider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abbas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Zaidi</surname>
          </string-name>
          ,
          <article-title>A multi-technique approach for user identification through keystroke dynamics</article-title>
          ,
          <source>in: Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics</source>
          .'
          <article-title>cybernetics evolving to systems, humans, organizations, and their complex interactions'(cat</article-title>
          . no.
          <issue>0</issue>
          , volume
          <volume>2</volume>
          , IEEE,
          <year>2000</year>
          , pp.
          <fpage>1336</fpage>
          -
          <lpage>1341</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Chalapathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <article-title>Deep learning for anomaly detection: A survey</article-title>
          , arXiv preprint arXiv:
          <year>1901</year>
          .
          <volume>03407</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Rigatti</surname>
          </string-name>
          ,
          <article-title>Random forest</article-title>
          ,
          <source>Journal of Insurance Medicine</source>
          <volume>47</volume>
          (
          <year>2017</year>
          )
          <fpage>31</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zulkernine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Haque</surname>
          </string-name>
          ,
          <article-title>Random-forests-based network intrusion detection systems</article-title>
          ,
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          , Part C (
          <article-title>Applications</article-title>
          and Reviews)
          <volume>38</volume>
          (
          <year>2008</year>
          )
          <fpage>649</fpage>
          -
          <lpage>659</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V. D.</given-names>
            <surname>Hengel</surname>
          </string-name>
          ,
          <article-title>Deep learning for anomaly detection: A review, ACM computing surveys (CSUR) 54 (</article-title>
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>N. K.</given-names>
            <surname>Sahu</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Mukherjee,</surname>
          </string-name>
          <article-title>Machine learning based anomaly detection for iot network:(anomaly detection in iot network)</article-title>
          ,
          <source>in: 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)</source>
          (
          <volume>48184</volume>
          ), IEEE,
          <year>2020</year>
          , pp.
          <fpage>787</fpage>
          -
          <lpage>794</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>