=Paper=
{{Paper
|id=Vol-3736/paper25
|storemode=property
|title=Subsystem of anomaly detection in the Smart House system based on machine learning
|pdfUrl=https://ceur-ws.org/Vol-3736/paper25.pdf
|volume=Vol-3736
|authors=Maxim Prodeus,Andrii Nicheporuk,Andrzej Kwiecien,Dmytro Martiniyuk,Oleksii Lyhun
|dblpUrl=https://dblp.org/rec/conf/icyberphys/ProdeusNKML24
}}
==Subsystem of anomaly detection in the Smart House system based on machine learning==
Subsystem of anomaly detection in the Smart House system based on machine learning Maxim Prodeus1,*, †, Andrii Nicheporuk1, †, Andrzej Kwiecien2, †, Dmytro Martiniyuk1, †, Oleksii Lyhun2, † 1 Khmelnytskyi National University, Institutska str., 11, Khmelnytskyi, 29016, Ukraine 2 Silesian University of Technology, Akademicka str., 2А, Gliwice, Poland Abstract With the deepening implementation of Smart Home systems, the role of anomaly detection subsystems becomes increasingly important for ensuring the security and stability of these complex environments. This paper proposes a new anomaly detection subsystem for Smart Home systems, based on advanced machine learning technologies. The architecture of this subsystem is designed to process various data streams generated by Internet of Things (IoT) devices, utilizing packet preprocessing to optimize data before further analysis. The application of the Random Forest algorithm allows for the construction of a machine learning model for effective anomaly detection in the system. To evaluate the effectiveness of the proposed subsystem, the CICIDS2017 dataset is utilized, which is divided into training and validation sets. Comparative analysis is conducted with the J48 tree algorithm in detecting various types of cyberattacks, such as Denial of Service (DoS), Probe, Remote to Local (R2L), and User to Root (U2R). The proposed subsystem aims to enhance the security and reliability of Smart Home systems by facilitating timely detection and response to potentially dangerous anomalies. This work represents a significant contribution to the field of smart systems as it addresses the security issue within the smart home environment, where a large number of connected devices are typically characterized by limited resources and increased requirements for confidentiality and availability. The application of machine learning methods for anomaly detection enables the identification of unusual and potentially hazardous interactions between devices and the network, indicating attacks or security breaches. Particular attention should be paid to the experiment results, which demonstrated the high effectiveness of the proposed system compared to traditional methods. Anomaly detection using the Random Forest algorithm proved to be effective in various attack scenarios, providing high accuracy and a low error rate. This suggests the potential use of such approaches for protecting smart systems in the future. Keywords 1 Smart house, network security, anomaly detection, threat detection, cyber defense, random forest 1. Introduction The Smart Home systems have seen rapid adoption in recent years, offering homeowners convenience, energy efficiency, and increased security. However, the proliferation of interconnected Internet of Things devices in these environments also creates vulnerabilities and potential security threats. Anomaly detection subsystems play a crucial role in identifying and ICyberPhyS-2024: 1st International Workshop on Intelligent & CyberPhysical Systems, June 28, 2024, Khmelnytskyi, Ukraine ∗ Corresponding author. † These authors contributed equally. mprodeus99@ukr.net (M. Prodeus); andrey.nicheporuk@gmail.com (A. Nicheporuk); andrzej.kwiecien@polsl.pl (A. Kwiecien); martiniyuk.dim14@gmail.com (D. Martiniyuk); andrzej.kwiecien@polsl.pl (O. Lugyn) 0009-0002-2968-4648 (M. Prodeus); 0000-0002-7230-9475 (A. Nicheporuk); 0000-0003-1447-3303 (A. Kwiecien); 0009- 0002-3524-872X (D. Martiniyuk); 0000-0003-1447-3303 (O. Lugyn) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings mitigating these threats, allowing for proactive responses to malicious activities or system anomalies [1]. This document presents a comprehensive approach to anomaly detection in smart homes using machine learning methods for network traffic analysis and detection of suspicious patterns [2]. The architecture of the proposed subsystem, along with the utilization of the Random Forest algorithm, is detailed. Additionally, experimental methodologies and evaluation metrics are described to assess the system's effectiveness in detecting various types of cyberattacks [3]. With the increasing number of devices connected to the network and the volume of generated data, there is a significant need for reliable and efficient anomaly detection methods [4, 5]. Traditional methods such as signature analyzers and rules often struggle to effectively cope with complex and evolving threats, necessitating enhanced approaches. The developed subsystem serves as a response to this challenge, enabling cybersecurity systems to adapt to changing conditions and detect anomalous behavior patterns that may indicate potential cyber threats. In the context of cybersecurity, anomalies can take various forms: from unjustified user activity and attacks on infrastructure to suspicious transactions and leaks of confidential information. Detecting these anomalies is a challenging task that is difficult to address without the use of automated and intelligent tools. The main advantages of using this method in cybersecurity are its ability to operate in real-time, identify new forms of threats, and adapt to changes in the cyber landscape [6]. The core principle of the subsystem's operation is based on the idea of using machine learning algorithms to recognize and learn the normal behavior pattern of the system or user. Instead of the traditional approach, where it is assumed that the form of a malicious attack is known, the subsystem adopts an intuitive approach: it learns the normal state and then detects anomalies that deviate from it. A subsystem specifically designed for deployment in the Internet of Things (IoT) environment is also introduced. IoT is a concept that involves connecting various physical objects to the network for data exchange and process automation. This technology finds applications in various sectors, from household devices to industrial systems [7]. As the number of connected devices grows, the issue of cybersecurity in IoT becomes increasingly relevant. One of the most common threats is Distributed Denial of Service (DDoS) attacks, aimed at overloading network resources [8]. Mitigating DDoS in IoT is extremely important because a large number of connected devices can be used to create a botnet, which in turn can easily initiate DDoS attacks. This can lead to service disruptions in critical systems such as medical devices, power systems, or automotive transportation networks. Securing IoT involves using measures to detect and suppress DDoS attacks. Utilizing anomaly detection systems, traffic restriction from suspicious sources, and implementing authentication mechanisms are key components of DDoS protection in IoT [9]. Coordinated security measures in IoT not only help maintain the functionality of systems but also prevent potential consequences of attacks on critical infrastructure [10]. Thus, effective DDoS mitigation in IoT is a necessary component to ensure stability and security in the modern connected world. Despite the significant potential of the subsystem, there are certain challenges. One of them is the need for a large amount of data for effective algorithm training. Collecting and processing this data can be resource-intensive [11]. Additionally, algorithms' high sensitivity to changes in input data may lead to false positives. In the future, the development of the subsystem will be associated with improving algorithms for automatic anomaly detection in real-time, developing methods to reduce false positives, and expanding its application in other areas [12]. Enhancing measures to ensure confidentiality and ethics in the use of the subsystem, especially in processing personal data, is also important. In a world of constantly evolving threats and rapidly changing technologies, machine learning for anomaly detection proves to be an indispensable tool for cybersecurity and identifying unknown threats. Its capabilities in detection and adaptation, application in various sectors, and potential to address complex issues make this subsystem a key element in modern security and protection strategies. Engaging intelligent systems based on machine learning is a crucial step in ensuring the resilience and efficiency of information systems in response to constantly growing cybersecurity threats [13]. The objective of this work is to develop and present a comprehensive approach to anomaly detection in smart home environments and the Internet of Things (IoT) using machine learning methods. Specifically, the focus is on designing a subsystem capable of identifying and mitigating potential cybersecurity threats, including Distributed Denial of Service (DDoS) attacks, within these interconnected systems. The aim is to enhance the security and resilience of smart homes and IoT devices by implementing intelligent anomaly detection mechanisms. 2. Related works The study and application of machine learning methods for anomaly detection in modern cybersecurity and other fields of scientific research have garnered significant interest and are aimed at addressing pertinent challenges in security, medicine, finance, and beyond [14]. Below is a review and analysis of some key works investigating the application of machine learning for anomaly detection and their contributions to the development of respective fields [15]. In "Detection of DNS DDoS Attacks with Random Forest Algorithm on Spark" by Liguo Chen, Yuedong Zhang, Qi Zhao, Guanggang Geng, ZhiWei Yan, the authors conducted a review of anomaly detection methods focused on cybersecurity [16]. They explore various approaches and algorithms, such as the Random Forest method, statistical approaches, and deep learning methods. The work represents an important scientific research in the field of detecting DDoS attacks. The aim of the work is not only to indicate whether a DNS server is under attack, but also to distinguish normal requests from abnormal ones. Some additional features are also added to distinguish unusual domain names from regular domain names. The authors selected features based on which anomaly activity recognition in the data stream would be performed. After that, in the experiment process, three main steps were executed. Firstly, data preprocessing was carried out to normalize the data and give the subsystem the ability to perceive values. Secondly, the subsystem was trained to differentiate between normal and anomalous data. And thirdly, they characterized the threat assessment system, which is based on the selection made by Random Forest. This work presents a new method for reducing DDoS traffic on TLD servers, where traffic filtering based on machine learning algorithms is applied to the core recursive DNS servers on the Internet. Their classification model is built on Spark and operates with a 0.0% FPR and 4.36% FNR, meaning that practical requirements for accuracy and performance are met. In future work, the authors plan to apply a streaming approach, which is more suitable for real-time rule creation by the firewall. In "Transport in the IP-based Internet of Things: status report" by J. Antonio Garcia-Macias, the focus is on IoT networks using IP at the network level [17]. A simplified IP-based IoT stack is depicted in comparison to the traditional Internet protocol stack. Being network-agnostic in its scheme, IP does not make assumptions about underlying physical layers and data link layers. The development of IP-based IoT networks has been focused on IEEE 802.15.4 at the physical and data link layers, as well as the use of Bluetooth Low Energy (BLE) system. Other technologies such as ITU-T G.9959, DECT ULE, MS/TP, NFC, IEEE 1901.2, and IEEE 802.11ah have been studied by the author for their utilization in IPv6-based IoT networks. Many application protocols have been developed considering IoT scenarios. For instance, the Constrained Application Protocol (CoAP) has been proposed as an alternative to the Hypertext Transfer Protocol (HTTP); Message Queuing Telemetry Transport (MQTT) and Advanced Message Queuing Protocol (AMQP) are used for message queue-based applications; and Lightweight Machine-to-Machine (LwM2M) has been proposed as a solution for device management and service provisioning. In the work "A Survey of Anomaly Detection in Internet of Things," the utilization of anomaly detection methods in Internet of Things (IoT) networks is investigated [18]. Authors Moustafa, N. and Slay, J. examine the characteristics of IoT that pose challenges for anomaly detection and present various methods, such as statistical methods, machine learning-based methods, and deep learning, that can be applied for effective anomaly detection in the context of IoT. The work represents a review of anomaly detection methods in the Internet of Things (IoT). Nurul Moustafa and Jillian Slay systematically explore a wide range of approaches and methods used for anomaly detection in complex IoT systems. The work begins with defining the concept of IoT and its importance in the modern world. The authors emphasize the diversity and scope of interpretations of IoT, which encompass various types of devices, communication protocols, and applications ranging from smart homes to industrial systems. In their further investigation, the authors delve into the challenges and issues associated with ensuring security in IoT, particularly anomaly detection. They analyze the characteristics of IoT data such as large volume, diversity, and dynamism, which complicate the task of anomaly detection. 3. Architecture of the anomaly detection subsystem in the Smart House system based on machine learning. Preprocessing of IoT traffic packets. In modern Smart Home systems, where a large number of various sensors and devices provide a continuous flow of data, anomaly detection becomes a critically important task to ensure the security and efficiency of system operation [19]. Therefore, an integrated anomaly detection subsystem based on machine learning has been proposed for anomaly detection in the Smart Home system (see Fig. 1). Its main function is to analyze network traffic and data received from sensors in the smart home and apply machine learning algorithms to detect abnormal behavioral patterns. Control influences are implemented through the microcontroller system of the Smart Home. Although threats to IoT may also include metamorphic viruses [20], one subsystem alone cannot fully cope with all aspects of the system. Therefore, interaction with other subsystems working in different protocols and planes [21] is not excluded. Only in such a case can protection against harmful influences be maximized. Along with the anomaly detection subsystem, the proposed architecture includes a notification subsystem and a data recording and logging subsystem. The data recording and logging subsystem provide the ability to recover event histories for further analysis and anomaly detection. These data can be useful for comparison when detecting subsequent anomalies and for preserving formed behavioral patterns. To alert the user about potential cyber attacks, a notification subsystem is included, which generates messages in the form of sending an email and SMS notification. A local data storage ensures the storage of copies of important data to ensure availability and access speed. In the proposed architecture, it is used for data analysis and recovery after detecting anomalies or cases of loss of communication with the cloud service. The interaction between components of the smart home will be facilitated by the TCP/IP protocol, which is well-suited for IoT operations. The critical aspect in protocol selection was the universality and flexible protocol set supporting various types of IoT devices and networks, offering advantages such as compatibility, scalability, and security. Initially, the CoAP protocol was considered, but due to its incompatibility with the chosen dataset, the TCP/IP protocol was chosen [22]. Interoperability allows IoT devices to communicate with each other and with the cloud, regardless of their hardware, software, or network architecture. This will enable future implementation of enhanced security during user connection through cloud access [23]. Moreover, TCP/IP can handle a large number of IoT devices and data traffic using methods such as subnetting, routing, and addressing. Additionally, TCP/IP can provide security features for IoT devices such as encryption, authentication, and firewall. It also supports protocols like TLS (Transport Layer Security) and IPSec (Internet Protocol Security), which can enhance data transmission security and network access. Finally, TCP/IP can support various application protocols such as HTTP, MQTT, and CoAP, catering to different IoT scenarios and requirements. Figure 1: Anomaly detection subsystem in a smart home. The functioning of the anomaly detection subsystem in a smart home can be represented as a sequence of stages (see Fig. 2), generally including model training and utilization [24]. Initially, data is collected from various sources such as server logs, network packets, and sensor data. These data are then normalized and standardized for uniform representation, ensuring their consistency and interoperability. Data processing involves considerations such as IP addresses, packet sizes, port checks, DNS protocols, and sensor data. Min-max normalization is applied to unify the data, as all input data come in different formats. IP addresses, DNS protocols, and ports are categorical, while packet sizes and sensor data are numerical [25]. During the preprocessing stage, features are extracted, and missing or incorrect values are handled by comparing them with the average normal values. An important parameter for anomaly detection is the identification of features indicating data irregularities. For IP address analysis, the source and destination IP (Source IP/Destination IP) are checked. Packet size is determined by parameters such as the total length of forward packets (Fwd), total length of backward packets (Bwd), minimum packet length, maximum packet length, packet length mean, and packet length standard deviation. DNS protocol parameters include query repetition, name length, and domain names with invalid characters. Ports are checked for source and destination packet ports. For sensors, critical anomaly detection parameters include the absence of measurements for a long period, sudden changes in parameters, and large discrepancies with average values. The Random Forest algorithm is used as the machine learning algorithm. For model training, in addition to collected data (in the form of features), the target indicator (label) was included. During the cyber attack detection stage, the trained model is used to classify new data as "normal" or "anomalous". Figure 2: Architecture of anomaly detection subsystem in a smart house. 3.1 Data processing module Data preprocessing is a critical stage in the system of data collection and cyber attack detection using Random Forest. This stage involves optimizing input data before using it in the model to prevent excessive energy consumption [26]. Initially, it is important to identify key features for the model, considering their influence. As mentioned earlier, fundamental features that significantly impact anomaly detection or cyber attacks have been selected. During preprocessing, decisions are made on how to handle missing data, where the system compares them with regular data flow and their average values. It is also essential to normalize the data, transforming them into a uniform range of values to avoid the influence of voluminous parameters. Data transformations are also used to improve their distribution and highlight important characteristics. For communication of messages in IoT networks, the TCP/IP protocol is used. TCP/IP provides mechanisms for communication between different devices in the IoT network. It enables devices to exchange data, including sensor information, device management, and other data. TCP/IP provides some basic security features such as message integrity, confidentiality, and endpoint identity protection using SSL/TLS (Secure Sockets Layer/Transport Layer Security). However, it is important to note that the use of the TCP/IP protocol alone is insufficient to ensure communication security in IoT. Therefore, additional security measures such as authentication and access control need to be implemented. In this regard, the protected endpoint interacting with IoT devices should be determined by business logic, not by transport protocols and endpoint availability, as depicted in Figure 3. This is achieved through message protection at different network layers, even in the case of low-power radio devices, while maintaining system performance [27]. Devices with limited resources require a specialized protocol for secure communication that minimizes performance impact while flexibly supporting various trust models. The gateway used to support cross-device communication with the cloud can perform essential functions, but it cannot be fully trusted with access to application-level data. Figure 3: Interaction of the TCP/IP protocol with the IoT architecture. The Random Forest classifier was chosen because it does not assume linear interactions or even linear functions. Random Forest is also a bagging method, which is easily scalable since each decision tree can be trained on each working node of the cluster. In the case of DDoS Water Torture attacks, analytical functions may interact linearly, as the names of the attacked subdomains are generated randomly and queried by a large number of IP addresses. In this case, QR, SS, and SIS will interact linearly. However, during DDoS AMP attacks, capturing the name record will result in a small SS and a large QR. Therefore, tree-based models are better than linear models. In large datasets, there may be a need for selective analysis to reduce the volume of data without losing representation. Sampling can be random, based on certain criteria, or another method to ensure the optimal amount of data for model training [28]. Some additional features may be created based on existing ones to improve data representation or detect additional relationships. For example, combining or extracting certain characteristics can help the model better understand complex interactions in the data. These data preprocessing steps are important to ensure the quality and effectiveness of the Random Forest model in detecting cyber attacks and ensuring optimal utilization of training on input data. 3.2 Model training The machine learning model is a key component in the proposed anomaly detection subsystem in the Smart Home system. The model is trained based on pre-processed and normalized training data. Using the trained Random Forest model, the system classifies the input data, considering each input as either "normal" or "anomalous". The main idea is that attacks often manifest as anomalous deviations from typical system behavior. The training of the Random Forest algorithm followed the following steps: 1. Loading the dataset. 2. Applying preprocessing techniques. Discretization. 3. Partitioning the dataset into four datasets. 4. Splitting the dataset into training and testing sets. 5. Selecting the best feature set using a feature subset selection measure. 6. Passing the dataset to the Random Forest for training. 7. Passing the testing dataset to the Random Forest for classification. 8. Calculating accuracy, detection rate, and false alarm rate. Additionally, when evaluating data packets, the Random Forest will detect anomalies based on the ensemble of branches and decisions made by the system. Let the first tree (see Fig. 4) decide whether the received data packets match normal values. Initially, the packet is compared with the total length of all packets sent forward from the source to the destination during network monitoring (Total Length of Fwd Packets) and vice versa, packets sent in the reverse direction (Total Length of Bwd Packets), which is 100 bytes. Then it checks against a maximum allowable value (Max Packet Length), for example, 50 bytes, and a minimum allowable value (Min Packet Length), 5 bytes. Also, consider the parameter of packet length mean (Packet Length Mean), 20 bytes. And take into account the standard deviation of packet length (Packet Length Std) of 10 bytes from the norm. With these parameters, the first tree in the Random Forest system can be constructed. 4 Experimental studies For conducting experiments, all test data was split into two sets: a training set (70%) and a validation set (30%). The training set is provided to the Random Forest classifier for training, while the validation set is used to assess the classifier's performance. All experiments were conducted using the Weka tool. For analysis, was utilized the CICIDS2017 dataset. The CICIDS2017 dataset consists of 42 attributes, with the last attribute comprising the class label [29]. Was tested various numbers of Random Forest trees. The following performance metrics were used to evaluate the classifier: 10-fold cross-validation was employed for classification [30]. Accuracy is calculated first, equation 1. Accuracy – Defined as the ratio of correctly classified samples to the total number of samples (1) 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐fi𝑒𝑒𝑒𝑒 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 = , 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 Then the detection rate is the ratio of the total number of attacks detected by the system to the total number of attacks present in the data set, equation 2. (2) 𝑇𝑇𝑇𝑇 𝐷𝐷𝐷𝐷 = , TP + TN where DR(Detection rate) is the proportion of detection, TP - correctly detected positive results, TN - correctly detected negative results. After this false Alarm Rate – The false alarm rate is defined as equation 3. (3) FP FAR = , FP + TN where FAR (False alarm rate) is the frequency of false alarms, FP is the number of false positive detections, TN is the number of correct negative detections. Matthews Correlation Coefficient (MCC) is the ratio between observed and predicted binary classifications, equation 4. (4) TN × TP − FN × FP МСС = , �(FP + TP)(FN + TP)(TN + FP)(TN + FN) where TN – Correct negative; FN – False negative; FP – False positive result; TP – Correct positive. For experimental analysis, was utilized the CICIDS2017 dataset in ARFF format. The following preprocessing steps were conducted: 1. Missing Values Imputation. Was applied the Weka filter to replace all missing values in the CICIDS2017 dataset. The filter utilizes the mean and mode from the training data to replace missing values. 2. Discretization. A discretization filter with 10 bins was used for numerical attributes. To integrate the cyber-attack detection system into a smart home, this system can be considered as part of a comprehensive infrastructure solution that interacts with other elements of the smart home. The primary task is to position the system at a level that ensures the necessary security and efficiency of its operation. Regarding the architecture of the cyber-attack detection system, and can employ distributed components for threat analysis and detection. This may include modules for data analysis, machine learning, as well as an interface for interaction with other systems in the smart home. Thanks to the use of Weka and the corresponding architecture, it is possible to effectively model attacks and analyze their impact on the smart home, providing a reliable system of integration and protection against cyber threats. For thorough analysis and evaluation of the effectiveness of cyber-attack detection methods, was conducted comparisons of their performance on different types of attacks [31]. In this study, four main classes of cyber threats were considered: DOS (Denial of Service), Prob (Probe), R2L (Unauthorized Access to Remote System), and U2R (Unauthorized Access by Privileged Users). The tables below display the results for each type of attack, including accuracy, sensitivity (DR), false alarm rate (FAR), and Matthews correlation coefficient (MCC). In analysis and comparison, was used the Weka program, which provides a wide range of tools for machine learning and data analysis. To understand the performance of the random forest approach, it was compared with the J48 tree method. The performance of the proposed approach is presented in Table 1. It can be seen from Tables 1, 2 and 3 that the proposed model achieved high DR and low FAR for attack classification. For DOS attacks, the proposed model achieved an accuracy of 94.98%, which is 7% higher than the J48 algorithm. The FAR recorded for the J48 is higher than the proposed model. A good classifier for detecting attacks should have a high DR and a low FAR (Figure 5). Figure 4: Anomaly search tree in Random Forest based on selected features. After applying the symmetric uncertainty (SU) feature selection function, the measurement accuracy and DR were improved and the FAR was reduced. Table 1 Performance index for a random forest (number of trees = 100). № Attack type Accuracy DR FAR МСС 1 Dos 94.98 95.83 0.00519 0.94 2 Prob 95.34 95.81 0.00501 0.93 3 R2L 95.27 95.81 0.00501 0.94 4 U2R 94.93 95.83 0.00548 0.92 Table 2 Performance indicator for the J48 tree. № Attack type Accuracy DR FAR МСС 1 Dos 94.91 95.3 0.00828 0.933 2 Prob 95.19 95.4 0.0093 0.933 3 R2L 95.13 95.3 0.010 0.948 4 U2R 94.98 95.3 0.0075 0.948 Table 3 After applying the FSS-symmetric uncertainty. № Attack type Accuracy DR FAR МСС 1 Dos 94.88 95.82 0.00467 0.95 2 Prob 95.43 95.73 0.00467 0.95 3 R2L 95.39 95.86 0.00501 0.94 4 U2R 94.97 95.82 0.00467 0.93 For the trial attack after applying the feature selection feature, the DR is recorded as 95.83%. For R2L and U2R, the MCC was recorded as 0.94 and 0.92, respectively, indicating the effectiveness of this approach for classifying attacks in IDS. The average accuracy obtained by the proposed approach without feature selection is 95.13%, while it is only 95.05% for J48. The Matthews correlation coefficient recorded by this model is high compared to the J48 classifier. Experimental results show that this approach can achieve high accuracy, high DR with low FAR. Interestingly, for metamorphic viruses, which are capable of disguising themselves and changing their features, this method would be less effective. For this, a more complex algorithm is needed, which can maintain efficiency at 85%, which is quite high for this type of threat [32]. Alternatively, employing feature obfuscation analysis may raise efficiency to 94% [33]. DR parameter change FAR parameter change schedule schedule 96 0,01 95,5 0,005 95 0 Dos Prob R2L U2R Dos Prob R2L U2R Random Forest J48 Random Forest J48 Figure 5: Comparison of DR and FAR threat detection parameters for Random Forest and J48. Additionally, some research suggests using API Call Tracing for threat detection [34, 35]. The effectiveness of this method, according to studies, is 96.56%, but the high computational overhead would increase energy efficiency costs, which would be unacceptable and contradict the priorities of this system. 5 Conclusions In summary, the development of anomaly detection subsystems based on machine learning algorithms represents significant progress in enhancing the security of Smart Home systems. The utilization of the Random Forest algorithm demonstrates promising results in effectively detecting anomalies in IoT device-generated network traffic. Experimental evaluations conducted on the CICIDS2017 dataset underscore the effectiveness of the proposed subsystem in various attack scenarios. By providing early detection and response capabilities, the subsystem contributes to safeguarding smart homes against potential cyber threats, ensuring the integrity and reliability of these interconnected environments. This subsystem represents an important and timely direction in the field of cybersecurity and beyond. With the continuous advancement of technologies and the increasing number of network-connected devices, cybersecurity issues are becoming increasingly relevant, and this subsystem is an integral part of strategies to combat these threats. The application of such subsystems manifests in various domains such as cybersecurity, finance, medicine, and others, indicating its versatility and significant potential. One of the main advantages is its ability to detect anomalies in real-time and adapt to new forms of threats. The subsystem enables the detection of patterns that may go unnoticed by traditional methods and provides the ability to preempt attacks or other malicious activities. A review of related works highlights the diversity of methods and approaches in the field of cybersecurity, from statistical methods to deep learning based on various algorithms. However, considering the constant evolution and refinement of methods, such subsystems remain a key tool for ensuring cybersecurity and detecting anomalies and cyber-attacks in the modern world. 6 References [1] G. K. Mehrotra, K. M. Chilukuri, H. Huang, Anomaly Detection Principles and Algorithms (Terrorism, Security, and Computation) 1st ed. (2017). [2] C. Zhou, P. Zhang, J. Li, H. Luo, Y. Wang, Deep Learning for Anomaly Detection: A Review. Mathematical Problems in Engineering. (2021). [3] C.S Smith, M. Koning, Decision Trees and Random Forests: A Visual Introduction For Beginners: A Simple Guide to Machine Learning with Decision Trees. (2017). [4] A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. (2022). [5] I. Obeidat, M. AlZubi, Developing a faster pattern matching algorithms for intrusion detection system. International Journal of Computing, 18(3) (2019) 278-284. doi:10.47839/ijc.18.3.1520 [6] J. Howard. (2018). Introduction to Machine Learning for Coders. [7] S.L. Cheruvu, A. Kumar, et al. (2019). Demystifying Internet of Things Security: Successful IoT Device/Edge and Platform Security Deployment. [8] R. Wason. (2014). Internet of Distributed denial-of-service (DDoS) attack: Learn DDos Attack Methods. [9] S. Ciuta. (2023). Securing the Internet of Things (IoT): Cybersecurity of Connected Devices. [10] I. Goodfellow, Y. Bengio, A. Courville. (2017). Deep Learning. [11] S. Shalev-Shwartz, S. Ben-David. (2015). Understanding Machine Learning: From Theory to Algorithms. [12] S. Raschka, V.Mirjalili. Python Machine Learning, 2022. [13] M. P. Deisenroth. Mathematics for Machine Learning. (2020). [14] Z. Zhi-Hua. (2021). Machine Learning. [15] A. Ng. Machine Learning on Coursera. URL: https://www.coursera.org/learn/machine- learning. [16] L. Chen, Y. Zhang, Q. Zhao, G. Geng, Z. Yan. Detection of DNS DDoS Attacks with Random Forest Algorithm on Spark. 134 (2018) 310-315. [17] J. Antonio Garcia-Macias. (2023). Transport in the IP-based Internet of Things: status report [18] N. Moustafa, J. Slay, A Survey of Anomaly Detection in Internet of Things, 2019. [19] Kiranyaz, S., Avci, O., Abdeljaber, O., & Incecik, E. (2017). Machine Learning for Anomaly Detection and Diagnosis in Aeronautics. [20] O. Pomorova, O. Savenko, S. Lysenko, A. Kryshchuk, A. Nicheporuk, A technique for detection of bots which are using polymorphic code, Communications in Computer and Information Science. 431 (2014) 265-276. [21] O. Savenko, S. Lysenko, A. Nicheporuk et al., Metamorphic Viruses’ Detection Technique Based on the Equivalent Functional Block Search, CEUR Workshop Proceedings. 1844 (2017) 555-569. [22] R. Achary, C. J. Shelke, K. Marx, Aishwarya Rajesh. (2023). Security Implementation on IoT using CoAP and Elliptical Curve Cryptography. doi: "doi.org/10.1016/j.procs.2023.12.105". [23] I. R. Chiadighikaobi, N. Katuk, B. Osman, DMUAS-IoT: A Decentralised Multi-Factor User Authentication Scheme for IoT Systems. International Journal of Computing 21(4) (2022). 424-434, doi: “doi.org/10.47839/ijc.21.4.2777”. [24] S. Garcia, M. Grill, , J.Stiborek, A. Zunino, Cambiaso, E. Machine Learning in Cybersecurity: A Comprehensive Survey. 19 (2019). [25] J. Antonio Garcia-Macias. (2023). Transport in the IP-based Internet of Things: status report. doi: doi.org/10.1016/j.procs.2023.09.006. [26] O.Yakubu, B. C. Narendra, C.O. Adjei, A Novel IoT Based Smart Energy Meter with Backup Battery. International Journal of Computing, 20(3), (2021). 357-364, doi: doi.org/10.47839/ijc.20.3.2281. [27] V. Chandola, A. Banerjee, V. Kumar, Anomaly Detection: A Survey. (2014). [28] C. Zhou, P. Zhang, J. Li, H. Luo, Y. Wang, Deep Learning for Anomaly Detection: A Review. (2018). [29] H. Lindstedt. Methods for network intrusion detection. Evaluating rule-based methods and machine learning models on the CIC-IDS2017 dataset. (2022). [30] N. A. Farnaaz, J. Akhil. Random Forest Modeling for Network Intrusion Detection System. (2016). doi: doi.org/10.1016/j.procs.2016.06.047. [31] A. Feijoo-Añazco, D. Garcia-Carrillo, Jesús Sanchez-Gomez, Rafael Marin-Perez. Innovative security and compression for constrained IoT networks. (2023). doi: "doi.org/10.1016/j.iot.2023.100899". [32] O. Pomorova, O. Savenko, S. Lysenko, A. Nicheporuk, Metamorphic Viruses Detection Technique based on the the Modified Emulators, CEUR Workshop Proceedings. 1614 (2016) 375-383. [33] A. Kashtalian, S. Lysenko, O. Savenko, A. Nicheporuk, T. Sochor, V. Avsiyevych, Multi- computer malware detection systems with metamorphic functionality. Radioelectronic and Computer Systems. 2024(1), 152-175. doi: 10.32620/reks.2024.1.13 [34] O. Savenko, A. Nicheporuk, S. Lysenko, et al., Dynamic signature-based malware detection technique based on API call tracing CEUR Workshop Proceedings, 2393 (2019) 633-643. [35] V. Khoroshko, V. Kudinov, M. Kapustian Evaluation of quality indicators of functioning cyber protection management systems of information systems, Computer Systems and Information Technologies. 2 (2022) 47–56. doi: 10.31891/csit-2022-2-6.