Development of the social engineering attack models ⋆ Oleksandr Bokhonko1,†, Sergii Lysenko1,∗,† and Piotr Gaj2,† 1 Khmelnitsky National University, Khmelnitsky, Instytutska street 11, 29016, Ukraine 2 Silesian University of Technology, ul. Akademicka 2A, 44-100 Gliwice, Poland Abstract This study developed specialized models for detecting social engineering attacks, with a focus on spam emails, spear phishing, and trojan emails. Each model captures distinct features of these attacks using machine learning-based detection processes. Utilizing the BotGRABBER framework, which incorporates algorithms such as random forest, decision tree, K-nearest neighbor, and XGBoost, the models analyze characteristics like email metadata, user interaction patterns, attachment behaviors, and network anomalies to differentiate between malicious and legitimate communications. The targeted approach of each model enables tailored detection strategies that address specific social engineering tactics, whether they involve spam, personalized deceptive emails, or malware-infected attachments. For example, the trojan email model concentrates on identifying embedded malware within email attachments, utilizing sandbox environments for controlled testing and analysis. In contrast, the spear phishing model focuses on detecting personalized attack methods by analyzing sender details and links for suspicious patterns. The spam email model, on the other hand, prioritizes content filtering and tracking calls-to-action to distinguish between legitimate emails and mass-distributed spam. Empirical results demonstrate the models’ effectiveness, achieving approximately 99% detection accuracy with a 6% false positive rate. This strong performance highlights the potential of these models to contribute to proactive defense strategies against evolving social engineering threats. By leveraging targeted feature sets and adaptive machine learning algorithms, these models can be effectively deployed in real-world environments to safeguard networks and systems from a wide array of social engineering attacks. Keywords social engineering attack, cybersecurity, models; cyberattacks; detection; network host1 1. Introduction A social engineering attack is a type of cyberattack that relies on human interaction and psychological manipulation to trick individuals into revealing confidential information or performing actions that compromise security. Unlike technical hacking methods, social engineering exploits human behavior and trust to gain unauthorized access, bypass security measures, or install malicious software [1]. Attackers often pose as trusted figures, such as employees, IT support, or even friends, to lower a person’s guard and gain access to sensitive information. Such attacks exploit emotions such as curiosity, fear, urgency, or helpfulness. Attackers might, for example, create a sense of urgency to prompt immediate action without verification [2]. Social engineering does not rely on code manipulation or exploiting software vulnerabilities. Instead, it leverages human psychology as the “weakest link” in security [3]. Social engineering attacks are particularly effective because they exploit human psychology rather than technology [4]. Since people are typically more inclined to trust or respond to authority and act under pressure, these attacks often bypass traditional security defenses. This is why training AdvAIT-2024: 1st International Workshop on Advanced Applied Information Technologies, December 5, 2024, Khmelnytskyi, Ukraine - Zilina, Slovakia ∗ Corresponding author. † These authors contributed equally. booweb24@gmail.com (O. Bokhonko); sirogyk@ukr.net (S. Lysenko); piotr.gaj@polsl.pl (P. Gaj) 0000-0002-7228-9195 (O. Bokhonko); 0000-0001-7243-8747 (S. Lysenko); 0000-0002-2291-7341 (P. Gaj) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings and awareness programs are critical in defending against social engineering. Developing models to detect and respond to these attacks is a crucial area of cybersecurity [5]. Social engineering attacks have unique strategies to exploit human psychology and trust for unauthorized access or data extraction [6]. As these attacks continuously evolve in complexity and sophistication, there is an urgent need to model and analyze these threats to devise effective detection techniques [7]. Social engineering attacks exploit human factors, often bypassing traditional security controls; thus, enhancing detection strategies tailored to identify these behaviors is crucial. Each social engineering attack represents a unique set of tactics that adversaries can adapt over time [8, 9]. The frequency and evolving nature of phishing and spear phishing, for instance, indicate attackers’ ability to customize messages to specific individuals or roles. Detection techniques must account for these evolutions to stay ahead of attackers [10]. Since social engineering primarily manipulates psychological factors, traditional security mechanisms often fail to detect it [11]. So, building comprehensive models for each attack type can provide a foundation for more effective detection. With effective models, it is possible to move towards predictive security, identifying potential attack patterns before they fully develop. By structuring detection techniques around a clear understanding of each social engineering attack vector, organizations can build resilient defenses against both traditional and emerging social engineering strategies. This would provide an essential layer of security, strengthening overall cybersecurity frameworks. 2. State-of the-art There is a huge number of researches devoted to the problem of the social engineering attack detection. Thus, in article [12] by the author suggested phishing detection method called Freeze-Phish, which uses Python to create a web crawler to collect information such as hyperlinks from a website. In addition, the author created a database of brand words and suspicious words by editing distance algorithms such as Levenshtein distance and Hamming distance to compare the difference between the words in the web page URL and the suspicious word. The developer used a neural network to train this model and exported the code as an executable file (.exe) so that users can more easily use the code to detect suspicious web pages. Compared to other methods, the accuracy of the Freeze- Phish model is about 97% true positives, and the average execution time is 21.3 seconds. The work [13] presents a new phishing detection model that uses feature selection to select highly correlated features with a class label. The feature selection step uses a library of independent significant features from MATLAB and a heatmap from Python to find highly correlated features. The model uses an adaptive boosting approach that consists of multiple classifiers to improve the accuracy of the model. The model proposed by the authors provides extremely high predictive accuracy of approximately 99%. A malicious URL (or) a malicious website is a common and serious cyber security threat. Therefore, the search engine becomes the basis of information management. Most existing systems for detecting malicious websites focus on specific attacks. Meanwhile, blacklist-based browser extensions are powerless against numerous websites. Therefore, it is important that any data coming from the client side is effectively obfuscated so that the server cannot interpret any valuable information from the obfuscated data. In paper [4], the first PPSB service is proposed. It provides strong security guarantees that are lacking in existing SB services. In particular, it inherits the ability to detect dangerous URLs while protecting both the user's privacy (browsing history) and the proprietary assets of the blacklist provider (the list of dangerous URLs). The authors propose a model that encrypts sensitive user data to prevent interference by external analysts and service providers. It also fully supports selective aggregate functions for analyzing user behavior online and guarantees differential privacy. The RSA homomorphic algorithm is used to encrypt user behavior data online. The implementation is complete and its performance is evaluated against a real-time behavioral data set. In this study [5], the authors proposed an adaptive framework that combines deep learning and Randon Forest for image reading, speech synthesis from deeply faked videos, and natural language processing at different prediction levels to significantly improve the performance of machine learning models for detecting phishing attacks. To validate both the effectiveness and adaptability of our proposed framework to overcome the limitations of current approaches and its ability to detect sophisticated phishing sites, the researchers created 4 categories of phishing sites and uploaded them to a secure server with compromised DNS at a friendly URL; the first was a text-only phishing site, an image-only phishing site, a video-only phishing site, and a combination phishing site. The authors used SEO-friendly URLs and hacked the legitimate DNS on the text-only phishing site so that they could avoid detection at the 1st level to the 4th level of the framework where they were detected. Also, the developers created phishing sites where the text contains only image format, text-only format and video-only format using fake videos to test the adaptability of the proposed structure to different scenarios of a complex or complex phishing site, the proposed structure successfully overcomes the limitations of existing approaches, greatly improves the detection of phishing attacks and successfully detect sophisticated phishing web pages with multi-dimensional fake videos, images and texts. This study [6] addresses limitations in existing research, such as reliance on proprietary datasets and lack of real-world application, by proposing a highly efficient machine learning model for email classification. Using the most complete and largest publicly available data set, the model achieves an f1 of 0.99 and is designed to be deployed in appropriate applications. Additionally, Explainable AI (XAI) is integrated to increase user trust. This research offers a practical and highly accurate solution that helps fight phishing by providing users with a real-time web application to detect phishing emails. The work presented in source [7] aims to protect users' e-mail structure and settings to prevent attackers from using the account when it is hacked or hijacked and to prevent them from setting up forwarding in the victim's e-mail account to another account, which automatically stops the user from receiving emails. Secure code is applied to the submit button of the composition to reduce insider impersonation attack. In addition, to protect open applications on public and private devices. Article [8] provides an overview of the revolutionary technology often referred to as the "Guardian of Artificial Intelligence". It is a technologically advanced strategy to combat social engineering attacks using artificial intelligence (AI). The method uses machine intelligence technologies such as behavioral pattern analysis, anomaly detection, and social engineering deception to perform real-time monitoring actions. Using artificial intelligence functions in cyber defense, this method emphasizes a proactive and adaptive methodology to increase the level of security and immunity from social engineering attacks. While conventional social engineering defenses have shown some success, they rely heavily on static rules and signatures, making it difficult for them to keep up with the rapidly evolving tricks of cybercriminals. Social engineering attacks have become more sophisticated and targeted, requiring organizations to go beyond layered defenses and equip themselves with more advanced and adaptive security tools such as machine learning-based detection and behavioral analytics tools to effectively address such challenges. However, the use of machine learning mechanisms in cyber security brings challenges such as data reliability, model readability, and aggressive attacks. Ensuring the integrity and reliability of the training data is critical to avoid data bias and enable the development of reliable ML models. Furthermore, making sense of the findings made by highly nested neural networks is a challenging task, leading to debates in the area of transparency and accountability. In this study [9], the authors consider the potential of hybrid approaches that combine several models to increase both the reliability and effectiveness of phishing detection. The researchers highlight the limitations of existing hybrid models that focus primarily on efficiency while ignoring broader applicability. To address these gaps, the authors present a new framework explicitly designed for real-world applications that lays the foundation for practical and robust phishing detection architectures. The authors performed a proof-of-concept to evaluate its effectiveness, reliability, and detection speed. The authors also present an innovative methodology for simulating bypass attacks on basic models with one analysis. These experiments demonstrate that the proposed hybrid framework outperforms individual models, exhibiting higher efficiency, resistance to circumvention attempts, and real-time detection capabilities. The proof-of-concept method achieves an accuracy of 97.44%, thus outperforming the current state-of-the-art approach while requiring less computational time. The results provide insight into the multifaceted factors behind hybrid models beyond simple performance and highlight the importance of holistic applicability of hybrid approaches to address the critical need for robust phishing protection. In [20] system aims to enhance user security by detecting phishing websites, ensuring safe browsing and transactions while protecting sensitive information was proposed. It provides users with a browser extension that helps identify whether a website is legitimate or not. The system combines heuristic features, visual features, and various approaches to feed machine learning algorithms, ensuring effective detection. A key challenge is adapting to new phishing tactics, which requires algorithms that continually learn and evolve. To achieve high accuracy, the system uses online learning algorithms and multiple approaches to improve precision. However, the system may occasionally produce minor false positives and false negatives, which can be minimized by incorporating more advanced features for the machine learning model, leading to better accuracy. Identification and labeling of fake news is a difficult problem due to the huge amount of heterogeneous content. Essentially, the functions of machine learning (ML) and natural language processing (NLP) are to improve, accelerate and automate the analytical process. In this paper [21], a combination of ML and NLP is implemented to classify fake news based on an open, large, and labeled corpus on Twitter. In this case, the authors compare several state-of-the-art machine learning and neural network models based on content-only features. In order to improve the classification performance, inverse document frequency functions (TF-IDF) were applied before the training process in ML training, while word embedding was used in neural network training. Due to the application of ML and NLP methods, all traditional models have an accuracy of more than 85%. All neural network models have over 90% accuracy. In their experiments, the authors found that neural network models outperformed traditional ML models by an average of about 6% accuracy, with all neural network models achieving up to 90% accuracy. This research [22] presents a new method for detecting phishing attacks on websites, avoiding the problems associated with the shortcomings of knowledge-based representation and binary solution. The proposed detection method was performed using Fuzzy Rule Interpolation (FRI). FRI reasoning methods have added the advantage of increasing the robustness of fuzzy systems and effectively reducing system complexity. These benefits help the intrusion detection system (IDS) generate more realistic and comprehensive alerts in the event of phishing attacks. The proposed method was applied to a dataset of an open-source phishing website. The results show that the proposed detection method achieved a detection rate of 97.58% and effectively reduced the number of false alarms. Additionally, it effectively blurs the line between normal and phishing traffic due to its fuzzy nature. It has the ability to generate the necessary security alert in case of deficiencies in the knowledge-based representation. In addition, the results obtained using the proposed detection method were compared with other literature results. The results showed that the accuracy rate of this work is competitive with other methods. In addition, the proposed detection method can generate the necessary anti-phishing alerts even if one of the sparse anti-phishing rules does not cover some input parameters (observations). Article [23] presents an in-depth exploration of the current landscape of social engineering attacks, detailing their classifications and outlining a range of mitigation strategies organizations can implement to protect their most valuable assets against these persistent and rapidly evolving threats. In study [24], the authors proposed a new scheme called Routing Protocol for Energy-Efficient Networks (RPEEN) for clone attack detection in an IoT-based intelligent healthcare application. The main advantage of this scheme is the improvement of energy efficiency, since energy efficiency is the most important constraint in WSN systems. The performance of the proposed scheme is highlighted using parameters such as time delay, residual energy, throughput, energy efficiency, and error rate. In addition, to show the effectiveness of the proposed algorithm, this algorithm is compared with the existing hybrid multilevel clustering (HMLC) algorithm. It is found that the proposed RPEEN scheme achieves a time delay of 0.63 and 0.6ms with 0 dead nodes and by avoiding the clone attack, respectively. In addition, the proposed scheme achieves the highest residual energy of 49.5 J for 2500 shots. In addition, the proposed algorithm achieves the highest throughput of 99.2% for 50 nodes. The emergence of large language models (LLMs), including ChatGPT, has had a significant impact on a wide range of fields. Although LLMs have been widely investigated for tasks such as code generation and text synthesis, their application to detect malicious web content, particularly phishing sites, has been little studied. To counter the growing wave of cyberattacks due to misuse of LLM, it is important to automate detection using advanced LLM capabilities. In paper [25], the authors propose a new system called ChatPhishDetector that uses LLM to detect phishing sites. This system involves using a web crawler to gather information from websites, generate hints for LLM based on the crawled data, and then derive detection results from the responses generated by LLM. The system enables the detection of multilingual phishing sites with high accuracy, identifying fake brands and social engineering techniques in the context of the entire website without the need to train machine learning models. To evaluate the performance of the system, the authors performed experiments on their own dataset and compared it with the baseline systems and several LLMs. Experimental results using GPT-4V have demonstrated outstanding performance with a precision of 98.7% and a recall of 99.6%, outperforming the detection results of other LLMs and existing systems. These findings highlight the potential of LLM to protect users from online fraud and have important implications for strengthening cybersecurity measures. Nevertheless, there are disadvantages for each of the provided detection approaches: complexity in deployment, when the model relies on a neural network and custom algorithms, making it potentially harder for users to maintain; high execution time; limited scalability; overfitting risk; huge feature dependence; high computational cost; resource intensive; high complexity; over- reliance on datasets; limited applicability; transparency challenges; false alarms. Such situation requires the development of new approaches and new model, that take into account all aspects of social engineering attack functioning. 3. Development of the social engineering attacks models Let us define the set S as the social engineering attacks set, Ξ={α,β,γ,δ,ε,ϵ,ζ,η,θ,ϑ,ι,κ,λ,μ,ν}, where α – the vishing attack; β – phishing attack; γ – profile cloning; δ – grooming; ε – dumpster diving attacks; ϵ – tailgating; ζ – file masquerade; η – baiting; θ – scareware or pop-up windows; ϑ – water-holing; ι – trojan mail; κ – spear phishing; λ – spam mail; μ – interesting software; ν – hoaxing. 3.1. Trojan mail attack model In order to develop Trojan mail attack model, let us focus on the key components of the attack: 1. Email crafting. Hackers design emails that appear to be from trusted sources, such as a known colleague, a reputable company, or even government entities. The content of the email typically contains a sense of urgency or relevance to prompt the recipient to interact with the links or attachments. For example, the email might reference an overdue invoice, a shipment confirmation, or a required update. 2. Spoofing and deceptive tactics. Hackers may use email spoofing to mask their true identity, making the email appear as if it’s coming from a legitimate sender. They often replicate the visual style and tone of official communication to reduce suspicion, using logos, familiar phrases, or signatures. 3. Malicious link or attachment. The email includes a malicious link or attachment, often in the form of a document (e.g., PDF, Word, Excel) or a ZIP file. Clicking on the link directs the user to a compromised site or triggers a malware download. Opening the attachment similarly executes the malware, installing it on the user’s system. 4. Trojan execution. Once activated, the malware (often a trojan) installs itself silently on the user’s device. The trojan may open a backdoor for remote access, allowing hackers to control the system, capture keystrokes or take screenshots to steal login credentials and sensitive information, or spread laterally across the network, infecting other systems. 5. Unauthorized access and data theft. After the trojan is successfully installed, hackers can gain unauthorized access to the infected system. The hackers may use this access to steal confidential information such as passwords, financial details, or personal data, monitor network activity and gather intelligence for further attacks, or encrypt the system or files for ransomware attacks. 6. Continued Exploitation. The trojan remains hidden and continues to operate without the user's knowledge, enabling ongoing surveillance or exploitation. Hackers can use the compromised system to launch additional attacks, either within the organization or against external targets. Trojan mail attack model has to include the set of countermeasures. Thus, to protect against Trojan mail attacks, individuals and organizations should implement several defensive strategies: 1. Email Filtering and Security. We are to use advanced email filtering solutions to block suspicious emails or detect common signs of phishing and malware delivery, and implement email authentication protocols, such as SPF, DKIM, and DMARC, to prevent email spoofing. 2. User Awareness and Training. We are to educate users to recognize phishing emails, especially those containing suspicious attachments or unexpected requests for action, and train users to avoid clicking on unfamiliar links or downloading attachments from unverified sources. 3. Antivirus and malware protection. We are to ensure that antivirus and anti-malware solutions are updated regularly to detect and block trojans and other types of malware, and enable real-time scanning of email attachments and downloads. 4. Network Segmentation and Access Controls. We are to limit the lateral movement of malware by segmenting networks and implementing access controls. This helps to contain an infection if it does occur, and employ least privilege policies, ensuring users have access only to the resources they need. 5. Backup and Recovery. We are to regularly back up critical data and maintain a recovery plan in case of infection. This can help mitigate the damage caused by ransomware or data theft following a trojan mail attack. Let us present the model of the trojan mail attack as the tuple: 𝑀𝑀𝛿𝛿 = ⟨𝐴𝐴𝛿𝛿 , 𝐸𝐸𝛿𝛿 , 𝑈𝑈𝛿𝛿 , 𝑀𝑀𝛿𝛿 , 𝑆𝑆𝛿𝛿 , 𝐷𝐷𝛿𝛿 ⟩, (1) where 𝐴𝐴𝛿𝛿 = {𝑎𝑎𝛿𝛿1 , 𝑎𝑎𝛿𝛿2 , … , 𝑎𝑎𝛿𝛿𝛿𝛿𝐴𝐴 } is the set that represents the hackers responsible for crafting 𝛿𝛿 and distributing trojan mail, 𝑁𝑁𝐴𝐴𝛿𝛿 - the number of individuals conducting the trojan mail attacks; 𝐸𝐸𝛿𝛿 = {𝑒𝑒𝛿𝛿1 , 𝑒𝑒𝛿𝛿2 , … , 𝑒𝑒𝛿𝛿𝛿𝛿𝐸𝐸 } is the set that represents the emails that are sent to potential victims, 𝛿𝛿 which contain malicious links or attachments, 𝑁𝑁𝐸𝐸𝛿𝛿 - the number of emails; 𝑈𝑈𝛿𝛿 = �𝑢𝑢𝛿𝛿1 , 𝑢𝑢𝛿𝛿2 , … , 𝑢𝑢𝛿𝛿𝛿𝛿𝑈𝑈 � is the set that represents the users who receive and interact with the 𝛿𝛿 malicious emails, 𝑁𝑁𝑈𝑈𝛿𝛿 - number of users; 𝑀𝑀𝛿𝛿 = �𝑚𝑚𝛿𝛿1 , 𝑚𝑚𝛿𝛿2 , … , 𝑢𝑢𝛿𝛿𝛿𝛿𝑀𝑀 � is the set that represents the trojan malware that is embedded in the 𝛿𝛿 links or attachments, 𝑁𝑁𝑀𝑀𝛿𝛿 – number of trojan malware; 𝑆𝑆𝛿𝛿 = �𝑠𝑠𝛿𝛿1 , 𝑠𝑠𝛿𝛿2 , … , 𝑠𝑠𝛿𝛿𝛿𝛿𝑆𝑆 � is the set that represents the systems that are compromised once the 𝛿𝛿 malware is activated, 𝑁𝑁𝑆𝑆𝛿𝛿 – number of compromised systems; 𝐷𝐷𝛿𝛿 = �𝑑𝑑𝛿𝛿1 , 𝑑𝑑𝛿𝛿2 , … , 𝑑𝑑𝛿𝛿𝛿𝛿𝐷𝐷 � is the set that represents the confidential data that hackers target for 𝛿𝛿 theft or unauthorized access, 𝑁𝑁𝐷𝐷𝛿𝛿 – number of confidential data. Let us define the email crafting function 𝑓𝑓𝛿𝛿𝐸𝐸𝐶𝐶 that describes how the attacker crafts emails and sends them to users, as: 𝑓𝑓𝛿𝛿𝐸𝐸𝐶𝐶 : 𝐴𝐴𝛿𝛿 × 𝑈𝑈𝛿𝛿 → 𝐸𝐸𝛿𝛿 , 𝑓𝑓𝛿𝛿𝐸𝐸𝐶𝐶 (𝑎𝑎𝛿𝛿𝛿𝛿 , 𝑢𝑢𝛿𝛿𝛿𝛿 ) = 𝑒𝑒𝛿𝛿𝛿𝛿 . where the attacker 𝑎𝑎𝛿𝛿𝛿𝛿 sends an email 𝑒𝑒𝛿𝛿𝛿𝛿 to the user 𝑢𝑢𝛿𝛿𝛿𝛿 . Let us define the Malware Delivery Function 𝑓𝑓𝛿𝛿𝑀𝑀𝐷𝐷 that describes how users interact with the email and trigger the malware, as: 𝑓𝑓𝛿𝛿𝑀𝑀𝐷𝐷 : 𝐸𝐸𝛿𝛿 × 𝑈𝑈𝛿𝛿 → 𝑀𝑀𝛿𝛿 , 𝑓𝑓𝛿𝛿𝑀𝑀𝐷𝐷 (𝑒𝑒𝛿𝛿𝛿𝛿 , 𝑢𝑢𝛿𝛿𝛿𝛿 ) = 𝑚𝑚𝛿𝛿𝛿𝛿 , where the user 𝑢𝑢𝛿𝛿𝛿𝛿 opens the email 𝑒𝑒𝛿𝛿𝛿𝛿 and activates the malware 𝑚𝑚𝛿𝛿𝛿𝛿 . Let us define the System Compromise Function 𝑓𝑓𝛿𝛿𝑆𝑆 𝐶𝐶 that describes how the malware infects the user’s system, as: 𝑓𝑓𝛿𝛿𝑆𝑆 𝐶𝐶 : 𝑀𝑀𝛿𝛿 × 𝑈𝑈𝛿𝛿 → 𝑆𝑆𝛿𝛿 , 𝑓𝑓𝛿𝛿𝑆𝑆 𝐶𝐶 (𝑚𝑚𝛿𝛿𝛿𝛿 , 𝑢𝑢𝛿𝛿𝛿𝛿 ) = 𝑠𝑠𝛿𝛿𝛿𝛿 , where the malware 𝑚𝑚𝛿𝛿𝛿𝛿 compromises the user’s system 𝑠𝑠𝛿𝛿𝛿𝛿 . Let us define the Unauthorized Access Function 𝑓𝑓𝛿𝛿𝑈𝑈 𝐴𝐴 that describes how hackers gain unauthorized access to systems through the trojan, as: 𝑓𝑓𝛿𝛿𝑈𝑈 𝐴𝐴 : 𝐴𝐴𝛿𝛿 × 𝑆𝑆𝛿𝛿 → 𝑆𝑆𝛿𝛿 , 𝑓𝑓𝛿𝛿𝑈𝑈 𝐴𝐴 (𝑎𝑎𝛿𝛿𝛿𝛿 , 𝑠𝑠𝛿𝛿𝛿𝛿 ) = 𝑆𝑆𝛿𝛿 𝑥𝑥𝑎𝑎𝛿𝛿 , 𝛿𝛿 where the hacker 𝑎𝑎𝛿𝛿𝛿𝛿 gains control of the system 𝑠𝑠𝛿𝛿𝛿𝛿 , creating 𝑆𝑆𝛿𝛿 𝑥𝑥𝑎𝑎𝛿𝛿 (the compromised system). 𝛿𝛿 Let us define the data theft function 𝑓𝑓𝛿𝛿𝐷𝐷𝑇𝑇 that describes how hackers steal data from the compromised systems, as: 𝑓𝑓𝛿𝛿𝐷𝐷𝑇𝑇 : 𝐴𝐴𝛿𝛿 × 𝑆𝑆𝛿𝛿 → 𝐷𝐷𝛿𝛿 , 𝑓𝑓𝛿𝛿𝐷𝐷𝑇𝑇 (𝑎𝑎𝛿𝛿𝛿𝛿 , 𝑠𝑠𝛿𝛿𝛿𝛿 ) = 𝑑𝑑𝛿𝛿𝛿𝛿 . where the hacker 𝑎𝑎𝛿𝛿𝛿𝛿 steals data 𝑑𝑑𝛿𝛿𝛿𝛿 from the compromised system 𝑠𝑠𝛿𝛿𝛿𝛿 . The overall impact of the trojan mail attack can be measured by the number of infected systems, the amount of stolen data, and the extent of unauthorized access. Thus, impact function 𝑔𝑔𝛿𝛿 can be presented as: 𝑔𝑔𝛿𝛿 : 𝐴𝐴𝛿𝛿 × 𝐸𝐸𝛿𝛿 × 𝑈𝑈𝛿𝛿 × 𝑀𝑀𝛿𝛿 × 𝑆𝑆𝛿𝛿 × 𝐷𝐷𝛿𝛿 → ℝ, 𝑔𝑔𝛿𝛿 �𝑎𝑎𝛿𝛿𝛿𝛿 , 𝑒𝑒𝛿𝛿𝛿𝛿 , 𝑢𝑢𝛿𝛿𝛿𝛿 , 𝑚𝑚𝛿𝛿𝛿𝛿 , 𝑠𝑠𝛿𝛿𝛿𝛿 , 𝑑𝑑𝛿𝛿𝛿𝛿 � = 𝐼𝐼𝜀𝜀𝑇𝑇 , where 𝐼𝐼𝜀𝜀𝑇𝑇 represents the impact of the trojan mail attack, considering factors such as the number of compromised systems, the severity of the data theft or unauthorized access, the spread of the malware across users and systems. Thus, for a specific victim 𝑢𝑢𝛿𝛿𝛿𝛿 targeted by attacker 𝑎𝑎𝛿𝛿𝛿𝛿 : 𝑓𝑓𝛿𝛿𝐸𝐸𝐶𝐶 (𝑎𝑎𝛿𝛿𝛿𝛿 , 𝑢𝑢𝛿𝛿𝛿𝛿 ) = 𝑒𝑒𝛿𝛿𝛿𝛿 the hacker sends a malicious email to the user; 𝑓𝑓𝛿𝛿𝑀𝑀𝐷𝐷 (𝑒𝑒𝛿𝛿𝛿𝛿 , 𝑢𝑢𝛿𝛿𝛿𝛿 ) = 𝑚𝑚𝛿𝛿𝛿𝛿 the user opens the email, activating the malware; 𝑓𝑓𝛿𝛿𝑆𝑆 𝐶𝐶 (𝑚𝑚𝛿𝛿𝛿𝛿 , 𝑢𝑢𝛿𝛿𝛿𝛿 ) = 𝑠𝑠𝛿𝛿𝛿𝛿 the malware compromises the user’s system; 𝑓𝑓𝛿𝛿𝑈𝑈 𝐴𝐴 (𝑎𝑎𝛿𝛿𝛿𝛿 , 𝑠𝑠𝛿𝛿𝛿𝛿 ) = 𝑆𝑆𝛿𝛿 𝑥𝑥𝑎𝑎𝛿𝛿 the hacker gains unauthorized access to the system; 𝛿𝛿 𝑓𝑓𝛿𝛿𝐷𝐷𝑇𝑇 (𝑎𝑎𝛿𝛿𝛿𝛿 , 𝑠𝑠𝛿𝛿𝛿𝛿 ) = 𝑑𝑑𝛿𝛿𝛿𝛿 the hacker steals data from the compromised system. 3.2. Spear phishing attack model In order to develop spear phishing attack model, let us focus on the key components of the attack: 1. Information Gathering. Attackers begin by researching their target extensively. This may involve collecting data from social media profiles, professional networking sites (like LinkedIn), or publicly available information. The goal is to create a detailed profile of the victim, including their job role, interests, contacts, and recent activities. 2. Message Crafting. With the gathered information, attackers craft a convincing email or message that is highly personalized and relevant to the victim. The message often includes familiar references, such as the names of colleagues, recent projects, or organizations the victim is associated with. This familiarity is intended to lower the victim's defenses. 3. Deceptive Links or Attachments. The crafted message typically contains links to malicious websites or attachments with embedded malware. These links might mimic legitimate URLs or point to fake websites designed to harvest credentials. 4. Execution of the Attack. When the victim clicks the link or opens the attachment, they may be directed to a fake login page where they are prompted to enter their credentials, unknowingly providing them to the attacker. If the attack involves malware, it may be downloaded onto the victim’s system, allowing the attacker to gain access to sensitive data or further infiltrate the network. 5. Account Compromise. Once the attacker obtains the victim’s login credentials or malware is installed, they can access the victim's accounts, whether personal or organizational. This access may lead to unauthorized transactions, data theft, or further attacks against the victim’s contacts. 6. Exploitation of Access. Attackers may use the compromised account to send additional spear phishing emails to the victim's contacts, thereby expanding the attack. They may also exploit the access to steal sensitive data, conduct fraud, or manipulate transactions. Attack model has to include the set of countermeasures. Thus, to protect against attacks, individuals and organizations should implement several defensive strategies: 1. User Education and Awareness. Train users to recognize the signs of spear phishing, including suspicious emails, unexpected requests for sensitive information, and links to unfamiliar sites. Encourage users to verify the authenticity of messages before clicking links or providing information. 2. Email Security Measures. Implement email filtering solutions to detect and block suspicious messages. Use email authentication methods (e.g., SPF, DKIM, DMARC) to reduce the likelihood of spoofed emails. 3. Multi-Factor Authentication (MFA). Enable MFA on sensitive accounts to add an additional layer of security. This makes it more difficult for attackers to gain access even if they have stolen login credentials. 4. Regular Monitoring and Incident Response. Monitor accounts and systems for unusual activity that may indicate a successful attack. Establish an incident response plan to quickly address any security breaches. 5. Limit Information Sharing. Be cautious about the amount of personal and professional information shared on social media and other online platforms. Review privacy settings to control who can see information. Let us present the model of the spear phishing attack as the tuple: 𝑀𝑀𝜀𝜀 = ⟨𝐴𝐴𝜀𝜀 , 𝑇𝑇𝜀𝜀 , 𝐼𝐼𝜀𝜀 , 𝑀𝑀𝜀𝜀 , 𝑅𝑅𝜀𝜀 , 𝑆𝑆𝜀𝜀 , 𝐷𝐷𝜀𝜀 ⟩, (2) where 𝐴𝐴𝜀𝜀 = {𝑎𝑎𝜀𝜀1 , 𝑎𝑎𝜀𝜀2 , … , 𝑎𝑎𝜀𝜀𝜀𝜀𝐴𝐴𝜀𝜀 } is the set that represents the attackers involved in spear phishing, 𝑁𝑁𝐴𝐴𝜀𝜀 – number of attackers; 𝑇𝑇𝜀𝜀 = {𝑡𝑡𝜀𝜀1 , 𝑡𝑡𝜀𝜀2 , … , 𝑡𝑡𝜀𝜀𝜀𝜀𝑇𝑇𝜀𝜀 } is the set that represents the specific individuals or organizations targeted by the attack, 𝑁𝑁𝑇𝑇𝜀𝜀 – number of individuals; 𝐼𝐼𝜀𝜀 = {𝑖𝑖𝜀𝜀1 , 𝑖𝑖𝜀𝜀2 , … , 𝑖𝑖𝜀𝜀𝜀𝜀𝐼𝐼𝜀𝜀 } is the set that represents the collected information about the targets, such as personal details and professional affiliations, 𝑁𝑁𝐼𝐼𝜀𝜀 – number of collected information; 𝑀𝑀𝜀𝜀 = {𝑚𝑚𝜀𝜀1 , 𝑚𝑚𝜀𝜀2 , … , 𝑚𝑚𝜀𝜀𝜀𝜀𝑀𝑀𝜀𝜀 } is the set that represents the crafted messages sent to targets, which may contain malicious links or attachments, 𝑁𝑁𝑀𝑀𝜀𝜀 – number of crafted messages; 𝑅𝑅𝜀𝜀 = �𝑟𝑟𝜀𝜀1 , 𝑟𝑟𝜀𝜀2 , … , 𝑟𝑟𝜀𝜀𝜀𝜀𝑅𝑅𝜀𝜀 � is the set that represents the malicious software that may be delivered through the attack, 𝑁𝑁𝑅𝑅𝜀𝜀 – number of malicious software; 𝑆𝑆𝜀𝜀 = �𝑠𝑠𝜀𝜀1 , 𝑠𝑠𝜀𝜀2 , … , 𝑠𝑠𝜀𝜀𝜀𝜀𝑆𝑆𝜀𝜀 � is the set that represents the systems compromised as a result of the attack, 𝑁𝑁𝑆𝑆𝜀𝜀 – number of compromised systems; 𝐷𝐷𝜀𝜀 = �𝑑𝑑𝜀𝜀1 , 𝑑𝑑𝜀𝜀2 , … , 𝑑𝑑𝜀𝜀𝜀𝜀𝐷𝐷𝜀𝜀 � is the set that represents the confidential information targeted for theft, 𝑁𝑁𝐷𝐷𝜀𝜀 – number of confidential information; Let us define the information collection function 𝑓𝑓𝜀𝜀𝐼𝐼𝐶𝐶 that describes how attackers gather information about their targets, as: 𝑓𝑓𝜀𝜀𝐼𝐼𝐶𝐶 : 𝐴𝐴𝜀𝜀 × 𝑇𝑇𝜀𝜀 → 𝐼𝐼𝜀𝜀 , 𝑓𝑓𝜀𝜀𝐼𝐼𝐶𝐶 (𝑎𝑎𝜀𝜀𝜀𝜀 , 𝑡𝑡𝜀𝜀𝜀𝜀 ) = 𝑖𝑖𝜀𝜀𝜀𝜀 . The attacker 𝑎𝑎𝜀𝜀𝜀𝜀 collects information 𝑖𝑖𝜀𝜀𝜀𝜀 about the target 𝑡𝑡𝜀𝜀𝜀𝜀 . Let us define the message crafting function 𝑓𝑓𝜀𝜀𝑀𝑀𝐶𝐶 that describes how attackers create personalized messages based on the collected information, as: 𝑓𝑓𝜀𝜀𝑀𝑀𝐶𝐶 : 𝐼𝐼𝜀𝜀 × 𝑇𝑇𝜀𝜀 → 𝑀𝑀𝜀𝜀 , 𝑓𝑓𝜀𝜀𝑀𝑀𝐶𝐶 (𝑖𝑖𝜀𝜀𝜀𝜀 , 𝑡𝑡𝜀𝜀𝜀𝜀 ) = 𝑚𝑚𝜀𝜀𝜀𝜀 , where the attacker 𝑎𝑎𝜀𝜀𝜀𝜀 crafts a message 𝑚𝑚𝜀𝜀𝜀𝜀 for the target 𝑡𝑡𝜀𝜀𝜀𝜀 . Let us define the Message Sending Function 𝑓𝑓𝜀𝜀𝑀𝑀𝑆𝑆 that describes how the crafted message is sent to the target, as: 𝑓𝑓𝜀𝜀𝑀𝑀𝑆𝑆 : 𝑀𝑀𝜀𝜀 × 𝑇𝑇𝜀𝜀 → 𝑇𝑇𝜀𝜀 , 𝑚𝑚 𝑓𝑓𝜀𝜀𝑀𝑀𝑆𝑆 (𝑚𝑚𝜀𝜀𝜀𝜀 , 𝑡𝑡𝜀𝜀𝜀𝜀 ) = 𝑡𝑡𝑟𝑟𝜀𝜀 𝜀𝜀 , where the message 𝑚𝑚𝜀𝜀𝜀𝜀 is sent to the target 𝑡𝑡𝜀𝜀𝜀𝜀 . Let us define the Malware Delivery Function 𝑓𝑓𝜀𝜀𝑀𝑀𝐷𝐷 that describes how the target interacts with the message, potentially activating malware, as: 𝑓𝑓𝜀𝜀𝑀𝑀𝐷𝐷 : 𝑀𝑀𝜀𝜀 × 𝑇𝑇𝜀𝜀 → 𝑅𝑅𝜀𝜀 𝑓𝑓𝜀𝜀𝑀𝑀𝐷𝐷 (𝑚𝑚𝜀𝜀𝜀𝜀 , 𝑡𝑡𝜀𝜀𝜀𝜀 ) = 𝑟𝑟𝜀𝜀𝜀𝜀 where the target 𝑡𝑡𝜀𝜀𝜀𝜀 activates the malware 𝑟𝑟𝜀𝜀𝜀𝜀 by interacting with the message. Let us define the System Compromise Function 𝑓𝑓𝜀𝜀𝑆𝑆 𝐶𝐶 that describes how the malware compromises the target's system, as: 𝑓𝑓𝜀𝜀𝑆𝑆 𝐶𝐶 : 𝑅𝑅𝜀𝜀 × 𝑇𝑇𝜀𝜀 → 𝑆𝑆𝜀𝜀 , 𝑓𝑓𝜀𝜀𝑆𝑆 𝐶𝐶 (𝑟𝑟𝜀𝜀𝜀𝜀 , 𝑡𝑡𝜀𝜀𝜀𝜀 ) = 𝑠𝑠𝜀𝜀𝜀𝜀 , where the malware 𝑟𝑟𝜀𝜀𝜀𝜀 compromises the system 𝑠𝑠𝜀𝜀𝜀𝜀 of the target 𝑡𝑡𝜀𝜀𝜀𝜀 . Let us define the Data Theft Function 𝑓𝑓𝜀𝜀𝐷𝐷𝑇𝑇 that describes how attackers gain access to confidential data once the system is compromised, as: 𝑓𝑓𝜀𝜀𝐷𝐷𝑇𝑇 : 𝐴𝐴𝜀𝜀 × 𝑆𝑆𝜀𝜀 → 𝐷𝐷𝜀𝜀 , 𝑓𝑓𝜀𝜀𝐷𝐷𝑇𝑇 (𝑎𝑎𝜀𝜀𝜀𝜀 , 𝑠𝑠𝜀𝜀𝜀𝜀 ) = 𝑑𝑑𝜀𝜀𝜀𝜀 , where the attacker 𝑎𝑎𝜀𝜀𝜀𝜀 steals data 𝑑𝑑𝜀𝜀𝜀𝜀 from the compromised system 𝑠𝑠𝜀𝜀𝜀𝜀 . The overall impact of a spear phishing attack can be quantified based on the number of systems compromised, the volume of data stolen, and the extent of unauthorized access achieved. Thus, let us present the impact function 𝑔𝑔𝜀𝜀 as: 𝑔𝑔𝜀𝜀 : 𝐴𝐴𝜀𝜀 × 𝑇𝑇𝜀𝜀 × 𝑀𝑀𝜀𝜀 × 𝑅𝑅𝜀𝜀 × 𝑆𝑆𝜀𝜀 × 𝐷𝐷𝜀𝜀 → ℝ, 𝑔𝑔𝜀𝜀 �𝑎𝑎𝜀𝜀𝜀𝜀 , 𝑡𝑡𝜀𝜀𝜀𝜀 , 𝑚𝑚𝜀𝜀𝜀𝜀 , 𝑟𝑟𝜀𝜀𝜀𝜀 , 𝑠𝑠𝜀𝜀𝜀𝜀 , 𝑑𝑑𝜀𝜀𝜀𝜀 � = 𝐼𝐼𝜀𝜀𝑆𝑆 , where 𝐼𝐼𝜀𝜀𝑆𝑆 represents the impact of the spear phishing attack, considering factors such as -the number of compromised systems, the value of stolen data or unauthorized access obtained, the potential damage to the victim’s reputation and finances. Thus, for a specific victim 𝑡𝑡𝜀𝜀𝜀𝜀 targeted by attacker 𝑎𝑎𝜀𝜀𝜀𝜀 : 𝑓𝑓𝜀𝜀𝐼𝐼𝐶𝐶 (𝑎𝑎𝜀𝜀𝜀𝜀 , 𝑡𝑡𝜀𝜀𝜀𝜀 ) = 𝑖𝑖𝜀𝜀𝜀𝜀 the attacker gathers information about the target; 𝑓𝑓𝜀𝜀𝑀𝑀𝐶𝐶 (𝑖𝑖𝜀𝜀𝜀𝜀 , 𝑡𝑡𝜀𝜀𝜀𝜀 ) = 𝑚𝑚𝜀𝜀𝜀𝜀 the attacker crafts a personalized message for the target; 𝑓𝑓𝜀𝜀𝑀𝑀𝑆𝑆 (𝑚𝑚𝜀𝜀𝜀𝜀 , 𝑡𝑡𝜀𝜀𝜀𝜀 ) = 𝑡𝑡𝜀𝜀 𝑚𝑚 𝜀𝜀 the message is sent to the target; 𝜀𝜀 𝑟𝑟 𝑓𝑓𝜀𝜀𝑀𝑀𝐷𝐷 (𝑚𝑚𝜀𝜀𝜀𝜀 , 𝑡𝑡𝜀𝜀𝜀𝜀 ) = 𝑟𝑟𝜀𝜀𝜀𝜀 the target activates the malware from the message; 𝑓𝑓𝜀𝜀𝑆𝑆 𝐶𝐶 (𝑟𝑟𝜀𝜀𝜀𝜀 , 𝑡𝑡𝜀𝜀𝜀𝜀 ) = 𝑠𝑠𝜀𝜀𝜀𝜀 the malware compromises the target’s system. 𝑓𝑓𝜀𝜀𝐷𝐷𝑇𝑇 (𝑎𝑎𝜀𝜀𝜀𝜀 , 𝑠𝑠𝜀𝜀𝜀𝜀 ) = 𝑑𝑑𝜀𝜀𝜀𝜀 the attacker steals data from the compromised system. 3.3. Spam mail attack model In order to develop spam mail attack model, let us focus on the key components of the attack: 1. Spam mail attack model email list acquisition. Attackers often obtain lists of email addresses through various means, including data breaches, purchasing lists from underground markets, or using web scrapers to collect publicly available addresses. This list serves as the target pool for the spam campaign. 2. Message crafting. Spam emails can vary widely in content, from promotional offers and phishing attempts to scams and malicious links. Attackers may create enticing subject lines to increase open rates, often using urgency or enticing offers (e.g., "Limited Time Offer!" or "You've Won a Prize!") to lure victims. 3. Distribution methods. Emails can be sent using various methods, including botnets, bulk email services, or compromised accounts. Botnets, which are networks of infected computers, are often employed to distribute spam more efficiently and evade detection. 4. Call to action. The emails typically include a call to action, encouraging recipients to click on a link, enter personal information, or download attachments. Links may lead to phishing sites designed to capture sensitive information or malicious downloads that infect the user’s system with malware. 5. Malware delivery. Some spam emails contain attachments that, when opened, install malware on the recipient's device. This can include ransomware, spyware, or adware, leading to further exploitation of the victim's data. Infected systems may be used for further spam distribution, creating a cycle of infection. 6. Tracking and analytics. Attackers often implement tracking mechanisms to measure the effectiveness of their campaigns, such as monitoring open rates, click-through rates, and conversions. This information helps refine future spam campaigns and target more effectively. Consequences of spam mail attacks are to be added to the modes as well: 1. Information Theft. Users who fall for phishing scams may inadvertently provide personal data, leading to identity theft or unauthorized financial transactions. 2. Malware Infection. Clicking on links or downloading attachments can lead to malware infections, compromising the victim’s system and possibly leading to network breaches in organizational settings. 3. Resource Drain. The sheer volume of spam can overwhelm email systems, causing legitimate emails to be lost or delayed. This can lead to reduced productivity for individuals and organizations alike. 4. Reputation Damage. If a user's account is compromised due to spam, it may be used to send further spam, damaging the sender's reputation and leading to blacklisting. Spam mail attacks model has to include the set of countermeasures. To mitigate the risks associated with spam mail, individuals and organizations can adopt the following strategies: 1. Email filtering. Implement spam filters and email security solutions to block unwanted emails before they reach users' inboxes. 2. User education. Educate users about recognizing spam and phishing attempts, including common signs like poor grammar, generic greetings, and suspicious links. 3. Avoiding unsubscribe links. Encourage users not to click unsubscribe links in spam emails, as they may confirm to the sender that the email address is active, leading to more spam. 4. Use of strong security practices. Utilize strong passwords and enable two-factor authentication to protect email accounts from being compromised. 5. Regular software updates. Keep operating systems, antivirus software, and applications updated to protect against vulnerabilities that could be exploited by malware delivered via spam. Let us present the model of the spear spam mail attack as the tuple: 𝑀𝑀𝜖𝜖 = ⟨𝐴𝐴𝜖𝜖 , 𝑇𝑇𝜖𝜖 , 𝐼𝐼𝜖𝜖 , 𝑀𝑀𝜖𝜖 , 𝑅𝑅𝜖𝜖 , 𝑆𝑆𝜖𝜖 , 𝐷𝐷𝜖𝜖 ⟩, (3) where 𝐴𝐴𝜖𝜖 = {𝑎𝑎𝜖𝜖1 , 𝑎𝑎𝜖𝜖2 , … , 𝑎𝑎𝜖𝜖𝜖𝜖𝐴𝐴𝜖𝜖 } is the set that represents the individuals or groups sending spam emails, 𝑁𝑁𝐴𝐴𝜖𝜖 – number of individuals; 𝑅𝑅𝜖𝜖 = {𝑟𝑟𝜖𝜖1 , 𝑟𝑟𝜖𝜖2 , … , 𝑟𝑟𝜖𝜖𝜖𝜖𝑅𝑅𝜖𝜖 } is the set that represents the potential victims who receive spam emails, – number of potential victims; 𝐸𝐸𝜖𝜖 = {𝑒𝑒𝜖𝜖1 , 𝑒𝑒𝜖𝜖2 , … , 𝑒𝑒𝜖𝜖𝜖𝜖𝐸𝐸𝜖𝜖 } is the set that represents the spam emails sent out by attackers, 𝑁𝑁𝐸𝐸𝜖𝜖 – number of spam emails; 𝑀𝑀𝜖𝜖 = {𝑚𝑚𝜖𝜖1 , 𝑚𝑚𝜖𝜖2 , … , 𝑚𝑚𝜖𝜖𝜖𝜖𝐸𝐸𝜖𝜖 } is the set that represents the malicious software that may be included in the spam emails, 𝑁𝑁𝐸𝐸𝜖𝜖 – number of malicious software; 𝑇𝑇𝜖𝜖 = {𝑡𝑡𝜖𝜖1 , 𝑡𝑡𝜖𝜖2 , … , 𝑡𝑡𝜖𝜖𝜖𝜖𝑇𝑇𝜖𝜖 } is the set that represents the tracking data collected by attackers to measure the success of their spam campaigns, 𝑁𝑁𝑇𝑇𝜖𝜖 – number of data collected by attackers; 𝐶𝐶𝜖𝜖 = {𝑐𝑐𝜖𝜖1 , 𝑐𝑐𝜖𝜖2 , … , 𝑐𝑐𝜖𝜖𝜖𝜖𝐶𝐶𝜖𝜖 } is the set that represents the potential consequences for the recipients of spam emails, 𝑁𝑁𝐶𝐶𝜖𝜖 - number of potential consequences. Let us define the email distribution function 𝑓𝑓𝜖𝜖𝐸𝐸𝐷𝐷 that describes how attackers send spam emails to recipients, as: 𝑓𝑓𝜖𝜖𝐸𝐸𝐷𝐷 : 𝐴𝐴𝜖𝜖 × 𝐸𝐸𝜖𝜖 → 𝑅𝑅𝜖𝜖 , 𝑓𝑓𝜖𝜖𝐸𝐸𝐷𝐷 (𝑎𝑎𝜖𝜖𝜖𝜖 , 𝑒𝑒𝜖𝜖𝜖𝜖 ) = 𝑟𝑟𝜖𝜖𝜖𝜖 , where the attacker 𝑎𝑎𝜖𝜖𝜖𝜖 sends email 𝑒𝑒𝜖𝜖𝜖𝜖 to recipient 𝑟𝑟𝜖𝜖𝜖𝜖 . Let us define the click function 𝑓𝑓𝜖𝜖 𝐶𝐶that describes how recipients may interact with the spam emails, as: 𝑓𝑓𝜖𝜖 𝐶𝐶: 𝑅𝑅𝜖𝜖 × 𝐸𝐸𝜖𝜖 → 𝑀𝑀𝜖𝜖 , 𝑓𝑓𝜖𝜖 𝐶𝐶 (𝑟𝑟𝜖𝜖𝜖𝜖 , 𝑒𝑒𝜖𝜖𝜖𝜖 ) = 𝑚𝑚𝜖𝜖𝜖𝜖 , where the recipient 𝑟𝑟𝜖𝜖𝜖𝜖 clicks on a link or downloads malware 𝑚𝑚𝜖𝜖𝜖𝜖 from email 𝑒𝑒𝜖𝜖𝜖𝜖 . Let us define the infection function 𝑓𝑓𝜖𝜖 𝐼𝐼 that describes the process of a recipient's system being infected by malware, as: 𝑓𝑓𝜖𝜖 𝐼𝐼: 𝑀𝑀𝜖𝜖 × 𝑅𝑅𝜖𝜖 → 𝐶𝐶𝜖𝜖 ,, 𝑓𝑓𝜖𝜖 𝐼𝐼 (𝑚𝑚𝜖𝜖𝜖𝜖 , 𝑟𝑟𝜖𝜖𝜖𝜖 ) = 𝑐𝑐𝜖𝜖𝜖𝜖 , where the malware 𝑚𝑚𝜖𝜖𝜖𝜖 infects the recipient's system, resulting in consequence 𝑐𝑐𝜖𝜖𝜖𝜖 . Let us define the Tracking Function 𝑓𝑓𝜖𝜖 𝑇𝑇 that describes how attackers track the success of their spam campaign, as: 𝑓𝑓𝜖𝜖 𝑇𝑇: 𝐴𝐴𝜖𝜖 × 𝑅𝑅𝜖𝜖 → 𝑇𝑇𝜖𝜖 , 𝑓𝑓𝜖𝜖 𝑇𝑇 (𝑎𝑎𝜖𝜖𝜖𝜖 , 𝑟𝑟𝜖𝜖𝜖𝜖 ) = 𝑡𝑡𝜖𝜖𝜖𝜖 , where the attacker 𝑎𝑎𝜖𝜖𝜖𝜖 collects tracking data 𝑡𝑡𝜖𝜖𝜖𝜖 based on recipient’s interaction 𝑟𝑟𝜖𝜖𝜖𝜖 with spam. The overall impact of a spam mail attack can be quantified based on the number of recipients affected, the amount of malware delivered, and the potential damage caused. Thus, the impact function 𝑔𝑔𝜖𝜖 can be presented as: 𝑔𝑔𝜖𝜖 : 𝐴𝐴𝜖𝜖 × 𝑅𝑅𝜖𝜖 × 𝐸𝐸𝜖𝜖 × 𝑀𝑀𝜖𝜖 → ℝ, 𝑔𝑔𝜖𝜖 �𝑎𝑎𝜖𝜖𝜖𝜖 , 𝑟𝑟𝜖𝜖𝜖𝜖 , 𝑒𝑒𝜖𝜖𝜖𝜖 , 𝑚𝑚𝜖𝜖𝜖𝜖 � = 𝐼𝐼𝜀𝜀𝑆𝑆 , where 𝐼𝐼𝜀𝜀𝑆𝑆 represents the impact of the spam mail attack, considering factors such as the number of systems infected, the volume of personal data compromised, the cost associated with the attack, including system recovery and reputation damage. Thus, for a specific victim 𝑢𝑢𝜀𝜀𝜀𝜀 targeted by attacker 𝑎𝑎𝜀𝜀𝜀𝜀 : 𝑓𝑓𝜀𝜀𝐸𝐸𝐷𝐷 (𝑎𝑎𝜀𝜀𝜀𝜀 , 𝑒𝑒𝜀𝜀𝜀𝜀 ) = 𝑟𝑟𝜀𝜀𝜀𝜀 the attacker sends spam email 𝑒𝑒𝜀𝜀𝜀𝜀 to recipient 𝑟𝑟𝜀𝜀𝜀𝜀 . 𝑓𝑓𝐶𝐶 (𝑟𝑟𝜀𝜀𝜀𝜀 , 𝑒𝑒𝜀𝜀𝜀𝜀 ) = 𝑚𝑚𝜀𝜀𝜀𝜀 the recipient clicks on a link or downloads malware 𝑚𝑚𝜀𝜀𝑞𝑞 from email 𝑒𝑒𝜀𝜀𝜀𝜀 . 𝑓𝑓𝐼𝐼 (𝑚𝑚𝜀𝜀𝜀𝜀 , 𝑟𝑟𝜀𝜀𝜀𝜀 ) = 𝑐𝑐𝜀𝜀𝜀𝜀 : the malware 𝑚𝑚𝜀𝜀𝜀𝜀 infects the recipient's system, leading to consequence 𝑐𝑐𝜀𝜀𝜀𝜀 . 𝑓𝑓𝑇𝑇 (𝑎𝑎𝜀𝜀𝜀𝜀 , 𝑟𝑟𝜀𝜀𝜀𝜀 ) = 𝑡𝑡𝜀𝜀𝜀𝜀 the attacker collects tracking data 𝑡𝑡𝜀𝜀𝜀𝜀 based on recipient 𝑟𝑟𝜀𝜀𝜀𝜀 interaction with the spam. 4. Experiments 4.1. Title information To assess the effectiveness of the developed models the BotGRABBER framework was employed. It is a security tool designed to enhance network resilience against cyberattacks. The system leverages machine learning. One of the key aspects of BotGRABBER is its ability to perform self-adaptive security actions. Additionally, framework is designed to integrate seamlessly with various machine learning algorithms to refine its detection capabilities, ensuring efficient performance in complex network environments [8]. To conduct an experiment for detecting social engineering attacks (such as spam mail, spear phishing, and trojan mail attacks), the setup involves specific hardware, network configurations, and security settings. A controlled email server with logging enabled, such as Microsoft Exchange, to capture emails and analyze metadata, as well as the simulated mailboxes to receive test spam, phishing, and trojan emails were used. For detection automation, libraries such as scikit-learn or TensorFlow to process email features like sender, subject line, body text, and links for patterns indicative of social engineering were employed. An isolated network to avoid real-world impact if malware is executed during testing. Use virtual machines or a controlled subnet within a Virtual Private Cloud (VPC) was set up. Snort tool as IDS/IPS to monitor network traffic for anomalies that align with phishing and trojan attacks was used. For trojan attack detection, a sandbox environment (Cuckoo Sandbox) that safely opens and monitors email attachments to identify potentially malicious behavior without compromising real systems was incorporated. A labeled dataset with examples of spam, spear-phishing, and trojan mails was created. To do this an open-source datasets [26, 27, 28] for training and testing were used. Detailed logging on email servers and network devices to capture metadata (e.g., headers, sender IPs, attachment details) were enable. Traffic logs were stored in a centralized logging system for analysis. Features such as the frequency of certain keywords, unusual sender addresses, mismatched domain names, attachment types, and user interaction patterns were extracted. Machine learning models (random forest, decision tree, K-nearest neighbor, and XGBoost) on both legitimate and malicious email datasets to classify emails based on phishing indicators were trained [29]. Results for three types of attacks are presented in Figures 1-3. The experiment tested algorithms including random forest, decision tree, K-nearest neighbor, and XGBoost, analyzing host network data that can signal a potential social engineering attack. The empirical findings showed a high detection accuracy of about 99%, alongside a false positive rate near 6%. Thus, the implementation of the developed models for social engineering attacks detection has demonstrated high detection potential. spear phishing attack detection 1 0,9999 0,9998 Accuracy 0,9997 Precision 0,9996 Recall 0,9995 F1 score AUC 0,9994 0,9993 0,9992 RF DT kNN XGBoost Figure 1: Spear phishing attack detection results. Trojan mail detection 1 0,9998 Accuracy 0,9996 Precision Recall 0,9994 F1 score AUC 0,9992 0,999 0,9988 RF DT kNN XGBoost Figure 2: Trojan mail attack detection results. Spam mail attack detection 0,9999 0,9997 0,9995 Accuracy 0,9993 Precision Recall 0,9991 F1 score 0,9989 AUC 0,9987 0,9985 RF DT kNN XGBoost Figure 3: Spam mail attack detection results. 5. Conclusions The research developed specialized models for detecting social engineering attacks, focusing on spam mail, spear phishing, and trojan mail. Each model captures unique characteristics of these attacks through a series of machine learning-based detection processes. Developed models analyze features such as email metadata, user interaction patterns, attachment behaviors, and network anomalies to distinguish malicious activity from legitimate communication. The trojan mail model emphasizes the identification of embedded malware within email attachments, employing sandbox environments for controlled testing and analysis of attachment behaviors. The spear phishing model, in contrast, focuses on personalization tactics, using sender recognition and link analysis to detect contextually suspicious patterns. The spam mail model prioritizes content filtering and call-to-action tracking to differentiate legitimate communication from mass-distributed spam. The empirical findings validate the models' robustness, achieving approximately 99% accuracy in detection while maintaining a 6% false positive rate. This high detection performance illustrates the potential of our models to support a proactive defense framework against evolving social engineering threats. By leveraging specific feature sets and adaptive machine learning algorithms, these models can be effectively implemented in real-world scenarios to protect networks and systems from a wide range of social engineering attacks. Future work may explore hybrid models, advanced behavioral analytics, and real-time detection capabilities to further enhance resilience against increasingly sophisticated attacks. The future development of these models may explore the combining these individual models to create a more unified system capable of detecting multiple attack types simultaneously, improving the adaptability of the detection framework; integrating behavioral profiling to understand normal user behavior and identify deviations that may signal an attack. Declaration on Generative AI During the preparation of this work, the authors used Grammarly in order to: grammar and spelling check; DeepL Translate in order to: some phrases translation into English. After using these tools/services, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content. References [1] F. Huseynov, B. Ozdenizci Kose. Using machine learning algorithms to predict individuals’ tendency to be victim of social engineering attacks. Information Development, 2024, 40.2: 298- 318. [2] V. Kolluri. Revolutionary research on the ai sentry: an approach to overcome social engineering attacks using machine intelligence. International Journal of Creative Research Thoughts (IJCRT), ISSN, 2320-2882. [3] T. Rathod et al. A comprehensive survey on social engineering attacks, countermeasures, case study, and research challenges. Information Processing & Management, 2025, 62.1: 103928. [4] A. Sharma. Natural Language Processing for Cybersecurity: Detecting and Mitigating Social Engineering Attacks. International Meridian Journal, 2024, 6.6. [5] O. Pomorova, O. Savenko, S. Lysenko, A. Kryshchuk, A. Nicheporuk. A Technique for detection of bots which are using polymorphic code. Communications in Computer and Information Science, 2014, vol. 431. PP.265-276. [6] O.Savenko, S. Lysenko, A. Kryschuk. Multi-agent based approach of botnet detection in computer systems. In: Kwiecień, A., Gaj, P., Stera, P. (eds.) CN 2012. CCIS, vol. 291, pp. 171–180. Springer, Heidelberg (2012). Doi:10.1007/978-3-642-31217-5_19. [7] S. Lysenko, O. Pomorova, O. Savenko, A. Kryshchuk, K. Bobrovnikova. DNS-based Anti-evasion Technique for Botnets Detection. Proceedings of the 8-th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, Warsaw (Poland), September 24–26, 2015. Warsaw, 2015. Pp. 453–458. [8] O. Pomorova, O. Savenko, S. Lysenko, A. Kryshchuk, K. Bobrovnikova Anti-evasion Technique for the Botnets Detection Based on the Passive DNS Monitoring and Active DNS Probing. Communications in Computer and Information Science. (2016) 608. 83-95. [9] A. Naz, M.Sarwar, M. Kaleem, M. A. Mushtaq, S. Rashid, (2024). A comprehensive survey on social engineering-based attacks on social networks. [10] S. Gupta et al. A Comprehensive Analysis of Social Engineering Attacks: From Phishing to Prevention-Tools, Techniques and Strategies. In: 2024 Second International Conference on Intelligent Cyber Physical Systems and Internet of Things (ICoICI). IEEE, 2024. p. 1-8. [11] N. Akyeşilmen, A. Alhosban. Non-Technical Cyber-Attacks and International Cybersecurity: The Case of Social Engineering. Gaziantep University Journal of Social Sciences, (2024) 23.1 342- 360. [12] Cheng-Ying Yang, Chun-Yi Shih, Chou-Chen Yang, Min-Shiang Hwang, Freeze-Phish: An ANN Based Phishing Detection System, in International Journal of Network Security, 2023/09 893- 898. doi:10.6633/IJNS.202309_25(5).19. [13] A. Odeh, I. Keshta, E. Abdelfattah, PhiBoost- A novel phishing detection model Using Adaptive Boosting approach, Jordanian Journal of Computers and Information Technology (2021). doi: 10.5455/jjcit.71-1600061738. [14] A.K.S. Sekar, S.S. Kumar, S. Sampath, U.T. Kumar, V. Vignesh, Phishing website clone detection using machine learning rules with cryptography technique, International Journal of Gender, Science and Technology, 13 (2024). [15] I. Tosin, C. Kiekintveld, Aritran Piplai, Deep Learning-Based Speech and Vision Synthesis to Improve Phishing Attack Detection through a Multi-layer Adaptive Framework, Computer Science (2024). doi: 10.48550/arXiv.2402.17249. [16] Abdulla Al-Subaiey, Mohammed Al-Thani, Naser Abdullah Alam, Kaniz Fatema Antora, Amith Khandakar, Novel interpretable and robust web-based AI platform for phishing email detection, in Computers and Electrical Engineering, 2024, Volume 120, Part A, doi:10.1016/j.compeleceng.2024.109625. [17] N. W. Peace, A framework for securing email entrances and mitigating phishing impersonation attacks, Computer Science (2023). doi: 10.5121/ijnsa.2023.15602. [18] V. Kolluri. Revolutionary research on the ai sentry: an approach to overcome social engineering attacks using machine intelligence. International Journal of Creative Research Thoughts (IJCRT), ISSN, 2320-2882. [19] R.J. van Geest , Cascavilla G., Hulstijn J., Zannone N., “The applicability of a hybrid framework for automated phishing detection”, Computers & Security, 2024, doi:1016/j.cose.2024.103736. [20] S. R. Janani, et al. Detection of Phishing Page Using Machine Learning and Response HTML. In: International Conference on Communications and Cyber Physical Engineering 2018. Singapore: Springer Nature Singapore, 2024. p. 499-508. [21] C.M. Lai, M.H. Chen, E. Kristiani, V.K. Verma, C.T. Yang, Fake news classification based on content level features, Applied Sciences, (2022) 12(3) 1116. [22] A. Maen, Al-S. Jamil, Cyber-Phishing Website Detection Using Fuzzy Rule Interpolation in Cryptography (2022) 6(2) 4. doi:10.3390/cryptography6020024. [23] V. Bharath, , et al. Introduction to Social Engineering: The Human Element of Hacking. In: Social Engineering in Cybersecurity. CRC Press, 2024. p. 1-25. [24] S.Vaishnavi, T. Sethukarasi. Detection and Avoidance of Clone Attack in IoT Based Smart Health Application, in Intelligent Automation & Soft Computing, 2022, 31(3):1919-1937 [25] K. Takashi, F. Naoki, N. Hiroki, C. Daiki, Detecting Phishing Sites Using ChatGPT in Computer Science, 2023. v1 , doi: 10.48550/arXiv.2306.05816 [26] M. Lansley, F. Mouton, S. Kapetanakis, N. Polatidis, SEADer++: social engineering attack detection in online environments using machine learning, J. Inf. Telecommun., 4(3) (2020) 346– 362. doi: 10.1080/24751839.2020.1747001. [27] A. A. Akinyelu, A. O. Adewumi, Classification of phishing email using rand om forest machine learning technique, J. Appl. Math., vol. 2014, 2014, doi: 10.1155/2014/425731. [28] A.El Aassal, L. Moraes, S. Baki, A. Das, R. Verma, Anti -Phishing Pilot, in ACM IWSPA 2018 Evaluating Performance with New Metrics for Unbalanced Datasets, pp. 21–24, 2018. [29] O. Revniuk, A. Postoliuk, Research on the application of adaptive risk assessment methods for web applications, Computer Systems and Information Technologies, 2024 (3), 34–43. https://doi.org/10.31891/csit-2024-3-5.