User-agent as a Cyber Intrusion Artifact: Detection of APT Activity using minimal Anomalies on the User-agent String Traffic Badr-Eddine Bouhlal1,* , Tim Sonnekalb1 , Bernd Gruner1 and Clemens-Alexander Brust1 1 German Aerospace Center (DLR), Institute of Data Science Jena, Germany Abstract The detection of attacks, especially persistent intrusions, relies on a combination of various artifacts. Despite being manipulable, the user-agent string, a component of HTTP headers, has proven to be a tool for triggering alerts, thereby enhancing detection capabilities. In this paper, we perform a review and analysis of existing malicious user agent strings. We gather relevant data from different sources of threat intelligence and present a dataset of user-agent strings associated with malicious activities gathered from real incident reports. We also propose a categorization of existing user-agent string anomalies with respect to their type (e.g., syntax) and their complexity degree. Keywords User-agent string (UAS), Advanced persistent threat (APT), Intrusion detection, Machine learning 1. Introduction Intrusion detection is based on a multi-factor identification method that relies on multiple indicators. One of these indicators can be the HTTP User-Agent String (UAS) component. The initial function of the UAS is to allow the server to identify the request sender and disclose technical information such as the operating system and browser version. However, the UAS presents a considerable vulnerability in the web world because it is easily manipulated, making it a tool used to carry out code injection attacks. Nevertheless, in the context of a private and highly secure corporate network, the UAS can be used in addition to identifying the request sender as one of the intrusion detection factors. The Cybersecurity and Infrastructure Agency (CISA) released a report in 2022 [1] containing recommendations that aim at helping organizations mitigating against advanced persistent threats (APT). The report recommend checking for anomalies that may be observed in UAS. These anomalies may directly affect the syntax of the UAS or may be related to other anomalies * Corresponding author. $ badr-eddine.bouhlal@dlr.de (B. Bouhlal); Tim.Sonnekalb@dlr.de (T. Sonnekalb); Bernd.Gruner@dlr.de (B. Gruner); clemens-alexander.brust@dlr.de (C. Brust)  0009-0002-2860-3068 (B. Bouhlal); 0000-0002-0067-1790 (T. Sonnekalb); 0000-0002-4177-2993 (B. Gruner); 0000-0001-5419-1998 (C. Brust) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) S. Böhm and D. Lübke (Eds.): 16th ZEUS Workshop, ZEUS 2024, Ulm, Germany, 29 February–1 March 2024, published at http://ceur-ws.org CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Bouhlal et al.: Detection of APT Activity using minimal Anomalies on the User-agent String Traffic that can be detected by using multiple identification factors, such as multiple authentication attempts with different UASs from the same IP addresses [1]. An anomaly may appear in several forms, and a malformed UAS can have multiple explana- tions: indicate the use of a vulnerability scanning tool (T1595 - Active Scanning) [2, 3], or be the sign of data exfiltration in the form of a legitimate string. It can also serve as a communication channel between malware and a command and control server (command and control attack). Communication via the application layer protocol is associated with several techniques used by APT groups attempting to exploit vulnerabilities in the HTTP/HTTPS protocol [4]. Challenges. The UAS does not follow a general format. While RFC 7231 [5] defines a general format for the presentation of the UAS many benign applications do not adhere to it [6]. This variability in UAS representation makes it difficult to conceive universal classification rules that remain effective over time. This situation presents significant challenges in differentiating legitimate UAS from malicious ones, as both representations initially consist of sequences of characters, numbers, and special characters. Another issue that arises when considering automation is determining the impact of irregular and sometimes anomalous user-agent that may not necessarily represent a threat which may increase the number of false alarms. To the best of our knowledge, no dataset containing user-agent strings associated with malicious activity and presenting a detection artifact has been published. In addition, all existing studies concerning the detection of malicious traffic from user agents are mainly based on analysis of the network traffic, due to the absence of a malicious user-agent set. We summarize the key contributions of this paper as follows:(1) We introduce a dataset that consists of a collection of 1063 malicious UASs, which is publicly available under a CC-BY 4.0 License [7], (2) we perform a review and analysis of existing malicious UASs, and (3) we propose a categorization of the different UAS anomalies. 2. Related Work In this section, we will present a series of studies that have focused on the user-agent string. Almost all of these studies aim to develop methods for distinguishing regular user-agent strings from malicious ones. Their approaches focus on developing parsing methods to make this dis- tinction. However, we have noticed that they use data that cannot be directly linked to malicious traffic (a public database for this purpose does not exist, as far as we know). Furthermore, they focus mainly on syntax errors and do not consider cases of rarity. This rarity can hide minimal anomalies that are difficult to distinguish using a syntax parser, such as fake information, e.g., the pattern of the user-agent is correct, but the version of the browser used is fake. Zhang et al. [6] examined the UASs in malicious traffic, specifically malware. The authors found that one out of every eight instances of malware traffic contained suspicious user-agent in at least one of their HTTP requests. Currently, user-agent are still being analyzed manually. However, the analysis showed that there are multiple patterns that could be used to automatically classify user-agent anomalies. They also propose an automated technique for extracting user-agent anomalies and creating signatures for malware detection. Kheir [8] applied a rule-based methodology using regular expressions for distinguishing abnormal and normal user-agent based on the fact that user-agent have a general and fixed 64 Bouhlal et al.: Detection of APT Activity using minimal Anomalies on the User-agent String Traffic structure that enables their validation using regular expressions. The data used during this research was collected over two months from real traffic and contained 150 billion user-agent. Zhang et al. [6] stated that the causes of anomalies could be due to two factors, namely a malfunction during the encoding decoding of the user-agent or a malicious activity. Zhang et al. [9] used a method that combines several steps to classify user-agent: firstly, it uses a parser based on a context-free grammar to classify them based on their representation, a standard UAS, a non-standard representation for non-standard UAS, and finally, for unrecognized representations. Then, the authors propose using an anomaly detection algorithm to separate benign UAS from malicious ones. Their study compared the Context-Free Grammar (CFG) with the User Agent parser based on regular expressions. They showed that the CFG is better suited for analyzing user-agent traffic due to its ease of adaptation and simplicity of comprehension. Nandakumar et al. [10] presented a novel method of parsing the user-agent strings based on Multi-Headed Attention mechanism using transformer, the method is divided into two-step, first parse the UAS to gather the information related to the device and software, then correlate the extracted information with known related Common Vulnerabilities and Exposures (CVE). 3. Illustrative Incidents: Malicious User-Agent Case Studies Cyber-attacks, especially those carried out for espionage purposes, are designed to persist without being detected in the network. Performed by well-trained teams (APTs) using complex methods and sometimes over several well-planned stages, these attacks are not necessarily easy to analyze and sometimes difficult to determine their consequences at first glance. However, every potential indicator of malicious activity on the network must be carefully analysed and considered. These indicators, also known as network artifacts, can vary from an IP address, an URI pattern or an UAS that has not previously been observed in a defined network environment, or one that appears to be out of the ordinary [11]. In this section, we provide a concise overview of some incident reports from cyber-attack campaigns, specifically focusing on cases where the UAS field deviates from the norm. This divergence serves as a crucial factor in uncovering potential threats within the network. We perform this review of real incident to be able to analyse the type of anomalies that can be considered as artifacts of detection within the UAS (cf. Subsection 3.2 ), and also to perform a categorization of them (cf. Subsection 4.2 ). 3.1. Real-life Incidents Targeted Phishing Exploits Impacting Japanese and Taiwanese Organizations [12, 13]. APT groups have launched a mail fishing campaign with malicious word attachments targeting governmental organizations, finance, media, and high-tech sectors in Japan and Taiwan. The attack exploits a Microsoft Office EPS vulnerability CVE-2015-1701 [12], the exploit payload releases a binary, that includes an embedded sample of the IRONHALO malware. IRONHALO uses the HTTP protocol to fetch the payload from a command-and-control (C&C) server with hard-coded settings and a specific Uniform Resource Locator (URL) path [13]. This malware variant sends an HTTP request to a legitimate Japanese site with a malformed UAS with a syntax error: missing space between the different components of the UAS as shown in Figure 1a. 65 Bouhlal et al.: Detection of APT Activity using minimal Anomalies on the User-agent String Traffic Registry Keys: GET /syougyou/images/index.php HTTP/1.1 HKLM\System\CurrentControlSet\Services\bmwappushservice Accept: */* URLs: User-Agent: Mozilla/4.0(compatible;MSIE 8.0;Windows NT 6.1) https//is-cdn.edge.g18.dyn.usr-e12-as.akamaitechnology[.Jcom/deploy/assets/css/main/style.min.css http://a17-h16.911.iad17.as.pht-external.c15.qoldenlines]. Jnet/deploy/assets/css/main/style.min.css Connection: Keep-Alive HTTP artifacts: Host: www.[.]com “User-Agent : XXXXXXXXXXXXXXXXX/5.0 (Windows NT 6.1 WOW64; Trident/7.0; AS; rv:11.0) like Gecko" Cache-Control: no-cache "Proxy-Authorization : Basic [Data]" ~ [Data] Will contain the TDTESS encrypted data to send (a) IRONHALO HTTP GET request [13] (b) CoppyKittens TDTESS indicators of compromise [14] Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0 ‘Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36 ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.1 Safari/605.1.15 Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7162; Pro’ Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7166; Pro) Microsoft Office/14.0 (Windows NT 6.1; Microsoft Outlook 14.0.7143; Pro)’ Microsoft Office/15.0 (Windows NT 6.1; Microsoft Outlook 15.0.4605; Pro) (c) Fancy Bear (APT28) user-agent strings [15] Figure 1: Anomalies remarked on the UASs based on real incident related to APT Traffic CopyKittens [14]. A cyber espionage group that mainly targets strategic organizations such as governmental organizations (defense companies, research institutions, Ministry of defense, and large IT companies), using self-developed tools that are not necessarily publicly reported. The methods of attack are complex and varied. The report [14] describes the intrusion methodology and also a set of used malware’s, for example, the TDTESS which is a 64-bit . NET binary backdoor that communicates regularly with the command and control server, using basic authentication to receive new instructions. The incident analysis report presents various Indicators of compromise, among which is a malformed UAS [14], as shown in Figure 1b. Russian GRU Conducting Global Brute Force Campaign to Compromise Enterprise and Cloud Environments [15]. Between mid-2019 and 2021, the Russian General Staff Main Intelligence Directorate (GRU) 85th Main Special Service Center (GTsSS), also known as Fancy Bear or APT28, used the Kubernetes cluster to launch large-scale anonymous brute force access attacks against government and private sector organizations, exploiting the CVE 2020-0688 and CVE 2020-17144 vulnerabilities in Microsoft Exchange, they used several protocols including HTTP. The campaign report publicly released by the NSA encompasses a detailed description of the techniques and tactics used as well as mitigation and detection methods including IP address lists and UASs, which "are crafted to appear consistent with those sent by legitimate client software. Some of the UASs delivered in the authentication requests are incomplete or truncated versions of legitimate UASs, offering the following unique detection opportunities [15]" (cf. Figure 1c) . Bumblebee loader [16, 17]. A malware, employed in various campaigns by numerous threat actors, uses the Windows Management Instrumentation (WMI) framework to extract system details. Then, it establishes a connection with the C&C server at intervals of 25 seconds to receive commands to be execued. Logpoint [16] proposes two detection techniques within the proxy log files. Malware detection in the proxy log might be done either using the user agent and/or the URI. Hence, any existence of a user agent that matches the string bumblebee is 66 Bouhlal et al.: Detection of APT Activity using minimal Anomalies on the User-agent String Traffic a strong sign of the persistence of this malware. Other versions of the malware uses different evasion techniques and might change the UAS. An undefined UAS with the same number of digits should also be suspicious and might refer to the persistence of bumblebee [16, 17]. LYCEUM middle east campaign [18]. An APT group that targets organizations in strategic sectors, including oil and gas. Campaigns were reported in South Africa in 2018 and in the Middle East in 2019. The group primarily uses password-sparing or brute-force attacks to gain access to an organization to obtain credentials. Then, using compromised accounts, they send spearphishing emails containing malicious attached Excel files containing the Danbot malware. Danbot malware is a first-stage access trojan (RAT) that uses DNS and the HTTP protocol for communication. The Danbot HTTP request contains two anomalies: An ampersand (&) after operating system values in the UAS (Mozilla/5.0 (Windows NT 10.0; &) Gecko/20100101 Firefox/64.0), and a misspelling of ’Encoding’ in the accept- encoding header [18]. Quasar: Open-Source Remote Administration Tool [19]. Is a RAT open source used for Windows Operating systems, hosted publicly on GitHub, specially dedicated for being used for legitimate purposes. In addition, various APT threat groups are using Quasar to conduct cyber espionage campaigns. Quasar enables remote control, keylogging, file transfer and enables the user to collect information about the host system. During the client connection’s setup, the client tries to determine its geolocation, including its Wide Area Network (WAN) IP address. This is achieved by sending an HTTP GET request to the Uniform Resource Locator (URL) ip-api[.]com/json/ with the following User-Agent string: Mozilla/5.0 (Windows NT 6.3; rv:48.0) Gecko/20100101 Firefox/48.0. "This User-Agent string mimics a Mozilla Firefox 48 browser running on Windows 8.1. This User-Agent string would likely stand out as unique in a corporate network environment, and its presence could be a high-confidence indication of Quasar activity" as stated by the Cybersecurity and Agency [19]. 3.2. Analysis of the Incidents By analyzing the various real incidents presented in the previous section, we note that the irregularities in the HTTP traffic (in our case, UASs), represents especially syntax errors or rarity of occurrence, represent a serious sign of a potential threat. Indeed, UAS patterns vary significantly and do not necessarily follow a general structure that could be generalized to all device types (software, hardware). However, some syntax anomalies could represent a "red flag" and should be carefully examined. These syntax errors vary from grave errors, where the user-agent does not match any pattern of a correct UAS, to non-defined strings, as in the case of Bumblebee, or the TeamTNt group, where they use the UAS field to execute an injection code operation curl –referer $REFERER –user-agent TNTcurl $CURLPARA $GETFROM -o $PUTITTO [20]. Another example of using the UAS as a tool of code injection is the recent exploit of the Log4j vulnerability by the APT35 [21]. In addition we mention also syntax errors such as misspellings, absences, or even the existence of strings that should not exist in the correct UAS (as in the cases of CopyKittens and LYCEUM). Another category of anomalies concerns not the syntax of the UAS, where it may be completely accurate, but the existence of a rare UAS, that can be suspicious especially in corporate networks. The detection of such anomalies depends on the analysis of the overall traffic, whereas a simple 67 Bouhlal et al.: Detection of APT Activity using minimal Anomalies on the User-agent String Traffic analysis of the syntax would not allow the detection, as in the case of Quasar malware and Fancy Bear (APT28). A rule-based detection method based on syntax would not be useful for the detection of fake UASs. The latter may take the form of a syntactically correct UAS. However, with a fake browser version, for example, the Metamorfo [22] malware uses a UAS with Mozilla/3. 0, the existence of such a version 3.0 must be suspicious. 4. Dataset Construction In this section we present the construction process of our dataset of UASs associated with malicious traffic, focusing on attacks related to APTs. An essential part of building this dataset is categorizing anomalies in UASs. 4.1. Data Sources and Data Collection For creating the dataset, we relied on different sources. Firstly, security reports on real incidents (cf. Subsection 3.1) which are collected by reputable organizations such as MITRE ATT@CK [23] and the Cybersecurity and Infrastructure Security Agency (CISA) [24]. These reports permit valuable insights into the latest cyber threats, offering detailed information on tactics, techniques, and procedures (TTPs) employed by malicious actors. The second source of our data consisted of contributions published by security professionals and experts in security blogs and community forums, for example, Microsoft [25] and Cisco Talos [26]. Subsequently, the UASs identified as artifacts of malicious activity were obtained from the Sigma open-source rules. These entries are regularly identified and logged as part of blacklists 1 . In addition, the dataset includes data gathered from the open-source project Apache Bad Bot Blocker 2 . 4.2. User-Agent String Anomaly Categorization Our categorization encompasses a wide range of anomalies observed in UASs, based on analyzing anomalies related to real-life incidents, we define the following two main categories. 4.2.1. The type of anomalies The anomaly type, can be used as a detection hint and for this we define three different cases: Syntax errors. These are the syntax anomalies that can be distinctive in a user-agent, for example, the missing space in the case of IRONHALO or the existence of an invalid string part, as is the case with CopyKittens [14] (XXXXXXXXXXXXXXXX/5.0 instead of Mozilla/5.0). In this case, the UAS has a correct pattern (format) but contains syntax errors. Unknown string. In this category, we will classify all UASs that do not contain any pattern or part of a pattern of a correct UAS. These UASs are a collection of random strings (characters, digit or special characters) and may contain the name of the malware itself or any other string like the bumblebee for example. 1 Sigma - Generic Signature Format for SIEM Systems:https://github.com/SigmaHQ/sigma# sigma---generic-signature-format-for-siem-systems 2 Apache ultimate bad bot blocker https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker/tree/master 68 Bouhlal et al.: Detection of APT Activity using minimal Anomalies on the User-agent String Traffic Other. This category contains UASs without syntax anomaly, but are associated with a malware, APTs, or other malicious activities. Intrusion detection from these user agents may be due to their rarity in the traffic. They may stand out as anomalous in network traffic because of their rarity, or simply because their signatures should not exist in the specific network traffic. 4.2.2. The anomaly complexity degree The second classification criterion distinguishes between high anomalies and low anomalies. This criterion is designed to evaluate the expected ability of a machine learning model to detect anomalous UASs at two levels of difficulty: the detection of a user-agent that differs totally syntactically from a normal pattern, and between the detection of a simple syntax anomaly. The two categories are defined as follows: Low anomaly. Low anomaly formats represent user agent strings with minor or subtle deviations from normal (standard) formats. These may be a small syntax error, missing compo- nents, or the presence of unknown elements in an otherwise normal structure, as in the case of IRONHALO, where the anomaly is a missing space. In this category, we also include the UASs previously classified as Other. High anomaly. Formats or patterns involving irregular, unusual, or entirely new structures in user agent strings that deviate significantly from regular formats. These anomalies can include random unknown strings, special characters, or unique identifiers with no correspondence with known, legitimate user agents, like the example of the bumblebee, where the string bumblebee do not represent a legitimate (or a part of) UAS. Another example is: sample (unknown version) CFNetwork/596.5 Darwin/12.5.0 (x86_64) (iMac8%2C1) [27], which includes many syntax errors and deviates systematically from a normal UAS. This UAS exhibits several anomalous characteristics, including elements like sample (unknown version). Additionally, the presence of the identifier (iMac8%2C1) does not correspond to known UAS or a part of a legitimate UAS and the percentage sign is not typically part of a standard UAS. 5. Dataset Description and Usage Description. During data collection, we systematically inspect malicious UASs and collect additional information: The type of anomaly, the degree of complexity, the APT associated with the malicious UAS, if the information is available, the sources (e.g., citations of the resources from where the UAS was gathered), and the string pattern that distinguishes between two cases: if the abnormal UAS matches the entire string (match_string) or if it involves a regular expression (regx). The Regular expression describes a string part that might be included in a syntactically correct UAS but might be a sign of malicious activities. The data is provided in a CSV format, each row represents a UAS with corresponding information. Table 1 depicts an example of the dataset structure and Table 2 shows the distribution of the malicious UASs per category. Usage. The dataset contains two types of UAS entries that must be treated differently: match_string, which represents malicious UASs that can be used directly, and regx, which presents only string parts which must be applied to correct UASs to generate malicious ones, these regular expressions define the potential location of these parts within a correct UAS. In addition to the set of malicious UASs (abnormal), we provide a set of 1000 of the most frequently 69 Bouhlal et al.: Detection of APT Activity using minimal Anomalies on the User-agent String Traffic UAS Anomaly Type Anomaly Complexity Related Apt Ressources String Pattern UAS-1 syntax error low APT34 URL of document/online resources match_string Table 1 Description of the dataset structure Type of anomaly Complexity degree Unknown string Syntax errors Other High anomaly 738 77 - Low anomaly - 34 214 Table 2 Distribution of the collected malicious UASs per category seen user agents during November 2023, parsed by the Whatismybrowser API parser 3 . These 1000 entries will be considered as a reference for normal user agents, containing no syntax anomalies. This allows using the dataset to train/test machine learning models on the detection of malicious UASs. We performed a similarity check between the two sets (normal and abnormal), and we noticed that there are eight common entries. An example entry is the malicious UAS representing the pattern used by the Google’s bot crawler Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html), normally legitimately used for indexing purposes. The explanation for this finding is the fact that some malwares use complicated evasion techniques and they might use the most common and widely used UAS patterns. Moreover, some malwares mimics the UAS of the system on which it is installed to blend into the network traffic and avoid to be detected, as in the case of the FatDuke malware used by APT29, also known as Cozy Bear [28]. Such UASs fall under the category Other, where the UAS does not exhibit any syntax errors. However, the detection artifacts might be due to their rarity of existence within a traffic of legitimate UASs. Another example is the Quasar malware that uses a legitimate UAS that mimics a Mozilla Firefox 48 browser running on Windows 8.1, which could be suspicious, especially in corporate networks [19]. 6. Conclusion and Future Work This study highlights the significant role of the UAS as an indicator of attack detection, and especially persistent intrusions. We present a dataset of 1063 UAS associated with malicious activities, the majority of which were collected manually from real incident reports. These UASs exhibit anomalies in syntax or other aspects, providing a basis for detecting persistent activities within network traffic. As future work, we plan to further extend the dataset and use it as training and test set to evaluate different machine learning models on their capabilities to automatically detect anomalies related to UASs. 3 Whatismybrowser https://www.whatismybrowser.com/ 70 Bouhlal et al.: Detection of APT Activity using minimal Anomalies on the User-agent String Traffic Acknowledgments We extend our sincere gratitude to Whatismybrowser, Raape, Ulrich and Möbius, Max for their invaluable assistance in collecting UASs for this study. References [1] Cybersecurity and Infrastructure Security Agency (CISA), Cybersecurity advisory: Im- packet and exfiltration tool used to steal sensitive information from defense industrial base organization, 2022. URL: https://www.cisa.gov/news-events/cybersecurity-advisories/ aa22-277a, accessed: April 15, 2023. [2] Center for Threat-Informed Defense, Microsoft azure security control mappings to mitre att&ck®, 2023. URL: https://center-for-threat-informed-defense.github.io/ security-stack-mappings/Azure/README.html, accessed: April 20, 2023. [3] M. Attck, Active scanning, 2022. URL: https://attack.mitre.org/techniques/T1595/, accessed: April 20, 2023. [4] M. Attck, Application layer protocol: Web protocols, 2020. URL: https://attack.mitre.org/ versions/v7/techniques/T1071/001/, accessed: April 21, 2023. [5] R. T. Fielding, J. F. Reschke, Hypertext transfer protocol (http/1.1): Semantics and content, 2014. URL: https://www.rfc-editor.org/rfc/rfc7231, accessed: April 11, 2023. [6] Y. Zhang, H. Mekky, Z.-L. Zhang, R. Torres, S. ju Lee, A. Tongaonkar, M. Mellia, Detecting malicious activities with user-agent-based profiles, International Journal of Network Management 25 (2015) 306 – 319. URL: https://api.semanticscholar.org/CorpusID:13959125. [7] B.-E. Bouhlal, Dataset of malicious user-agent strings, 2024. URL: https://doi.org/10.5281/ zenodo.10700806. doi:10.5281/zenodo.10700806. [8] N. Kheir, Behavioral classification and detection of malware through http user agent anomalies, Journal of Information Security and Applications 18 (2013) 2–13. URL: https: //www.sciencedirect.com/science/article/pii/S2214212613000331. doi:https://doi.org/ 10.1016/j.jisa.2013.07.006, sETOP’2012 and FPS’2012 Special Issue. [9] Y. Zhang, H. Mekky, Z.-L. Zhang, R. Torres, S.-J. Lee, A. Tongaonkar, M. Mellia, Detecting malicious activities with user-agent-based profiles, International Journal of Network Management 25 (2015) 306–319. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/nem. 1900. doi:https://doi.org/10.1002/nem.1900. [10] D. Nandakumar, S. Murli, A. Khosla, K. Choi, A. Rahman, D. Walsh, S. Riede, E. Dull, E. Bowen, A novel approach to user agent string parsing for vulnerability analysis using mutli-headed attention, 2023. arXiv:2306.03733. [11] CSNP, Tryhackme - pyramid of pain room, Dec 5, 2022. URL: https://www.csnp.org/post/ tryhackme-pyramid-of-pain-room, accessed: on: [02.01.2024]. [12] R. W. Genwei Jiang, Dan Caselden, The eps awakens, 2015. URL: https: //web.archive.org/web/20170613151054/https://www.fireeye.com/blog/threat-research/ 2015/12/the_eps_awakens.html, accessed: [15.12.2023]. [13] F. T. I. Ryann Winters, The eps awakens - part 2, 2015. URL: https://web.archive. 71 Bouhlal et al.: Detection of APT Activity using minimal Anomalies on the User-agent String Traffic org/web/20151226205946/https://www.fireeye.com/blog/threat-research/2015/12/ the-eps-awakens-part-two.html, accessed: [15.12.2023]. [14] C. C. Security, Operation wilted tulip, 2017. URL: https://www.clearskysec.com/ wp-content/uploads/2017/07/Operation_Wilted_Tulip.pdf, accessed: [15.12.2023]. [15] Cybersecurity, I. S. Agency, Russian gru conducting global brute force cam- paign compromise enterprise and cloud environments, 2021. URL: https: //media.defense.gov/2021/Jul/01/2002753896/-1/-1/1/CSA_GRU_GLOBAL_BRUTE_ FORCE_CAMPAIGN_UOO158036-21.PDF, accessed on [18.12.2023]. [16] Logpoint, Buzz of the bumblebee – a new malicious loader, https://www.logpoint.com/wp- content/uploads/2022/05/buzz-of-the-bumblebee-a-new-malicious-loader-threat-report- no-3.pdf, 2022. Accessed on [18.12.2023]. [17] K. Merriman, P. Trouerbach, This isn’t optimus prime’s bumblebee but it’s still transforming, April 28, 2022. URL: https://www.proofpoint.com/us/blog/threat-insight/ bumblebee-is-still-transforming, accessed on [18.12.2023]. [18] Secureworks, Lyceum takes center stage in middle east campaign, 2019. URL: https://www. secureworks.com/blog/lyceum-takes-center-stage-in-middle-east-campaign, accessed: on: [02.01.2024]. [19] Cybersecurity, I. S. Agency, Quasar open-source remote administration tool, 2019. URL: https://www.cisa.gov/news-events/analysis-reports/ar18-352a, accessed on [18.12.2023]. [20] David Fiser and Alfredo Oliveira, Tracking the Activities of TeamTNT: A Closer Look at Cloud-Focused Malicious Actor Group, . URL: https://documents.trendmicro.com/assets/ white_papers/wp-tracking-the-activities-of-teamTNT.pdf, Accessed on: [18.12.2023]. [21] C. point, Apt35 exploits log4j vulnerability to distribute new modular power- shell toolkit, https://research.checkpoint.com/2022/apt35-exploits-log4j-vulnerability-to- distribute-new-modular-powershell-toolkit/, 2022. Accessed on: [18.12.2023]. [22] Xiaopeng Zhang, Another Metamorfo Variant Targeting Customers of Financial In- stitutions, https://www.fortinet.com/blog/threat-research/another-metamorfo-variant- targeting-customers-of-financial-institutions, 2020. Accessed on: [18.12.2023]. [23] M. Corporation, MITRE ATTCK, Accessed: on: [02.01.2024]. URL: https://attack.mitre.org/. [24] Cybersecurity and Infrastructure Security Agency (CISA), CISA, Accessed: on: [02.01.2024]. URL: https://www.cisa.gov/. [25] Microsoft Corporation, Microsoft Security Blog, Accessed February 2024. URL: https: //www.microsoft.com/en-us/security/blog/. [26] Cisco Talos, Cisco Talos Blog, Accessed February 2024. URL: https://blog.talosintelligence. com/. [27] Unit 42 Palo Alto Networks, Unit 42: Xagentosx – sofacy’s xagent macos tool, 2017. URL: https://unit42.paloaltonetworks.com/unit42-xagentosx-sofacys-xagent-macos-tool/, accessed: on: [19.02.2024]. [28] T. D. ESET: Matthieu Faou, Mathieu Tartare, Operation ghost the dukes aren’t back they never left, 2019. URL: https://web-assets.esetstatic.com/wls/2019/10/ESET_Operation_ Ghost_Dukes.pdf, accessed: on: [05.01.2024]. 72