Anomaly Detection of Command Shell Sessions based on DistilBERT: Unsupervised and Supervised Approaches

Anomaly Detection of Command Shell Sessions based on DistilBERT: Unsupervised and Supervised Approaches ZefangLiu zefang.liu@jpmchase.com JPMorgan Chase

3223 Hanover St 94304 Palo Alto CA USA

JohnFBuford JPMorgan Chase

3223 Hanover St 94304 Palo Alto CA USA

Anomaly Detection of Command Shell Sessions based on DistilBERT: Unsupervised and Supervised Approaches 1613-0073 FE62F06B136442CBDE6B3233627F5E81 GROBID - A machine learning software for extracting information from scholarly documents anomaly detection keystroke data command line Unix shell DistilBERT

Anomaly detection in command shell sessions is a critical aspect of computer security. Recent advances in deep learning and natural language processing, particularly transformer-based models, have shown great promise for addressing complex security challenges. In this paper, we implement a comprehensive approach to detect anomalies in Unix shell sessions using a pretrained DistilBERT model, leveraging both unsupervised and supervised learning techniques to identify anomalous activity while minimizing data labeling. The unsupervised method captures the underlying structure and syntax of Unix shell commands, enabling the detection of session deviations from normal behavior. Experiments on a largescale enterprise dataset collected from production systems demonstrate the effectiveness of our approach in detecting anomalous behavior in Unix shell sessions. This work highlights the potential of leveraging recent advances in transformers to address important computer security challenges.

Introduction

The complexity of modern computer systems and networks has led to an increasing demand for efficient and reliable security solutions. Interactive command shells, especially Unix shells, which provide a powerful interface for system administration, development, and maintenance tasks, are an essential aspect of many computing environments. However, they can also be exploited by attackers to gain unauthorized access, escalate privileges, avoid defense detection, collect sensitive data, and manipulate systems. As a result, anomaly detection in command shells has become a crucial component of computer security.

Previous studies have utilized various techniques for anomaly detection in command shell sessions, ranging from simple rule-based methods to more complex machine learning algorithms. However, most of these approaches rely heavily on predefined features or labeled data from security experts for training supervised models. Assembling a large, well-labeled dataset can be time-consuming and labor-intensive, often resulting in a limited scope of detection capabilities due to the inherent biases in the labeling process.

Recent advances in deep learning and natural language processing (NLP) have enabled new opportunities for addressing complex security challenges. In particular, transformer-based models, such as BERT (Bidirectional Encoder Representations from Transformers) [1] and GPT (Generative Pretrained Transformer) [2], have achieved state-of-the-art performance across various NLP tasks. These models have the potential to enhance computer security by enabling more effective and adaptable anomaly detection systems that can learn from large-scale, diverse data sources.

In enterprise production environments, access to command shells is treated as a privileged activity because of the potential for misuse of system commands. Commands with the potential for misuse are well known. Specific commands may be a priori disabled. Attack techniques have been compiled, for example, in the MITRE ATT&CK ® framework. Enterprises can implement rule-based detection using these resources. Consequently, the benefit of the anomaly detection model is to automatically identify command patterns that are outliers with respect to the overall set of sessions that would not be detected by the rule-based approach. Due to the volume, length, and complexity of shell sessions, manual detection of outliers is impractical. An automatic process is needed to assign anomaly scores to sessions, where sessions with high anomaly scores can be prioritized for further investigation. In this paper, we apply a transformer-based model for anomaly detection in Unix shell sessions with a pretrained DistilBERT model. Our method employs both unsupervised and supervised learning techniques, aiming to deliver a robust and flexible solution for identifying anomalous activity while reducing the burden on manual labels from experts.

DistilBERT [3], a lighter and more efficient version of the BERT [1], has demonstrated exceptional performance across a wide range of NLP tasks. By pretraining a DistilBERT model on a large dataset of Unix shell sessions, we capture the underlying structure and syntax of Unix shell commands and allow the model to identify deviations of shell sessions from normal activity. The unsupervised method uses an ensemble model to calculate anomaly scores, detecting potential security threats without requiring labeled data. We further experimented with applying the unsupervised model to specific command subshells, such as HDFS, SQL, Spark, and Python, which are notable for having specific subshell command syntaxes. To further enhance the precision of our anomaly detection system, we implement a supervised approach by fine-tuning the pretrained DistilBERT model on a small set of labeled Unix shell sessions with suspicious keywords, which allows the model to learn from session labels and distinguish normal and anomalous activity more effectively. The overall pipeline is shown in the Figure 1 for both unsupervised ans supervised methods.

The main contributions of this paper are as follows:

1. We apply a comprehensive anomaly detection framework for Unix shell sessions based on the pretrained DistilBERT model and ensemble anomaly detectors, addressing an important problem in computer security. 2. We conduct experiment and demonstrate the effectiveness of unsupervised approach using an ensemble method to compute anomaly scores for a large-scale enterprise dataset, enabling the identification of suspicious activities without extensive manual labeling. 3. We evaluate the performances of supervised fine-tuned models on a few-shot set of labeled sessions, highlighting the adaptability and accuracy of our supervised approach. The remainder of this paper is organized as follows: Section 2 provides related work in command shell anomaly detection; Section 3 presents the data, including dataset description, differences from previous datasets, data quality issues, and data cleaning procedures; Section 4 details our methodology, including the unsupervised and supervised approaches; Section 5 presents the experimental results and examples of suspicious activities; and Section 6 concludes the paper and outlines possible future work.

Related Work

In this section, we discuss the existing literature related to detecting anomalies in Unix shell commands. We first review research in log anomaly detection and then masquerade detection. We also highlight the gaps in previous research that our proposed approach aims to address.

Masquerade Detection

Masquerade detection [12] is a specific type of anomaly detection that focuses on identifying unauthorized users who have gained access to legitimate user's accounts or privileges and are attempting to impersonate them. The goal is to detect differences in user behavior between sessions that may indicate the presence of an attacker. In the context of Unix shell sessions, masquerade detection aims to distinguish between the normal activities of the genuine user and the suspicious actions of the masquerader. Early approaches to masquerade detection relied on traditional machine learning techniques, such as Naive Bayes [13,14,15], Support Vector Machines (SVMs) [15,16], and Hidden Markov Models (HMM) [17]. Deep learning techniques [18,19], including Convolutional Neural Networks [20], Temporal Convolutional Networks [21], and LSTM [20], have also been applied to masquerade detection, leading to improved detection accuracies.

However, these masquerade detection methods are not well-suited for detecting suspicious activities in Unix shell sessions. The goal of masquerade detection is to find imitators, while the command shell anomaly detection is trying to search suspicious or exploitable command patterns. Besides, the supervised method used in previous research can only detect anomalous sessions based on predefined rules and features from experts, which limit their flexibility and adaptability and make it challenging to identify new or unknown threats in command shell sessions.

Data

In this section, we describe the data used for our study, including the data description and data preprocessing. Important steps for extracting and cleaning commands from the raw keystroke data are highlighted. We also discuss the characteristics of the data that make it different from previous Unix shell datasets.

Data Description

Previous datasets for Unix shell commands include the SEA dataset [22], Greenberg dataset [23], PU dataset [24], and NL2Bash [25]. The SEA dataset, introduced by Schonlau et al. [22], is a widely recognized benchmark, consisting of Unix commands from 50 users, with potential masquerade attacks seeded. The Greenberg dataset, collected by Greenberg et al. [23], contains Unix commands from 168 different users of the Unix C shell, and has been used to study user behavior and evaluate masquerade detection models. The PU dataset, developed by Lane et al. [24], contains 9 sets of sanitized user data collected from Purdue university command histories of 8 users in 2 years. The NL2Bash dataset, collected by Lin et al. [25], contains around 10,000 English sentence and bash command pairs. These datasets have contributed significantly to the development and evaluation of various Unix shell anomaly detection techniques, especially in the masquerade detection area. While each dataset offers unique insights, they also have their limitations, such as being outdated, only with truncated commands but without command options and subshells, lacking diversity of command usages, or not providing sufficient data for certain types of real exploits or attacks. Consequently, our study aims to leverage a large-scale, unlabeled dataset of Unix shell commands from real operating system users to explore novel anomaly detection approaches and address the limitations of previous datasets.

The raw data used in the research includes 90 days of Unix keystroke sessions from over 15,000 users, which have about 3 million activity objects. Among these activities, around 2.4 million objects are non-empty interactive sessions. However, the raw data have several data characteristics, including mixed shell prompts, command inputs, and command outputs, various shell prompts across sessions and within session, truncated long command lines with varying line lengths, various command aliases across sessions, mixed background process outputs with prompts and inputs, and missed backspaces and tab keys. In order to prepare this dataset for detecting anomalies in the next step, we developed heuristics to extract and clean commands from the raw data.

Data Preprocessing

The anomaly detector for shell commands needs clean command sessions to avoid introducing much noise into the model. However, the raw keystroke log dataset is a mixture of commands inputted by users and also responses outputted from systems. In order to increase the anomaly detection accuracies and also decrease the computing time, we extract user command inputs from the raw data and clean these commands. A heuristic algorithm is developed for this data preprocessing function, which is introduced briefly as follows.

In order to extract commands from the raw data, we need to search the shell prompts first. One conventional way is using the regular expressions. However, in practice, different sessions can have different shell prompts, and even in one session, the shell prompt can vary based on current working directories or subshells. Handcrafting regular expressions for each session is a tedious and non-adaptive work. To overcome these drawbacks, we create a list of 140 common Unix commands and a list of prompt terminal symbols ($, #, >). More terminal symbols were tested, but the probability of mismatching increased. For each input line, the first occurring prompt terminal symbol is located, and the following word is tested against the common command set. If this word is a known common command, the prompt is saved, otherwise it is skipped. To avoid mismatching prompts, several rules are applied for fixing corner cases, such as removing time prefixes, checking for balanced brackets in each prompt, and excluding environment variables.

After extracting session prompts, we then extract commands from the raw data, where we search for known prompts from this session and then extract the command line after the prompt. Additional steps are applied for handling several special cases, such as removing text editor buffers and concatenating wrapped multiple-line commands. Some meta data are also collected for down-stream use, including numbers of output lines and error messages. After extracting commands and dropping duplicates, we obtain 1.15 million sessions.

The last step is the command cleaning process. The main goal of this step is to reduce the data noise, so the anomaly detection model can give more precise results. We apply several filters for cleaning the extracted shell commands, including removing command lines with error messages, dropping command editing buffers and shell completions, deleting long consecutive spaces and over-repeated characters, filtering command names with regular expressions, masking numbers and special words, and cleaning cyclic commands usually generated by loops from shell scripts.

The cleaned command shell sessions are then used in the next stage for both unsupervised an supervised approaches.

Methodology

In this section, we outline the methodology of our proposed anomaly detection approach for Unix shell sessions. Our approach employs both unsupervised and supervised learning techniques. We provide a detailed description of the unsupervised ensemble anomaly detector based on the pretrained DistilBERT model and also the supervised fine-tuning of the DistilBERT model using a few labeled data.

Unsupervised Approach

The unsupervised approach of our research involves pretraining a DistilBERT [3] model from Hugging Face [26] on Unix shell commands and constructing an ensemble anomaly detector based on the session embeddings from the pretrained DistilBERT. This method was first proposed by CrowdStrike [27,28] for command lines from various platforms. The unsupervised model discovers new anomaly patterns for manual review.

Since the Unix shell commands are different from human languages, we pretrain a language model from scratch with the Unix shell commands instead of using an already existing pretrained model. BERT [1] and its lighter-weight variant DistilBERT [3] are state-of-the-art encoderbased transformer models that have shown remarkable performance in various natural language processing tasks, especially in understanding context and capturing complex language patterns. DistilBERT [3] is selected in this research due to its balance of performance and efficiency. The WordPiece [1], the default sub-word tokenizer for DistilBERT, with a dictionary size of 30,000 is trained for tokenizing the Unix sessions, while several other dictionary sizes were experimented. Then the tokens are inputted into the DistilBERT model, and the model is pretrained for the masked language modeling (MLM) task to capture the inherent structure and dependencies within command sequences. The cased DistilBERT model is selected since the Unix shell is case-sensitive. This unsupervised pretraining allows the model to learn general representations of command sequences without relying on labeled data. Once the DistilBERT mode has been pretrained, the last hidden states are used as the embeddings of the Unix shell sessions. At the end of the pretraining process, we have one contextual embedding for each command session, which represents the higher-level features of the command sequences.

To detect anomalies of Unix sessions in an unsupervised approach without fine-tuning a classification layer, four outlier detectors from PyOD [29] are applied, including the principal component analysis (PCA) [30,31], isolation forest (IF) [32,33], copula-based outlier detection (COPOD) [34], and autoencoders (AE) [30], by following CrowdStrike's framework [27,28]. These four outlier detection models are trained with the session embeddings, and their decision scores are normalized for each outlier detector. For each session, all four decision scores are averaged to get the final anomaly score of that session. The anomaly scores represent how deviant of one command session from the overall collection of sessions. Sessions with high anomaly scores are considered outliers, which may contain unusual command syntaxes or patterns.

Supervised Approach

The supervised part of our approach involves fine-tuning the pretrained DistilBERT model with labeled data to improve its performance in distinguishing between normal and suspicious command sequences as a binary classifier. We fine-tune the pretrained DistilBERT with SetFit (Sentence Transformer Fine-tuning) [35], which is an efficient and prompt-free framework for few-shot fine-tuning of sentence transformers. In SetFit, the transformer can be fine-tuned on a small number of text pairs in a contrastive Siamese manner with high accuracy. The results of the model fine-tuned by SetFit are compared with the original fine-tuned DistilBERT and a trained logistic regressor with fixed session embedding.

In order to fine-tune the pretrained model, examples of labeled sessions are required. Instead of labeling sessions manually, we create a table of suspicious keywords developed based on Uptycs's work [36] to cover MITRE ATT&CK ® techniques [37,38] commonly used by attackers. Those suspicious keywords are presented in the Table 1 with their corresponding technique IDs and names. Those suspicious keywords are searched in each Unix shell sessions, and those sessions with the number of unique suspicious keywords higher than the threshold are considered as anomalies. The setting of the labeled dataset is discussed further in the experimental results. Besides the suspicious keywords, we also created regular expressions to tag sessions with more ATT&CK techniques [37,38]. Those tags are used for the session annotation and analysis.

Upon completing the supervised fine-tuning phase, we evaluate the performance of our anomaly detection approach using the testing data. We assess the model's effectiveness in detecting normal and suspicious command sequences by calculating various performance metrics, including precision, recall, and F1 score. The evaluations are discussed in the next section.

Experimental Results

In this section, we present the experimental results for both unsupervised and supervised anomaly detection methods applied to Unix shell commands. We first evaluate the unsupervised model with the pretrained DistilBERT embedding and the ensemble anomaly detector on the unlabeled data and then evaluate performance of the supervised model with labeled sessions.

Unsupervised Approach Results

In order to evaluate the unsupervised model and understand its performance, several analyses are done, including visualizing distributions of anomaly scores and embedding vectors, investigating relations between the anomaly scores and numbers of tokens and command lines, and also comparing anomaly scores of the common shell commands.

The distribution of anomaly scores is shown in the Figure 2a. Since the anomaly scores have already been standardized, the mean and standard deviation of the distribution are 0 and 1 respectively. The distribution of anomaly scores is close to normal distribution, where most of sessions are observed around mean, while some outliers have higher anomaly scores than the most sessions. Besides, the anomaly scores from four anomaly detectors for the top 100 anomalies are also shown in the Figure 2b, where the COPOD usually has the highest anomaly scores, while the IF tends to be the lower side and with a higher variance. For most sessions, these four anomaly detectors show consistent behaviors and assign high anomaly scores to these sessions.

To further understand the behavior of the unsupervised model, the anomaly scores are presented with the number of tokens and the number of command lines in the Figure 3a and Figure 3b. Generally speaking, a session with more tokens and more command lines can have higher anomaly score. It is because usually shorter sessions only have the simple syntax for straightforward and repetitive daily usages, while longer sessions can have long command sequences to perform complicated and uncommon tasks, which are preferred by the unsupervised model due to their unusual command structure and syntax.

At the end of unsupervised model analysis, we show the anomaly scores for the top 50 common commands in the Figure 4. Those anomaly scores are weighted averaged of the session anomaly scores, where these commands appear. Most common commands, such as "ls" "exit", "bash", and so on, have lower anomaly scores, while "alias" and "l" have higher anomaly scores. In most cases, there is no clear explanation about the relation between the command names and their anomaly scores, since those anomaly scores are averaged from their sessions and can be affected by the session structures. But in general, infrequent commands have higher anomaly scores.

In summary, a session with a high anomaly score does not always mean it has the suspicious activity. However, anomaly scores can be used for prioritizing command sessions for expert analyses and also help monitoring experts discover new suspicious patterns. The unclear relations and uncertainties of the unsupervised model results motivate us to build and evaluate supervised models, which are discussed next. More investigation of relations between anomaly scores and suspicious activities and also the language structure of shell commands can be done in the future research.

In addition to the Unix shell, similar analyses are also done for subshell commands. During the command cleaning, we removed subshells which have different prompts than the Unix shells, such as HDFS, Spark, SQL, and Python. Those subshells are extracted separately, where an unsupervised anomaly detector in the same structure is applied to each subshell. The anomaly scores are assigned to subshell sessions, where specific exploits are also scanned through them. Analyzing the experiment results from subshell anomaly detection is beyond the scope of this paper.

Supervised Approach Results

To evaluate the supervised models, we label the command sessions by the number of suspicious keywords as described in the methodology. If one session has at least three unique suspicious keywords, it is considered as an abnormal session. However, if one session has zero suspicious keywords, it is labeled as a normal session. Other sessions are labeled as the abstained session, which are removed from model evaluations, since there is no strong criterion to classify them into either class. The labeled dataset is split into the training and testing sets by 90:10, and the number of sessions in each class are shown in the Table 2. During experiments, we use the same number of normal and abnormal sessions from the training data and combine them into a few-shot training set. Since the evaluation results from a small training set is unstable, we run 5 experiments for each model and each number of samples per class. For models fine-tuned with SetFit [35], we use the batch size 16, learning rate 1e-5, number of iterations 20 (number of text pairs), and train each model for 1 epoch. For fine-tuned DistilBERT models, we use the learning rate 1e-5, and each model is trained for 5 epochs. The averaged precisions, recalls, and F1 scores are reported in the Figure 5 and Table 3. The fine-tuned SetFit model with 2048 samples per class shows the best result, which is higher than the fine-tuned DistilBERT with the same training data size. The fixed DistilBERT embedding with logistic regression gives the lowest result. The observation shows the advantage of SetFit for fine-tuning pretrained models when the labeled data are limited. Also, the model performance increases as the number of samples per class increasing. The experimental results of supervised model show the feasibility of creating a small set of manually labeled command sessions, fine-tuning a pretrained model with SetFit, and then using it for classifying more sessions automatically.

Session Annotations and Examples

Besides experiments and evaluations of unsupervised and supervised models, we also annotated sessions with MITRE ATT&CK ® techniques in addition to previously mentioned suspicious keywords and anomaly scores. These annotations can help cybersecurity experts recognize and analyze suspicious activity.

During the annotation process, Unix shell sessions are labeled by searching 58 MITRE ATT&CK ® techniques with corresponding regular expressions. For each technique, we search for specific command usages and file accesses. The distributions of techniques are shown in Figure 6, and the tactics are shown in Figure 7. The most common techniques are T1057 Process Discovery, T1082 System Information Discovery, and T1105 Ingress Tool Transfer, although those sessions with less-common techniques are more interesting to be analyzed for anomaly detection. Three session examples with high anomaly scores are selected and presented in the Table 4-6, where ATT&CK techniques are highlighted in the blue color with suspicious keywords in the red color. The first example in the Table 4 shows remote command execution of transient web server with potential for data exfiltration. The second example in the Table 5 gives a potential data exfiltration and credential exposure subject to discovery via process discovery. And the last example in the Table 6 illustrates disk clear and boot load configuration changes.

Table 5

An example of potential data exfiltration and credential exposure subject to discovery via process discovery.

Conclusions

Anomaly detection for interactive command shells is a complex problem. Detection of anomalies is needed as a cybersecurity safeguard because privileged access at the shell level provides the opportunity for a range of attacks that threaten critical enterprise infrastructure, data, and services. On the other hand, prevention of such threats by locking system access prevents important operations activities like upgrades, change management, and outage investigation and remediation.

Prior research has been limited by available datasets. We presented the first published results on keystroke anomaly detection using an enterprise-scale dataset captured from production systems over a 90-day period. The extent of the dataset, 1.15 million sessions captured from over 15,000 users, demonstrates the need for automated anomaly detection. The dataset came with important data extraction and cleaning issues but provides a rich cross-section of enterprise operations activities. Notably, the monitored infrastructure in the dataset excludes network appliances and specialized embedded systems and is otherwise representative of widely used information technology.

Past research has also been limited by available models. We presented the first experimental results of using a machine-learning transformer model, specifically DistilBERT, for keystroke log anomaly detection of Unix shells, in both unsupervised and supervised approaches. Although the dataset is unlabeled, we tagged each session using two existing schemes: the MITRE ATT&CK ® techniques and suspicious keywords. Unix shell sessions with high anomaly scores were then cross-checked with the tags as part of validating the utility of the anomaly model for operations uses. Model output was also compared with rule-based log analysis scripts used by operations teams. The results of the cross-check show that the outliers found by the model contain significant cases not found in either the tagging or existing analysis scripts. More future research can be done for designing specific tokenizers for shell commands, understanding the implicit relations between anomaly scores and suspicious activities, and analyzing subshell command anomalies.

Figure 1 :1Figure 1: Pipeline of the command shell session anomaly detection with both unsupervised and supervised methods.

Figure 2 :Figure 3 :23Figure 2: Distributions of averaged anomaly scores and all anomaly scores from four anomaly detection model.

Figure 4 :4Figure 4: Averaged anomaly score for common command names.

Figure 5 :5Figure 5: F1 scores of three supervised models with different training sizes.

4 - 6 -46Activity id = *1e1BD9. Anomaly score = 1.8919. Suspicious keywords = [kill: 3, wget: 21] 1 <lines removed> 2 salt "WH" cmd.run "python -m SimpleHTTPServer # --directory /sqldata/ms_backups/" bg=trues/WH_test_db_FU 3 salt "WH" cmd.run "ps aux | grep '[S]impleHTTPServer #' | awk '{print $#}' |xargs kill -9 "/WH_test_db_FUWH: > [T1057: Process Discovery, T1489: Service Stop]5salt "WH" cmd.run "cd /sqldata/dbmigration;wget http://<host:port>//sqldata/ms_backups/WH_test_db_FU launch transient web server on remote host. Line 3: terminate the server. Line 4 and 6: ATT&CK tags inserted by processing pipeline. Line 5: transfer data from web server using wget.

3 -3Activity id = *1c01C8. Anomaly score = 1.9754. Suspicious keywords = [curl: 12] 1 <lines removed> 2 curl -T server_support.tar.gz -u<username>:<plaintext_credentials> <externalhost> /dropzone/uploads

1 <1Activity id = *b41A0E. Anomaly score = 3.1271. Suspicious keywords = [chmod: 2, df: 1, wget: 1] lines removed > 2 ansible all -i <INVENTORY> -m shell -a "uptime;grep Start /etc/INSTALL_CLASS;cat /etc/redhat-release" -o> 3 -> [T1082: System Information Discovery]> 4 ansible all -i <INVENTORY> -m shell -a "cd /root;chmod HFF diskwipe.sh;./diskwipe.sh" -b> 5 -> [T1222.002: File and Directory Permissions Modification -Linux and Mac File and Directory Permissions Mod]> 6 ansible all -i <INVENTORY> -m shell -a "/sbin/service ambari-agent restart" -become -b> 7 <lines removed>> 8 ansible all -i <INVENTORY> -m shell -a "cd /boot/grub#;cp -p grub.cfg grub.cfg.bkp" -b> 9 ansible all -i <INVENTORY> -m shell -a "/sbin/grubby --args=transparent_hugepage=never --update-kernel=ALL " -b> 10 <lines removed> Details: Lines 1, 7, 10 omitted for brevity. Line 3 and 5 are automatic annotations added by pipeline. Line 2: remote command to check system details. Line 4: remote command to clear disk prior to install. Line 6: restart Hadoop monitoring agent. Line 8, 9: modify boot loader.

Table 11Suspicious keywords and MITRE ATT&CK ® techniques.ATT&CK Tech-ATT&CK Technique NameSuspicious Keywordsnique IDT1018Remote System Discoveryarp, ping, ip, hostsT1033System Owner/User Discoverywhoami, who, w, users, USERT1049System Network Connections Discoverynetstat, lsof, who, wT1016System Network Configuration Discoveryarp, ipconfig, ifconfig, nbtstat, netstat,route, ping, ipT1082System Information Discoverydf, uname, hostname, env, lspci, lscpu,lsmod, dmidecode, systeminfoT1087Account Discoveryid, groups, lastlog, ldapsearchT1069Permission Groups Discoverygroups, id, ldapsearchT1040Network Sniffingtcpdump, tsharkT1574.006Hijack Execution Flow: Dynamic Linker Hi-ld.so.preload, LD_PRELOADjackingT1547.006Boot or Logon Autostart Execution: Kernelmodprobe, insmod, lsmod, rmmod, mod-Modules and ExtensionsinfoT1136Create Accountuseradd, adduserT1053.003Scheduled Task/Job: Croncrontab, cronT1489Service Stopkill, pkillT1562.001Impair Defenses: Disable or Modify ToolssystemctlT1105Ingress Tool Transfercurl, scp, sftp, tftp, rsync, finger, wgetT1222.002File and Directory Permissions Modification:chown, chmod, chgrp, chattrLinux and Mac File and Directory PermissionsModificationT1003.008OS Credential Dumping: /etc/passwd andpasswd, shadow/etc/shadowT1070.003Indicator Removal: Clear Command History.bash_history, HISTFILE, HISTFILESIZET1548.003Abuse Elevation Control Mechanism: Sudosudo, sudoersand Sudo CachingT1546.004Event Triggered Execution: Unix Shell Config-profile, profile.d, .profile, .bash_profile,uration Modification.bash_login, .bashrc, .bash_logout

Table 22Number of sessions in the normal, abnormal, and abstained classes.ClassNumber of Unique Suspi-Number of SamplesTraining SetTesting Setcious Keywords(90%)(10%)Normal= 0790,363711,32779,036Abnormal>= 328,41325,5712,842Abstained (no label) In between335,322--Total-1,154,098736,89881,878

Table 33Evaluation results of three supervised models with different training sizes.ModelLogistic RegressionFine-tuned DistilBERTFine-tuned DistilBERT with SetFitNumber of Precision Recall F1 Score Precision Recall F1 Score PrecisionRecallF1 ScoreSamplesper Class160.14640.78600.24540.16320.55780.25130.15690.82870.2622320.16250.82480.27110.19950.69770.30360.20590.89300.3331640.17130.87540.28620.16250.84180.27160.27120.94840.42101280.19220.88490.31550.17030.90980.28640.39090.97580.55632560.20700.88900.33560.32300.96630.48400.48190.98500.64595120.23080.90270.36760.49000.97740.65240.58450.98660.733710240.26310.91880.40900.64830.98540.78190.71340.99000.829020480.29440.92670.44670.75340.98990.85550.79340.98940.8802

Number of sessions for different MITRE ATT&CK ® tactics.Number of Sessions0 100000 200000T1057 T1082 T1105 T1222.002 T1083 T1070.004 T1489 T1485 T1018 T1049 T1087.001 T1560.001 T1033 T1069.001 T1016 T1543.002 T1546.004 T1552.004 T1053.003 T1003.007 T1529 T1486 T1003.008 T1053.002 T1053.006 T1098.004 T1553.004 T1113 T1562.001 T1562.003 T1614.001 T1552.001 T1552.003 T1040 T1548.001 T1070.003 T1007 T1547.006 T1218 T1201 T1562.006 T1087.002 T1069.002 T1136.001 T1070.002 T1070.007 T1046 T1548.003 T1562 T1574.006 T1546.005 T1037.004 T1115 T1562.004 T1135 T1547.013 Technique T1546.016 T1558Figure 6: Number of sessions for different MITRE ATT&CK ® techniques.TacticImpact Exfiltration Command and Control Collection Lateral Movement Discovery Credential Access Defense Evasion Privilege Escalation Persistence Execution Initial Access Resource Development Reconnaissance0100000 200000 300000 400000 CountFigure 7:

Table 44An example of remote command execution of transient web server with potential for data exfiltration.

Table 66An example of disk clear and boot load configuration changes.

JDevlin M.-WChang KLee KToutanova arXiv:1810.04805 Bert: Pre-training of deep bidirectional transformers for language understanding 2018 arXiv preprint Improving language understanding by generative pre-training ARadford KNarasimhan TSalimans ISutskever 2018 VSanh LDebut JChaumond TWolf arXiv:1910.01108 Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter 2019 arXiv preprint A survey on log anomaly detection using deep learning RBYadav PSKumar SVDhavale 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO) IEEE 2020 Log-based anomaly detection with deep learning: How far are we? V.-HLe HZhang Proceedings of the 44th international conference on software engineering the 44th international conference on software engineering 2022 Long short-term memory SHochreiter JSchmidhuber Neural computation 9 1997 Attention is all you need AVaswani NShazeer NParmar JUszkoreit LJones ANGomez ŁKaiser IPolosukhin Advances in neural information processing systems 30 2017 Deeplog: Anomaly detection and diagnosis from system logs through deep learning MDu FLi GZheng VSrikumar Proceedings of the 2017 ACM SIGSAC conference on computer and communications security the 2017 ACM SIGSAC conference on computer and communications security 2017 Robust log-based anomaly detection on unstable log data XZhang YXu QLin BQiao HZhang YDang CXie XYang QCheng ZLi Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2019 Logbert: Log anomaly detection via bert HGuo SYuan XWu 2021 international joint conference on neural networks (IJCNN), IEEE 2021 Log-based anomaly detection without log parsing V.-HLe HZhang 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE 2021. 2021 A survey on masquerader detection approaches MBertacchini PFierens Proceedings of V Congreso Iberoamericano de Seguridad Informática V Congreso Iberoamericano de Seguridad Informática Universidad de la República de Uruguay 2008 Masquerade detection using truncated command lines RAMaxion TNTownsend Proceedings international conference on dependable systems and networks international conference on dependable systems and networks IEEE 2002 Masquerade detection using enriched command lines RAMaxion 2003 International Conference on Dependable Systems and Networks 2003. 2003 Proceedings., IEEE Computer Society One-class training for masquerade detection KWang SJStolfo Workshop on Data Mining for Computer Security 2003 Empirical evaluation of svm-based masquerade detection using unix commands H.-SKim S.-DCha Computers & Security 24 2005 Hmms based masquerade detection for network security on with parallel computing JLiu MDuan WLi XTian Computer Communications 156 2020 Deep learning approaches for predictive masquerade detection WElmasry AAkbulut AHZaim Security and Communication Networks 2018 2018 Deep learning for insider threat detection: Review, challenges and opportunities SYuan XWu Computers & Security 104 102221 2021 A conceptual hybrid model of deep convolutional neural network (dcnn) and long short-term memory (lstm) for masquerade attack detection AAAzeezat OSAdebukola A.-AAdebayo OBOlushola Information and Communication Technology and Applications: Third International Conference, ICTA 2020

Minna, Nigeria

Springer November 24-27, 2020. 2021 3 Revised Selected Papers Masquerade detection based on temporal convolutional network HZhai YWang XZou YWu SChen HWu YZheng IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD) IEEE 2022. 2022 MSchonlau WDumouchel W.-HJu AFKarr MTheus YVardi Computer intrusion: Detecting masquerades 2001 Using unix: Collected traces of 168 users SGreenberg 88/333/45 1988 Calgary, Alberta Department of Computer Science, University of Calgary Research Report An application of machine learning to anomaly detection TLane CEBrodley Proceedings of the 20th national information systems security conference the 20th national information systems security conference

Baltimore, USA

1997 377 Nl2bash: A corpus and semantic parser for natural language interface to the linux operating system XVLin CWang LZettlemoyer MDErnst Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC the Eleventh International Conference on Language Resources and Evaluation (LREC 2018. 2018 Transformers: State-of-the-art natural language processing TWolf LDebut VSanh JChaumond CDelangue AMoi PCistac TRault RLouf MFuntowicz Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations the 2020 conference on empirical methods in natural language processing: system demonstrations 2020 S.-BCocea Bert embeddings: A modern machine-learning approach for detecting malware from command lines (part 1 of 2 2022. 2022-06-01 CPopa Bert embeddings: A modern machine-learning approach for detecting malware from command lines (part 2 of 2 2022. 2022-06-01 YZhao ZNasrullah ZLi arXiv:1901.01588 Pyod: A python toolbox for scalable outlier detection 2019 arXiv preprint Outlier Analysis CCAggarwal 2016 Springer Publishing Company, Incorporated 2nd ed A novel anomaly detection scheme based on principal component classifier M.-LShyu S.-CChen KSarinnapakorn LChang Proceedings of the IEEE foundations and new directions of data mining workshop the IEEE foundations and new directions of data mining workshop IEEE Press 2003 Isolation forest FTLiu KMTing Z.-HZhou in: 2008 eighth ieee international conference on data mining IEEE 2008 Isolation-based anomaly detection FTLiu KMTing Z.-HZhou ACM Transactions on Knowledge Discovery from Data (TKDD) 6 2012 Copod: copula-based outlier detection ZLi YZhao NBotta CIonescu XHu 2020 IEEE international conference on data mining (ICDM), IEEE 2020 Efficient few-shot learning without prompts LTunstall NReimers UE SJo LBates DKorat MWasserblat OPereg arXiv:2209.11055 2022 arXiv preprint PSalunkhe Linux commands & utilities commonly used by attackers 2021. 2022-10-01 TMCorporation Mitre att&ck ® enterprise techniques 2023. 2023-03-01 RCanary ® Atomic red team ™ 2023. 2023-03-01