Anomaly Detection for System Logs Literature Overview *

Anomaly Detection for System Logs Literature Overview * JustasJuknys juknys@vdu.lt Vytautas Magnus University

Universiteto str. 10-202 53361 Akademija, Kaunas Lithuania

IVUS2024: Information Society University Studies

2024, May 17 Kaunas Lithuania

Anomaly Detection for System Logs Literature Overview * 1613-0073 9CC4E0385FF15B8336AB09AFDD073406 GROBID - A machine learning software for extracting information from scholarly documents Deep learning Neural networks Machine Learning Log messages Literature Review Anomaly Detection Cyber Security Classification *

This paper describes analysis results of system log Anomaly Detection literature from time period of 2018 to 2023. The literature was found using keywords "log anomaly", "machine learning", "neural network". A total of 80 different scientific papers have been analyzed. It has been determined that most popular neural networks are LSTM/BiLSTM; most common datasets are HDFS, BGL and Thunderbird; Most popular evaluation metrics include F1, precision and accuracy. Most of research sought to address issues of improving model detection accuracy, lowering system resource use and making model more suitable real time detection.

Introduction

As time goes by complexity and scale of software systems is rapidly increasing, which forces the necessity for new anomaly detection methods to be developed [1]. As the amount of data needed to be analyzed increases, the need to fully automate detection process increases [2]. As treats to system security become more sophisticated, the amount of needed to be analyzed data points keeps increasing as well, which at the same time makes it harder to use supervised training approach and properly interpret received data [3]. Another major issue is prevalence of 0 day exploits which are usually impossible to predict in advance [4]. All of the aforementioned issues are normally addressed through use of anomaly detection methods.

To limit the scope of research it was chosen to focus on the specific keywords: "Log", "Anomaly detection", "machine learning", "neural networks". arxiv.org[5] and sciencedirect.com [6] databases have been used for research paper collection. A total of 117 different research papers have been analyzed. All papers have been written during 2000-2023 time period with 78 of the papers being from 2018-2023 time period. Only research and conference papers have been analyzed.

For the purposes of this paper the following has been chosen to analyze:

1. Which neural network and machine learning approaches are being used? 2. What metrics have been used to evaluate suggested approaches and how do different approaches compare to each other? 3. Which data sets are being used to train models? 4. What problems in anomaly detection have been identified? 5. What findings/conclusions have been made?

Key definitions

1) Anomaly detection: It is an approach seeking to identify unusual events based on comparisons to standard situation. The anomalous event is something which cannot be fully anticipated in advance and as result cannot be detected via traditional pattern based detection methods. To declare an anomaly an outlier needs to be found. This outlier could appear through various contexts like statistical outlier, situation/sequence outlier, timing outlier and so on… It is usually assumed that the amount of anomalous data is much less numerous than normal data. Most popular approach to solving anomaly detection problems is use of semi supervised training, where models are trained exclusively on normal data [3].

2) Log data: this is information gathered in sequential order and presented in lines. Each log entry contains all the necessary information to identify various system states at given time moments. Data is usually saved in either string or numerical values and is saved in easily readable text files. By following log entries it should be possible to reconstruct how system continuously functioned in the past, so if system deviates from expected behavior, log analysis should identify the moment of system malfunction.

Log data can be used to determine in advance if there are any risks for system failure and also can be used to detect possible intrusions. In order to achieve this, multiple data entries need to be analyzed at once in order to identify any abnormal patterns [3].

3) Neural Networks are subset of Artificial Intelligence (AI) research. They are algorithms based on neuroscience seeking to replicate function of human brain. These networks consist of many input units, which are arranged in sets of layers. Initially preproccessed data is fed to initial layer and after performing initial data transformations, layer results are passed to subsequent layers7. Over time Neural Network discovers patterns within its data and then can use it to classify data into various categories. 4) Machine Learning (ML) is a subset of AI research, seeking to imitate human intellect through self learning algorithms. Firstly it is provided with preprocessed data, then a chosen model is applied to discover any meaningful patterns within given data [8]. The given data can either be labeled to enhance model accuracy, which is called "Supervised Learning". In case of Unsupervised training provided data is unlabeled and patterns need to be discovered using statistical methods.

When compared to neural networks, classical, or "non-deep", machine learning is more dependent on human intervention to learn. Human experts determine the set of features to understand the differences between data inputs, usually requiring more structured data to learn [9]. Traditional machine learning methods include Isolation Forest, SVM, kNN, Naive Bayes, Polynomial/Linear Regression, PCA and other methods.

Survey Results

Table 1 showcases the amount of publications released during recent years. Publication amount is the exact number of research papers released during that year. Any papers which also include research into neural network use are counted as well. It can be said that during recent 5 years the anomaly detection field has received an increased amount of attention from the research community. During last 3 year period majority of written literature covers Neural Network methods and standard machine learning methods (like Knn, decision trees, SVM…) are becoming less popular. Table 3 provides the list of all commonly used machine learning methods. Any method which only has been used once within researched literature has been included in "other" category. It has been determined that SVM is the most frequently used machine learning method. Its primary advantage over Neural Networks is its significantly faster computational speed, which is important when it's necessary to detect system anomalies as soon as possible. Some other notable advantages include ability to handle high dimensional data and low risk of overfitting [11]. HDFS is a key component of Hadoop, offering reliable storage through data replication, integrates with big data frameworks and supports batch processing [12]. Within reviewed literature it appeared the most frequently and often was simultaneously used with BGL and Thundebird [13], both of which are popular supercomputer log datasets. Table 5 showcases all frequently used evaluation metrics. Any research metric which has only been used once within all research papers is included in "other" category. It has been determined that F1 Score (Formula 1) was the most commonly used evaluation metric. This metric is calculated using Recall (Formula 2) and Precision (Formula 3) metrics, so in most of research papers all 3 metrics have been used simultaneously. Within these formulas True Positive stands for all correctly identified elements, False Negative stands for all elements which have been incorrectly labeled as false, False Positive are all elements incorrectly labeled as true.

Precision is a good way of determining reliability of individual results which helps to minimize the risk of spending unnecessary resources on managing false alarms. Recall is useful for determining how much of an impact false negatives might have which is very important as all it takes is one missed anomaly to cause massive system damage. As both Precision and Recall are important, F1 ensures that both of them can be represented using a single metric [14].

Precision=

True Postive True Positive+ False Positive (3) Table 6 lists most common anomaly detection problems described within research papers. Any problem which has only been mentioned once has been assigned to "other" category. The largest concern specified by research literature is that due to increase in data amount, the extent to which log data analysis should be automated should increase as well [15][16] [17]. Another major issue being brought up is that by itself log data does not include a sufficient amount of data to effectively determine new treats [18]. Often while relying on log data, only time context is established and additional data context is ignored [19]. Further issues could also be introduced while parsing log data, which could further degrade anomaly detection accuracy [20]. One of the main requirements for successful anomaly detection is timely discovery of new treats. In order to comply with it and provide near real time detection, some necessary compromises need to be done. For example often this means only relying on most simple log data analysis and ignoring additional system analysis tools [21][22]. Furthermore state of the art anomaly detection methods with highest detection accuracy are usually unfit for time sensitive issue detection [23]. Another concern is that due to amount of information needed to be processed, cloud computing becomes necessary, which introduces issues of data transfer speeds [24][25]. To add on top of that due to software updates, models designed for previous software versions might severely degrade in accuracy [26].

Some additional issues being brought up in literature included having difficulty to perform simultaneous parallel analysis when each input is part of time series and requires proper understanding of its context [27]; not all problems might be reflected within logs and the issues of software program itself might be overlooked [28]; anomaly detection methods do not get sufficiently compared to each other [29];

traditional machine learning methods such as SVM are unable to perform sufficiently accurate analysis of temporal information of discrete log messages [30]; Certain anomaly detection models have not been sufficiently tested in real life application [31]; models based on statistical methods might be insensitive to importance of log entry order sequences [32].

Primary Findings

The following were the main findings of analyzed literature:

1.Embedding multi-core point-by-point convolution and global average pooling achieves significant advantages in terms of arithmetic power, memory and high availability, while ensuring detection accuracy [23].

2.Gumbel Noise Score Matching model demonstrated the capability of score matching for anomaly detection on categorical types in both tabular and image datasets. It also provided a unified framework for modeling mixed data types via score matching [33].

3.In transformer based models adapter-based tuning consistently outperforms training and fine-tuning models [16].

4.Dividing log events into dependent and independent types is an effective way to boost model accuracy [17].

5.Taking a character-based approach to process log events (lines) contributes to higher performance as the model may take advantage of characters deleted in word-based approaches, such as numbers and punctuation. Merging the parser, vectorizer, and classifier components into one deep neural network, allows model to learn log data at the language level [34].

6.Models trained on multi-project datasets are not only more accurate in standard tests but also more robust to sequence evolutions and more accurate in ahead of time anomaly predictions [34].

7.Though the presence of critical logs often indicates problems, their absence does not necessarily imply a healthy system status. An important reason is that sometimes determining where and how to place an informative log statement is difficult. In some cases, faults do not affect metrics, while in other cases, metrics exhibit unusual patterns (e.g., jitters) even if the system is experiencing minor performance fluctuations instead of faults. Hence, simply identifying anomalous metric patterns is insufficient [1].

8.Faults can cause unexpected behaviors involving either logs or metrics, or both of them. So the two data sources should be analyzed comprehensively to reveal the actual anomalies [1].

9. Intrinsic structure of host-based logs, as captured by persistence images and the spectrum of graph and hypergraph Laplacians, contains discriminative information about whether or not the logs are anomalous [35].

10. Data augmentation can simulate deviations in log data that occur from service updates over time which contribute to successful anomaly detection [25].

11. Multimodal approach can improve the scores for anomaly detection for multiple modalities in comparison to the single modalities of logs and traces [36].

12. Filtering out common log entries can noticeably improve anomaly detection accuracy [37].

Conclusions

During this survey it has been determined that over recent years the popularity of this topic has been increasing. The problems identified within research papers still need to be addressed and no universal solution has been discovered which would allow anomaly detection methods to keep up with ever increasing amount of generated log data and general increasing complexity of system software. It has also been determined that neural networks are continuously increasing in popularity, while traditional machine learning methods are becoming less popular. It has been determined that the most popular neural network model is LSTM/BiLSTM, most commonly used dataset is HDFS and most frequently used evaluation metric is F1 score.

Table 1 Publications1per yearYearTotal Publication AmountNeural Network Papers2023 (first half) 2022 2021 2020 20196 15 26 13 105 10 17 10 4

Table 22lists all the different neural networks which have been mentioned in at least at least 2 separate research papers. All the remaining methods are included in "other" category. By far the most popular neural network models were LSTM or BiLSTM. The primary reason for this is that log data is normally represented in time series where usually previous log entries have influence over later entries[10].

Table 22

Most common Neural Network approachesNeural NetworkAmountLSTM/BiLSTM11Autoencoder CNN/TCN Deeplog/LogAnomaly/LogRobust7 6 6Transformer GNN/eGNN/eGFC6 5RNN MLP Siamese Neural Network4 3 2Other13

Table 33Most Common Machine Learning ApproachesMethod NameAmountSVM14Isolation Forest Logistic/Linear Regression PCA Word2Vec Bayesen10 6 6 6 5kNN Decision Tree Drain Algorithm4 3 2Other14

Table 44contains amounts of all most commonly used datasets. Industrial category refers to unnamed datasets which used specific industrial process log data. Private category includes all datasets, which cannot be disclosed due to a non disclosure agreement. Generated category includes all synthetic datasets which were generated specifically for the research study. Any dataset which didn't fall into previous 3 categories and was only mentioned once within all research papers, has been included in "other" category.

Table 44

Most Common DatasetsDataset NameAmountHDFS BGL Thunderbird Openstack Spirit NSL-KDD20 17 10 8 6 4DARPA Hadoop3 3Lanl Mnist CIFAR3 3 2Huawei Cloud KDD CUP 992 2Industrial Private Generated4 11 4Other79

Heterogeneous Anomaly Detection for SoftwareSystems via Semi-supervised Cross-modal Attention CherylLee TianyiYang ZhuangbinChen YuxinSu YongqiangYang MichaelRLyu 2023 Reactive Log Anomaly DetectionBased On Iterative PU Learning ThorstenWittkopp DominikScheinert PhilippWiesner AlexanderAcker OdejKao Pull 2023 Deep Learning for Anomaly Detection in Log Data:A Survey MaxLandauer SebastianOnder FlorianSkopik MarkusWurzenberger 2023 Zero-day attack detection: a systematic literature review RasheedAhmad IzzatAlsmadi WasimAlhamdani 'Lo Tawalbeh 10.1007/s10462-023-10437-z 2023 Neural networks ChrisWoodford 2023 Machine learning, explained SaraBrown 2021 What is machine learning? 2024 IBM Log Message Anomaly Detection and Classification UsingAuto-B/LSTM and Auto-GRU ,Amir Farzada TAaronGullivera 2021 SutharMudra Bhavikkmuar Advantages of Support Vector Machines (SVM) 2023 The Ultimate Guide to HDFS for Big Data Processing DonalTobin 2023 What Supercomputers Say: A Study of Five System Logs AdamOliner JonStearley 2007 F1 Score in Machine Learning NikolajBuhl 2023 Leveraging Log Instructions in Log-based AnomalyDetection JasminBogatinovski GjorgjiMadjarov SashoNedelkoski JorgeCardoso OdejKao 2022 A Unified Transformer-based Framework forLog Anomaly Detection HongchengGuo XingyuLin JianYang YiZhuang JiaqiBai LiangfanTieqiaozheng WeichaoZheng BoHou ZhoujunZhang Li 2022 TRANSLOG LogDP: Combining Dependency and Proximityfor Log-based Anomaly Detection YongzhengXie HongyuZhang BoZhang MuhammadAli Babar ShaLu 2021 Fine-grained Anomaly Detection in Sequential Datavia Counterfactual Explanations HeCheng DepengXu ShuhanYuan XintaoWu ;Saswati ;Ray SanaLakdawala MononitoGoswami ChufanGao Learning Probabalistic Graph Neural Networks forMultivariate Time Series Anomaly Detection 2022. 2021 Log-based Anomaly Detection Without Log Parsing HongyuVan-Hoang Le Zhang 2021 Learning What to Monitor for EfficientAnomaly Detection DavideSanvito GiuseppeSiracusano SharanSanthanam RobertoGonzalez RobertoBifulco 2022 syslrn An Anomaly Event Detection Method Based on GNN Algorithmfor Multi-data Sources YipengJi JingyiWang ShaoningLi YangyangLi ShenwenLin XiongLi 2021 LightLog: A lightweight temporal convolutional network for log anomaly detection on the edge ZuminWang JiyuTian HuiFang LimingChen JingQin 2022 DeCorus: Hierarchical Multivariate Anomaly Detection atCloud-Scale BrunoWassermann DavidOhana RonenSchaffer RobertShahla ElliotKKolodner EranRaichstein MichalMalka 2022 A2Log: Attentive Augmented Log Anomaly Detection ThorstenWittkopp AlexanderAcker SashoNedelkoski JasminBogatinovski DominikScheinert OdejWu Fan Kao 2021 Robust and Transferable Anomaly Detection in LogData using Pre-Trained Language Models HaroldOtt JasminBogatinovski AlexanderAcker SashoNedelkoski OdejKao 2021 Distributed Anomaly Detection in Edge Streams usingFrequency based Sketch Datastructures PrateekChanda MalayBhattacharya 2021 System Log Anomaly Detectionbased on BERT Masked Language Model YukyungLee JinaKim PilsungKang Lanobert 2023 <author> <persName><forename type="first">Zhuangbin</forename><surname>Chen</surname></persName> </author> <author> <persName><forename type="first">Jinyang</forename><surname>Liu</surname></persName> </author> <author> <persName><forename type="first">Wenwei</forename><surname>Gu</surname></persName> </author> <author> <persName><forename type="first">Yuxin</forename><surname>Su</surname></persName> </author> <author> <persName><forename type="first">Jieming</forename><surname>Zhu</surname></persName> </author> <author> <persName><forename type="first">Yongqiang</forename><surname>Yang</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Michael</surname></persName> </author> <imprint/> </monogr> </biblStruct> <biblStruct xml:id="b26"> <monogr> <title level="m" type="main">Experience Report: Deep Learning-based System Log Analysisfor Anomaly Detection Lyu 2022 Log Anomaly Detection via BERT HaixuanGuo ShuhanYuan XintaoWu Logbert 2021 Online anomaly detection using statisticalleverage for streaming business process events JonghyeonKo MarcoComuzz 2021 YichengGuo YujinWen CongweiJian YixinLian YiWan Detecting Log Anomalies with Multi-Head Attention LAMA 2021 Anomaly Detection via Gumbel Noise Score Matching AhsanMahmood JunierOliva MartinStyner 2023 OneLog: Towards End-to-End Training in Software Log Anomaly Detection ShayanHashemi MikaMäntylä 2021 Topological Data Analysis for Anomaly Detection in Host-Based Logs ThomasDavies 2022 Multi-Source Anomaly Detection in Distributed IT Systems JasminBogatinovski SashoNedelkoski 2021 SiavashGhiasvand FlorinaMCiorba Anomaly Detection in High Performance Computers: A Vicinity Perspective 2019