Anomaly Detection for System Logs Literature Overview* Justas Juknys1,∗,† 1 Vytautas Magnus University, Universiteto str. 10–202, 53361 Akademija, Kaunas, Lithuania Abstract This paper describes analysis results of system log Anomaly Detection literature from time period of 2018 to 2023. The literature was found using keywords “log anomaly”, “machine learning”, “neural network”. A total of 80 different scientific papers have been analyzed. It has been determined that most popular neural networks are LSTM/BiLSTM; most common datasets are HDFS, BGL and Thunderbird; Most popular evaluation metrics include F1, precision and accuracy. Most of research sought to address issues of improving model detection accuracy, lowering system resource use and making model more suitable real time detection. Keywords Deep learning; Neural networks; Machine Learning; Log messages; Literature Review; Anomaly Detection; Cyber Security; Classification 1. Introduction As time goes by complexity and scale of software systems is rapidly increasing, which forces the necessity for new anomaly detection methods to be developed[1]. As the amount of data needed to be analyzed increases, the need to fully automate detection process increases[2]. As treats to system security become more sophisticated, the amount of needed to be analyzed data points keeps increasing as well, which at the same time makes it harder to use supervised training approach and properly interpret received data[3]. Another major issue is prevalence of 0 day exploits which are usually impossible to predict in advance[4]. All of the aforementioned issues are normally addressed through use of anomaly detection methods. To limit the scope of research it was chosen to focus on the specific keywords: “Log”, “Anomaly detection”, “machine learning”, “neural networks”. arxiv.org[5] and sciencedirect.com[6] databases have been used for research paper collection. A total of 117 different research papers have been analyzed. All papers have been written during 2000-2023 time period with 78 of the papers being from 2018-2023 time period. Only research and conference papers have been analyzed. For the purposes of this paper the following has been chosen to analyze: 1. Which neural network and machine learning approaches are being used? 2. What metrics have been used to evaluate suggested approaches and how do different approaches compare to each other? 3. Which data sets are being used to train models? 4. What problems in anomaly detection have been identified? 5. What findings/conclusions have been made? 2. Key definitions * IVUS2024: Information Society and University Studies 2024, May 17, Kaunas, Lithuania 1,∗ Corresponding author † These author contributed equally. justas.juknys@vdu.lt (J. Juknys) 0009-0005-7913-3934 (J. Juknys) ©️ 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 1) Anomaly detection: It is an approach seeking to identify unusual events based on comparisons to standard situation. The anomalous event is something which cannot be fully anticipated in advance and as result cannot be detected via traditional pattern based detection methods. To declare an anomaly an outlier needs to be found. This outlier could appear through various contexts like statistical outlier, situation/sequence outlier, timing outlier and so on… It is usually assumed that the amount of anomalous data is much less numerous than normal data. Most popular approach to solving anomaly detection problems is use of semi supervised training, where models are trained exclusively on normal data[3]. 2) Log data: this is information gathered in sequential order and presented in lines. Each log entry contains all the necessary information to identify various system states at given time moments. Data is usually saved in either string or numerical values and is saved in easily readable text files. By following log entries it should be possible to reconstruct how system continuously functioned in the past, so if system deviates from expected behavior, log analysis should identify the moment of system malfunction. Log data can be used to determine in advance if there are any risks for system failure and also can be used to detect possible intrusions. In order to achieve this, multiple data entries need to be analyzed at once in order to identify any abnormal patterns[3]. 3) Neural Networks are subset of Artificial Intelligence (AI) research. They are algorithms based on neuroscience seeking to replicate function of human brain. These networks consist of many input units, which are arranged in sets of layers. Initially preproccessed data is fed to initial layer and after performing initial data transformations, layer results are passed to subsequent layers7. Over time Neural Network discovers patterns within its data and then can use it to classify data into various categories. 4) Machine Learning (ML) is a subset of AI research, seeking to imitate human intellect through self learning algorithms. Firstly it is provided with preprocessed data, then a chosen model is applied to discover any meaningful patterns within given data[8]. The given data can either be labeled to enhance model accuracy, which is called “Supervised Learning”. In case of Unsupervised training provided data is unlabeled and patterns need to be discovered using statistical methods. When compared to neural networks, classical, or "non-deep", machine learning is more dependent on human intervention to learn. Human experts determine the set of features to understand the differences between data inputs, usually requiring more structured data to learn[9]. Traditional machine learning methods include Isolation Forest, SVM, kNN, Naive Bayes, Polynomial/Linear Regression, PCA and other methods. 3. Survey Results Table 1 showcases the amount of publications released during recent years. Publication amount is the exact number of research papers released during that year. Any papers which also include research into neural network use are counted as well. Table 1 Publications per year Year Total Publication Amount Neural Network Papers 2023 (first half) 6 5 2022 15 10 2021 26 17 2020 13 10 2019 10 4 2018 8 3 2017 9 3 2016 3 0 2015 5 3 It can be said that during recent 5 years the anomaly detection field has received an increased amount of attention from the research community. During last 3 year period majority of written literature covers Neural Network methods and standard machine learning methods (like Knn, decision trees, SVM…) are becoming less popular. Table 2 lists all the different neural networks which have been mentioned in at least at least 2 separate research papers. All the remaining methods are included in "other" category. By far the most popular neural network models were LSTM or BiLSTM. The primary reason for this is that log data is normally represented in time series where usually previous log entries have influence over later entries[10]. Table 2 Most common Neural Network approaches Neural Network Amount LSTM/BiLSTM 11 Autoencoder 7 CNN/TCN 6 Deeplog/LogAnomaly/LogRobust 6 Transformer 6 GNN/eGNN/eGFC 5 RNN 4 MLP 3 Siamese Neural Network 2 Other 13 Table 3 provides the list of all commonly used machine learning methods. Any method which only has been used once within researched literature has been included in "other" category. It has been determined that SVM is the most frequently used machine learning method. Its primary advantage over Neural Networks is its significantly faster computational speed, which is important when it’s necessary to detect system anomalies as soon as possible. Some other notable advantages include ability to handle high dimensional data and low risk of overfitting[11]. Table 3 Most Common Machine Learning Approaches Method Name Amount SVM 14 Isolation Forest 10 Logistic/Linear Regression 6 PCA 6 Word2Vec 6 Bayesen 5 kNN 4 Decision Tree 3 Drain Algorithm 2 Other 14 Table 4 contains amounts of all most commonly used datasets. Industrial category refers to unnamed datasets which used specific industrial process log data. Private category includes all datasets, which cannot be disclosed due to a non disclosure agreement. Generated category includes all synthetic datasets which were generated specifically for the research study. Any dataset which didn't fall into previous 3 categories and was only mentioned once within all research papers, has been included in "other" category. HDFS is a key component of Hadoop, offering reliable storage through data replication, integrates with big data frameworks and supports batch processing[12]. Within reviewed literature it appeared the most frequently and often was simultaneously used with BGL and Thundebird[13], both of which are popular supercomputer log datasets. Table 4 Most Common Datasets Dataset Name Amount HDFS 20 BGL 17 Thunderbird 10 Openstack 8 Spirit 6 NSL-KDD 4 DARPA 3 Hadoop 3 Lanl 3 Mnist 3 CIFAR 2 Huawei Cloud 2 KDD CUP 99 2 Industrial 4 Private 11 Generated 4 Other 79 Table 5 showcases all frequently used evaluation metrics. Any research metric which has only been used once within all research papers is included in "other" category. It has been determined that F1 Score (Formula 1) was the most commonly used evaluation metric. This metric is calculated using Recall (Formula 2) and Precision (Formula 3) metrics, so in most of research papers all 3 metrics have been used simultaneously. Within these formulas True Positive stands for all correctly identified elements, False Negative stands for all elements which have been incorrectly labeled as false, False Positive are all elements incorrectly labeled as true. Precision is a good way of determining reliability of individual results which helps to minimize the risk of spending unnecessary resources on managing false alarms. Recall is useful for determining how much of an impact false negatives might have which is very important as all it takes is one missed anomaly to cause massive system damage. As both Precision and Recall are important, F1 ensures that both of them can be represented using a single metric[14]. Table 5 Most Common Evaluation Metrics Metric Amount F1 40 Precision 25 Recall 24 Accuracy 20 AUC 11 Computation Time/Resource Reduction 5 Error Rate 2 Standard Deviation 2 Other 11 2 ⋅ Precision ⋅ Recall F 1= (1) Precision+ Recall True Positive Recall= (2) True Positive+ False Negative True Postive Precision= (3) True Positive+ False Positive Table 6 lists most common anomaly detection problems described within research papers. Any problem which has only been mentioned once has been assigned to "other" category. The largest concern specified by research literature is that due to increase in data amount, the extent to which log data analysis should be automated should increase as well[15][16][17]. Another major issue being brought up is that by itself log data does not include a sufficient amount of data to effectively determine new treats[18]. Often while relying on log data, only time context is established and additional data context is ignored[19]. Further issues could also be introduced while parsing log data, which could further degrade anomaly detection accuracy[20]. Table 6 Most common problems Problem type Count Need for better data processing 19 Need better context extracting 18 Excess computing resource use 12 Excess information amount 8 Changing environment/software updates 8 Cloud computing optimization 7 Unfit for novel anomaly detection 6 Need more data points 5 Insufficiently tested models 4 Flawed datasets 4 Insufficient detection rate 2 Log data by itself is insufficient 2 Other 5 One of the main requirements for successful anomaly detection is timely discovery of new treats. In order to comply with it and provide near real time detection, some necessary compromises need to be done. For example often this means only relying on most simple log data analysis and ignoring additional system analysis tools[21][22]. Furthermore state of the art anomaly detection methods with highest detection accuracy are usually unfit for time sensitive issue detection[23]. Another concern is that due to amount of information needed to be processed, cloud computing becomes necessary, which introduces issues of data transfer speeds[24][25]. To add on top of that due to software updates, models designed for previous software versions might severely degrade in accuracy[26]. Some additional issues being brought up in literature included having difficulty to perform simultaneous parallel analysis when each input is part of time series and requires proper understanding of its context[27]; not all problems might be reflected within logs and the issues of software program itself might be overlooked[28]; anomaly detection methods do not get sufficiently compared to each other[29]; traditional machine learning methods such as SVM are unable to perform sufficiently accurate analysis of temporal information of discrete log messages[30]; Certain anomaly detection models have not been sufficiently tested in real life application[31]; models based on statistical methods might be insensitive to importance of log entry order sequences[32]. 3.1. Primary Findings The following were the main findings of analyzed literature: 1.Embedding multi-core point-by-point convolution and global average pooling achieves significant advantages in terms of arithmetic power, memory and high availability, while ensuring detection accuracy [23]. 2.Gumbel Noise Score Matching model demonstrated the capability of score matching for anomaly detection on categorical types in both tabular and image datasets. It also provided a unified framework for modeling mixed data types via score matching [33]. 3.In transformer based models adapter-based tuning consistently outperforms training and fine-tuning models[16]. 4.Dividing log events into dependent and independent types is an effective way to boost model accuracy [17]. 5.Taking a character-based approach to process log events (lines) contributes to higher performance as the model may take advantage of characters deleted in word-based approaches, such as numbers and punctuation. Merging the parser, vectorizer, and classifier components into one deep neural network, allows model to learn log data at the language level [34]. 6.Models trained on multi-project datasets are not only more accurate in standard tests but also more robust to sequence evolutions and more accurate in ahead of time anomaly predictions [34]. 7.Though the presence of critical logs often indicates problems, their absence does not necessarily imply a healthy system status. An important reason is that sometimes determining where and how to place an informative log statement is difficult. In some cases, faults do not affect metrics, while in other cases, metrics exhibit unusual patterns (e.g., jitters) even if the system is experiencing minor performance fluctuations instead of faults. Hence, simply identifying anomalous metric patterns is insufficient [1]. 8.Faults can cause unexpected behaviors involving either logs or metrics, or both of them. So the two data sources should be analyzed comprehensively to reveal the actual anomalies [1]. 9. Intrinsic structure of host-based logs, as captured by persistence images and the spectrum of graph and hypergraph Laplacians, contains discriminative information about whether or not the logs are anomalous[35]. 10. Data augmentation can simulate deviations in log data that occur from service updates over time which contribute to successful anomaly detection[25]. 11. Multimodal approach can improve the scores for anomaly detection for multiple modalities in comparison to the single modalities of logs and traces [36]. 12. Filtering out common log entries can noticeably improve anomaly detection accuracy [37]. 4. Conclusions During this survey it has been determined that over recent years the popularity of this topic has been increasing. The problems identified within research papers still need to be addressed and no universal solution has been discovered which would allow anomaly detection methods to keep up with ever increasing amount of generated log data and general increasing complexity of system software. It has also been determined that neural networks are continuously increasing in popularity, while traditional machine learning methods are becoming less popular. It has been determined that the most popular neural network model is LSTM/BiLSTM, most commonly used dataset is HDFS and most frequently used evaluation metric is F1 score. Bibliography [1] Cheryl Lee, Tianyi Yang, Zhuangbin Chen, Yuxin Su, Yongqiang Yang and Michael R. Lyu, Heterogeneous Anomaly Detection for SoftwareSystems via Semi-supervised Cross-modal Attention, 2023. URL: https://arxiv.org/pdf/2302.06914.pdf [2] Thorsten Wittkopp, Dominik Scheinert, Philipp Wiesner, Alexander Acker, and Odej Kao, PULL: Reactive Log Anomaly DetectionBased On Iterative PU Learning, 2023. URL: https://arxiv.org/pdf/2301.10681.pdf [3] Max Landauer, Sebastian Onder, Florian Skopik, and Markus Wurzenberger, Deep Learning for Anomaly Detection in Log Data:A Survey, 2023. URL: https://arxiv.org/pdf/2207.03820.pdf [4] Rasheed Ahmad, Izzat Alsmadi, Wasim Alhamdani, Lo’ai Tawalbeh, Zero-day attack detection: a systematic literature review, 2023. URL: https://link.springer.com/article/10.1007/s10462-023-10437-z [5] , About arXiv, 2024. URL: https://info.arxiv.org/about/index.html [6] , About Science Direct, 2024. URL: https://www.elsevier.com/products/sciencedirect [7] Chris Woodford, Neural networks, 2023. URL: https://www.explainthatstuff.com/introduction-to- neural-networks.html [8] Sara Brown, Machine learning, explained , 2021. URL: https://mitsloan.mit.edu/ideas-made-to- matter/machine-learning-explained [9] IBM, What is machine learning?, 2024. URL: https://www.ibm.com/topics/machine-learning [10] Amir Farzada,, T. Aaron Gullivera, Log Message Anomaly Detection and Classification UsingAuto- B/LSTM and Auto-GRU, 2021. URL: https://arxiv.org/pdf/1911.08744.pdf [11] Suthar Mudra Bhavikkmuar, Advantages of Support Vector Machines (SVM), 2023. URL: https://iq.opengenus.org/advantages-of-svm/ [12] Donal Tobin, The Ultimate Guide to HDFS for Big Data Processing, 2023. URL: https://www.integrate.io/blog/guide-to-hdfs-for-big-data-processing/ [13] Adam Oliner, Jon Stearley, What Supercomputers Say: A Study of Five System Logs, 2007. URL: https://ieeexplore.ieee.org/document/4273008/ [14] Nikolaj Buhl, F1 Score in Machine Learning, 2023. URL: https://encord.com/blog/f1-score-in- machine-learning/ [15] Jasmin Bogatinovski, Gjorgji Madjarov, Sasho Nedelkoski, Jorge Cardoso, Odej Kao, Leveraging Log Instructions in Log-based AnomalyDetection, 2022. URL: https://arxiv.org/pdf/2207.03206.pdf [16] Hongcheng Guo, Xingyu Lin, Jian Yang, Yi Zhuang, Jiaqi Bai, TieqiaoZheng, Liangfan Zheng, Weichao Hou, Bo Zhang, Zhoujun Li, TRANSLOG: A Unified Transformer-based Framework forLog Anomaly Detection, 2022. URL: https://arxiv.org/pdf/2201.00016.pdf [17] Yongzheng Xie, Hongyu Zhang, Bo Zhang, Muhammad Ali Babar, Sha Lu, LogDP: Combining Dependency and Proximityfor Log-based Anomaly Detection, 2021. URL: https://arxiv.org/pdf/2110.01927.pdf [18] He Cheng, Depeng Xu, Shuhan Yuan, Xintao Wu, Fine-grained Anomaly Detection in Sequential Datavia Counterfactual Explanations, 2022. URL: https://arxiv.org/pdf/2210.04145.pdf [19] Saswati Ray, Sana Lakdawala, Mononito Goswami, Chufan Gao, Learning Probabalistic Graph Neural Networks forMultivariate Time Series Anomaly Detection, 2021. URL: https://arxiv.org/pdf/2111.08082v1.pdf [20] Van-Hoang Le, Hongyu Zhang, Log-based Anomaly Detection Without Log Parsing, 2021. URL: https://arxiv.org/pdf/2108.01955.pdf [21] Davide Sanvito, Giuseppe Siracusano, Sharan Santhanam, Roberto Gonzalez, Roberto Bifulco, syslrn: Learning What to Monitor for EfficientAnomaly Detection, 2022. URL: https://arxiv.org/pdf/2203.15324.pdf [22] Yipeng Ji, Jingyi Wang, Shaoning Li, Yangyang Li, Shenwen Lin, Xiong Li, An Anomaly Event Detection Method Based on GNN Algorithmfor Multi-data Sources, 2021. URL: https://arxiv.org/pdf/2104.08761.pdf [23] Zumin Wang, Jiyu Tian, Hui Fang, Liming Chen, Jing Qin, LightLog: A lightweight temporal convolutional network for log anomaly detection on the edge, 2022. URL: https://www.sciencedirect.com/science/article/abs/pii/S1389128621005119 [24] Bruno Wassermann, David Ohana, Ronen Schaffer, Robert Shahla, Elliot K. Kolodner, Eran Raichstein, Michal Malka, DeCorus: Hierarchical Multivariate Anomaly Detection atCloud-Scale, 2022. URL: https://arxiv.org/pdf/2202.06892.pdf [25] Thorsten Wittkopp, Alexander Acker, Sasho Nedelkoski, Jasmin Bogatinovski, Dominik Scheinert, Wu Fan, Odej Kao, A2Log: Attentive Augmented Log Anomaly Detection, 2021. URL: https://arxiv.org/pdf/2109.09537.pdf [26] Harold Ott, Jasmin Bogatinovski, Alexander Acker, Sasho Nedelkoski, Odej Kao, Robust and Transferable Anomaly Detection in LogData using Pre-Trained Language Models, 2021. URL: https://arxiv.org/pdf/2102.11570.pdf [27] Prateek Chanda, Malay Bhattacharya, Distributed Anomaly Detection in Edge Streams usingFrequency based Sketch Datastructures, 2021. URL: https://arxiv.org/pdf/2111.13949.pdf [28] Yukyung Lee, Jina Kim, Pilsung Kang , LAnoBERT : System Log Anomaly Detectionbased on BERT Masked Language Model, 2023. URL: https://arxiv.org/pdf/2111.09564.pdf [29] Zhuangbin Chen, Jinyang Liu, Wenwei Gu, Yuxin Su, Jieming Zhu, Yongqiang Yang, Michael R. Lyu, Experience Report: Deep Learning-based System Log Analysisfor Anomaly Detection, 2022. URL: https://arxiv.org/pdf/2107.05908.pdf [30] Haixuan Guo, Shuhan Yuan, Xintao Wu, LogBERT: Log Anomaly Detection via BERT, 2021. URL: https://arxiv.org/pdf/2103.04475.pdf [31] Jonghyeon Ko, Marco Comuzz, Online anomaly detection using statisticalleverage for streaming business process events, 2021. URL: https://arxiv.org/pdf/2103.00831.pdf [32] Yicheng Guo, Yujin Wen, Congwei Jian, Yixin Lian, Yi Wan, Detecting Log Anomalies with Multi- Head Attention (LAMA), 2021. URL: https://arxiv.org/pdf/2101.02392.pdf [33] Ahsan Mahmood, Junier Oliva, Martin Styner, Anomaly Detection via Gumbel Noise Score Matching, 2023. URL: https://arxiv.org/pdf/2304.03220.pdf [34] Shayan Hashemi, Mika Mäntylä, OneLog: Towards End-to-End Training in Software Log Anomaly Detection , 2021. URL: https://arxiv.org/pdf/2104.07324v1.pdf [35] Thomas Davies, Topological Data Analysis for Anomaly Detection in Host-Based Logs , 2022. URL: https://arxiv.org/pdf/2204.12919.pdf [36] Jasmin Bogatinovski, Sasho Nedelkoski, Multi-Source Anomaly Detection in Distributed IT Systems, 2021. URL: https://arxiv.org/pdf/2101.04977.pdf [37] Siavash Ghiasvand, Florina M. Ciorba, Anomaly Detection in High Performance Computers: A Vicinity Perspective, 2019. URL: https://arxiv.org/pdf/1906.04550.pdf