AI in Cybersecurity: Activities of the CINI-AIIS Lab at University of Naples Federico II Antonino Ferraro1 , Antonio Galli1,* , Valerio La Gatta1,2 , Lidia Marassi1 , Stefano Marrone1 , Vincenzo Moscato1 , Marco Postiglione1,2 , Carlo Sansone1 and Giancarlo Sperli1 1 University of Naples Federico II, Via Claudio 21, Naples, 80125, Italy 2 Northwestern University, Department of Computer Science, McCormick School of Engineering and Applied Science, 2233 Tech Dr, Evanston, IL 60208, United States Abstract Artificial intelligence (AI) is revolutionizing various industries, including cybersecurity, by emulating human intelligence to address complex threats. In the cybersecurity domain, AI offers significant potential, bolstering defense mechanisms, optimizing threat detection, and advancing incident response capabilities. AI-powered systems can analyze vast datasets to identify anomalies, predict cyberattacks, and enhance overall security posture. Machine Learning (ML), a subset of AI, enables systems to learn from data and make informed decisions, such as predicting optimal security measures based on threat intelligence and operational context. Deep Learning (DL), another ML subset, harnesses Artificial Neural Networks (ANNs) to process intricate data patterns and provide accurate threat assessments. DL, especially through Convolutional Neural Networks (CNNs), is transforming cybersecurity by extracting meaningful features from network traffic and log data for anomaly detection and threat hunting. Moreover, DL integrated with Natural Language Processing (NLP) streamlines tasks like threat intelligence analysis and incident response coordination. The versatility of AI underscores its pivotal role in cybersecurity, driving resilience enhancements and fostering proactive defense strategies. In this paper, we highlight AI projects in the cybersecurity sector from the University of Naples Federico II node of the CINI-AIIS Lab, showcasing their innovative contributions to cyber defense. Keywords Artificial Intelligence, Cybersecurity, Deep Learning, Machine Learning 1. Introduction Networks (CNNs), DL revolutionizes cybersecurity by extracting salient features from network traffic and log Artificial intelligence (AI) is a transformative force across data, facilitating anomaly detection, threat prediction, various industries, providing a paradigm shift in cyberse- and forensic analysis. curity practices. Within the cybersecurity domain, AI is Moreover, the fusion of DL with Natural Language heralding significant advancements, redefining defensive Processing (NLP) streamlines critical cybersecurity tasks, strategies, amplifying threat detection capabilities, and such as threat intelligence analysis, malware detection, refining incident response mechanisms. By harnessing and incident response coordination. By comprehensively AI technologies, organizations can fortify their defensive analyzing textual data, NLP-powered systems augment postures, anticipate and mitigate cyber threats proac- analystsโ€™ capabilities, enabling rapid threat identification tively, and elevate overall security resilience. and proactive response measures. At the core of AIโ€™s impact on cybersecurity lies its The adaptable and multifaceted nature of AI positions capacity to analyze vast and diverse datasets, enabling it as a cornerstone of cybersecurity, driving innovation, the identification of anomalies, prediction of emerging resilience, and agility in the face of evolving threats. In threats, and optimization of security measures. Machine this paper, we present a comprehensive overview of AI Learning (ML), a pivotal subset of AI, equips systems with initiatives in cybersecurity, drawing from projects con- the ability to learn from data, thereby enhancing decision- ducted at the University of Naples Federico II node of the making processes based on evolving threat landscapes CINI-AIIS Lab. Through these endeavors, we showcase and operational contexts. Deep Learning (DL), another the transformative potential of AI in bolstering cyber cornerstone of AI, leverages sophisticated Artificial Neu- defense strategies and safeguarding digital ecosystems ral Networks (ANNs) to discern intricate patterns within against emerging threats. data, furnishing precise threat assessments and action- able insights. Particularly through Convolutional Neural 2. Interpreting AI Models for Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- nized by CINI, May 29-30, 2024, Naples, Italy Behavioral Malware Detection * Corresponding author. $ antonio.galli@unina.it (A. Galli) In the past decade, the landscape of cyber threats to In- ยฉ 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). formation Systems has undergone a remarkable trans- CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings formation, driven largely by the widespread adoption of promising avenues for improving detection accuracy and Internet of Things (IoT) devices and Cloud Computing efficiency. technologies. This proliferation has provided cybercrimi- Despite their impressive performance, ML and DL- nals with a fertile ground for launching a multitude of based detection systems often lack transparency and in- attacks, ranging from the insertion of unwanted adver- terpretability, raising concerns about their trustworthi- tisements into websites to the clandestine exfiltration of ness and reliability in real-world applications. To address sensitive data for illicit financial gains. At the forefront of these concerns, researchers have begun exploring the these attacks are various forms of malicious software, col- field of eXplainable Artificial Intelligence (XAI), which fo- lectively referred to as malware, which pose significant cuses on developing models and techniques that can pro- challenges to the security and integrity of digital systems. vide human-understandable explanations for AI-driven Examples of such malware include trojans, backdoors, decisions ([11]). In the context of malware detection, XAI spyware, and worms, each designed with the explicit methodologies aim to elucidate the underlying reasoning purpose of exploiting vulnerabilities in target systems behind classification decisions, offering valuable insights ([1]). into the features and patterns driving the detection pro- The detection of malware represents a formidable re- cess. search endeavor, compounded by the ever-evolving so- While XAI approaches have shown promise in enhanc- phistication of cyber threats. As Cyber Security (CS) ing the explainability of malware detection systems, their researchers develop new detection techniques, malware application to Behavioral Malware Detection (BMD) re- authors respond in kind, continually refining their strate- mains relatively unexplored, particularly in the context gies to evade detection ([2, 3]). In this perpetual arms of deep sequential neural networks. This gap in research race, traditional antivirus software programs, reliant underscores the need for comprehensive investigations on signature-based detection mechanisms, have strug- into the explainability of BMD systems, especially as they gled to keep pace with the rapidly evolving threat land- become increasingly reliant on advanced DL techniques. scape. Signature-based detection relies on identifying In our research, we present a novel XAI framework for known patterns or signatures of malicious code within a BMD, leveraging a range of state-of-the-art techniques database, often leading to a cat-and-mouse game where to provide transparent and interpretable explanations malware authors employ advanced evasion techniques for classification decisions. Through extensive experi- such as code obfuscation to circumvent detection ([4, 5]). mentation on publicly available datasets, we evaluate the To address the shortcomings of signature-based detec- effectiveness and robustness of our framework, shedding tion, researchers have explored alternative approaches light on its utility and potential limitations in real-world that focus on analyzing malware behavior, rather than cybersecurity applications. static code signatures. These approaches can be broadly More in details, our methodology builds upon a categorized into Static Malware Detection (SMD) and pipeline composed by three steps: the sequence pre- Behavioral Malware Detection (BMD). SMD techniques processing module aims to standardize the data format, analyze the static characteristics of malware, such as its the model is a classification learner that exploits the se- byte-code structure, while BMD approaches monitor the quence structure of input data to perform the classifica- dynamic behavior of malware at runtime, particularly tion and the explainer generates the explanation support- the sequence of Application Programming Interface (API) ing the modelโ€™s prediction. Our methodological workflow calls made by the software to the underlying operating is summarized in Fig. 1. system ([6]). This behavioral analysis provides valuable To sum up, we introduced an Explainable Artificial insights into the actions performed by malware, offering Intelligence (XAI) framework for behavioral malware de- a more comprehensive understanding of its capabilities tection. We aimed to assess the effectiveness of four XAI and intentions. methods within a sequence-based deep learning model However, the complexity and variability of modern and their relevance in contemporary cybersecurity appli- malware present significant challenges to both SMD and cations. BMD approaches. Static analysis techniques are vulnera- Our experiments demonstrated the feasibility of vari- ble to evasion tactics such as dynamic code linking and ous XAI techniques in explaining the decisions of LSTM- encryption, while behavioral analysis can be computa- based classifiers, considering both explanation quality tionally intensive and time-consuming ([7, 8]). In re- and computational efficiency. While our focus was on sponse to these challenges, researchers have turned to local explanations for individual samples, global expla- advanced Machine Learning (ML) and Deep Learning nations were not addressed. (DL) techniques to enhance the effectiveness of malware However, limitations exist, particularly regarding the detection systems ([9, 10, 7]). These approaches lever- lack of qualitative metrics to directly evaluate XAI effec- age the power of neural networks to automatically learn tiveness and the potential influence of domain-specific complex patterns and features from raw data, offering factors on our findings. LIME Classification performance LSTM / GRU Decreasing redundancy Efficiency Embedding Softmax SHAP Dense Compatibility ... ... ... Sequence-level representations Padding LRP Perturbation API call sequence Sufficiency Attention Stability Input Pre-processing Model XAI Evaluation Figure 1: Methodological workflow. The pre-processing step aims to standardize the data format. The model classifies the input sequence as malware/goodware, and the explainer generates the explanation. The models are then evaluated in terms of classification performance, efficiency and explanations quality. Future research will explore additional XAI methods attributes ๐‘ ๐‘– (such as port number and bytes transferred) and assess the robustness of our framework against ad- and determining whether the input is benign or repre- versarial attacks. We also plan to investigate whether sents an attack. In cases of an attack, the output ๐‘ฆ๐‘– iden- explanations can enhance classification performance and tifies the specific type of attack (e.g., DDoS, sweep). assist in identifying systematic errors in predictive mod- Denoising Autoencoder (DAE): The DAE module els. Real-world scenarios will be considered to evaluate processes the ๐‘–-th session ๐‘ ๐‘– โˆˆ R๐‘› and outputs its latent the practical utility of explanations in aiding expert ana- representation ๐‘ฅ หœ ๐‘– โˆˆ R๐‘˜ and the reconstructed instance lysts. หœ๐‘ ๐‘– โˆˆ R . The latent representation can be considered ๐‘› as the DAE features, while the reconstructed instance represents how the input session might be generated 3. Autoencoder-Based Deep from the latent space. Learning Pipeline for Network Reconstruction Error (RE) Module: The RE mod- ule utilizes the output of the DAE, หœ๐‘ ๐‘– , to calculate the Anomaly Detection reconstruction error ๐‘’๐‘– โˆˆ R. This error is indicative of In recent years, the rapid expansion of interconnected the autoencoderโ€™s proficiency in interpreting the input devices, like those found in IoT and Cloud networks, has session - a higher error suggests a poorer representation. highlighted the urgent need for strong network secu- The RE module assesses the similarity between ๐‘ ๐‘– and หœ๐‘ ๐‘– rity assessments. One crucial aspect of addressing this using various metrics ๐‘š(), such as cosine similarity or challenge is detecting network anomalies, which serve dot product, with empirical evidence favoring the former as important indicators of network intrusions, privacy for enhanced results. breaches, system damage, and fraudulent activities. Deep Threshold Module (TRH): The TRH module con- neural networks, known for their ability to learn intricate catenates the reconstruction error ๐‘’๐‘– with the latent rep- anomaly patterns from data, have become increasingly resentation ๐‘ฅ หœ ๐‘– , forming a comprehensive feature vector popular in this field. However, their effectiveness can for the input instance. It functions as a binary classifier be hampered by the unique characteristics of network within a multilayer perceptron architecture, discerning if traffic data, which is sparse, noisy, and often imbalanced the DAE has recognized ๐‘ ๐‘– as akin to the benign instances due to the multitude of devices and internet applications it was trained on: generating it. Anomalies typically occur in only a small fraction of instances, ranging from 0.001% to 1%. In our ๐‘“ :๐‘ฅ หœ ๐‘– โˆˆ R๐‘˜ โ†’ {0, 1} (1) research, we tackle these challenges with a focused ap- Here, a positive class indicates a benign session, while proach. Initially, we use an autoencoder (AE) to identify a negative class signals an attack, the specifics of which instances of anomalous behavior. Then, these anoma- are determined by the AC module. lies are classified by an attack classifier based on their Attack Classifier (AC): In tandem with the TRH com- specific type. We have tested our framework on a large- putation, the AC module also receives the concatenated scale dataset consisting of real-world network traffic data, vector of ๐‘’๐‘– and ๐‘ฅ หœ ๐‘– . The AC module employs a multi- yielding promising results. class tabular classifier (such as a random forest or sup- Our proposed framework, as depicted in Figure 2, op- port vector machine) that can be trained using standard erates at a high level by processing session description Figure 2: Overview of proposed NAD pipeline. Figure 4: TRH accuracy on training and validation splits. On Figure 3: DAE reconstruction error on training and validation the x axis we report the increasing number of epochs, while splits. On the x axis we report the increasing number of epochs, accuracy values are reported on the y axis. while MSE values are reported on the y axis. Table 1 Attacks Classifier, validation performance supervised machine learning methods. It assigns the at- tack typology to the input instance, with the choice of Anomaly Precision Recall F1 classification algorithm impacting overall performance, DDoS 0.99 1.00 0.99 as detailed in the experimental section. The final decision IP sweep 1.00 1.00 1.00 of the framework is derived by considering the outputs Nmap sweep 0.98 0.87 0.92 of both the TRH and AC modules. If the TRH output is Port sweep 0.99 0.99 0.99 zero, indicating successful reconstruction by the DAE, the input instance is classified as benign. If not, the input instance is classified according to the attack type pre- to reconstruct input samples. The final MSE scores were dicted by the AC module. This approach leverages the 1.2944e-5 for training and 1.2402e-5 for validation. Ad- DAEโ€™s ability to recognize benign sessions, a capability ditionally, further training for five epochs using both honed through extensive training on numerous instances, training and validation data reduced the training MSE to while the AC module provides the specificity in attack 1.1759e-5. typology classification when an attack is presumed. The TRH model, integrating latent features from the Our dataset has been provided with the NAD2021 chal- DAE and its reconstruction error, was trained to classify lenge [12], where participants are provided with traf- samples as Normal (0) or Anomalous (1), using a similar fic records from three specific dates, classified as either early stopping strategy set at 10 epochs. Figure 4 show normal traffic or a specific type of network attack. The that training stops at epoch 202 with a training accuracy challenge focuses on two primary types of attacks: (1) ๐ด๐‘๐‘๐‘ก๐‘Ÿ๐‘Ž๐‘–๐‘› = 0.9697 and validation accuracy ๐ด๐‘๐‘๐‘ฃ๐‘Ž๐‘™ = probing attacks, that involve attempts to extract data from 0.9698. These results indicate the modelโ€™s proficiency in a targeted network, and (2) DDOS-Smurf attacks, which differentiating between anomalous and normal samples. are characterized by the use of numerous ICMP flows, The AC module, tasked with classifying attack samples aimed at overwhelming and halting traffic to a specific identified by the TRH, was trained using a RandomFor- destination IP address. est classifier. Performance metrics, including Precision, The DAE module was trained using an early stopping Recall, and F1 scores, are detailed in the classification mechanism, halting after three epochs without MSE im- report. The confusion matrix provides further insights provement on the validation set. Figure 3 show that into the classifierโ€™s performance across different attack training stops at 69 epochs and the model easily learns types. We report results in Table 1 (Precision, Recall and Table 2 4. AI Act and Biometrics Attacks classifier, validation confusion matrix As AI becomes more integrated into daily life, cybersecu- DDoS IP sweep Nmap sweep Port sweep rity emerges as a critical concern. The AI Act, the first DDoS 374 1 0 0 global law on AI usage, serves as a key regulatory frame- IP sweep 2 38310 0 172 work within the European Union, emphasizing ethical Nmap sweep 1 4 116 12 Port sweep 2 109 2 12253 considerations in cybersecurity. This law seeks a balance between technological innovation and the protection of core ethical values, ensuring AI is used responsibly. Par- Table 3 ticularly important within the AI Act is the role of cyber- Test performance of DAE+TRH modules distinguishing anoma- security for high-risk AI systems, which requires a com- lous and normal samples prehensive security approach. One significant challenge Class Precision Recall F1 addressed by the AI Act is the management of biometrics, acknowledging their sensitive nature and the privacy and Normal 1.00 0.96 0.98 security implications for individuals. The act is partic- Anomaly 0.47 0.98 0.63 ularly concerned with the ethical use of biometric data, such as fingerprints, and facial and vocal recognition, due Table 4 to the personal data protection it necessitates. To regu- Test performance of the entire DAE+TRH+AC pipeline late the deployment of facial and biometric recognition technologies in public spaces, the AI Act sets strict rules, Class Precision Recall F1 allowing exceptions only in well-defined scenarios like DDoS 0.11 0.52 0.19 locating missing persons or preventing serious crimes Normal 1.00 0.96 0.98 IP sweep 0.53 0.99 0.69 [13]. Nmap sweep 0.96 0.83 0.89 While the AI Act represents a significant step forward Port sweep 0.34 0.95 0.50 in balancing the benefits of artificial intelligence with the protection of fundamental rights, it also makes even more complex the landscape of challenges that remain. F1 scores) and Table 2 (confusion matrix). Indeed, on one hand, stringent regulations are essential The final test assessed the combined performance of for managing the risks associated with AI technologies the DAE, TRH, and AC modules on the test set. Given the and ensuring they adhere to ethical standards. On the unbalanced nature of the data, Precision and Recall were other hand, continuous research in the field of AI and key metrics for evaluating the DAE+TRHโ€™s ability to dis- biometrics is critical. The need for advancing research in tinguish between normal and anomalous samples. While biometrics is recognized globally, to the extent that nu- these modules demonstrated high quality in differenti- merous international competitions have been established ating negatives from positives, there were limitations in to challenge researchers in identifying fake biometrics. identifying all anomalies. The cumulative errors from the Over the years, the Naplesโ€™ CINI AI-IS node has made DAE+TRH and AC modules are reflected in the overall significant contributions to the field of fake fingerprint system performance. The aggregated ๐น๐›ผ๐›ฝ score, evaluat- detection. It has actively participated in several editions ing the system across all classes, was recorded as 0.577, of LIVDET1 , an international competition that challenges indicating areas for improvement in the pipelineโ€™s ability researchers with the task of distinguishing between live to accurately classify various types of network activities. and fake fingerprints created through diverse techniques In conclusion, we introduced a streamlined and effec- and spoofing materials. Our team has achieved notable tive framework for Network Anomaly Detection (NAD). success in the last two editions, securing first place in Our approach involves two main phases: (1) identify- one and second place in another. These accomplishments ing anomalies using latent features generated by a Deep were made possible through our innovative use of adver- Denoising Autoencoder, and (2) classifying these anoma- sarial learning techniques, which allowed us to perform lies with a multi-label classifier. Despite potential error a synthetic data augmentation able to improve the over- propagation within the pipeline, our approach has shown all performance of a liveness detector [14] achieving an promising results. However, we observed a limitation in accuracy over 90% on two dataset. More recently, ex- the performance of the Threshold module (TRH), partic- ploiting the experience matured over the years, we also ularly in detecting attack samples, due to dataset imbal- developed a new fake fingerprint crafting strategy that ance. Future research will focus on implementing class- can be used to physically cast a fake fingerprint able to balancing techniques to improve the TRH moduleโ€™s ef- bypass AI-based liveness detectors [15]. fectiveness and enhance the overall system performance. 1 https://sites.unica.it/livdet/ These results not only anticipate future cybersecu- [7] M. G. Gaber, M. Ahmed, H. Janicke, Malware de- rity threats but also aid in formulating effective defence tection with artificial intelligence: A systematic mechanisms. To address this need while also protecting literature review, ACM Computing Surveys (2023). people from unwanted misuses, we advocate that one of doi:10.1145/3638552. the major challenges in the field of AI is education, to [8] A. Damodaran, F. Di Troia, C. A. Visaggio, T. H. promote a deeper understanding of the risks and ethi- Austin, M. Stamp, A comparison of static, dynamic, cal implications of AI and enable people to participate and hybrid analysis for malware detection, Jour- in an informed and conscious manner in public debate nal of Computer Virology and Hacking Techniques and decision-making regarding the use and regulation of 13 (2017) 1โ€“12. doi:https://doi.org/10.1007/ these technologies. In pursuing a balance between tech- s11416-015-0261-z. nological innovation and the protection of fundamental [9] F. O. Catak, A. F. Yazฤฑ, O. Elezaj, J. Ahmed, Deep rights, it seems necessary to promote an open and inclu- learning based sequential model for malware anal- sive dialogue involving both developers and civil society ysis using windows exe api calls, PeerJ Computer stakeholders [16]. Science 6 (2020) e285. URL: https://doi.org/10.7717/ peerj-cs.285. doi:10.7717/peerj-cs.285. [10] G. M., S. C. Sethuraman, A comprehensive Acknowledgments survey on deep learning based malware detec- tion techniques, Computer Science Review 47 This work was supported in part by the Piano Nazionale (2023) 100529. doi:https://doi.org/10.1016/ Ripresa Resilienza (PNRR) Ministero dellโ€™Universitร  e j.cosrev.2022.100529. della Ricerca (MUR) Project under Grant PE0000013-FAIR [11] S. Ali, T. Abuhmed, S. El-Sappagh, K. Muhammad, J. M. Alonso-Moral, R. Confalonieri, R. Guidotti, References J. Del Ser, N. Dรญaz-Rodrรญguez, F. Herrera, Ex- plainable Artificial Intelligence (XAI): What we [1] S. Yan, J. Ren, W. Wang, L. Sun, W. Zhang, Q. Yu, A know and what is left to attain Trustworthy survey of adversarial attack and defense methods Artificial Intelligence, Information Fusion 99 for malware classification in cyber security, IEEE (2023) 101805. doi:https://doi.org/10.1016/ Communications Surveys & Tutorials 25 (2023) 467โ€“ j.inffus.2023.101805. 496. doi:10.1109/COMST.2022.3225137. [12] L. Chen, S.-E. Weng, C.-J. Peng, H.-H. Shuai, W.- [2] N. Galloro, M. Polino, M. Carminati, A. Continella, H. Cheng, Zyell-nctu nettraffic-1.0: A large-scale S. Zanero, A Systematical and longitudinal study of dataset for real-world network anomaly detection, evasive behaviors in windows malware, Computers 2021. URL: https://arxiv.org/abs/2103.05767. doi:10. & Security 113 (2022) 102550. doi:https://doi. 48550/ARXIV.2103.05767. org/10.1016/j.cose.2021.102550. [13] T. Madiega, Artificial intelligence act, European [3] F. Zhong, X. Cheng, D. Yu, B. Gong, S. Song, J. Yu, Parliament: European Parliamentary Research Ser- MalFox: Camouflaged Adversarial Malware Exam- vice (2021). ple Generation Based on Conv-GANs Against Black- [14] A. Galli, M. Gravina, S. Marrone, D. Mattiello, Box Detectors, IEEE Transactions on Computers C. Sansone, Adversarial liveness detector: Leverag- (2023) 1โ€“14. doi:10.1109/TC.2023.3236901. ing adversarial perturbations in fingerprint liveness [4] Z. Bazrafshan, H. Hashemi, S. M. H. Fard, detection, IET Biometrics 12 (2023) 102โ€“111. A. Hamzeh, A survey on heuristic malware de- [15] R. Casula, G. Orrรน, S. Marrone, U. Gagliardini, G. L. tection techniques, in: The 5th Conference on In- Marcialis, C. Sansone, Realistic fingerprint presen- formation and Knowledge Technology, IEEE, 2013, tation attacks based on an adversarial approach, pp. 113โ€“120. doi:10.1109/IKT.2013.6620049. IEEE Transactions on Information Forensics and [5] B. Cheng, J. Ming, E. A. Leal, H. Zhang, J. Fu, Security (2023). G. Peng, J.-Y. Marion, {Obfuscation-Resilient} exe- [16] J. Borenstein, A. Howard, Emerging challenges in ai cutable payload extraction from packed malware, and the need for ai ethics education, AI and Ethics in: 30th USENIX Security Symposium (USENIX Se- 1 (2021) 61โ€“65. curity 21), 2021, pp. 3451โ€“3468. [6] M. Alazab, R. Layton, S. Venkatraman, P. Wat- ters, Malware detection based on structural and be- havioural features of api calls, in: International cy- ber resilience conference (1st: 2010), Edith Cowan University, 2010, pp. 1โ€“10.