Data/Process Analysis for Advanced Interoperable Cyber Ranges Giuseppe Salerno1,* 1 University of Calabria, Italy Abstract Cyber Ranges (CR) are strategic assets for cyber security that can be used by a wide range of users and for many purposes including cybersecurity education, testing, and research. The main focus of my research includes: exploring new domains and cross-domain scenarios by studying assets, potential weaknesses and vulnerabilities, and specific attack and defense techniques; investigating new enabling technologies and paradigms by leveraging the Digital Twins paradigm; and studying a new model for Attack Graph within the context of Cyber Ranges. Keywords Cyber Range, Digital Twin, Knowledge Graph, Attack Graph, Kill Chain 1. Introduction Recent global security incidents highlight a significant increase in the complexity and impact of cybersecurity threats. Attackers are becoming increasingly sophisticated, utilizing advanced and automated methods in their operations. This situation necessitates enhanced protection for assets and comprehensive training for personnel to counteract these evolving threats. Cyber Ranges (CRs), as described by the National Institute of Standards and Technology (NIST), are "interactive, simulated representations of an organization’s local network, systems, tools, and applications." These environments serve as secure spaces for training Information and Communication Technology (ICT) professionals, preparing them to address a wide array of cyber-attacks and scenarios. A critical factor in the success of Cyber Ranges is their ability to accurately simulate complex, real-world environments. To this end, my research explores the integration of the Digital Twin (DT) paradigm to augment the functionality and scope of Cyber Ranges. In more detail, Section 2 of this document, introduces a novel approach that utilizes Digital Twins technology, aimed at enhancing the capabilities and effectiveness of Cyber Ranges. This method facilitates ongoing monitoring of physical systems during the operation of Digital Twins and addresses the absence of specialized passthrough connectors typically unavailable in conventional Cyber Ranges. Moreover, it enables the accurate emulation of complex en- vironmental behaviors. This is accomplished either through executable models that abstract real-world assets or, for software systems, by generating virtual replicas. In the contex of SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy * Corresponding author. $ giuseppe.salerno@dimes.unical.it (G. Salerno)  0009-0004-2502-9667 (G. Salerno) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings virtualized environments such as Cyber Ranges, particularly those applied to Industrial Control Systems (ICS) and Internet of Things (IoT) devices, my research has extended into the areas of Penetration Testing and Vulnerability Assessment methodologies. This exploration led to the investigation of Attack Graph-based approaches. These methodologies model the process by which an external attacker sequentially executes attacks to progressively gain privileges in any computer system until achieving their ultimate goal of system compromise. In Section 3, I will discuss a novel Attack Graph model and provide an example scenario to demonstrate its practical application. 2. Enhancing Cyber Ranges with Digital Twin Integration via Knowledge Graph 2.1. Knowledge graph-based digital twin Digital Twins are defined as dynamic, virtual replicas of physical systems, continuously syn- chronized to mirror the real system’s performance and health status throughout its lifecycle [1]. A Knowledge Graph (KG) is a graph-based data structure designed to enhance contextual un- derstanding by interconnecting metadata [2]. It proves particularly well-suited for applications in scenarios demanding the integration, management, and extraction of value from diverse sources on a large scale. Knowledge graphs offer numerous advantages over traditional data models, facilitating the modeling, structuring, management, and analysis of heterogeneous and complex data with dynamic relationships. A dynamic knowledge graph proves to be an ideal basis for digital twins [3]. This knowledge graph integrates ontologies and autonomous agents that consistently engage with it. Utilizing ontologies facilitates standardized data use, promoting reuse and interoperability [4]. The multi- domain aspect enables the incorporation of new ontologies and the establishment of relationships between related terms, thereby enhancing connectivity. The interlinking of concepts and instances in knowledge graphs, complemented by dynamic updates from computational agents and real-time data feeds, facilitates numerous interactions among participants within a given digital twin. The representation in the form of a Knowledge Graph of an entire environment is advanta- geous from a security standpoint, because through a unified graph view, it is possible to analyze every possible entry point and study the entire attack surface effectively. It is therefore a fundamental starting point for subsequent studies and analyses to have a good, robust and well-defined knowledge base. Hence, the first step in this research project will be to finalize the definition of a model for representing Knowledge Graph-Based Digital Twins. Each individual digital twin can be interconnected to other entities. The relationships between DTs will culminate in a Knowledge Graph, which will serve as the representation of the ultimate Cyber Range. 2.2. Digital Twin for ICS and Security Historically, the idea behind the development of digital twin was to monitor and manage the performance of physical systems in the context of Industry 4.0 and smart manufacturing. The construction of a digital twin for a physical item involves three key aspects: (1) identifying the components and parameters of the physical product in its real environment, (2) establishing a link between the physical and virtual versions of the product, and (3) integrating data and information to bridge the virtual and real worlds [5]. In the context of cybersecurity, digital twin applications have become increasingly signifi- cant [6]. For instance, they can be integrated with cyber ranges to analyze system behavior under different cyber attack scenarios. Indeed, digital twins can be used in attack emulations and simulations in order to evaluate resilience metrics, ultimately aiding in the design of security and safety mechanisms for cyber (physical) systems. Notably, digital twins can also act effectively as honeypots, offering a proactive strategy for uncovering attack vectors within a network [7]. The advantage of using a digital twin as a honeypot is its ability to enhance both the level of interaction and attraction of the "twin" [8]. In this context, my research concentrates on addressing cybersecurity challenges in ICSs, with the goal of developing knowledge-based attack graphs that are also physics-aware. This approach aims to enable the orchestration of targeted attacks, that extends beyond mere denial of service [9]. Furthermore, I will be exploring the development remediation tactics, defen- sive mechanisms, and strategies to protect intelligent infrastructures against such targeted threats [10, 11, 12]. To this end, I will study the synergy between process mining techniques for detecting physics-aware attacks and the application of game theory models [13, 14, 15]. Moreover, to build models of malware behavior that are not only precise but also fast to discover and interpretable by humans, I intend to investigate effective log encoding [16] for advanced process mining methods [17, 18] paired with explainable AI [19] exploiting efficient computation schemes [20, 21]. An additional, promising avenue for future research involves addressing the challenges of maintaining anonymity in industrial IoT communication networks. Particularly, the principles of sender and relationship anonymity, similar to those applied in the Tor network, could substantially enhance the security of industrial communications [22]. By adapting proto- cols that offer sender anonymity against global passive adversaries, ICSs can be safeguarded against sophisticated adversaries monitoring critical network points [23]. Furthermore, in- corporating privacy-preserving techniques from social networks and IoT, such as those for short communications and MQTT-anonymous protocols, could enhance the robustness of ICSs against advanced persistent threats [24]. 3. Exploring and Developing an Attack Graph Approach in Cyber Range Environments The ever-evolving capabilities of cyber attackers force security administrators to prioritize the early detection of emerging threats. Targeted cyber attacks commonly progress through multiple stages, spanning from the initial reconnaissance of the network environment to the eventual impact on objectives. Multi-step attacks can be conceptualized using the military kill chain concept. The cyber kill chain conceptualizes attacks as sequences of steps. It assumes that the attacker initially identifies suitable targets, then prepares the necessary deliverables, and subsequently transmits them into the environment. Another threat model is provided by attack graphs, which illustrate the paths taken by attackers through the network. Typically, attackers achieve a series of attack steps, where each step grants them certain privileges on protected assets. During my research, I investigated approaches related to Kill Chain Attack Graphs. I studied the approach proposed by Sadlek et al. [25], which combines the kill chain and the attack graph concepts. It allows representing chains of attacker’s actions divided into kill chain phases. According to their definition a Kill Chain Attack Graph (KCAG) is an ordered triple (𝒢, 𝑃, 𝑓 ) where 𝒢 = (𝑉, 𝐸) denotes a directed graph with vertices 𝑉 and edges 𝐸. A set 𝑃 contains kill chain phases, and a function 𝑓 assigns kill chain phases to attack techniques. Whereas Sheyner et al. defined an attack graph as a tuple of states, transitions between the states, an initial state and success states [26]. Ou et al. introduced the concept of a logical attack graph, which is a bipartite directed graph consisting of fact and derivation nodes. Each fact node is labeled with a logical statement represented as a predicate applied to its arguments, whereas each derivation node is labeled with an interaction rule utilized in the derivation step. The edges within a logical attack graph denote a "depends on" relation [27]. In the initial phase of my research, I defined an attack graph as a directed graph 𝒢 = (𝑉, 𝐸), whose vertices 𝑉 are entities and whose edges 𝐸 denote specific relationships or actions. The vertices in this graph are classified into four categories: • Attacker: This vertex represents the knowledge and control an attacker possesses over an asset, underlining the capabilities and potential strategies at their disposal. • Asset: This refers to any component, be it a system, network, or resource, susceptible to cyber threats. • Vulnerabilities and Properties: This category includes the exploitable weaknesses or characteristics of an asset. • Attack Goals: This specifies the final aims or targets the attacker seeks to accomplish, which could range from compromising data integrity to system disruption or control. Specifically, an attacker’s control over assets is differentiated into three levels. Level zero indicates unawareness of the asset’s existence. At level one, the attacker is aware of the asset but lacks any control or capability to breach its security. The highest level identifies the attacker’s ability to violate the asset’s security protocols. Next, we have five asset categories: hosts, processes, individuals, technologies, and data. Properties/Vulnerabilities of assets constitute the third type of vertices. They include information about network services, vulnerable applications, user accounts, etc. A vulnerability can be a known Common Vulnerabilities and Exposures (CVE) or a custom vulnerability/bug present in a host, application code, service misconfiguration, and so on.. Attack goals, the fourth vertex type, denote the attacker’s end targets and feature only incoming edges, indicating their terminal nature within the graph. Furthermore, we have the following types of edges: 1. Edges connecting the first and second vertex types representing steps in the attack progression. 2. Edges linking the second vertex type back to the first (or to an attack goal) illustrate the control level an attacker acquires over an asset post-attack, as part of the attack sequence. 3. Edges from the third to the second vertex type, labeled "hasProperty," associate an asset with its properties or vulnerabilities. Figure 1: Attack Graph Example. Vertices corrispond to: attacker and their level of knowledge/control over an asset (cirle), asset (rectangle), property and/or vulnerability of an asset (rumble), attack goal (double-line circle). An instance of a possible attack graph is depicted in Figure 1. In this example scenario, a server exposes a web application on port 80. At the first step (0), the attacker does not have knowledge of the services the server is exposing. After a reconnaissance phase, the attacker identifies the presence of a web server and initiates an analysis and testing phase on it (1). At the end of this phase, he has gained further knowledge and discovered that there is a "PDF Converter" functionality on the web server with a known CVE, which allows "remote command injection" and thus a reverse shell on the server. Consequently, by exploiting the CVE (2), the attacker gains access to a reverse shell and achieves Arbitrary Code Execution on the remote server (G). The model introduced aims to provide a comprehensive perspective on the specific attacker’ strategies and processes over a scenario described by means of knowledge-graph. Conclusion The domain of Cyber Ranges within the cybersecurity landscape covers a vast range of challenges that can be approached from various perspectives. Currently, there is a lack of comprehensive Knowledge Graph-based models capable of representing any cyber-physical system or object as a Digital Twin. Many existing solutions are not suitable to specific devices such as Industrial Control Systems and IoT devices. Hence, the primary foundational step of this research involves finalizing the definition of the general model for representing Digital Twins. Additionally by proposing a new model for Attack Graph within the CR context, this research contributes to advancing the efficacy and versatility of cyber defense strategies. My definition of the attack graph has made it possible to represent threats intuitively and to clearly outline the potential phases of a cyber attack. Acknowledgements This work was partially supported by project SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the European Union – NextGenerationEU. References [1] A. M. Madni, C. C. Madni, S. D. Lucero, Leveraging digital twin technology in model-based systems engineering, Syst. 7 (2019) 7. URL: https://api.semanticscholar.org/CorpusID: 86548244. [2] A. Hogan, E. Blomqvist, M. Cochez, C. d’Amato, G. D. Melo, C. Gutierrez, S. Kirrane, J. E. L. Gayo, R. Navigli, S. Neumaier, et al., Knowledge graphs, ACM Computing Surveys (Csur) 54 (2021) 1–37. [3] C. Ramonell, R. Chacón, H. Posada, Knowledge graph-based data integration system for digital twins of built assets, Automation in Construction 156 (2023) 105109. URL: https://www.sciencedirect.com/science/article/pii/S0926580523003692. doi:https://doi. org/10.1016/j.autcon.2023.105109. [4] J. Akroyd, S. Mosbach, A. Bhave, M. Kraft, Universal digital twin - a dynamic knowledge graph, Data-Centric Engineering 2 (2021) e14. doi:10.1017/dce.2021.10. [5] G. Epiphaniou, M. Hammoudeh, H. Yuan, C. Maple, U. Ani, Digital twins in cyber effects modelling of iot/cps points of low resilience, Simulation Modelling Practice and Theory 125 (2023) 102744. [6] E. Russo, G. Costa, G. Longo, A. Armando, A. Merlo, Lidite: A full-fledged and feather- weight digital twin framework, IEEE Transactions on Dependable and Secure Computing 20 (2023) 4899–4912. doi:10.1109/TDSC.2023.3236798. [7] M. Lucchese, F. Lupia, M. Merro, F. Paci, N. Zannone, A. Furfaro, HoneyICS: A High- interaction Physics-aware Honeynet for Industrial Control Systems, in: Proceedings of the 18th International Conference on Availability, Reliability and Security, ARES ’23, Association for Computing Machinery, New York, NY, USA, 2023. URL: https://doi.org/10. 1145/3600160.3604984. doi:10.1145/3600160.3604984. [8] F. Lupia, M. Lucchese, M. Merro, N. Zannone, ICS Honeypot Interactions: A Latitudinal Study, in: 2023 IEEE International Conference on Big Data (BigData), 2023, pp. 3025–3034. doi:10.1109/BigData59044.2023.10386497. [9] G. Longo, F. Lupia, A. Pugliese, E. Russo, Physics-aware targeted attacks against maritime industrial control systems, Journal of Information Security and Applications 82 (2024) 103724. doi:https://doi.org/10.1016/j.jisa.2024.103724. [10] G. Longo, A. Orlich, A. Merlo, E. Russo, Enabling real-time remote monitoring of ships by lossless protocol transformations, IEEE Transactions on Intelligent Transportation Systems 24 (2023) 7285–7295. doi:10.1109/TITS.2023.3258365. [11] G. Fortino, C. Greco, A. Guzzo, M. Ianni, Neural network based temporal point processes for attack detection in industrial control systems, in: 2022 IEEE International Conference on Cyber Security and Resilience (CSR), 2022, pp. 221–226. doi:10.1109/CSR54599.2022. 9850333. [12] G. Fortino, C. Greco, A. Guzzo, M. Ianni, Identification and prediction of attacks to industrial control systems using temporal point processes, Journal of Ambient Intelligence and Humanized Computing (2022). URL: https://doi.org/10.1007/s12652-022-04416-5. doi:10. 1007/s12652-022-04416-5. [13] G. Greco, F. Lupia, F. Scarcello, Coalitional games induced by matching problems: Com- plexity and islands of tractability for the Shapley value, Artif. Intell. 278 (2020). [14] G. Greco, F. Lupia, F. Scarcello, Structural tractability of shapley and banzhaf values in allocation games, in: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, AAAI Press, 2015, pp. 547–553. [15] S. Saraeian, B. Shirazi, Process mining-based anomaly detection of additive manufacturing process activities using a game theory modeling approach, Computers & Industrial Engineering 146 (2020) 106584. [16] M. Ianni, E. Masciari, Scout: Security by computing outliers on activity logs, Computers & Security 132 (2023) 103355. URL: https://www.sciencedirect.com/science/article/pii/ S0167404823002651. doi:https://doi.org/10.1016/j.cose.2023.103355. [17] G. Greco, A. Guzzo, F. Lupia, L. Pontieri, Process Discovery under Precedence Constraints, ACM Trans. Knowl. Discov. Data 9 (2015) 32:1–32:39. [18] M. L. Bernardi, M. Cimitile, F. M. Maggi, Data-aware process discovery for malware detection: an empirical study, Mach. Learn. 112 (2023) 1171–1199. [19] C. Greco, M. Ianni, A. Guzzo, G. Fortino, Explaining binary obfuscation, in: 2023 IEEE International Conference on Cyber Security and Resilience (CSR), 2023, pp. 22–27. doi:10. 1109/CSR57506.2023.10224825. [20] F. Lupia, A. Mendicelli, A. Ribichini, F. Scarcello, M. Schaerf, Computing the Shapley value in allocation problems: approximations and bounds, with an application to the Italian VQR research assessment program, J. Exp. Theor. Artif. Intell. 30 (2018) 505–524. [21] G. Greco, F. Lupia, F. Scarcello, The tractability of the shapley value over bounded treewidth matching games, in: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, ijcai.org, 2017, pp. 1046–1052. URL: https://doi.org/10.24963/ijcai.2017/145. doi:10.24963/IJCAI. 2017/145. [22] F. Buccafurri, V. De Angelis, M. F. Idone, C. Labrini, A protocol for anonymous short commu- nications in social networks and its application to proximity-based services, Online Social Networks and Media 31 (2022) 100221. URL: https://www.sciencedirect.com/science/article/ pii/S2468696422000258. doi:https://doi.org/10.1016/j.osnem.2022.100221. [23] F. Buccafurri, V. De Angelis, M. F. Idone, C. Labrini, S. Lazzaro, Achieving sender anonymity in tor against the global passive adversary, Applied Sciences 12 (2022). URL: https://www. mdpi.com/2076-3417/12/1/137. doi:10.3390/app12010137. [24] F. Buccafurri, V. de Angelis, S. Lazzaro, Mqtt-a: A broker-bridging p2p architecture to achieve anonymity in mqtt, IEEE Internet of Things Journal 10 (2023) 15443–15463. doi:10.1109/JIOT.2023.3264019. [25] L. Sadlek, P. Celeda, D. Tovarnak, Identification of attack paths using kill chain and attack graphs, in: NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium, IEEE, 2022. [26] O. Sheyner, J. Haines, S. Jha, R. Lippmann, J. M. Wing, Automated generation and analysis of attack graphs, in: Proceedings 2002 IEEE Symposium on Security and Privacy, IEEE, 2002, pp. 273–284. [27] X. Ou, W. F. Boyer, M. A. McQueen, A scalable approach to attack graph generation, in: Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS ’06, Association for Computing Machinery, New York, NY, USA, 2006, p. 336–345.