Independent Verification and Diversity: Two Echelons of Cyber Physical Systems Safety and Security Assurance Vyacheslav Kharchenkoa a KhAI - National Aerospace University “Kharkiv Aviation Institute”, Chkalov st. 17, Kharkiv, Ukraine Abstract Conceptions of safety and security for cyber physical systems (CPS) in context of interaction with environment are analysed. Models and interconnection of safety and security and its attributes (functional safety, Internet safety, labor and occupational safety; cyber security, confidentiality, integrity, accessibility and physical security) for CPS functioning in conditions of information and physical environment are discussed considering common cause and time failures issue. Independent verification and validation (IV&V) and D3 (Defence-in- Depth and Diversity) approach are two echelons for protection of CPSs against cyber and physical attacks and failures caused by physical and design faults. The techniques of IV&V (XMECA, XBD, XTA, XIT etc.) are analysed in point of view different safety and security attributes. Multi-FIT technique is described as an example for CPS safety assessment. Application of diversity for safety and security assurance is discussed. Keywords 1 Cyber physical systems, safety, security, independent verification and validation, diversity, common cause failure 1. Introduction Cyber physical systems (CPSs) for critical applications such as NPP reactor trip systems, aerospace board and launch-abort control systems, railway interlocking and block signal systems and so on are important factor of safety for any country. Assurance of CPS safety and security of CPSs is one of key problems researched and advanced by scientists and engineers. High level of CPS safety and security can be achieved by enhancing and implementing methods, techniques and technologies of regulation, assessment and improving of dependability and its attributes. There are two main approaches to assurance safety and security. Firstly, it’s rigorous verification and validation allowing to minimize number or theoretically exclude design faults and vulnerabilities. This is process-based echelon of protection for created CPS. The second echelon of protection is grounded on application of redundant structures, especially version redundancy tolerating component failures caused by physical, design and interaction faults [1]. Diversity is one of the general principles of fault- and intrusion-tolerant computer-based CPS designing and increasing dependability, decreasing the risks of the common cause failure (CCF) and optimizing costs considering consequences of severe accidents [2-4]. Objectives of the paper are the following:  to analyse conceptions of safety and security for CPS in context of interaction with physical and information environment;  to discuss CPS safety and security assessment and assurance problem considering CCF issue; 1 ICT&ES-2020: Information-Communication Technologies & Embedded Systems, November 12, 2020, Mykolaiv, Ukraine EMAIL: v.kharchenko@csn.khai.edu (V. Kharchenko) ORCID: 0000-0001-5352-077X (V. Kharchenko) ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org)  to research the independent verification and validation techniques (XMECA, XBD, XTA, XIT etc.) and D3 (Defence-in-Depth and Diversity) approach as two barriers against attacks and failures caused by physical and design faults. Structure of the paper corresponds to objectives. Section 2 describes conceptions of CPS safety and security in context of CCF. Sections 3 and 4 discuss two echelons of CPS protection such as independent verification and validation and D3 principle. Section 5 concludes and formulates future research directions. 2. CPS safety and security in context of common cause failure 2.1. Safety and security model Interconnection between functional safety and information (cyber) security as attributes of big safety is described by Figure 1. According with [4,5] safety is an attribute defining how CPS directly or via controlled object impacts on physical environment (PE) and information (IE) environment (Figure 1,a) and decreases risks of accidents. Failures of safety critical I&C systems increase such risks. Information (cyber) and physical security defines the degree of influence of IE and PE on system (blue and brown arrows, Figures 1,b-d). Insecure influence of IE on safety critical system can cause failures and unsafe influence of system on environment (dotted blue arrow, Fig.1,c). More detailed analysis of influence of IE and PE of safety critical system and its influence on environment is illustrated by Figure 1,d, elements of notation are described by Figure 1,e. There are two types of attacks on CPS integrity (and accessibility or availability) and confidentiality. First of them causes failures and can be reason of unsafe impact of CSP on IE and PE. Second one causes receiving confidential data and can be reason more successful attacks on integrity and accessibility. Influence of PE can cause fatal failures and corresponding influence of CPS on PE and IE. If CPS safety depends on cyber security as a part of information security it’s justifiable using of concept “cyber safety” as a part of safety [6]. Figure 1: Models of CPS safety and security: model of safety (a), models of security (b) and influence of security on safety, model of interaction of CPS and environment (d), notation of models More detailed analysis of different attributes of safety including functional, Internet and labour safety, and security including information security (confidentiality, integrity and accessibility), and physical security is given in Table 1. It describes influence of physical and information environment for all types of safety and security, and influence of attributes of safety and security on physical and information environment. Besides, it is analysed level of potential effects (local, for controlled object only, and global similar NPP accidents). Influence of attributes is marked by “+”. Table 1 Influence of safety and security attributes on environment Safety& Types Influence of Influence on environment Effects Security environment PE IE PE IE Local Global Safety Functional safety + + + (via control- + + led object) Internet safety + + + + Labor + + (via control- + safety led object) Security Infor- Confidenti- + + (via integrity, + + mation ality accessibility) (cyber) Integrity + + (via func- + + + security tional safety) Accessibility + + (via func- + + + tional safety) Physical security + + (via human) + + 2.2. Common cause and common time failures One of the key problem of CPS safety (and security as well) assurance is minimization or exclusion in general of common cause failure (CCF) risks. CCF is event when ef (two or more) channels (versions) of redundant e-channel (e-version) system fail one by one or simultaneously and there is common reason causing this event. In any case, CCF is a multiple failure (MF) of CPS unlike single failure (SF) one of the redundant channels. It should be emphasized that MF occur as a result of not only one (common) cause. It may be caused by a few different reasons concurring or spreading of failure time value does not exceed the response time of on-line testing and reconfiguration. Such type of multiple failures is called as a common time failure (CTF) which is common event failure (CEF) as CCF [6]. Classification of common cause and time failures is shown on the Figure 2. In addition to considered concepts, three attributes should be specified: Figure 2: Classification of common cause and common time failures types and its reasons - reasons (physical, design faults and vulnerabilities of hardware (HW) and software (SW)); - number of failed channels (versions) (partial and full CCFs, i.e. PCCFs and FCCFs, and partial and full CTFs, i.e. PCTFs and FCTFs); - matching of output channel data in case of failures, i.e. matching (MCCFs, MCTFs) and different (DCCFs, DCTFs) failures. Two preliminary conclusions which are important for safety critical CPSs. Firstly, CTFs are important objective of research because there are examples of serial failures caused by attacks on vulnerabilities of redundant channels and combined reasons. Secondly, very important tasks is analysis and assurance, if it’s possible, of distinguishability of failure effects (output data of failed channels) to fix fact of partial or full common cause and time failures. 2.3. IV&V-D3: two echelons of common failures protection Problem of CCF decreasing risks can be solved by use of two approaches (Figure 3):  minimizing of latent faults, first of all, design faults and vulnerabilities. For that techniques of verification and validation (V&V) of developed or modernized CPSs (hardware, software, FPGA components, platforms etc.) have to be applied. There is rigorous requirements to V&V including requirement to independence of verification and validation teams, process, techniques and tools for safety critical CPSs such as NPP I&C systems. V&V which are performed by an organizational and/or financially independent team is called independent V&V (IV&V). Implementing IV&V allows detecting faults which haven’t been detected by developers or QA specialists of company;  application of diversity as a part of more general so-called principle D3 (Defense-in- Depth&Diversity) [7] to provide trusted fault-, vulnerability and intrusion-tolerance during CPS operation. D3 is a horizontal/vertical defense echelon consisting of n subechelons ei and m version redundancy types vrj (Figure 3) [6, 8, 9]. Diversity and multi-diversity when a few types version process-product redundancy are applied allows decreasing risks of common cause failure and common time failure as well during operation stage of CPSs. IV&V Stages of V&V Project, Threats Verification HW, SW and (faults,…, FPGA anomalies) components, ) I&C Creation of CPS Operation of CPS Figure 3: Two echelons of CCF protection: independent verification and validation and D3 (defence- in-depths and diversity) approach: ei – echelons of protection in depth, vr – types of version redundancy These approaches are two echelons of CCF/CTF protection implementing DET principle “to detect – to eliminate (detected faults during V&V) – to tolerate (residual/undetected faults during operation)”. 3. Independent verification and validation techniques: the first echelon 3.1. Methods of safety and security assessment and V&V techniques There are a lot of methods of CPS safety and security assessment and V&V techniques which are used by developers/QA engineers of companies and independent verifiers as well such as [4, 9, 11]:  XME(C/D)A, X (Failure, Software failure, Intrusion, …) Modes and Effects C/D (Criticality/Diagnostics) Analysis;  XBD, X (Reliability, Safety, Security, Trustworthiness, …) Block Diagrams;  XTA, X (Failure, Attack, Non-availability, …) Tree Analysis;  XIT, X (Fault, Software fault, Vulnerability, …) Injection Testing;  HAZOP(X), Hazard Operation Analysis (X – for safety, security);  MM(X), Markov’s Models (X – availability, dependability, safety, security).  other techniques based on CCF analyses, model checking, formal methods and so on. The V&V techniques include more software and documentation based procedures as review of documents (static analysis, verification and validation plans and reports review, check-list based analysis and so on). Table 2 summarizes the results of analysis of these techniques applicability for assessment of different safety and security attributes. The following marks are used:  applicable technique, + ;  can be applicable, (+);  can’t be applicable, x. Two preliminary conclusions are the following: - in fact, all methods and techniques which were initially developed and are used to assess functional safety have analogues to assess security and cyber security. For example, FME(C)A technique (Failure ME(C)A) was modified for security assessment as IME(C)A (Intrusion Modes and Effects (Criticality) Analysis). Feature of IMECA is considering failure as a pair “vulnerability- attack” or as a combination of threats, vulnerabilities and attacks/intrusions [4, 11]; Table 2 Analysis of applicability of assessment methods and V&V techniques for safety and security analysis Safety& Types Methods of safety and security assessment and V&V techniques Security XME(C/D)A XBD XTA XIT HAZOP(X) MM(X) Others Safety Functional safety + + + + + + + Internet safety (+) x (+) x (+) x + Labor safety + x + x + (+) + Security Infor- Confi- + (+) + + (+) + + mation den- (cyber) tiality security Integrity + (+) + + (+) + + Accessi- + (+) + + (+) + + bility Physical security + + + (+) (+) + + - the methods of assessment and V&V procedures can be used by combining ones. For that a special graph-model describing a different ways to get searched measures or V&V results or to assure high level of trustworthiness by getting searched measures using different combinations of the techniques. 3.3. Multi-FIT based verification Fault (and vulnerability) injection testing is one of the techniques applied for IV&V according with standards requirements to safety critical CPS. The goals of FIT are to assess the test quality considering test coverage/trustworthiness issues, efficiency of online testing, analyse fault- and intrusion-tolerance (to design and physical faults). “Natural” failures for complex SW and HW, CPS are multiple ones caused by physical and design faults, attacks with different scenarios. Main challenges of multiple fault injection (multi-FIT): complexity and time of verification (in general number of faults equals 2kmn, n – number of faults, k – number of fault types, m – number of CPS levels), mutation/masking of faults and blockage of verifiable performance. The standard NUREG/CR-7151 recommends employing a multi-FIT, but it does not describe procedures of injection. To tolerate these challenges two approaches can be applied [12]:  development of injectable projects, i.e. assurance of ability to inject faults regarding to actual/specified physical scheme or code (FITability) to optimize points and means of injection;  implementing technique of multi-FIT based on application of modified t-wise procedure and operations of de-masking and de-blockage of injected fault subsets (Figure 4). The future steps are important from research and practical point of view:  development of techniques and tools that take into account the possibilities of injecting different fault/vulnerability types for different CPS components and system levels. For FPGA- based systems it may be physical faults injecting at the module and chip levels, design faults and vulnerabilities injecting into VHDL code and top-level software code);  development of methods assuring ability to multi-fault injections, i.e. multi-FIT-ability. Figure 4: IDEF diagram of multi-FIT technique [12] 4. Diversity and defence-in-depth: the second echelon 4.1. Multi-version computing and classification of version redundancy Diversity is a basic principle of multi-version computing. Main concepts of multi-version computing are the following:  version is an option of different product or/and process realization of CPS function(s);  version redundancy (VR) is a type of redundancy when different versions are used;  diversity or multiversity (MV) is the principle providing use of several versions;  multi-version system (MVS) is a system in which a few versions are used;  multi-version technology (MVT) is set of the interconnected rules and design actions in which a few versions-processes leading to development of two or more intermediate or end-products are used;  multi-version project (MVP) is a project in which the MVT is applied to create one- or multi- version system;  strategy of diversity (MV) is a collection of general criteria and rules defining principles of formation and selection of version redundancy types and volume or MVTs;  diversity metric is indicator to assess level of diversity of versions. To assess CPS safety measures especially a probability of common cause failure it is necessary to evaluate the diversity metrics [4, 9]. Figure 5 presents set model of version faults (attacked vulnerabilities) causing failures. For one-version and cannel system (Figure 5,a) number of single faults equals N (N = Card SF). In this case, any faults of set SF is fatal and is, in fact, CCF. Hence  - factor as a metric of CCF determining relation of number of faults caused CCFs to total number of such faults equals one (and  =  = 1). The metrics of two-version system (Figure 5,b,c) can be evaluated as following:  = NCCF / N, NCCF = Card (SF1 ∩ SF2); N = (N1 + N2) / 2, Ni = Card SFi; i = 1 - ; d = NMCCF / N;  d = NDCCF / N;  = d +  d . Metrics of relative number of MCCFs and DCCFs (see Figure 2): *d = d / , *d =  d /. For three-version system (Figure 5,d)  = 1 -  - 2, where  is metric determining part of CCFs of any two versions (PCCF),  = 2NPCCF / N). If  = 0 (Figure 5,e),  = 1 - . These types of faults and vulnerabilities and metrics can be used to add a profile of injected faults for FIT based verification of multi-version CPSs. Values of metrics  and  are determined using statistics of testing and operation failures and expert methods [9]. a) b) c) d) e) Figure 5: Models of faults sets of one-version (a), two-version (b,c) and three-version (d,e) CPS 4.2. Application of defence-in-depth and diversity for safety and security assurance Classification of different diversity types and D3 in general is described in [7]. Table 3 contains the results of diversity and defence-in-depth (DiD) applicability analysis for assurance of CPS safety and security. The following marks are used:  applicable type of diversity, + ;  type of diversity can be applicable, (+);  type of diversity can’t be applicable, x. Table 3 Influence of diversity and defence-in-depth on safety and security assurance Safety & Types Influence of diversity Influence Security of DiD Signal Functional Equipment Software Design Human Safety Functional safety + + + + + + + Internet safety x x x x x x (+) Labor safety + x + x + + + Security Infor- Confiden- x (+) (+) (+) (+) + + mation tiality (cyber) Integrity + + + + + + + security Accessi- + + + + + + + bility Physical security + + + + + + + Let’s analyse two examples of application of diversity to assess and improve safety and security. In the first case CPS has hardware and software diversity. Dependencies of up-state probabilities on time for the two-version structures are illustrated by Figure 6 [6]. Initial data for modeling are the following: failure rate of version (channel) version = 310-5 1/h, metrics of diversity for physical and design HW faults Hp=0, Hd = 0.2, metric of diversity for SW design faults Sd = 0.8; values of SF metrics for one version hp= hd = sd = 1/3. Figure 6: Dependencies of CPS probability of up-state on time for one-version system (1V), two- version system with diverse SW (2V SW), HW (2V HW) and as SW and HW (2V) versions The second case describes security assessment of FPGA-based MVS. Table 4 summarizes some attacks and the results of assessment using IMECA-analysis. The table contains countermeasures strategies which could be applied as a requirements from Regulatory Guide 5.71:2010 (Cyber Security Programs For Nuclear Facilities, U.S. NRC) to eliminate the attack causes and, moreover, FPGA-based MVS diversity type and its attributes as a countermeasures [4]. Table 4 The results of FPGA-based MVS security assessment using IMECA technique FPGA-based Attack Attack Occurrence probability No nature mode Type of Countermeasures CPS diversity Effect Attack cause severity effects (including RG 5.71) types and its attributes Absence of chip  The use of security bit; security bit and/or  Application of physical Diversity of Readback 1 availability of Obtaining of security controls; electronic Active 1 physical access to secret (B.1.18 Insecure and elements (EEs): M H chip interface (e.g. information Rogue Connections,  Different Joint Test by adversary Appendix B to RG 5.71, technologies of Automation Group, Page B-6) EEs production; JTAG)  Search for a valid output attempting all 3 possible key Diversity of 3 values; Detecting and project Brute force  Exhaustion of documenting development Active all possible logic Leak of unauthorized changes to languages inputs to a device L M undesirable software and information,  Combination of in order; information (C.3.7, Appendix C to RG couples of  Gradual 5.71, Page C-7) diverse CASE- variation of the tools and HDLs voltage input and other environmental conditions Diversity of EE:  Different  Device to Making sure all states manufacturers of 4 execute an are defined and at the EEs;  Altering the incorrect implementation level, 3  Different Fault injection (glitch) input clock; operation verifying that glitches technologies of Active  Creating  Device left cannot affect the order of EEs production; momentary over- M H in a operations; Diversity of or under-shoots to compromisin Detection of voltage scheme the supplied g state tampering from within the specification (SS) voltage  Leak of device;  Different SSs; secret Clock supervisory  Combination of information circuits to detect glitches diverse CASE tools and SSs 5. Conclusions and recommendations The problem of the “last faults” is one of the most challengeable for critical cyber physical systems and reputational for commercial applications. There are two key approaches to minimizing risk of failures caused by design (SW/FPGA) faults and attacks on vulnerabilities using independent V&V and diversity. X (fault, vulnerability, anomaly) injection based techniques (X/FIT) are one of the efficient V&V techniques. Important tasks are fault profiling; FIT coverage and FIT-ability; multi-FIT and tools. Systematization and aggregating of V&V techniques allow achieving higher accuracy and trustworthiness. Diversity assures minimizing common cause failure (CCF) risk. Key problems are assessment CCF risk and implementation of new types of internal/external diversity, formal choice and combining of different types of version redundancy, multi-fault/vulnerabilities injection for multi- version systems and so on. 6. References [1] A. Avizienis, J.-C. Laprie, B. Randell, C. Landwehr, Basic Concepts and Taxonomy of Dependable and Secure Computing, IEEE Transactions on Dependable and Secure Computing 1 (2004) 11-33. [2] N. Leveson, Safeware: System Safety and Computers, Addison-Wesley, 1995. [3] C. Harvey, N. Stanton, Safety in System-of-Systems: Ten key challenges, Safety Science 70 (2014) 358-366. [4] M. Yastrebenetsky, V. Kharchenko (Eds.), Security and Safety of Nuclear Power Plant Instrumentation and Control Systems, Hershey, Pennsylvania, United States of America, IGI Global, 2020, 501 p. [5] A. Kornecki, N. Subramanian, J. Zalewski, Studying Interrelationships of Safety and Security for Software Assurance in Cyber-Physical Systems Proceedings of the Federated Conference on Computer Science and Information Systems, 2013. [6] V. Kharchenko, Big Data and Internet of Things for Safety Critical Applications: Challenges, Methodology and Industrial Cases, International Journal on Information Technologies and Security 4 (2018) 3-16. [7] NUREG7007:2009. Diversity Strategy for Nuclear Power Plant Instrumentation and Control Systems. URL: https://www.nrc.gov/docs/ML1005/ML100541256.pdf [8] V. Kharchenko, Diversity for Safety and Security of Embedded and Cyber Physical Systems: Fundamentals Review and Industrial Cases, in: Proceedings of 15th Biennial Baltic Electronics Conference, 2016, pp. 21-30. [9] V. Kharchenko, A. Siora, E. Bakhmach, Diversity-Scalable Decisions for FPGA-Based Safety- Critical I&Cs: from Theory to Implementation, in: Proceedings of the 6th ANS International Topical Meeting on Nuclear Plant Instrumentation, Controls, and Human Machine Interface Technology, NPIC&HMIT2009, Knoxville, TN, USA: American Nuclear Society, 2009, pp.11- 18. [10] IEC 60812:2018. Failure modes and effects analysis (FMEA and FMECA), 2018. URL: https://webstore.iec.ch/publication/26359 [11] E. Babeshko, V. Kharchenko and A. Gorbenko, Applying F(I)MEA-technique for SCADA- Based Industrial Control Systems Dependability Assessment and Ensuring, in: 2008 Third International Conference on Dependability of Computer Systems DepCoS-RELCOMEX, Szklarska Poreba, Poland, 2008. [12] O. Odarushchenko, V. Kharchenko, Sklyar, V. Multi-Fault Injection Testing: Cases for FPGA- Based NPP I&C Systems, in: Proceedings of ICONE-23 23rd International Conference on Nuclear Engineering, May 17-21, Chiba, Japan, 2015, pp.31-38.