=Paper=
{{Paper
|id=Vol-1084/paper1
|storemode=property
|title=Modeling and Verification of Redundancy Policies
|pdfUrl=https://ceur-ws.org/Vol-1084/paper1.pdf
|volume=Vol-1084
|dblpUrl=https://dblp.org/rec/conf/models/ChouhCJBT13
}}
==Modeling and Verification of Redundancy Policies ==
Modeling and Verification of Redundancy Policies? Hamza Chouh, Charlotte Callon, Ghita Jalal Frédéric Boulanger, and Safouan Taha Supélec E3S – Computer Science Department firstname.lastname@supelec.fr Abstract. In this paper, we present a metamodel for specifying redund- able software and hardware architectures. This metamodel takes into account the constraints on the number of redundant elements, the num- ber of allowed failures, the execution times and allocation constraints. From such a specification, we generate all possible structural configura- tions. Then, we check that each of these configurations can be scheduled. This has been implemented as a tool chain relying on Alloy, SynDEx, and model transformations in Eclipse/EMF. This work allows system architects to explore different hardware and software architectures to implement different redundancy policies. It has been applied on a simple case study from the Ariane V launcher. 1 Introduction A large number of fields such as industry, medicine and transport have a need for highly complex systems that are required to provide their functionalities under constraints of safety and reliability. Since computer systems and software are increasingly used in such critical systems, it is well known that both software and hardware are likely to present some defects that can lead to failures. Handling these defects without any catastrophic consequences is required when there is no guarantee of removing all the defects from the system design. A way to design such systems is to provide redundancy, that is to duplicate some critical parts of hardware and/or software. Hardware redundancy is a common prac- tice, but software redundancy is also essential since a software execution can be corrupted by, for example, memory faults. In this paper, we first define a meta-model that helps designing redun- dant systems. Our meta-model provides the necessary constructs to spec- ify both hardware and software architectures together with allocation and timing constraints. Then, we give a tool chain to exploit such ab- stract models to generate concrete redundant configurations. The first step to model the redundant system is to give a high-level description of its hardware and software architectures in terms of com- ponents, without taking redundancy into consideration. This description can be thought of as a functional decomposition of the system. Then, to ? This work has been supported by CNES (http://www.cnes.fr). provide information about redundancy, the user will be able to specify the following constraints: Policy constraints: choice of cold, warm or hot redundancy, number of copies of a hardware or a software component, number of failures the system must support; Allocation constraints, or how software components can be mapped onto hardware components; Communication constraints between software components that are re- lated to control or data exchanges; Timing constraints about both the communications over the buses and the execution of the software components on the hardware compo- nents to which they may be allocated. From this user model, our tool chain will generate (if any) hardware and software configurations that satisfy all the constraints on policy, al- location and communication. These concrete configurations instantiate the abstract model with precise information about replication of hard- ware and software components, allocations, connections to buses etc. Then, the tool chain will take the timing constraints into account to check which configurations can be scheduled and therefore executed. It will also provide a valid schedule of software executions on the available hardware. Redundancy model Alloy specification with constraints Alloy Redundancy Validated Alloy solution configuration schedule SynDEx model SynDEx Fig. 1. Tool chain and work flow Our tool chain, shown on Fig. 1, relies on the following parts: 1. The Redundancy Metamodel for describing the HW/SW architec- ture, the redundancy policy as well as allocation, communication and timing constraints, 2. A model transformation from the Redundancy Metamodel toward the Alloy[1] language to obtain a set of Alloy predicates that are solved to generate concrete configurations, 3. A model transformation from the Alloy output solutions back to the Redundancy Metamodel to analyze the generated configurations, 4. A model transformation from the Redundancy Metamodel that pro- duces SynDEx[2] code for checking if a concrete configuration can be scheduled. 2 Simple Case Study To show how redundant systems can be modeled using our meta-model, we will go through an example which is a very simple model of the Ariane V flight control-command system. 2.1 System architecture For the purpose of this case study, we decomposed the Ariane V hardware system as follows: – SRI (Inertial Reference System): the sensors, – OBC (On-Board Computer): the processors, – EPH (Hydraulic flight control Electronics): the actuators. Similarly, we decomposed the software system as follows: – Measures, getting the position and attitude of the launcher. – Computations of the necessary actions for following the trajectory. – Commands, such as increasing thrust or steering. 2.2 Policy Now that we have a functional decomposition of our system, we need to choose the type of redundancy - cold, warm or hot - and the number of copies for each component. For our example, the following redundancy policy can be chosen for the hardware: – Hot redundancy for sensors: all sensors are measuring all the time, – Warm redundancy for processors: all processors are initialized but only one of them has the control over the system, – Cold redundancy for actuators: only one actuator is running, the others are disconnected. The redundancy policy for software: – Hot redundancy for measures, – Warm redundancy for computations: several types of algorithms will be running but only one will provide results, – Cold redundancy for commands. For this example, we choose to have two copies of each component of the system and we want to handle one failure for each duplicated component. 2.3 Allocation Allocation consists in mapping each software component onto a hardware component. The meta-model we designed allows the allocation of several software components on several hardware components. For instance, we could choose to allocate measures and computations on processors or sensors, so that a measure could be done indifferently on a processor or a sensor. The same would go for computations. This can help covering more cases. For our example of the Ariane space launcher, we keep things simple and allocate: – Measures on sensors, – Computations on processors, – Commands on actuators. 2.4 Communication and Timing We know that the computations depend on measures and produce commands. We thus add data dependencies and the size of the corre- sponding data (to compute communication time). We also add informa- tion about the period of software tasks and their execution time on the given hardware, to obtain the final model shown on Fig. 2. hot [2] «allocated on» hot [2] Measure execTime=5 sensor period=45 «dependency» dataSize=5 warm [2] «allocated on» warm [2] computation execTime=20 processor period=45 «dependency» dataSize=10 cold [2] «allocated on» cold [2] command execTime=5 actuator period=45 software hardware Fig. 2. Ariane V redundancy model 3 Related Work Several approaches of redundancy can be found in the literature. Many focus on the automatic generation of fault-tolerant schedules, using as- sumptions on the failure model (fail-silent or fail-safe), on the homogene- ity of the hardware (same execution time for a task on all processors, same communication time for a piece of data on all buses) or on the number of supported faults [3–5]. Other approaches dynamically sched- ule backup tasks when a primary tasks fails [6]. Our approach aims at offering analysis tools to the designer of a system and of its redundancy policy. We therefore rely on previous work about fault tolerance and redundancy and try to harness existing tools in a tool chain. The meta- model we created for modeling the hardware and software components of a system and the redundancy constraints that they should satisfy, as well as the model transformations between this metamodel and the tool specific formats, allows us to share information among these tools and to exploit their results. We support different execution times on differ- ent processors and different communication times on different buses, as in [7]. The meta-model we propose here is inspired from the MARTE profile [8] (GRM and HLAM packages) and we added an extension for supporting abstract allocation constraints since MARTE’s Alloc package only models static allocation. 4 Redundancy Metamodel As shown in the left part of Fig. 3, our Ecore metamodel provides the user with a concise specification language to model the hardware/software architectures together with policy, allocation, communication and timing constraints. For this purpose, the Redundancy Metamodel provides the following concepts : Redundable a type of software or hardware elements that will be re- dundant. It has a maximum number of instances, a maximum num- ber of failures and a policy type. Allocation a mapping constraint between a software cluster and a hard- ware cluster (e.g. a measure must be executed on a sensor). It has an executionTime parameter to set the specific execution time of an instance of sw when allocated on an instance of hw. Dependency a need for communication between two software clusters (e.g. the need for an algorithm to get data from sensors). It has a dataSize parameter that models the size of the data that is sent from the src cluster to dst. Fig. 3. Redundancy Metamodel A typical example of a user model for a redundant system was given in Fig. 2. Our tool chain generates, from such a high-level redundant system model, many concrete configurations with precise connections to buses and pre- cise allocations between software and hardware instances. As shown in the right part of Fig. 3, our Redundancy Metamodel provides the follow- ing low level concepts to model these concrete configurations : Instance an element that will be part of the system and directly derive from a cluster. Each hardware or software instance has a state to describe whether it is running, passive or failed. Bus a communication medium between hardware instances. Communication a communication between two software instances. It instantiates a dependency between software clusters. An example of concrete configuration written in our meta-model is shown on Fig. 5 in section 5. The first usage of the redundancy metamodel is the specification of re- dundant systems at a high level of abstraction. The second usage of the redundancy metamodel is to provide a way to generate, view and mod- ify concrete configurations. From a high level description, we generate concrete configurations automatically using Alloy. 5 Solving Structural Constraints using Alloy To generate sets of hardware and software configurations that fulfill all the requirements induced by the constraints from the high level redun- dancy model, a constraint solver is needed. We chose to use Alloy [1] for this purpose since it is very well adapted to constraint solving over classes and object-oriented models. Alloy is the name of a constraint modeling language and of the associated constraint solver, both developed by the M.I.T.’s Software Design Group. We wrote a set of Alloy classes and predicates that match the elements of our redundancy metamodel (e.g. SWCluster, SWInstance. . . ). Then, we developed a model to text (M2T) transformation based on Java to translate a redundant system model to the equivalent Alloy declarations. We show in the left of Fig. 4 some of the classes and predicates we wrote, and in the right of the figure some of the generated Alloy code for the launcher case study. Finally, the whole code is given to the Alloy solver. abstract sig Redundable { sig SRI extends HWCluster {} instances: some Instance, sig OBC extends HWCluster {} nbInstances : one Int sig EPH extends HWCluster {} ... sig Measure extends SWCluster {} } sig Computation extends SWCluster {} ... sig Command extends SWCluster {} sig AllocationConstraint { ... sw: one SWCluster, fact allocationConstraints { hw: one HWCluster, allocationConstraint[SRI, Measure] execTime: one Int allocationConstraint[OBC, Computation] } { allocationConstraint[EPH, Command] all i: sw.instances | } i.allocatedOn.class in hw ... run compute for execTime >= 0 exactly 2 SRI_, } exactly 2 Measure_ ... ... Fig. 4. Example: Alloy specification To illustrate the use of Alloy for solving the constraints of a high level redundancy model, we over-constrained our case study: – Two different computations must be running (hot redundancy), – These two computations must be allocated on different processors. We recall that there are two processors and that one may fail. Running our M2T generator on the corresponding model and then run- ning Alloy on the generated code leads to an error message from Alloy that indicates that no configuration could be found. From this error, we can conclude that our variant of the launcher system is inconsistent. We therefore modify it by allowing the two computation programs to be allocated on the same processor and keeping the other constraints unchanged. Running Alloy on the model produced by our tool chain gives us the configuration shown in Fig. 5 (in which only instances and allocations are shown). measure_1 «allocated on» sensor_1 Running Running measure_2 «allocated on» sensor_2 Running Running computation_1 «allocated on» processor_1 Running Running computation_2 «allocated on» processor_2 Running NotWorking command_1 «allocated on» actuator_1 NotWorking NotWorking command_2 «allocated on» actuator_2 Running Running software instances hardware instances Fig. 5. Example: a generated configuration Now that we have consistent configurations fitting all the constraints that were specified using the Redundancy Metamodel, we need to check their schedulability without having to manually write code for a schedulability checker. 6 Checking Timing Constraints using SynDEx Running Alloy gives an allocation configuration that complies with the system architecture and its structural constraints (policy, allocation and communication constaints). It remains that timing constraints should be taken into account to ensure the schedulability. We use SynDEx [2, 9], a software developed by the INRIA Paris-Rocquencourt Research Center in France. SynDEx allows us to model the timing constraints on inter- connected systems and to check their schedulability. The SynDEx model is composed of three descriptions: – The description of the software application as a set of intercon- nected nodes. Each node represents a computation which can be run only when its input data is present at the entry ports. The con- nections between nodes represent data dependencies. – The description of the hardware architecture, which contains two types of elements: media and operators. Operators have communi- cation ports and processing time constraints, and media have data transportation time constraints. – The description of the allocation of the application onto the archi- tecture with timing constraints. From the configuration generated by Alloy, we get back to a more de- tailed model based on our meta-model using an ANTLR-based compiler we wrote. Using this model and the timing constraints initially entered using the redundancy metamodel as inputs, another Java-based M2T transformation in our tool chain creates a SynDEx representation of the model. Then we ask SynDEx to generate a schedule. If SynDEx returns a schedule, we know that the configuration is schedulable and correct. Else, we can analyze the result of SynDEx to understand why the scheduling fails. If we consider again the model of the launcher depicted in Fig. 2, and if we add hot redundancy for computations, Alloy finds an allocation configuration such as the one of Fig. 5. However, when run through SynDEx, it appears that this configuration is not schedulable. We may try to run SynDEx on some other redundancy configurations generated by Alloy. However, in this case, the inconsistency is actually located in the constraints we have chosen. Indeed, two computations must be able to run on the same remaining processor (when one processor fails) and to send data to actuators within a period of 45 cycles. Sending data requires 10 cycles and computing requires 20. This is why SynDEx is unable to schedule such a cycle: (20 + 10) ∗ 2 > 45. By allowing one of the two computations not to run (warm redundancy), Alloy generates a configuration that SynDEx is able to schedule, and we obtain the final scheduling shown on Fig. 6. The concrete configuration generated by Alloy in this case runs one instance of the computation on the remaining processor after the failure of the first processor. 7 Discussion The tool chain we developed provides a mean to model a system and redundancy constraints over it. Then, it generates allocation configura- tions that respect these constraints. Finally, the schedulability of these configurations can be checked. However, this work is a first approach to the modeling and verification of redundancy policies and has some limitations we are aware of. First, our meta-model does not consider the cost of reconfigurations be- tween different allocation configurations. We only take into account the Fig. 6. Scheduling example execution time of each software task when checking schedulability, and in some cases, this is not enough. Indeed, considering the example of Ari- ane V where OBCs (On-Board Computer) have warm redundancy policy, if one of the OBCs fails, some time is needed for the redundant one to be ready to take over, during which the system is unable to perform computations. This is not necessarily fatal but it needs to be modeled. More generally, going from an allocation configuration to another takes some time that may have consequences on the behavior of the system and is important to model. In addition, we started from the hypothesis of a fail-stop system, which means that we only consider failures that are detectable. Tolerance to silent failures is a complex problem that we did not address in this work. Moreover, the generated redundancy configurations are not sorted, which means that some configurations could be better optimized from a schedu- lability point of view when others consume less resources. This is par- tially linked to the fact that no qualitative constraints are given at the generation step. 8 Conclusion We have presented a metamodel for redundancy modeling and a tool chain for generating and verifying configurations. The main contributions of this paper are: – The metamodel to express redudancy both at a high level using constraints, and at a lower level using detailed allocation. – The model transformations for building an Alloy model from the redundancy model, and for building an allocated redundancy model from the configurations generated by Alloy. – The model transformation from an allocated redundancy model to a SynDEx model in order to check its schedulability. The main motivation of this work was to ease the exploration of different redundancy policies for a system by verifying automatically if a redun- dancy model is sound (are there configurations that satisfy the redun- dancy constraints), and if the configurations that satisfy the constraints can be scheduled. Future Works The next steps of this work will improve our redundancy tool chain by adding the previously discussed features. We will add information about reconfiguration times to take into account the dynamic aspects of redundancy, and we will provide a quality function to evaluate the relevance of a generated configuration, which could, for example, help minimizing the costs or improving the safety of a system. References 1. Daniel Jackson. Software Abstractions. The MIT Press, September 2011. 2. T. Grandpierre, C. Lavarenne, and Y. Sorel. Optimized rapid pro- totyping for real-time embedded heterogeneous multiprocessors. In Proceedings of 7th International Workshop on Hardware/Software Co- Design, CODES’99, Rome, Italy, May 1999. 3. Alan A. Bertossi, Luigi V. Mancini, and Federico Rossini. Fault- tolerant rate-monotonic first-fit scheduling in hard-real-time systems. IEEE Trans. Parallel and Distributed Systems, 10:934–945, 1999. 4. Xiao Qin, Zongfen Han, Hai Jin, Liping Pang, and Shengli Li. Real- time fault-tolerant scheduling in heterogeneous distributed systems. In in Proceedings of the 2000 International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, pages 26–29, 2000. 5. Krithi Ramamritham. Allocation and scheduling of precedence- related periodic tasks. IEEE Trans. Parallel Distrib. Syst., 6(4):412– 420, April 1995. 6. Marco Caccamo, Giorgio Buttazzo, and Scuola Superiore S. Anna. Optimal scheduling for fault-tolerant and firm real-time systems. In In 5th International Conference on Real-Time Computing Systems and Applications. IEEE, pages 223–231, 1998. 7. A. Girault, H. Kalla, M. Sighireanu, and Y. Sorel. An algorithm for automatically obtaining distributed and fault-tolerant static sched- ules. In Proceedings of International Conference on Dependable Sys- tems and Networks, DSN’03, San Francisco, California, USA, June 2003. 8. Safouan Taha, Ansgar Radermacher, Sebastien Gerard, and Jean-Luc Dekeyser. An open framework for detailed hardware modeling. In SIES, pages 118–125. IEEE, 2007. 9. A. Vicard and Y. Sorel. Formalization and static optimization for parallel implementations. In Proceedings of Workshop on Distributed and Parallel Systems, DAPSYS’98, Budapest, Hungary, September 1998.