Decentralised Avionics and Software Architecture for Sounding Rocket Missions Jasminka Matevska Enrico Noack Hochschule Bremen Airbus Defence and Space GmbH Bremen, Germany Bremen, Germany jasminka.matevska@hs-bremen.de enrico.noack@airbus.com Manuel Reinhold Eike-Kristian Diekmann Hochschule Bremen Hochschule Bremen Bremen, Germany Bremen, Germany mreinhold@stud.hs-bremen.de ediekmann@stud.hs-bremen.de Abstract — This paper describes our ongoing work in the context of the TEXUS/MAXUS sounding rocket program. Based on analysis of requirements, technologies and tools, we propose a solution to cope with increasing number of software applications and hardware components due to decentralisation of the communication system based on the OPC UA communication standard for distributed services. Our main goal is to provide an efficient avionics and software architecture configuration for both the initial development and the maintenance while assuring consistency and increasing availability and reliability of the system for different experiment and mission scenarios. Keywords — decentralised avionics and software architecture, sounding rockets, hardware / software interfaces, sensor data, experiment control, decentralised / distributed services, “Industrie 4.0”, OPC UA, configuration, monitoring, error handling, availability, and reliability I. INTRODUCTION Since April 2017, the flight of the MAXUS 9 rocket, a framework from the „Industrie 4.0“agenda [1] is used to control the spacecraft experiments. This agenda provides a platform for automation and data exchange in industrial context. Similar tasks are required for spacecraft operation. For example, on-board each TEXUS/MAXUS sounding rocket, a control of three to five experiments is performed. The responsible scientists from various disciplines and experiment engineers monitor these experiments. Each experiment has to be connected to the system and its sensor data has to be collected in order to establish the appropriate control. That is why it is obvious that an „Industrie 4.0“platform is a good candidate as a reference system for spacecraft control [2]. The transition to the new system enables features that are very useful and reasonable, but it is challenging since it requires new concepts as shown in this paper. Fig. 1. TEXUS/MAXUS Avionics and Software Architecture In the conventional TEXUS/MAXUS sounding rocket system (since December 1977), a purely centralised data for implementing OPC UA is based on different trade-offs exchange between space and ground was in use. There was no performed by experts and students documented within the network connection between the flight and ground computer. master thesis [5]. The reference avionics and software For the data exchange, a proprietary communication protocol architecture is presented in Fig. 1. was in use, which required specialised hardware. It was not Furthermore, now it is possible to perform the experiment possible to operate a single experiment without the specialised on various execution platforms such as parabolic flight, drop hardware. tower and even in the laboratory from the scientists The replacement of the proprietary communication with themselves. However, this decentralisation is leading to two the standardised Ethernet/IP (Internet Protocol)-based basic challenges [6], [7], and [8]. interface and OPC UA (Open Platform Communications The first challenge arises when the experiment goes on a Unified Architecture) [3], [4] enables the operation of an campaign on its own. Some of the “home” services must be experiment with just one standard laptop (or PC). The decision also available during this campaign. That can be as well © Matevska, Noack, Reinhold, Diekmann 2020 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). simple services like DHCP/DNS (Dynamic Host C. Recovery Configuration Protocol / Domain Name System), NTP “An error is that part of the system state that can cause a (Network Time Protocol), as also some more sophisticated subsequent failure. An error is detected if its presence is services like backup services or telemetry data storage. indicated by an error message or error signal” [9]. If an error Therefore, parts of the services must join the campaign or occurs in the system, this shall be recognized and a must be globally available. Another important point at this scenario is the consistency of the overall system. The corresponding action should be proposed or executed in telemetry data / software produced during the particular order to prevent any failure. Furthermore, if certain software campaign must be re-integrated into the system that stayed at is no longer functional, it shall be possible to easily and home. quickly recover from the failure and set up a new system. This system shall be identical to the initial system before its The second challenge is the growing number of failure. computers. Due to distribution of services, the different functions are deployed on different hardware platforms. D. Transparent Interface Even the Ground System Equipment has today a separate data After recovery, the user shall have an unmodified interface interface. While in the past, only two computers where used (hardware & software configuration). Windows 10 is used as (one on-board and one on ground) to control the experiment, the operating system. The reason for this is that users can today more than six computers are common. Three are in use operate and manage Windows machines themselves. In in the flight system (experiment control, data services, video addition, software is used that is only available for Windows. services) and three on ground (Ground Support Equipment with an own data interface, separate control stations for E. Mobile Systems scientists and engineers). The overall network is hosting today It shall be possible for the ground system to be used on more than 30 computers, whereby each single is important for different locations (for different scenarios) with a full the mission success. We need to keep track of each single functionality. The system and also the required services must service. Additionally, the system configuration is changing be available offline during a mission quickly. Every two years three sounding rockets with different experiments are launched, each having its own configuration F. Effort adapted to the specific scientific needs. The growth of The effort for the configuration and maintenance of new hardware and software hast to be managed without decreasing system shall be as low as possible. The resulting costs can be availability and at the same time without increasing the effort considered secondary. Here the trade-off between costs and to maintain such a system. effort has to be considered. The reduced effort can save working hours, which the employees can use efficient for This paper presents our ongoing work on appropriate other engineering work. This finally reduces the overall costs. concepts in order to answer these requests. G. Availability / Reliability II. REQUIREMENTS ON THE REFERENCE ARCHITECTURE “Availability is a system’s readiness for correct service. In order to meet the increased requirements on the ground Reliability is a system’s ability to continuously deliver correct system a reference decentralised avionics and software system service” [9]. In order to carry out any space and thus a architecture is developed. By keeping the effort for sounding rocket mission, many different sub-systems have to configuration, maintenance and update of the system as low be available and properly work together. Starting with an as possible, we can guarantee a permanent stability and appropriate mission, spacecraft and sub-system design to the consistency thus providing high availability and reliability of space mission operation, the avionics and software system the system. . components are the link between the spacecraft and the The requirements include several criteria that shall be met Ground Utilities. Therefore, we have to ensure that they are by the reference architecture. These are listed as follows. available and reliable operating as specified. A. Controlled Environment / Scenarios III. PROPOSED CONCEPT The system functions shall stay stable and comprehensible We performed an analysis of the requirements, suitable in the following scenarios: technologies and tools in order to find an appropriate solution. We decided that a DevOps (Development/IT-Operations) 1) Ground testing with an experiment in the laboratory toolchain/pipeline is suitable for fulfilling the criteria, as it is 2) Scientific tests with an experiment on a parabolic capable of automating the setup of user instances as far as flight possible. In a trade-off, we compared several concepts. We analysed the advantages and disadvantages of the considered 3) System tests with different experiments concepts and their suitability for meeting the requirements. 4) TEXUS/MAXUS Flight Operations Subsequently, a decision was made in favour of the proposed concept. For the TEXUS/MAXUS specific environment, a 5) Post-Flight Evaluation pipeline built of the tools mainly from HashiCorp B. Modular Architecture (https://www.hashicorp.com/) is considered suitable. HashiCorp provides products for the provisioning and Changes to a configuration in a particular scenario (e.g. configuration of individual systems up to system landscapes. software updates) shall not affect the correctness, An optimal solution can be achieved with the tools Packer, functionality and executability of other configurations. Vagrant and Ansible. Ansible was not developed by HashiCorp, but is an important part of the pipeline. A. Tools  Packer is used to create machine images. These images can be created from a single source configuration for multiple platforms such as Amazon Machine Images (AMI) for Amazon Elastic Compute Cloud (EC2), VMDK (virtual discs) and VMX (configuration files) for VMware or OVF (Open Virtualization Format) exports for VirtualBox.  Vagrant is an application for creating and managing Fig. 2. Concept Immutable Infrastructure virtual machines (VM). With Vagrant it is possible to create and manage complete virtual machine C. Other considered Architectures environments with a single workflow. This drastically Another approach was to outsource all systems and reduces the setup time of the development services to a public cloud. From a technological point of view environment. it would be a good approach. Provisioning and configuration  Ansible is a tool that automates the configuration and would also be a lot better and easier. The shortcoming of this administration of systems. This ranges from simple to option is that the system would no longer function or be highly complex tasks. Only SSH (secure shell) access accessible in offline mode. is required to access remote systems and the system The On Premise Configuration was also considered. Only can be managed without any additional software. the tool Ansible would be necessary. The disadvantage of this B. Architecture approach is that the actually consistent setup is disturbed by manual actions of users (installation of additional software, For the implementation, a server is set up for configuration misconfigurations). Correcting these actions individually and deployment. The server has the tools for provisioning, would be very time-consuming. configuration, execution, and testing of VM images as mentioned in the previous section. The required software and IV. IMPROVING AVAILABILITY / RELIABILITY the fully set up VM images are made available to the servers, laptops and computers in the network, using file share service. A high level of availability of all systems is required to The provisioning of the VM images is handled by the tool operate the ground station. System errors and lack of resources Packer. Packer uses files in JSON (JavaScript Object must be recognized in a short time or even predicted in order Notation) and XML (Extensible Markup Language) format to be able to take countermeasures and prevent a system for the description. The subsequent configuration of the failure. To assess the system status, information about the provisioned VM images is done with Ansible. systems have to be recorded and evaluated according to predefined rules. Depending on the component, different playbooks are used, which support the required software installations and The collected information has to be integrated into the execute them. After completion of the VM Images, this can be system communication interface concept in use. The tested with Vagrant. Using the command line, the Vagrant tool TEXUS/MAXUS project uses the open “Industrie 4.0” can start and run a virtual machine in Virtualbox in minutes. standard OPC UA to provide, for example, telemetry data and This allows the engineer to test the functionality of the virtual the data from scientific experiments. We are working on machine. Since no automated tests are available for extending the existing monitoring system, in order to integrate Infrastructure as Code, the only way to do this is to manually it into the OPC UA infrastructure and include monitoring data review the built virtual machine. The effort for this is limited analysis. Fig. 3 shows the proposed components extending the to a minimum. Once the virtual machine has been tested, it can existing system. be made available to the other engineers and scientists. For this purpose the image is released in the file share. The written code is also committed and pushed into the Git repository for versioning. The underlying concept of this architecture is also called Immutable Infrastructure. A schematic procedure is presented in presented in Fig. 2. The chosen concept ensures that the requirements are met very well. The controlled environment can be guaranteed by using Infrastructure as Code, because the state of the system is always identical, as it is never modified after deployment, thus ensuring the transparent interface. Since configurations are encapsulated in a virtual machine, it can be ensured that other configurations are not affected. This also makes it easier to recover failed systems and configurations. In addition, online services have been avoided as far as possible for the use of the concept, so that offline operation is possible. Fig. 3. TEXUS/MAXUS Error Detection Architecture Due to a high degree of automation and the use of open source tools, the resulting effort can be kept low. A. Information Collection C. Error / Fault / Failure Occurrence, Scenario Definition The telemetry data and data from the experiments are and Classification already provided as OPC UA nodes. Additional necessary TEXUS/MAXUS developers store error, fault and failure information shall be collected from infrastructure, occurrences, identified scenarios and their classification using development stations and other PC systems. At common tools such as Microsoft Excel. The assessment of the TEXUS/MAXUS, these systems are mainly operated with monitoring and analysis system is based on these entries. Windows operating systems. On these systems, so-called agents, that implement Windows Management D. Recovery Actions Instrumentation (WMI), are used to collect necessary If an error announces itself by a fault or a failure has information. Furthermore, our systems collects information on already occurred, the monitoring system reports this event processes, services, resources such as CPU, RAM, hard disk with the corresponding criticality and recommends actions to space, network status, etc. They aremade available as OPC remedy the problem. In addition, the failure will be traceable UA nodes shown as “Services & Resources” as well as hierarchically to the fault as the origin of the error. This helps “Adapter OPC UA” in Fig. 3. to narrow down the errors and correct them. Measures and rules for detecting and correction errors are provided by B. Information Rating experts in the knowledge and rule set database. The information provided can be fetched centrally from the Health Status Server via the OPC UA gateway. This By continuous monitoring of all systems with appropriate information is then called up by pre-processing, where error recovery actions in the case of errors and failures, we can detection and error evaluation is performed. If necessary, achievehigh availability and reliability of the system. measures for problem solving are proposed or carried out. SUMMARY Furthermore, a distinction hast to be made between different application scenarios in order to select the corresponding This paper shows a work in progress within the method. TEXUS/MAXUS sounding rocket program. A standardised reference system based on “Industrie 4.0” OPC UA C. Relevant Scenarios communication platform is facing new challenges due to We consider the application scenarios 1) Ground testing distribution of services, and increasing number of software with an experiment in the laboratory, 3) System tests with applications and hardware components (mainly PCs and different experiments and 4) TEXUS/MAXUS Flight laptops). The configuration and maintenance of the systems Operations from section II.A for the monitoring and analysis for different experiments and mission scenarios shall be of the system. provided in an efficient and consistent way, monitoring, information collection and error handling including recovery V. ERROR HANDLING mechanisms shall be implemented in order to improve the According to the requirements, we propose the following availability of the systems. Based on requirements, rule based error handling approach. technology and tool analysis we propose an appropriate avionics and software architecture for sounding rocket A. Error Detection systems and missions. Currently we are working on The approach of an expert system based on a knowledge implementation of the proposed solutions. database filled by specialists was chosen for error detection. Since a rocket launch is a rare event, AI (Artificial REFERENCES Intelligence) approaches such as neural networks or deep [1] Bundesministerium für Wirtschaft und Energie, Bundesministerium learning are only suitable to a limited extent, since many für Bildung und Forschung. Plattform Industrie 4.0. https://www.plattform-i40.de. training data is required there. In addition, operators and [2] E. Diekmann, M. Reinhold, J. Matevska, E. Noack. „Idustrie 4.0 in der developers are required to ensure that errors are under the Raumfahrt“. Deutscher Luft und –Raumfahrt Kongress 2018. „Luft- control of specialists. A distinction is made between error und Raumfahrt – Digitalisierung und Vernetzung“. Sep. 2018. scenarios because the different scenarios run different systems [3] OPC Unifed Architecture Foundation. url: https://opcfoundation.org/ and processes and intentionally show different behaviour. [4] Freeopcua Project. OPCUA Server and Client implementation. Aug. 2017. url: http://freeopcua.github.io/. B. Error Rating [5] P. Kathmann, J. Scheichel. Masterthesis: „IP-Kommunikation auf An error evaluation takes place depending on the Forschungsraketen“. Hochschule Bremen. Sep. 2017 application scenario. The criticality differs depending on the [6] J. Matevska, “ibacus (IP-BAsed CommUnication in Space)”, application scenario and is divided into the following Presentation ESC Kiruna, Mai 2018. categories according to VDMA (Verband Deutscher [7] P. Grashorn, A. Stein, E. Noack, „TEXUS 2.0: Neue Konzepte für Maschinen- und Anlagenbau e. V.) standard sheet 24582 [10]: Raketenexperimente in der Zukunft“, Presentation ESC Kiruna, Mai 2018.  Defect / error [8] E. Noack, J. Matevska. “TEXUS made in Bremen - Neues Datensystem für TEXUS kommt aus Bremen“, Presentation, Sternstunden 2018.  Critical condition [9] A. Avizienis, J.-C. Laprie, B. Randell, C. Landwehr. Basic Concepts  Warning and Taxonomy of Dependable and Secure Computing. In: IEEE Transactions on Dependable and Secure Computing 1, 2004, Nr. 1, S.  Good 11 – 33 [10] VDMA Verband Deutscher maschinen- und Anlagenbauer e. V.  No status statement Einheitsblatt 24582: Feldbusneutrale Referenzarchitektur für Condition Monitoring in Fabrikautomation, Berlin: Beuth Verlag GmbH, April 2014