=Paper= {{Paper |id=Vol-2581/aviose2020paper4 |storemode=property |title=Decentralised Avionics and Software Architecture for Sounding Rocket Missions |pdfUrl=https://ceur-ws.org/Vol-2581/aviose2020paper4.pdf |volume=Vol-2581 |authors=Jasminka Matevska ,Enrico Noack,Manuel Reinhold,Eike-Kristian Diekmann |dblpUrl=https://dblp.org/rec/conf/se/MatevskaNRD20 }} ==Decentralised Avionics and Software Architecture for Sounding Rocket Missions== https://ceur-ws.org/Vol-2581/aviose2020paper4.pdf
   Decentralised Avionics and Software Architecture
            for Sounding Rocket Missions
                      Jasminka Matevska                                                            Enrico Noack
                      Hochschule Bremen                                                   Airbus Defence and Space GmbH
                       Bremen, Germany                                                           Bremen, Germany
               jasminka.matevska@hs-bremen.de                                                enrico.noack@airbus.com

                       Manuel Reinhold                                                         Eike-Kristian Diekmann
                      Hochschule Bremen                                                          Hochschule Bremen
                      Bremen, Germany                                                             Bremen, Germany
                 mreinhold@stud.hs-bremen.de                                                ediekmann@stud.hs-bremen.de

   Abstract — This paper describes our ongoing work in the
context of the TEXUS/MAXUS sounding rocket program.
Based on analysis of requirements, technologies and tools, we
propose a solution to cope with increasing number of software
applications and hardware components due to decentralisation
of the communication system based on the OPC UA
communication standard for distributed services. Our main
goal is to provide an efficient avionics and software architecture
configuration for both the initial development and the
maintenance while assuring consistency and increasing
availability and reliability of the system for different experiment
and mission scenarios.

    Keywords — decentralised avionics and software architecture,
sounding rockets, hardware / software interfaces, sensor data,
experiment control, decentralised / distributed services, “Industrie
4.0”, OPC UA, configuration, monitoring, error handling,
availability, and reliability

                       I. INTRODUCTION
    Since April 2017, the flight of the MAXUS 9 rocket, a
framework from the „Industrie 4.0“agenda [1] is used to
control the spacecraft experiments. This agenda provides a
platform for automation and data exchange in industrial
context. Similar tasks are required for spacecraft operation.
For example, on-board each TEXUS/MAXUS sounding
rocket, a control of three to five experiments is performed. The
responsible scientists from various disciplines and experiment
engineers monitor these experiments. Each experiment has to
be connected to the system and its sensor data has to be
collected in order to establish the appropriate control. That is
why it is obvious that an „Industrie 4.0“platform is a good
candidate as a reference system for spacecraft control [2]. The
transition to the new system enables features that are very
useful and reasonable, but it is challenging since it requires
new concepts as shown in this paper.
                                                                        Fig. 1. TEXUS/MAXUS Avionics and Software Architecture
    In the conventional TEXUS/MAXUS sounding rocket
system (since December 1977), a purely centralised data                 for implementing OPC UA is based on different trade-offs
exchange between space and ground was in use. There was no              performed by experts and students documented within the
network connection between the flight and ground computer.              master thesis [5]. The reference avionics and software
For the data exchange, a proprietary communication protocol             architecture is presented in Fig. 1.
was in use, which required specialised hardware. It was not                Furthermore, now it is possible to perform the experiment
possible to operate a single experiment without the specialised         on various execution platforms such as parabolic flight, drop
hardware.                                                               tower and even in the laboratory from the scientists
    The replacement of the proprietary communication with               themselves. However, this decentralisation is leading to two
the standardised Ethernet/IP (Internet Protocol)-based                  basic challenges [6], [7], and [8].
interface and OPC UA (Open Platform Communications                          The first challenge arises when the experiment goes on a
Unified Architecture) [3], [4] enables the operation of an              campaign on its own. Some of the “home” services must be
experiment with just one standard laptop (or PC). The decision          also available during this campaign. That can be as well


© Matevska, Noack, Reinhold, Diekmann 2020
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
simple services like DHCP/DNS (Dynamic Host                       C. Recovery
Configuration Protocol / Domain Name System), NTP                 “An error is that part of the system state that can cause a
(Network Time Protocol), as also some more sophisticated          subsequent failure. An error is detected if its presence is
services like backup services or telemetry data storage.
                                                                  indicated by an error message or error signal” [9]. If an error
Therefore, parts of the services must join the campaign or
                                                                  occurs in the system, this shall be recognized and a
must be globally available. Another important point at this
scenario is the consistency of the overall system. The            corresponding action should be proposed or executed in
telemetry data / software produced during the particular          order to prevent any failure. Furthermore, if certain software
campaign must be re-integrated into the system that stayed at     is no longer functional, it shall be possible to easily and
home.                                                             quickly recover from the failure and set up a new system.
                                                                  This system shall be identical to the initial system before its
    The second challenge is the growing number of                 failure.
computers. Due to distribution of services, the different
functions are deployed on different hardware platforms.           D. Transparent Interface
Even the Ground System Equipment has today a separate data            After recovery, the user shall have an unmodified interface
interface. While in the past, only two computers where used       (hardware & software configuration). Windows 10 is used as
(one on-board and one on ground) to control the experiment,       the operating system. The reason for this is that users can
today more than six computers are common. Three are in use        operate and manage Windows machines themselves. In
in the flight system (experiment control, data services, video    addition, software is used that is only available for Windows.
services) and three on ground (Ground Support Equipment
with an own data interface, separate control stations for         E. Mobile Systems
scientists and engineers). The overall network is hosting today       It shall be possible for the ground system to be used on
more than 30 computers, whereby each single is important for      different locations (for different scenarios) with a full
the mission success. We need to keep track of each single         functionality. The system and also the required services must
service. Additionally, the system configuration is changing       be available offline during a mission
quickly. Every two years three sounding rockets with different
experiments are launched, each having its own configuration       F. Effort
adapted to the specific scientific needs. The growth of               The effort for the configuration and maintenance of new
hardware and software hast to be managed without decreasing       system shall be as low as possible. The resulting costs can be
availability and at the same time without increasing the effort   considered secondary. Here the trade-off between costs and
to maintain such a system.                                        effort has to be considered. The reduced effort can save
                                                                  working hours, which the employees can use efficient for
   This paper presents our ongoing work on appropriate            other engineering work. This finally reduces the overall costs.
concepts in order to answer these requests.
                                                                      G.    Availability / Reliability
  II. REQUIREMENTS ON THE REFERENCE ARCHITECTURE
                                                                      “Availability is a system’s readiness for correct service.
    In order to meet the increased requirements on the ground     Reliability is a system’s ability to continuously deliver correct
system a reference decentralised avionics and software system     service” [9]. In order to carry out any space and thus a
architecture is developed. By keeping the effort for              sounding rocket mission, many different sub-systems have to
configuration, maintenance and update of the system as low        be available and properly work together. Starting with an
as possible, we can guarantee a permanent stability and           appropriate mission, spacecraft and sub-system design to the
consistency thus providing high availability and reliability of   space mission operation, the avionics and software system
the system. .                                                     components are the link between the spacecraft and the
    The requirements include several criteria that shall be met   Ground Utilities. Therefore, we have to ensure that they are
by the reference architecture. These are listed as follows.       available and reliable operating as specified.
A. Controlled Environment / Scenarios                                                III. PROPOSED CONCEPT
    The system functions shall stay stable and comprehensible         We performed an analysis of the requirements, suitable
in the following scenarios:                                       technologies and tools in order to find an appropriate solution.
                                                                  We decided that a DevOps (Development/IT-Operations)
   1) Ground testing with an experiment in the laboratory
                                                                  toolchain/pipeline is suitable for fulfilling the criteria, as it is
   2) Scientific tests with an experiment on a parabolic          capable of automating the setup of user instances as far as
      flight                                                      possible. In a trade-off, we compared several concepts. We
                                                                  analysed the advantages and disadvantages of the considered
   3) System tests with different experiments                     concepts and their suitability for meeting the requirements.
   4) TEXUS/MAXUS Flight Operations                               Subsequently, a decision was made in favour of the proposed
                                                                  concept. For the TEXUS/MAXUS specific environment, a
   5) Post-Flight Evaluation                                      pipeline built of the tools mainly from HashiCorp
B. Modular Architecture                                           (https://www.hashicorp.com/) is considered suitable.
                                                                  HashiCorp provides products for the provisioning and
    Changes to a configuration in a particular scenario (e.g.     configuration of individual systems up to system landscapes.
software updates) shall not affect the correctness,               An optimal solution can be achieved with the tools Packer,
functionality and executability of other configurations.          Vagrant and Ansible. Ansible was not developed by
                                                                  HashiCorp, but is an important part of the pipeline.
A. Tools
    Packer is used to create machine images. These
       images can be created from a single source
       configuration for multiple platforms such as Amazon
       Machine Images (AMI) for Amazon Elastic Compute
       Cloud (EC2), VMDK (virtual discs) and VMX
       (configuration files) for VMware or OVF (Open
       Virtualization Format) exports for VirtualBox.
       Vagrant is an application for creating and managing         Fig. 2. Concept Immutable Infrastructure
        virtual machines (VM). With Vagrant it is possible to
        create and manage complete virtual machine                  C. Other considered Architectures
        environments with a single workflow. This drastically           Another approach was to outsource all systems and
        reduces the setup time of the development                   services to a public cloud. From a technological point of view
        environment.
                                                                    it would be a good approach. Provisioning and configuration
       Ansible is a tool that automates the configuration and      would also be a lot better and easier. The shortcoming of this
        administration of systems. This ranges from simple to       option is that the system would no longer function or be
        highly complex tasks. Only SSH (secure shell) access        accessible in offline mode.
        is required to access remote systems and the system             The On Premise Configuration was also considered. Only
        can be managed without any additional software.             the tool Ansible would be necessary. The disadvantage of this
B. Architecture                                                     approach is that the actually consistent setup is disturbed by
                                                                    manual actions of users (installation of additional software,
    For the implementation, a server is set up for configuration    misconfigurations). Correcting these actions individually
and deployment. The server has the tools for provisioning,
                                                                    would be very time-consuming.
configuration, execution, and testing of VM images as
mentioned in the previous section. The required software and               IV. IMPROVING AVAILABILITY / RELIABILITY
the fully set up VM images are made available to the servers,
laptops and computers in the network, using file share service.         A high level of availability of all systems is required to
The provisioning of the VM images is handled by the tool            operate the ground station. System errors and lack of resources
Packer. Packer uses files in JSON (JavaScript Object                must be recognized in a short time or even predicted in order
Notation) and XML (Extensible Markup Language) format               to be able to take countermeasures and prevent a system
for the description. The subsequent configuration of the            failure. To assess the system status, information about the
provisioned VM images is done with Ansible.                         systems have to be recorded and evaluated according to
                                                                    predefined rules.
    Depending on the component, different playbooks are
used, which support the required software installations and              The collected information has to be integrated into the
execute them. After completion of the VM Images, this can be        system communication interface concept in use. The
tested with Vagrant. Using the command line, the Vagrant tool       TEXUS/MAXUS project uses the open “Industrie 4.0”
can start and run a virtual machine in Virtualbox in minutes.       standard OPC UA to provide, for example, telemetry data and
This allows the engineer to test the functionality of the virtual   the data from scientific experiments. We are working on
machine. Since no automated tests are available for                 extending the existing monitoring system, in order to integrate
Infrastructure as Code, the only way to do this is to manually      it into the OPC UA infrastructure and include monitoring data
review the built virtual machine. The effort for this is limited    analysis. Fig. 3 shows the proposed components extending the
to a minimum. Once the virtual machine has been tested, it can      existing system.
be made available to the other engineers and scientists. For
this purpose the image is released in the file share. The written
code is also committed and pushed into the Git repository for
versioning. The underlying concept of this architecture is also
called Immutable Infrastructure. A schematic procedure is
presented in presented in Fig. 2.
    The chosen concept ensures that the requirements are met
very well. The controlled environment can be guaranteed by
using Infrastructure as Code, because the state of the system
is always identical, as it is never modified after deployment,
thus ensuring the transparent interface.
    Since configurations are encapsulated in a virtual
machine, it can be ensured that other configurations are not
affected. This also makes it easier to recover failed systems
and configurations. In addition, online services have been
avoided as far as possible for the use of the concept, so that
offline operation is possible.
                                                                       Fig. 3. TEXUS/MAXUS Error Detection Architecture
Due to a high degree of automation and the use of open source
tools, the resulting effort can be kept low.
A. Information Collection                                         C. Error / Fault / Failure Occurrence, Scenario Definition
    The telemetry data and data from the experiments are             and Classification
already provided as OPC UA nodes. Additional necessary               TEXUS/MAXUS developers store error, fault and failure
information shall be collected from infrastructure,               occurrences, identified scenarios and their classification using
development stations and other PC systems. At                     common tools such as Microsoft Excel. The assessment of the
TEXUS/MAXUS, these systems are mainly operated with               monitoring and analysis system is based on these entries.
Windows operating systems. On these systems, so-called
agents,    that     implement     Windows        Management       D. Recovery Actions
Instrumentation (WMI), are used to collect necessary                  If an error announces itself by a fault or a failure has
information. Furthermore, our systems collects information on     already occurred, the monitoring system reports this event
processes, services, resources such as CPU, RAM, hard disk        with the corresponding criticality and recommends actions to
space, network status, etc. They aremade available as OPC         remedy the problem. In addition, the failure will be traceable
UA nodes shown as “Services & Resources” as well as               hierarchically to the fault as the origin of the error. This helps
“Adapter OPC UA” in Fig. 3.                                       to narrow down the errors and correct them. Measures and
                                                                  rules for detecting and correction errors are provided by
B. Information Rating                                             experts in the knowledge and rule set database.
    The information provided can be fetched centrally from
the Health Status Server via the OPC UA gateway. This                By continuous monitoring of all systems with appropriate
information is then called up by pre-processing, where error      recovery actions in the case of errors and failures, we can
detection and error evaluation is performed. If necessary,        achievehigh availability and reliability of the system.
measures for problem solving are proposed or carried out.                                       SUMMARY
Furthermore, a distinction hast to be made between different
application scenarios in order to select the corresponding            This paper shows a work in progress within the
method.                                                           TEXUS/MAXUS sounding rocket program. A standardised
                                                                  reference system based on “Industrie 4.0” OPC UA
C. Relevant Scenarios                                             communication platform is facing new challenges due to
    We consider the application scenarios 1) Ground testing       distribution of services, and increasing number of software
with an experiment in the laboratory, 3) System tests with        applications and hardware components (mainly PCs and
different experiments and 4) TEXUS/MAXUS Flight                   laptops). The configuration and maintenance of the systems
Operations from section II.A for the monitoring and analysis      for different experiments and mission scenarios shall be
of the system.                                                    provided in an efficient and consistent way, monitoring,
                                                                  information collection and error handling including recovery
                   V. ERROR HANDLING                              mechanisms shall be implemented in order to improve the
    According to the requirements, we propose the following       availability of the systems. Based on requirements,
rule based error handling approach.                               technology and tool analysis we propose an appropriate
                                                                  avionics and software architecture for sounding rocket
A. Error Detection                                                systems and missions. Currently we are working on
    The approach of an expert system based on a knowledge         implementation of the proposed solutions.
database filled by specialists was chosen for error detection.
Since a rocket launch is a rare event, AI (Artificial                                         REFERENCES
Intelligence) approaches such as neural networks or deep          [1]  Bundesministerium für Wirtschaft und Energie, Bundesministerium
learning are only suitable to a limited extent, since many             für Bildung und            Forschung. Plattform     Industrie 4.0.
                                                                       https://www.plattform-i40.de.
training data is required there. In addition, operators and
                                                                  [2] E. Diekmann, M. Reinhold, J. Matevska, E. Noack. „Idustrie 4.0 in der
developers are required to ensure that errors are under the            Raumfahrt“. Deutscher Luft und –Raumfahrt Kongress 2018. „Luft-
control of specialists. A distinction is made between error            und Raumfahrt – Digitalisierung und Vernetzung“. Sep. 2018.
scenarios because the different scenarios run different systems   [3] OPC Unifed Architecture Foundation. url: https://opcfoundation.org/
and processes and intentionally show different behaviour.         [4] Freeopcua Project. OPCUA Server and Client implementation. Aug.
                                                                       2017. url: http://freeopcua.github.io/.
B. Error Rating
                                                                  [5] P. Kathmann, J. Scheichel. Masterthesis: „IP-Kommunikation auf
    An error evaluation takes place depending on the                   Forschungsraketen“. Hochschule Bremen. Sep. 2017
application scenario. The criticality differs depending on the    [6] J. Matevska, “ibacus (IP-BAsed CommUnication in Space)”,
application scenario and is divided into the following                 Presentation ESC Kiruna, Mai 2018.
categories according to VDMA (Verband Deutscher                   [7] P. Grashorn, A. Stein, E. Noack, „TEXUS 2.0: Neue Konzepte für
Maschinen- und Anlagenbau e. V.) standard sheet 24582 [10]:            Raketenexperimente in der Zukunft“, Presentation ESC Kiruna, Mai
                                                                       2018.
       Defect / error                                            [8] E. Noack, J. Matevska. “TEXUS made in Bremen - Neues Datensystem
                                                                       für TEXUS kommt aus Bremen“, Presentation, Sternstunden 2018.
       Critical condition                                        [9] A. Avizienis, J.-C. Laprie, B. Randell, C. Landwehr. Basic Concepts
       Warning                                                        and Taxonomy of Dependable and Secure Computing. In: IEEE
                                                                       Transactions on Dependable and Secure Computing 1, 2004, Nr. 1, S.
       Good                                                           11 – 33
                                                                  [10] VDMA Verband Deutscher maschinen- und Anlagenbauer e. V.
       No status statement                                            Einheitsblatt 24582: Feldbusneutrale Referenzarchitektur für
                                                                       Condition Monitoring in Fabrikautomation, Berlin: Beuth Verlag
                                                                       GmbH, April 2014