=Paper= {{Paper |id=Vol-3610/paper-01 |storemode=property |title=Securing Communication in the Field: Protecting Geo-distributed Computing in an Untrusted Environment |pdfUrl=https://ceur-ws.org/Vol-3610/paper-01.pdf |volume=Vol-3610 |authors=Olivier Gilles,David Faura,Daniel Gracia Pérez |dblpUrl=https://dblp.org/rec/conf/cesar/GillesFP23 }} ==Securing Communication in the Field: Protecting Geo-distributed Computing in an Untrusted Environment== https://ceur-ws.org/Vol-3610/paper-01.pdf
                                Securing communication on the field: Protecting
                                geo-distributed computing in an untrusted
                                environment
                                Olivier Gilles1 , David Faura1 and Daniel Gracia Pérez1
                                1
                                    Thales Research & Technology, 1 avenue Augustin Fresnel, 91767 Palaiseau, France


                                                                         Abstract
                                                                         As the number of geo-distributed connected sensors and the need to perform complex functionality
                                                                         increase in Industrial IoT-based systems, so does the transferred data volume. Mainframes and even edge
                                                                         computing show their limits when facing this intensive computing. To answer this challenge, an emerging
                                                                         trend, called Industrial Continuous Computing (ICC), advocates for truly distributed computation in
                                                                         the network, from the Smart Peripheral Devices (SPD) to the end-server or actuators. This, in turn,
                                                                         raises the issue of communication security. In this context, protecting communications in a dynamic
                                                                         architecture is challenging, as attackers may have physical access to a legitimate device. Surface attack on
                                                                         the communication includes confidentiality and integrity of the data, which in turn requires integrity of
                                                                         the software stack manipulating them. Breaches on these properties are likely to lead to secret exposure
                                                                         or sabotage.
                                                                             We present a novel approach based on open source software and secure hardware aiming to ensure end-
                                                                         to-end security for communication as well as securing devices authentication, enabling the confidentiality
                                                                         and integrity of data exchanges between the different SPDs and the end-server. Our approach uses
                                                                         publish-subscribe communication protocols (i.e. many-to-many) to build scalable and dynamic computing
                                                                         architecture. Amongst them, OPC UA PubSub provides security and interoperability to the ICC, allowing
                                                                         end-to-end encryption. This security however relies on securely embedded secrets on the node. We
                                                                         further secure authentication against specific threats to ICC, by using a Trusted Platform Module (TPM)
                                                                         as a secure element to protect the secrets embedded on devices, and relying on secure boot to protect
                                                                         this secure element against attackers. By doing so, we were able to set up fully secure a geo-distributed
                                                                         industrial computing infrastructure dedicated to the monitoring of railway facilities, connecting sensors,
                                                                         edge devices and data server in order to perform predictive maintenance.




                                1. Introduction
                                Industrial systems are constituted of four kind of nodes: sensors, actuators, controllers and
                                computers. The most common architecture is a system where a supervisor component imple-
                                menting both control and computing functions is connected to a set of slave devices, which
                                can be either sensors or actuators. With the important increase of data and computation needs,
                                supervisor components become unable to provide enough computation power to perform the
                                needed operations.

                                C&ESAR 2023 by DGA, November 21–23, 2023, Rennes, France
                                Envelope-Open olivier.gilles@thalesgroup.com (O. Gilles); david.faura@thalesgroup.com (D. Faura);
                                daniel.gracia-perez@thalesgroup.com (D. Gracia Pérez)
                                Orcid 0000-0002-3776-2071 (O. Gilles); 0009-0004-9416-8855 (D. Faura); 0000-0002-5364-8244 (D. Gracia Pérez)
                                                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)




                                Proceedings of the 30th C&ESAR (2023)                                                                                                                                            21
CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Securing Communication in the Field: Protecting Geo-distributed Computing...


   In traditional Industrial Control Systems such as SCADA systems, none of these compo-
nents are connected to open networks, and most of them rely on field buses, using wired
communications and proprietary protocols (e.g. PROFINET, EtherCAT).
   In Industrial Internet of Things (IIoT), as opposed to the more generic IoT context, the things
(devices, but also supervisors) are rarely directly connected to open networks such as Internet.
Indeed, the compromise of any device of the system could lead to catastrophic results [1, 2]. In
the other hand, computing nodes necessary to perform computing-intensive functions, but also
access to third-party sensors, monitoring nodes or even actuators demand data to cross open
networks. Indeed, in order to ensure data security during this odyssey, one must ensure (1) the
path safety and (2) whether it actually goes to its intended destination.
   While these issues are well-treated in traditional IT systems, IIoT differs from those in that
nodes can typically be physically accessed by users, or even by any bystanders, including
potential attackers. This leads to a whole new class of possible attacks where the attacker must
be assumed to have total control on memory (through memory dumping or fault injection [3]),
and that we must protect the system against.
   In this context that requires security for both data and software, we focus our analysis on
existing open protocols. We present in this paper a secure system architecture based on such
open protocols, namely MQTT, OPC UA PubSub, and on a hardware security element, the TPM,
and how we can articulate them to protect the nodes against attackers benefiting from physical
access to the device and reasonable hacking skills.
   The remainder of this paper is organized as follows: Section 2 introduces key concepts needed
for distributed computing in critical systems and their implications in terms of security needs on
the system’s equipements, as well as the industry challenge of predictive maintenance; Section 3
describes the OPC UA protocol, which is commonly used in industrial systems and candidates
for IIoT, as well as our contribution, a solution to secure authentication by ensuring secret
confidentiality and integrity in the SPD; in Section 4, we describe the industry challenge of
predictive maintenance, as well as the prototype relying on our solution to address its security
and connectivity issues; finally, we conclude about further enhancement in the conclusion.


2. Distributing Computation in Critical Systems
In recent years, the rising popularity of smart systems (Smart City, Autonomous System, Industry
4.0) and the proliferation of remote connected sensors and IoTs has generated massive and
varied amounts of data, greatly increasing the processing and storage capacities required to
extract valuable information. This evolution of usage has shown the limitations of a remote
centralized processing architecture, which typically increases the complexity and limits the
monitoring and interaction with smart systems in real time [4].

2.1. Geo-distributed industrial architectures
The Compute Continuum is an emerging distributed architecture, which brings back heteroge-
neous capabilities such as processing or storage close to the IoT devices. In that architecture,
the underlying infrastructure is composed of a cloud datacenter connected through a set of




22                                                          Proceedings of the 30th C&ESAR (2023)
                                                                       O. Gilles, D. Faura and D. Gracia Pérez


heterogeneous computing networks to a vast number of geo-distributed computing nodes
forming the Edge domain.
   The interest of this architecture is the benefits to extract in real time valuable information
from the IoT devices, to reduce reaction times of the actuators and to enable the reuse of IT-
domain components. Furthermore, this infrastructure can orchestrate and monitor the system’s
functions within the computing continuum. However, it increases the overall cyber security
issues because the geo-distributed computing nodes interact with an uncontrolled environment,
increasing the possibility of attacks on physical nodes and communication medium.
   These issues are especially crippling when the architecture is applied to OT systems, within
an Industrial Compute Continuum (ICC, see Figure 1), which have drastic safety requirements,
including real-time constraints. Regarding security in particular, it implies (1) the ability to
process data and perform analyses as close as possible to the data sources while ensuring data
privacy and data security, (2) efficient isolation of critical and non-critical functions colocated on
the same computing nodes and (3) secure communications along all the computing continuum,
including authentication of all actors (nodes and functions).




Figure 1: Industrial Compute Continuum



2.2. Designing the Industrial Compute Continuum
2.2.1. ICC communication topology
While many topologies may be used to implement the ICC, we advocate for publish-subscribe (or
PubSub) communication pattern. In this pattern, publishers and subscribers are connected to an
agent commonly known as broker, which may be constitued of a single component or distributed.
The broker receives messages from publishers and dispatches them to subscribers subscribed
to the topic of the messages. In other words, publishers send messages to potentially many
subscribers. This loosely coupling facilitates network scalability and versatility. Recognized
protocols enforcing PubSub mechanisms as MQTT1 or AMQP2 are widely used in the industry.
   In a network topology like the IIoT, where numerous (typically several hundreds) captors and
actuators need to communicate with supervisors, a classical client-server approach requires a
dense connection mesh to gather and distribute information. In this perspective, the PubSub
mechanism is interesting because the supervisors only need to send one message to communicate
information to all the remote devices, thus limiting the bandwidth usage on the source network.

1
    http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/errata01/os/mqtt-v3.1.1-errata01-os-complete.html
2
    http://docs.oasis-open.org/amqp/core/v1.0/amqp-core-complete-v1.0.pdf




Proceedings of the 30th C&ESAR (2023)                                                                      23
Securing Communication in the Field: Protecting Geo-distributed Computing...


In the ICC context, where the information goes through successive computing nodes, the PubSub
topology also allows factorizing the global computation power used.

2.2.2. IIoT communication security
Properly signed and encrypted messages are hard to modify and decrypt. Yet in the long
term no system is immune to compromise, even for mature, market-proven systems3 . This
is particularly true for long-life safe systems (for instance automatic train controllers), which
are rarely updated: such systems must be thoroughly tested for systematic bugs before being
deployed [5, 6]. Since new vulnerabilities can be revealed after the device deployment, a defense-
in-depth strategy must be adopted to ensure persistent security throughout the whole system’s
lifecycle.
   If the keys used to sign and encrypt messages of a device get compromised, the attacker would
be able to forge messages as if they were produced by this device. Furthermore, it would allow
a rogue client to obtain the same access as the device, leading to potential leaks of industrial
secrets, or offering leverage to physical sabotage actions to the attacker.
   IIoT often relies on a gateway in order to assert security. IIoTs such as remote sensors and a
gateway are an important segment of an ICC. As gateways (1) can be hierarchical and (2) can
perform edge computing, they are sound candidates to build the ICC backbone.

2.2.3. Implementing a gateway function
Because of the limited computing (or available) power of end-devices on some use cases, it is
not always possible to secure all communications between all devices. One typical case is when
end-devices such as remote sensors must run on limited batteries for long periods, and thus
reduce their computation to the bare minimum.
   In this case, devices that must communicate without encryption must belong to the same
isolated private network. To overcome this limitation and enable communications with devices
outside of this network, a gateway may be used to bridge communications to other isolated
networks. The gateway function ensures the confidentiality and the integrity of the data it relays,
from the edge (OT-to-IT border) to the subscriber system. It can use secure and interoperable
protocols to communicate these data to/from other gateways or other remote controllers, even
through public infrastructures encrypting/authenticating the communications when necessary.
The gateway function can be distributed amongst multiple physical devices.
   However, this approach implies that compromising a device implementing the gateway
function may give access to the whole subnetwork. In this case, it is possible to forge messages
on behalf of the compromised gateway. Such attacks may lead to catastrophic results.
   In order to ensure security of the industrial system, we present in this document a hardware-
based approach to secure the gateway communications, and give in Section 4.2 an example of
its usage in the predictive maintenance for the railway industry.




3
    https://us-cert.cisa.gov/ics/advisories/icsa-19-274-01




24                                                           Proceedings of the 30th C&ESAR (2023)
                                                           O. Gilles, D. Faura and D. Gracia Pérez


2.2.4. End-to-End Encryption
It is not always an option to rely on a secure gateway for very sensitive systems, for example
when subject to state security regulations, or target of industrial spying. In these cases, con-
fidentiality must involve only the ends of the communication, without third party. As such,
end-to-end encryption is an important need for IIoT. Only the final receiver of a message should
be able to access to its clear text data. It encompasses the capability to cross networks through
as many gateways as needed, while the data keeps its confidentiality and integrity, both being
guaranteed by cryptographic means. This latter feature is important, as devices in IIoT are
usually embedded into multiple layers of subnetworks.
   Using end-to-end encryption is only possible when using a data transmission protocol that
implements it, and it is usually not applicable for legacy systems.

2.2.5. Software integrity
Software integrity of the communicating devices is a mandatory foundation for security. This
applies to both system (firmware, kernel, drivers) and application software stacks.
  Devices integrity must be ensured by the device themselves through Secure Boot mechanisms.
Secure Boot ensures that the device will only boot the device constructor software. Further-
more, some Secure Boot implementations can provide decryption functionalities ensuring that
the protected software remains confidential while at rest, i.e. the protected software will be
encrypted in the storage memory like MMC, flash, etc.


3. Securing geo-distributed systems communications
In order to design trustable geo-distributed systems in adversary condition, it is mandatory to
ensure communication security accross all the system lifetime. Two critical needs to achieve
that result are (1) secure communication and (2) secure authentication. In this section, we show
how we can use OPC UA to met these needs in a protected environment, and how we can
leverage on hardware-based measures (Secure Boot and TPM) to extend security to untrusted
environment.

3.1. OPC UA PubSub
OPC UA is a standard [7] for data exchange in industrial communications. It provides safe and
secure means to connect supervision systems (SCADA) with programmable logic controller
(PLC), actuators, and sensors. Its publisher-subscriber variant (OPC UA PubSub) is a good
candidate for implementing IIoTs-based systems, including ICC, as it is platform-independent,
offers great inter-operability between standard IT and OT protocols and has been through
thorough security assessment [8, 9].
   OPC UA PubSub is structured in two layers enabling the inclusion of new underlaying
protocols as technology evolves: the Transport layer and the Message layer. It currently
supports a set of well-established communication protocols in order to convey its payload
between two clients, ranging from raw Ethernet to MQTT/AMQP. The most light-weight of




Proceedings of the 30th C&ESAR (2023)                                                          25
Securing Communication in the Field: Protecting Geo-distributed Computing...


these protocols (e.g. Ethernet or UDP) are used to implement field buses, and typically broadcast
messages on a local loop or take advantage of a physical switch to emulate said broker.
  The two latter transport protocols are in fact using existing publish-subscribe protocols,
which are only used for their transport capabilities. These are better suited to Cloud Integration
as they are supported by the main cloud providers. MQTT lower overhead makes it more
adapted to numerous IIoT scenarios.

3.1.1. OPC UA PubSub End-to-End Encryption
Messages are composed of headers and a payload. When using encryption, only the payload,
which bears data values, is encrypted. The whole Message is then signed (including sequence
number as to prevent replay attacks) and the signature is appended at the end of the message.
In this way, regardless of the Transport layer, end-to-end encryption is achieved.
   OPC UA PubSub supports three security modes: encryption and signature, signature only, or
none of them. There are multiple levels of encryption which are called security policies. They
describe which algorithms are used, with which key lengths, signature scheme, …
   For now, PubSub encryption always uses block ciphers in counter mode (AES in CTR
mode [10]), and the signature algorithm is a message authentication code (HMAC) based on the
SHA-256 hash algorithm. As these algorithms need a secret to work with (i.e. a symmetric key),
publishers and subscribers must possess the same secrets to securely exchange messages. No
other node in the communication path (including the broker or a potential man-in-the-middle
attacker) must to know any of these secrets, ensuring end-to-end encryption.

3.1.2. OPC UA PubSub key distribution
The solution defined in the specification (see Section 5.4.3 of [7] Part 14) is to use a server entity
named Security Key Services (SKS). This OPC UA server is in charge of the authentication of
the agents of the network, and the distribution of the security keys.
   As a consequence, publishers and subscribers must connect to the server using the classic
OPC UA protocol to fetch the security keys. They must do so in a secure way, so that the
security group keys are transferred in a confidential way on the (maybe) public network.
   In classic OPC UA, the authentication of the client is handled through a Public Key Infras-
tructure (PKI) with Certificate Authorities (CA). It uses the X.509 [11] public key certificate and
Certificate Revocation List (CRL) standard. This implies that the client embeds a key pair to
prove its identity. This pair is composed of a so-called public key that is signed by a certificate
authority (this signed key is also called a certificate), and a private key which must stay secret at
all times. The client and server shall also embed the trust chain to be able to check respectively
the client and server certificates (see [12] for details).
   Figure 2 describes the two main OPC UA PubSub phases, where communications using OPC
UA Client/Server paradigm (i.e. authentication and key distribution) are colored in red, while
communications using the PubSub paradigm (i.e. encrypted messaging) are colored in blue.




26                                                           Proceedings of the 30th C&ESAR (2023)
                                                              O. Gilles, D. Faura and D. Gracia Pérez




Figure 2: OPC UA PubSub


3.1.3. Publisher authentication
The OPC UA client-server protocol has been analyzed for security vulnerabilities in the past
(for instance, see [8, 9]) and is considered secure. Moreover, as of today, no vulnerability has
been identified in the OPC UA PubSub protocol, which security relies on modern cryptographic
means. However, as mentioned in Section 3.1.1, all agents participating in a given security
group share the same encryption and signature keys. This holds true whether the clients are
subscribers or publishers, making a malicious client holding valid group keys able to compromise
a whole group, as it could either feed some forged data (from publisher, attack on data integrity),
or read secret data (from subscriber, attack on data confidentiality). Hence, the group keys
handling and their distribution is a vulnerable process, and protecting the group keys is thus
critical to ensure the group security. As the group keys are securely exchanged on a secure
OPC UA client-server connection, their confidentiality is based on two factors: (1) their storage
in PubSub agents and (2) the authenticity of the client certificate used to fetch the group keys
from the SKS server.
   In OPC UA PubSub specification, agents connect to the SKS for authentication using the
OPC UA client/server protocol. In order to assert its authenticity, the OPC UA client will (1)
encrypt a message containing its nounce with the SKS’s private key and (2) send its encrypted
and signed certificate to the SKS. On its side, the SKS will (1) check the validity of the certificate
through its Public Key Infrastructure (PKI), (2) decrypt the client’s nounce and (3) encrypts its
own nounce with the client’s public key, and sign it with its private key. Upon reception of this
response, the client will check its validity through the SKS’s public key, and decrypt it with its
own private key. If all these operations succeedeed, double authentication is performed, and
both are deemed as legitimate. Exhanged nounces are then used to generate temporary session
keys, which will be used in further exchanges until a key renewal is needed.
   Now that the client has established a secure way to communicate with the server, it can
securely fetch the PubSub group keys. As the server has authenticated the client, it can either
authorize or deny the client request to fetch the group keys, depending on whether or not the
client is part of the security group. This process relies on the fact that the client private key
cannot be recovered, so that attackers cannot impersonate legitimate clients.




Proceedings of the 30th C&ESAR (2023)                                                              27
Securing Communication in the Field: Protecting Geo-distributed Computing...


3.2. Securing secrets on the field
Security measures must be taken all along the life cycle of the client device, as many opportunities
may be used to steal the client’s private key. For instance, if the private key is provisioned
by an external actor (distributed over the network, brought physically to the device on an
external storage device, ...), it could be compromised even before its provisioning. Even if the
keys are provisioned securely, it is still possible to compromise the device’s private key. For
instance, if the private key is stored physically unprotected on the device, an attacker could
either compromise it at rest or at runtime.

3.2.1. Asserting Authenticity with a Secure Cryptoprocessor
The Trusted Platform Module (TPM) is a cryptographic standard [13] for cryptoprocessor
providing services such as random number generation, generation and storage of cryptographic
keys, encryption and decryption of data. The TPM standard is currently in its second iteration
and it’s frequently referred as TPM2. The benefit of using a TPM is that it is in charge of
protecting the secrets and of doing the encryption/signature process without loading the secrets
to the system’s memory. While doing so, the secrets never get out of the TPM hardware module.
Moreover, some TPM implementations can be tamper resistant to physical intrusion and reset
themselves in such event, protecting the secrets.
   TPMs are usually available as discrete hardware components integrated into the target
device. Other times, TPMs are also available as software components [14] executed in a Trusted
Executed Environment (TEE). In all cases, the communication between the device and the TPM
is standardized by the cryptographic standard.

3.2.2. TPM and OPC UA
In the OPC UA client-server protocol, a TPM can be used to securely store the asymmetric
keys and realize signature operations (robust authentication in Section 3.1.3) and decryption
(to obtain the nonce). This requires two additional configuration steps: (1) the TPM is first
provisioned with an asymmetric key pair, i.e. the TPM generates an asymmetric key pair for
which the public key part can be retrieved but the private key part cannot, and (2) the public
key part of the created key pair is retrieved from the TPM and signed by a trusted certificate
authority.
   The signed public key part is the certificate that uniquely authenticates an OPC UA client or
server in the network architecture. An OPC UA PubSub agent that has a configured TPM is
now able to recover the security group keys securely, and the private key used in the OPC UA
client-server protocol cannot be extracted from the device.

3.2.3. Secure TPM configuration at boot time
In order to ensure the confidentiality of the TPM secrets it must be assured that the secrets
in the TPM can only be used in a trusted environment to avoid impersonation of our TPM
secured device. To achieve so, the TPM secrets are protected by another secret only known to
trusted environments. In our case, this can be ensured by cryptographically coupling the Secure




28                                                          Proceedings of the 30th C&ESAR (2023)
                                                             O. Gilles, D. Faura and D. Gracia Pérez


Boot mechanism of the device SoC and the TPM. The TPM standard provides the Platform
Configuration Registers (PCRs) mechanisms for this purpose. The same mechanism can be used
to implement a Measured Boot (always on top of the SoC Secured Boot mechanism).

3.2.4. Improving authentication security with trusted secrets
As illustrated by the previous example, the TPM provides the following properties: (1) the
private key is not readable on non-volatile storage (hard-disk, ROM, ...), (2) the private key is
not loaded in live memory of the device and (3) no human operator ever access or manipulates
the private key.
   Additionally to the mere security insurance increase, using a TPM significantly reduces the
attack surface of the system. Without a TPM (or other kinds of Hardware Security Modules,
HSM), the private key appears in static and/or live memory. It is also generated and provisioned
by different actors, human and/or automated. At any point the private key may be the target
of an attacker. In our solution, the only point of failure is the TPM, and thus the security
assessment is much easier and less error-prone.
   Having a single point of protected storage also helps monitoring the security of all the
communications. This facilitates the identification of the potential leak of the key and its
revocation, keeping safe the other agents of the network more efficiently.

3.2.5. Limits and Remaining Vulnerabilities
As it is an additional component, a TPM module must be integrated into the target board.
Although technically not challenging since the TPM suppliers generally make this integration
as easy as possible, it is likely to lead to a re-certification in critical domains (e.g. avionics).
   A second possible issue is the limited computational power of the TPM. In the example of
the OPC UA client-server connection, the TPM is used to sign a message to be sent (i.e. to
encrypt a 32B hash of the message, which is computed by the host) and to decrypt another
message (the AES256 nonce, so 16B of payload). In our benchmarks with ST33TPM/I²C, this
process takes around 1 second, using a rather naive implementation. Experience in TPM usage
tells us that a more optimized implementation would take around 100 ms (performances can
change according to the TPM version and supplier). In the PubSub protocol, the client-server
connection is only done to fetch the group keys, so the impacts establishing the connection will
strongly depend on the group key lifetime. In some real-time systems with strict needs in terms
of worst-case latency, it may be an issue.
   While the TPM reinforces security, it cannot prevent all type of attacks. Remote Code
Execution attacks on the software interacting with the TPM would allow an attacker to turn a
legitimate node into a malicious agent. This calls for a defense-in-depth approach, with use of
threat protection and detection on software in addition to the solution presented in this paper.
   In a production system, the TPM can additionally be used to perform measured boot coupled
with a root of trust, and the use of the created key pair to ensure that the device is only used
when the system integrity can be ensured, reinforcing the TPM usage to protect the group keys.
   Finally, decommissioning a TPM should also be done with care, as the private key may still
be embedded in the device. Ensuring total and irreversible wipe out of the key is possible, but




Proceedings of the 30th C&ESAR (2023)                                                            29
Securing Communication in the Field: Protecting Geo-distributed Computing...


this action must be clearly planned in the device life cycle. In all cases, it is recommended to
also revoke the device certificate from the certificate authority, preventing potential use of an
incorrectly cleared TPM.


4. Industrial Application: Securing Railway Predictive
   Maintenance
We demonstrated the effectiveness and feasability of our solution on a real-life industrial
challenge involving geo-distributed computation: predictive maintenance of catenaries, and we
ported our solution to industrial-grade equipements, demonstrating operational feasability.
  In this section we first describe the industrial challenge then we present our solution to
address it and ensuring communication security by applying solutions presented in Section 3.

4.1. Predictive Maintenance
Usage of IIoT can bring benefits along the industrial systems’ lifecyle, including but not limited
to phases such as provisioning, deployment, production or operational use. Amongst these
different stages, we explored the maintenance phase as a promising target for introduction of
IIoTs.
   Predictive Maintenance is a technique aiming to optimize the maintenance time by using
information remotely harvested from sensors – typically IIoT devices – monitoring the industrial
system of interest. In this approach, monitoring the industrial equipment’s state allows to
predict a short-term failure, and thus to dispatch a maintenance operator just-in-time. Predictive
maintenance also allows implementing an optimization feedback loop, by harvesting knowledge
on the industrial equipment behaviour and refining its states definition and thresholds, for
instance through machine learning. Predictive maintenance is a popular way to introduce IIoT
technologies into industry as (1) it is purely additive to the industrial system, without impacting
the core critical functionalities and (2) it has a fast Return On Investment, leading to drastic
decrease on cost of ownership if done correctly.
   In order to define the security needs for the use case, we performed a risk analysis following the
EBIOS [15] methodology involving cybersecurity and safety experts from the railway industry.
We defined three critical assets: (1) the analytics datalake, (2) the train service availability and
(3) the safety-critical network (i.e. the system in charge for real-time signaling).
   Regarding data send to the analytics, the main need is integrity, as forged data may trigger
emergency counter-measures, generally involving interruption of service on the whole track.
Thus, forged sensor data can be used to perform business-critical denial of service, impacting
asset (2). Furthermore, allowing unknown amount of untrusted data to be integrated to the
analytics datalake defeats its purpose and prevents its smart exploitation, impacting assets (1).
   Availability must be ensured in order to guarantee persistence of service (asset 2), although
considering the average duration of a maintenance operation comparatively to information
sampling, the only challenge will be ensuring that security measures taken will not deter from
this objective.




30                                                          Proceedings of the 30th C&ESAR (2023)
                                                           O. Gilles, D. Faura and D. Gracia Pérez


   Confidentiality is also needed on data, since sensors may indicate real-time position of the
trains, which is a sensitive information since trains are national strategical assets. Hence the
exploiting company must ensure protection of such information.
   From a connection perspective, authentication and authorization must be strictly enforced
in order to ensure security of the safety-critical network (asset 3). Although the maintenance
and safety networks are currently strictly separated, it may not be the case in the future,
since signaling / critical sensors information may be used to refine predictive maintenance.
Furthermore, it cannot be ruled out that some hidden path already exists between the two
networks, for instance through corporate IT network. Secure authentication ensures that no
rogue device can connect into the network, while authorization ensures that any legitimate
device will be used the way it is intended to (e.g. that an subscriber will not begin to publish
data).

4.2. Catenaries Monitoring Infrastructure
We built a prototype of light SPD implementing our secure communication solution, and we
tested it on an industrial equipment to ensure secure communications for predictive maintenance
in the railway industry, which is at the same time a critical asset of the supply chain and
an industrial system itself. Predictive maintenance however is not limited to this specific
industry, but can and indeed should be applied in any industry where critical assets are costly
to access and complex enough for failure to be difficult to predict on a strict time-based. Since
modern manufacturing processes are typically distributed over large area and involving multiple
stakeholders [16] - and indeed use complex equipment, they typically fit both requirements.




Figure 3: Catenary predictive maintenance


   Specifically, we monitored wireless trackside sensors (connected thermometers and unwinder)
to get information on the catenaries (respectively heating and physical tension), and exploiting
them in order to infer catenaries state in real-time. These data are then transmitted to an
analytics center that decides and prioritizes the maintenance operations according to different
business factors, such as maintenance team availability and current position, relative severity
of the catenary state, or per-hour cost of the line disruption; thus saving cost on maintenance
while minimizing service disruption for train users. Figure 3 illustrates the use case.
   Strict OT zone protection was not covered in this document, as it is heavily dependent on the
actual use case - whether because of (1) physical protections impeding access to the OT network,
(2) physical limitations on the OT devices or network prevent cryptographic security measures
to be applied, or (3) relative low value of the data samples comparatively to the correlated data




Proceedings of the 30th C&ESAR (2023)                                                          31
Securing Communication in the Field: Protecting Geo-distributed Computing...


produced by the secure gateway. Nevertheless, if possible and needed, the approach we use to
secure the IT zone could be applied to the OT zone as well.

4.3. System architecture
We tested two sensors: an unwinder and a thermometer, both equipped with a STIMIO Railnode
module, allowing to embed their outputs into LoRa frames. The gateway was based on a
STIMIO Railnet4 , an ARM Cortex-A7-based gateway supporting both LoRaWAN and 2G/4G
communications. The LoRaWAN Join Server responsible for sensors enrollment was hosted by
the gateway. The Railnet gateway is railway-certified and used in actual railway infrastructures.




Figure 4: Smart Catenaries architecture


   We applied our approach to secure a trackside gateway for connected sensors, in order to
ensure connectivity and security of the predictive maintenance. As illustrated in Figure 4,
sensors are connected through LoRaWAN to the gateway, which is in turn connected to open
networks (such as the Internet) through LTE.
   The communication protocol between the gateway and open network subscribers (in our
case a monitoring server) is OPC UA PubSub with end-to-end encryption enabled. The MQTT
protocol is used to transport the encrypted OPC UA PubSub messages, as it is more convenient
to use a TCP broker to connect to remote devices. The gateway of the private subnetwork is
then represented by two entry points: the MQTT broker and the SKS server used to distribute
the group keys. Systerel’s S2OPC5 library is used to build the OPC UA PubSub subscriber on the
gateway, but also for the SKS server. The MQTT broker was the off-the-shelves, open-source
tool Mosquitto6 , while the Secure Key Service was implemented through a S2OPC server, this
time working in client/server OPC UA mode. Both services were run on the same physical
remote server, although in actual deployment we would encourage hosting them in two distinct
machines.
   Finally, the analytics center was made of a back-end S2OPC subscriber, in charge of establish-
ing connection, getting OPC UA messages and transferring them to the analytics server. The
latter has been simulated through simple display of the results in a remote machine, although
connectivity to TIRIS7 , the actual analytics server used by Thales railway division has been
demonstrated in previous experiments.
4
  https://stimio.fr/documentation/railnet/
5
  https://opcfoundation.org/products/view/safe-and-secure-opc/
6
  Eclipse’s Mosquitto broker: https://mosquitto.org/
7
  https://www.thalesgroup.com/en/markets/transport/railways-digitalisation/tiristm-smart-maintenance




32                                                                Proceedings of the 30th C&ESAR (2023)
                                                            O. Gilles, D. Faura and D. Gracia Pérez


   Certificates from both SKS and clients embbed their public key used for the authentication
process described in Section 3.2.2 (associated with the related private key). In our use case,
these certificates are all signed by the same Certificate Authority (CA), using SHA256 hash and
a RSA4096 key pair. All end-users owned the CA certificate, and thus were able to check the
validity of others equipment’s certificates. While we did not experiment on it, the TPM could
be used to ensure the CA certificate integrity, to ensure that an attacker did not tamper the file.
In order to do it, a hash of the certificate can be stored within the TPM, and compared to the
certificate file on-disk before any usage.
   In our experiment, the CA was not connected to clients and SKS, so certificate revocation
was not directly possible. In most operationally deployed use cases, a complete PKI should be
set up, although the actual implementation of this PKI might differ a lot depending on the use
case needs and constraints.

4.4. Implementing the gateway function in an industrial-grade SPD
As shown in Figure 5, we integrated a TPM2 ST33 from ST Microelectronics (the STPM4RasPI8
extension board) as a discrete component into STIMIO Railnet module. Its Common Criteria
evaluation reached EAL 4+ [17], hence offering a satisfying level of security for this prototype.




Figure 5: Secure IIoT Gateway implementation


   The gateway operational communication with the IT zone was done through an OPC UA
PubSub Publisher, while its authentication was performed through a OPC UA client. Both
were built with Systerel S2OPC library. Cryptographic services for both OPC UA components
relied on the open-source, light-weight cryptography library mbedTLS9 , which we adapted to
communicate with the TPM10 through the TPM library from the TSS2 open-source stack11 , as
well as the Thales-proprietary library libtpm2.
   An OPC UA server was also added to the PubSub modules to expose the published data
in a client-server manner. This access is mostly for convenience and tests, as data transfers
between devices are only done in PubSub. The Paho12 library, a light-weight MQTT client
8
 https://www.st.com/en/evaluation-tools/stpm4raspi.html
9
 https://github.com/ARMmbed/mbedtls
10
   https://gitlab.com/systerel/S2OPC/-/tree/pab-tpm2
11
   https://github.com/tpm2-software
12
   Eclipse’s Paho library: https://www.eclipse.org/paho/




Proceedings of the 30th C&ESAR (2023)                                                           33
Securing Communication in the Field: Protecting Geo-distributed Computing...


implementation, is used by S2OPC as the library for the MQTT Transport of the OPC UA PubSub
messages. All software used in this experiment is open-source, with the exception of libtpm2, a
proprietary library developed by Thales and used as high-level API above TSS2. During the
authentication phase, the S2OPC stack will use the customized cryptography stack to perform
hardware-based secure authentication, as described in Section 3.2.1, communicating with the
SKS with OPC UA Client/Server directly over TCP/IP.
   During the communication phase, data received in LoRa from the remote IIoT sensor (railnode)
are processed by the LoRAWan-server, which dispatches messages to the local MQTT broker
(mosquitto), in turn queried by the S2OPC stack through the Paho library. Once the data are
received and processed by the gateway sensor logic, a new information is sent through Paho to
the remote MQTT broker.
   Using these software and hardware components, we were able to implement a prototype of
secure gateway for IIoT systems. More specifically, we ensured hardware-based authentication
of the gateway, and the end-to-end signature and encryption for communications, as intended.
   Some very basic level of edge computing was also performed in the gateway, such as com-
puting relative heating and timestamping the data. In more realistic use cases, more extensive
computation should be performed at edge (i.e. in the gateway).
   Successful connection, persistent connectivity and effective signature and encryption were
demonstrated through the use of forged messages and rogue gateways, which were both rejected
by the SKS and unable to log in and access any secret or data from the system.


5. Related works
Organisms such as German IUNO or French GIMELEC are mainly promoting methods and tools
in order to ensure national companies conformance to Industry4.0 standards [18]. While they
may also support research activities, they mainly propose high-level guidelines for the industry.
In the case of IUNO, usage of TPM for authentication is mentioned in [19], but no implementation
is provided, neither its usage within an existing communication protocol is described. In [20],
hardware properties (SRAM PUFs) are exploited to ensure secure authentication by guarantying
integrity and confidentiality of the secret key, similarly to our usage of the TPM solution. This
work, however, relies on a specific architecture which may not be usable in actual use cases.
Furthermore, no indication on how to integrate the proposed protocol into established standards
is proposed in the article.
   Regarding communication protocols of industrial use cases, Data Distributed Services [21]
(DDS) is an open standard for real-time distributed communications. While the standard is open
regarding the actual communications implementation, it supports a brokerless publish/subscribe
pattern of communications, and so can be compared to OPC UA which offers such possibility
in its PubSub version. DDS is a rich standard offering fine-grain control of the Quality of
Service. A security standard has been published, allowing security patterns similar to OPC UA
PubSub [22]. Either base or security DDS specifications, however, only describe APIs when
OPC UA provides in-depth protocol specifications, hence drastically increasing the tools inter-
operability. Furthermore, in our knowledge no national cybersecurity authority has performed
a thorough review of DDS Security, as it was the case for OPC UA by the German BSI [9].




34                                                          Proceedings of the 30th C&ESAR (2023)
                                                            O. Gilles, D. Faura and D. Gracia Pérez


   The OPC Foundation also mentions the possibility of using a TPM [23] or other secure storage
solutions. However, it does not discuss its potential usage or benefits for authentication.
   The wolfTPM library from wolfSSL13 allows integration of a TPM into the wolfSSL library, a
cryptographic library implementing the TLS1.3 standard [24]. This approach is quite similar
to ours, but it only applies to client/server connections. Its usage into the publish/subscribe
communications schemes is not mentioned in the literature. Another example of such approach
is presented by authors of [25], where benefits of integration of a TPM within the TLS cryptog-
raphy standard [26] is exploited not only for authentication but also for payload encryption.
However, as in the former case it does not cover the publish/subscribe topology, and in case of
multiple publishers may lead to multiple unnecessary encryptions.


6. Conclusion
We propose in this paper a consolidation of the authentication of devices that are part of a
distributed system, with both legacy devices that cannot be updated because of safety require-
ments, and remote devices that must use the public network and require end-to-end encryption.
In [27], we assessed our solution with ISA/IEC 62443 - a set of standards relative to the security
of industrial communication networks and systems.
   The proposed solution uses open protocols to communicate its data (MQTT, OPC UA PubSub),
enhancing the interoperability of the system, hence its maintainability. It also uses a TPM hard-
ware module to conceal secret identifiers and guarantee the authenticity of the remote modules.
We use secure boot in order to ensure the integrity of the device software. The developed
prototype shows that the remote data distribution works efficiently and securely. Functional
tests included in the S2OPC suite all passed successfully, and operational communications were
working as intended. Key distribution was performed in a relatively long time (around 10
seconds), probably because of the low bandwidth of the SPI bus connecting the TPM to the
CPU. Considering the timing needs of our application, such duration was acceptable. More
constrained applications may require a closer integration of the TPM.
   On a final note, while performances of the remote component are proven satisfying on its
industrial prototype, complete industrialization of the solution requires derisking of the whole
architecture, and more specifically further testing of the performances, since the solution’s
domain of application is Industrial IoT where a high number of devices provide situations similar
to data lakes (this is particularly the case in the railway industry). In the current architecture,
we have identified two elements that deserve further testing for the complete industrialization
of the solution: the broker and the SKS server.


References
 [1] R. Langner, Stuxnet: Dissecting a cyberwarfare weapon, IEEE Security Privacy (2011).
 [2] J. Nazario, BlackEnergy DDoS Bot Analysis, Arbor (2007).
 [3] C. H. Kim, J.-J. Quisquater, Faults, injection methods, and fault attacks, IEEE Design &
     Test of Computers (2007).
13
     https://www.wolfssl.com




Proceedings of the 30th C&ESAR (2023)                                                           35
Securing Communication in the Field: Protecting Geo-distributed Computing...


 [4] P. Tedeschi, S. Sciancalepore, Edge and fog computing in critical infrastructures: Analysis,
     security threats, and research challenges, in: 2019 IEEE European Symp. on Security and
     Privacy Workshops (EuroS&PW), 2019.
 [5] C. Metayer, P. Humbert, P.-A. Brameret, Vers une Implémentation Sûre et Cybersécurisée
     du protocole OPC UA, in: Congrès Lambda Mu 21,“Maîtrise des risques et transformation
     numérique: opportunités et menaces”, 2018.
 [6] M. Wolf, D. Serpanos, Safety and security in cyber-physical systems and internet-of-things
     systems (2018).
 [7] OPC Unified Architecture Specification, Specification, OPC Foundation, 2006-.
 [8] M. Puys, P. M.-L., P. Lafourcade, Formal Analysis of Security Properties on the OPC-UA
     SCADA Protocol, in: Computer Safety, Reliability, and Security, 2016.
 [9] Damm, Gappmeier, Zugfil, Plöb, Fiat, Störtkuhl, OPC UA Security Analysis, Technical
     Report, BSI, 2017.
[10] Advanced encryption standard, NIST FIPS PUB 197 (2001).
[11] S. Boeyen, S. Santesson, T. Polk, R. Housley, S. Farrell, D. Cooper, Internet X.509 Public
     Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile, 2008.
[12] Systerel, OPC UA Certificate Validation, https://blog.systerel.fr/fr/posts/2020-11/
     opcua-certificate-validation/, 2020.
[13] IEC, Trusted platform module library — Part 1: Architecture, Standard, 2015.
[14] H. Raj, et al., fTPM: A Software-Only Implementation of a TPM Chip, in: Proceedings of
     the 25th USENIX Conf. on Security Symp., 2016.
[15] Expression des Besoins et Identification des Objectifs de Sécurité - EBIOS, Agence nationale
     de la sécurité des systèmes d’information (ANSSI), 2010.
[16] W. Bauer, M. Hämmerle, S. Schlund, C. Vocke, Transforming to a hyper-connected society
     and economy – towards an “industry 4.0”, Procedia Manufacturing (2015).
[17] ANSSI, Rapport de certification ANSSI-CC-2018/42 ST33TPHF20, https://www.ssi.gouv.fr/
     uploads/2018/10/anssi-cc-2018_42fr.pdf, 2018.
[18] M. Waidner, M. Kasper, Security in industrie 4.0 - challenges and solutions for the fourth
     industrial revolution, in: 2016 Design, Automation & Test in Europe (DATE), 2016.
[19] D. A., et al., Putting things in context: Securing industrial authentication with context
     information, Int. Journal on Cyber Situational Awareness (IJCSA) (2019).
[20] C. Lipps, et al., Proof of concept for iot device authentication based on sram pufs using
     atmega 2560-mcu, in: 1st Int. Conf. on Data Intelligence and Security (ICDIS), 2018.
[21] DDS Security 1.1, Data Distribution Service - Version 1.4, Specifications, OMG, 2015.
[22] DDS Security 1.1, DDS Security - Version 1.1, Specifications, OMG, 2018.
[23] S. W. Group, Practical Security Recommendations for building OPC UA Applications,
     White paper, OPC Foundation, 2018.
[24] R. E., The Transport Layer Security (TLS) Protocol Version 1.3, RFC 8446, 2018.
[25] K. Li, M. Mass, M. Ralph, A Type-safe, TPM Backed TLS Infrastructure, Technical Report,
     Carnegie Mellon University, 2012.
[26] T. Polk, S. Turner, Prohibiting Secure Sockets Layer (SSL) Version 2.0, 2011.
[27] O. Gilles, D. Gracia Pérez, P.-A. Brameret, V. Lacroix, Securing IIoT communications using
     OPC UA PubSub and Trusted Platform Modules, Journal of Systems Architecture (2022).




36                                                          Proceedings of the 30th C&ESAR (2023)