<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving MQT T Security Through the Generation of Malicious Test Cases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Camilla Cespi Polisiani</string-name>
          <email>camilla.cespipolisiani01@universitadipavia.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Carla Calzarossa</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Zuppelli</string-name>
          <email>marco.zuppelli@cnr.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Caviglione</string-name>
          <email>luca.caviglione@cnr.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Massimo Guarascio</string-name>
          <email>massimo.guarascio@cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for High Performance Computing and Networking</institution>
          ,
          <addr-line>Rende</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Applied Mathematics and Information Technologies</institution>
          ,
          <addr-line>Genova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Pavia</institution>
          ,
          <addr-line>Pavia</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The pervasive deployment of IoT technologies accounts for a variety of hazards often requiring a cross-layer approach. For example, the security posture of brokers responsible for handling the Message Queuing Telemetry Transport (MQTT) protocol has to be assessed at diferent functional layers, thus it is important to generate test cases ranging from network trafic conditions to application-specific patterns. Alas, this is a time consuming and poorly-generalizable process. Therefore, this paper proposes two frameworks for improving IoT security. The ifrst is a suite for creating trafic flows starting from real traces or arbitrary congfiurations. The second is a Small Language Model that can produce realistic MQTT topics. To demonstrate their efectiveness, we showcase how they can be used to mitigate covert communications targeting IoT ecosystems. Results indicate that our tools can provide realistic test conditions for advancing IoT security, especially to better comprehend attacks targeting the MQTT protocol.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;covert communications</kwd>
        <kwd>IoT security</kwd>
        <kwd>test cases</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapid difusion of Internet of Things (IoT) technologies is responsible for the transformation of
various domains ranging from smart home environments to industrial control systems and digital urban
infrastructures [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. At the same time, their ubiquitous adoption opens the way to several security
hazards that might also damage physical assets or harm individuals. In fact, the mix of hardware,
software and network protocols results in a vast attack surface dificult to control. For instance, many
modern IoT deployments are plagued by data breaches, unauthorized access, and exfiltration attempts.
Therefore, enforcing IoT security is crucial for not endangering people, homes, industries, cities or the
whole Internet [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Several architectural blueprints and network protocols have been introduced to support the rapid
evolution of IoT applications also with the goal of improving their robustness. Among the others,
deployments based on the Message Queuing Telemetry Transport (MQTT) are demonstrating their
efectiveness, since they allow the organization of IoT nodes and data in a hierarchical structure, thus
ofering the possibility of segmenting the network to prevent bottlenecks [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Accordingly, many IoT
ecosystems rely on an MQTT broker, which serves as a central communication hub that receives and
dispatches messages across clients via a publish-subscribe paradigm. Given their importance, brokers
are attractive targets for attackers as they might be afected by vulnerabilities that can compromise the
entire IoT deployment [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
      </p>
      <p>
        Owing to the critical nature of IoT technologies, eforts to counteract attacks have intensified but
spawned an “arm race” leading to malware endowed with mechanisms to “obscure” data within network
protocols or mimic normal application behaviors, just to mention some [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. A recent ofensive
trend is to deploy covert channels, which establish hidden communication paths within legitimate
trafic flows to exfiltrate sensitive data, evade signature-based detection mechanisms, or orchestrate
nodes of a botnet [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. Alas, improving security requirements of IoT ecosystems is often a complex
task because of the dynamic, heterogeneous, and resource-constrained nature of IoT nodes. Among
the various issues, enforcing segmentation through trafic engineering and the ability of performing
automatic, reproducible, and consistent tests are major concerns (see [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and the references therein).
Therefore, this work showcases two mechanisms for the generation of test conditions to evaluate IoT
security. Specifically, the first approach entails a framework for assessing the network part of an IoT
ecosystem via the creation of trafic flows starting from real traces or arbitrary configurations. The
second leverages a Small Language Model (SLM) to produce realistic MQTT topics used to quantify the
permeability of brokers to covert communication attempts.
      </p>
      <p>Summing up, the contribution of this work is threefold: i) it introduces two frameworks for testing
the security properties of IoT ecosystems at diferent functional layers, i.e., trafic and topic levels; ii) it
showcases the efectiveness of SLMs to automatically generate test cases for improving the robustness of
IoT deployments against covert communications; iii) it considers a threat model leveraging information
hiding, which is often neglected in the literature dealing with IoT/Cyber-Physical Systems (CPS).</p>
      <p>The rest of the paper is structured as follows. Section 2 reviews past works on the assessment of
IoT security, while Section 3 introduces the proposed test mechanisms. Section 4 showcases numerical
results and Section 5 concludes the paper and hints at future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Owing to their difusion, the literature abounds of works dealing with security of IoT ecosystems. For
instance, a recent survey highlights the major challenges that should be addressed in the near future,
such as the lack of efective encryption schemes at the transport layer, insuficient
authentication/authorization mechanisms and insecure cloud interfaces [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Moreover, the need of orchestrating a vast array
of hardware entities (e.g., sensors, system on a chip frameworks, and resource constrained platforms)
accounts for major software hazards [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In fact, nodes and appliances are often plagued with
backdoors, firmware inheriting unpatched CVEs due to lack of control of the used codebase, and hazards
arising from prioritizing performance over security [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Another important aspect concerns the ability
of facing threats that can virtually target all the functional layers of IoT technologies, e.g., from bare
metal to the application. Hence, being able to conduct tests is mandatory, especially to capture corner
cases or the complex interplay of diferent hardware, software and vendors characterizing real-world
deployments.
      </p>
      <p>
        Concerning the generation of network trafic to test IoT infrastructures, [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] introduces a tool for
supporting IoT network simulations, e.g., for evaluating security countermeasures. However, this tool
is afected by some limitations: it cannot accurately simulate event-driven IoT devices, and it imposes
a periodic publication pattern instead of more realistic time-varying behaviors. The literature ofers
many works dealing with the development of synthetic trafic models. For example, [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] showcases how
the Scapy Python library can be used to produce trafic patterns characterizing IoT nodes deployed in
smart home scenarios, whereas [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] discusses a workaround to the scarcity of datasets needed to drive
models and obtain accurate results. Specifically, it suggests to deploy generative adversarial networks
to create better trafic models, e.g., capable of taking into account also location information. A more
refined approach is presented in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        Despite the used mechanisms, trafic generation schemes share some limitations. The first is the
lack of comprehensive datasets, especially for the case of the MQTT protocol. A major exception
is [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which proposes a collection of IoT trafic traces capturing various network attacks. However,
background trafic conditions are generated using the tool described in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], thus the obtained dataset
contains IoT nodes with the same “duty cycle”. The second limitation observed in the literature is that
some tools are not publicly available or have been created to investigate very narrow deployments,
e.g., smart homes. For analyzing the impact of covert communications, [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] showcases a framework to
produce trafic conditions representative of a variety of covert channels, which may also be suitable for
IoT ecosystems. Moreover, [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] deals with a tool for cloaking information within .pcap traces that also
ofers a prime support to threats hiding data within MQTT message headers, such as the Keep-Alive
and Client ID.
      </p>
      <p>
        The mitigation of covert communications can surely benefit from the availability of mechanisms for
generating test cases. In fact, identifying covert communications a-posteriori is a hard task, especially
in IoT scenarios, where data could be hidden in multiple places, e.g., measurements of sensors or trafic
traits. This is even more evident for the case of cyber-physical deployments, where the physical behavior
of sensors/actuators or the timing of protocol data units could be exploited to conceal information [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
To this extent, the literature proposes three main paradigms. The first acts in the early design stage of
protocols, mainly to eliminate functional ambiguities, imperfect isolation issues, or optional/unused
ifelds that can be abused as containers for the secret data [ 20]. The second takes advantage of some form
of AI to develop models capable of replicating trafic conditions or “challenge” detection frameworks for
improving their robustness [21]. In both cases, a core requirement is the ability to conduct a vast array
of real trials, e.g., to gather trafic traces that can be used to drive simulations or train models. A third
alternative approach deploys network-level fuzzers, proven to be efective to produce a multitude of test
conditions in an automatic manner, especially to reveal coding errors, bugs, or security issues [22]. Even
if pseudo-fuzzing techniques based on random permutation are efective to assess the susceptibility of
HTTP headers against covert communication attempts [23], their systematic adoption is still vastly
unexplored. At the same time, the use of AI for creating network fuzzers appears a very promising
approach to investigate a wide range of security hazards or implementation issues [24].
      </p>
      <p>Summing up, the limitation of available tools and the lack of comprehensive MQTT trafic datasets
underscore the need for automated approaches able to generate realistic network trafic conditions or
ad-hoc test cases for capturing specific traits of IoT deployments.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Mechanisms for Testing IoT Ecosystem Security</title>
      <p>In this section, we present two mechanisms for assessing the security of IoT ecosystems. First, Section 3.1
describes a framework able to produce diferent trafic conditions to test IoT devices at the network
level. Then, Section 3.2 showcases how SLMs can be used to tune mechanisms to reveal manipulations
of MQTT topics.</p>
      <sec id="sec-3-1">
        <title>3.1. Threat-driven Trafic Generation</title>
        <p>To address the lack of realistic MQTT trafic datasets for engineering and research purposes, we created
a framework able to generate both benign and threat-driven MQTT trafic conditions implementing a
controlled yet realistic test environment. The tool enables the simulation of diverse network threats,
including malicious software endowed with diferent types of covert channels and Denial of Service
(DoS) attacks.</p>
        <p>From an architectural viewpoint, the generator is organized into three functional layers. The
initialization layer processes input configurations, establishing the parameter settings for the simulation
scenario. The trafic simulation layer generates the MQTT trafic by considering clients publishing and
subscribing to topics, and reproducing both benign and malicious behaviors. Finally, the execution and
control layer manages the overall operation of the generator, coordinating the network trafic creation
phase, ensuring the startup and shutdown of components, and enabling the trafic capture in .pcap
format for further analysis.</p>
        <p>
          The tool also supports a detailed modeling of IoT devices, including periodic sensors that transmit
data at a fixed rate, and event-driven sensors that publish messages at instants of time sampled from a
specified probability distribution, such as exponential or uniform. The flexibility in customizing the
timings using diferent distributions makes the tool particularly suitable for simulating diverse IoT
behaviors, such as CPS or large-scale urban deployments [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>
          The tool has two built-in attack templates. The first allows the simulation of DoS attacks targeting
the MQTT broker by flooding it with PUBLISH messages with large payloads from a set of tampered
IoT nodes, while legitimate IoT devices, such as environmental sensors, periodically transmit telemetry
data. These mixed trafic patterns facilitate the analysis of MQTT performance and security under
realistic attack conditions. Being able to control the number of both legitimate and illicit IoT nodes
enables a precise control over the simulated attack, especially in terms of duration and frequency.
The second template considers a malicious actor cloaking information in MQTT trafic, specifically in
MQTT topic names, either through case manipulation or ID modulation [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The tool allows
the configuration of multiple parameters, such as topic names, Quality of Service (QoS) for message
publishing, message payload, and to customize the IoT nodes, either publishers or subscribers, according
to the MQTT interaction pattern.
        </p>
        <p>The proposed trafic generation framework is implemented in Python, and leverages libraries such as
Pandas, Numpy, and the Eclipse Paho MQTT client1 for eficient data handling and network
communications. The generator provides two modes of operation, namely, a manual configuration mode using a
.csv configuration file for synthetic trafic generation, and an empirical distribution mode that replays
previously captured trafic traces from .pcap files. This adaptability makes our tool a valuable asset
for testing detection systems, refining anomaly detection algorithms, and advancing security protocols
within MQTT-based IoT networks, under conditions that closely resemble real-world threats.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. SLMs for Automatic Generation of Test Cases</title>
        <p>Manipulation of MQTT topic names is used by attackers willing to establish a covert communication
between two (or more) clients sharing the same MQTT broker. By taking advantage of the global
“visibility” of the MQTT topics to all the connected devices, a malicious IoT device could encode secret
data by altering the case sensitivity of a topic name: publishing a message on a topic composed of only
lowercase letters signals the bit 1, whereas sending a message on a topic containing an uppercase letter
signals the bit 0. We point out that such an attack model requires a suitable “visibility” over topics
handled by the MQTT broker. In fact, properly configured brokers or as-a-Service deployments (e.g.,
based on AWS) that enforce access control policies might limit the number of topics that can be used
to encode the secret data. Nevertheless, weak credentials or poor configuration choices often plagues
MQTT brokers, which are exposed over the Internet without a proper security degree [25].</p>
        <p>In this context, to simulate an attacker modifying the topic list of a targeted MQTT broker, we
developed an AI framework based on the Bidirectional Encoder Representations from Transformers
(BERT) [26]. In essence, this framework employs a transformer-based architecture that comprehensively
understands word context within a sentence by analyzing both preceding and subsequent words. In
more detail, the BERT consists of a stack of transformer encoder layers, each comprising multiple
self-attention “heads". This bidirectional approach captures linguistic nuances more efectively than
traditional unidirectional models.</p>
        <p>The pre-training process includes two main steps, i.e., Word Masking and Next Sentence Prediction.
In the masking step, a certain percentage of words within a sentence is either masked or randomly
substituted. The BERT model is subsequently trained to predict these masked words by analyzing
the surrounding context, which includes the words that precede and follow the masked word. This
task is designed to help the model grasp the contextual relationships between words in a sentence. In
the prediction step, the BERT model undergoes fine-tuning to identify the relationships between two
consecutive sentences. This involves generating negative examples by replacing the second sentence
with a random one. The model is then trained to diferentiate between positive pairs (authentic
consecutive sentences) and negative pairs (where the second sentence has been replaced).</p>
        <p>For generating synthetic MQTT topics allowing the emulation of an attacker cloaking information in
malicious/counterfeit topics, we employed a compact pre-trained variant of BERT2 tailored for historical</p>
        <sec id="sec-3-2-1">
          <title>1https://pandas.pydata.org, https://numpy.org, https://pypi.org/project/paho-mqtt/</title>
          <p>2Hugging Face Model Hub: https://huggingface.co/dbmdz/bert-tiny-historic-multilingual-cased [Last Accessed: December
2024]
CLS</p>
          <p>Home</p>
          <p>Kitchen</p>
          <p>M0tion</p>
          <p>Model Prediction and</p>
          <p>Variant Generation</p>
          <p>BERT
CLS</p>
          <p>Home</p>
          <p>Kitchen</p>
          <p>MASK
Home/Kitchen/Motion</p>
          <p>Transfomer
Tokenization and
Masking
and multilingual text processing. This model is optimized for handling multilingual data, making it
especially efective at producing diverse and meaningful topic variations across diferent languages
while maintaining computational eficiency.</p>
          <p>The MQTT topic generator is based on a masked language modeling technique. The main steps
involved in this process are:
1. Tokenization: the input MQTT topic is tokenized by the BERT tokenizer. The text is then
converted into a sequence of tokens, which are mapped to their corresponding token IDs, e.g.,
from Home/Kitchen/Motion to Home, Kitchen, and Motion.
2. Masking Tokens: some tokens in the sequence are randomly selected and replaced with a special
[MASK] token. The masking probability is set to 15%, as discussed in [27].
3. Model Prediction: the masked sequence feeds the BERT model, which predicts the original
tokens that were replaced by the [MASK] tokens. These predictions are influenced by surrounding
tokens creating a “context”.
4. Variant Generation: the predicted tokens replace masked ones to generate a variant of the
legitimate topic.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Numerical Results</title>
      <p>In this section, we present the experimental results. Specifically, Section 4.1 demonstrates the
efectiveness of the approach to generate two diferent types of malicious network conditions, whereas Section
4.2 presents how pseudo-fuzz via SLMs can support the mitigation of covert channels targeting MQTT
topics.</p>
      <sec id="sec-4-1">
        <title>4.1. Generation of Network Threats</title>
        <p>To demonstrate the versatility of our trafic generation approach, we setup two distinct network threats,
i.e., a small DoS attack that tries to overwhelm an MQTT broker by flooding a specific topic name, and
a covert communication channel implemented through topic manipulation.</p>
        <p>The first experiment refers to a smart room environment that includes a broker that receives messages
by two types of sensors: a temperature sensor that publishes a legitimate message every 5 seconds and
500 compromised humidity sensors, each publishing a message every 0.05 seconds with a QoS level
set to 2. For implementing this ofensive scenario, we deploy a Mosquitto broker version 2.0.18 on a
virtualized Raspberry Pi with an ARM1176 processor and 256 MB RAM that models a setup commonly
used in IoT home appliances. Instead, the trafic generators run on a workstation with an Intel Core
i5-2500, 8 GB RAM, and 1 TB RAID 1 HDDs.</p>
        <p>Figure 2(a) illustrates the throughput of the network, expressed in number of packets per second,
during the simulated DoS attack. As shown, the high-frequency packet flooding, sustained over a
(a) DoS attack
(b) Covert channels
10-second interval, causes a rapid increase in the network throughput, whose peaks exceed 20, 000
packets/second, that is, approximately 13 Mbps. At the end of the attack, the load goes back to its
normal levels. Even though the attack is not able to saturate the network bandwidth, it afects the
performance of the packets. For example, the mean latency of packet delivery raises sharply from 2.88
seconds under normal trafic conditions to 51.09 seconds during the attack, clearly highlighting the
vulnerability of MQTT-based IoT networks to flooding attacks.</p>
        <p>For the second experiment, i.e., the covert communication channel, we consider a smart home
environment consisting of ten legitimate sensors (i.e., two subscribers and eight publishers) and three
compromised IoT devices. These devices establish covert communication channels by exploiting the
MQTT topic name field, each publishing messages every 4 seconds with a diferent QoS level (i.e., 0, 1,
and 2). The use of various QoS levels allows the evaluation of the impact of the attack also in terms
of the additional trafic being generated. We recall that higher QoS levels require MQTT to produce
acknowledgments, which contribute to an increased packet volume and resource consumption.</p>
        <p>Figure 2(b) depicts the network throughput over a 100-second interval, namely, the total trafic in
blue, alongside the trafic from each compromised device, represented in red, green, and orange. During
the observation interval, the peak throughput for covert trafic alone reaches approximately 8 Kbps,
compared to a peak of approximately 48 Kbps for the total trafic. These results illustrate how higher
QoS levels lead to increased packet exchanges due to the diferent acknowledgment schemes of the
MQTT protocol, thus impacting both the overall throughput and the detectability of the covert channels.</p>
        <p>Figure 3 further details the behavior of one of the compromised devices of the scenario previously
described, namely, the device using a QoS level equal to 1. The flow shows the exchanges between
the device and the MQTT broker and clearly illustrates the process that involves a single PUBACK
acknowledgment per message, typical of this QoS level. Notably, the covert communication mechanism
embeds the secret information within the last level of the topic name (e.g., Home/Kitchen/Humidity)
by utilizing the case pattern of the first letter as the container for the hidden data.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Generation of MQTT Topics for Covert Communications</title>
        <p>As discussed in Section 3.2, an attacker could abuse MQTT topics to conceal information and build some
form of covert communications. In this vein, being able to tune ad-hoc detection metrics or anticipate
possible ofensive hiding strategies are core tasks. To illustrate the efectiveness of SLMs for generating
test cases, we consider a set of 98 distinct broker configurations, each containing 30 unique topic names
representative of diverse, legitimate scenarios across multiple languages, including English, Spanish,
and German. These topic names are obtained by querying two publicly accessible MQTT test servers3
hosted by the Eclipse Foundation. The data collection process leverages a custom Python script built
with the Paho MQTT client library, enabling the connection to the test servers and the subscription to
all available topics using the # multi-level wildcard. From the total number of 4, 918 retrieved topics, a
subset of 1, 978 entries is randomly extracted to train the SLM used for generating the topic variations.</p>
        <p>To assess the SLM, we setup 5, 880 test cases equally distributed between real topics and
SLMgenerated topics that simulate an attacker crafting a covert, malicious variant closely resembling a
legitimate topic. For each of the 98 brokers used in the experiments, we sequentially exclude one
of its 30 topics at a time and we assess these topics with respect to the legitimate topic as well as a
counterfeit variant generated by the SLM. Because of the string-based nature of MQTT topic names, these
assessments are based on metrics commonly employed for text analysis, e.g., entropy, compressibility,
Levenshtein distance, and cosine similarity [29, 30]. Moreover, such metrics demonstrated to be a
prime efective mechanism to identify the presence of a threat actor trying to hide data through the
manipulations of topics, even if they require further tweaking [27]. To classify topic names as legitimate
or counterfeit, each metric is assigned a threshold value. These thresholds are determined based on
a given percentile derived from the distribution of the metric, computed across all topic pairs for the
considered brokers.</p>
        <p>Figure 4 shows the distributions of the cosine similarity and Levenshtein distance. As can be seen
from Figure 4(a), the distribution of the cosine similarity is highly skewed towards small values. For
example, its median is equal to 0.14, whereas the first and third quartiles are equal to 0.06 and 0.25,</p>
        <sec id="sec-4-2-1">
          <title>3https://test.mosquitto.org/, https://mqtt.eclipseprojects.io/</title>
          <p>
            (a) Cosine similarity distribution
(b) Levenshtein distance distribution
respectively. We recall that cosine similarity quantifies the similarity of two vectors by calculating the
cosine of the angle between them, thus small values (close to 0) indicate high dissimilarity, suggesting
potential covert or counterfeit topics. On the contrary, high values (close to 1) indicate an alignment
with legitimate topics. In addition, the densely clustered lower percentiles suggest that thresholds
based on these values might efectively detect counterfeit topic names by filtering out topics with
minimal similarity with respect to the baseline. In contrast, higher percentiles exhibit greater sparsity,
indicating strongly aligned, legitimate topics. Instead, the distribution of Levenshtein distances (depicted
in Figure 4(b)) is rather evenly spread. The distances are in the range [
            <xref ref-type="bibr" rid="ref1 ref10 ref11 ref12 ref13 ref14 ref15 ref16 ref17 ref18 ref19 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">1–60</xref>
            ], and the first and third
quartiles are equal to 9 and 21, respectively. We recall that the Levenshtein distance measures the
minimum number of single-character edits needed to transform one string into another, thus it helps to
identify related or similar topics: smaller values indicate an alignment with expected patterns of the
considered broker, whereas higher values might signal counterfeit topics with significant deviations.
In this case, higher percentiles potentially serve as efective thresholds for detecting that topics have
been altered to conceal data, given that larger values represent greater divergence from legitimate topic
names. Results have shown that the cosine similarity achieves the best performance independently
of the value of the percentile, leading to the proposal of a detection method solely based on cosine
similarity. Specifically, according to the observed distribution, four percentiles, i.e., 10ℎ, 20ℎ, 25ℎ,
and 50ℎ, are selected to explore the balance between precision and accuracy. Each threshold percentile
is used to classify topic names, labeling a topic as counterfeit if its cosine similarity falls below the
given threshold.
          </p>
          <p>Table 1 presents the classification performance assessed, as a function of the percentile, in terms of
accuracy, precision, recall, and F1 score. Results indicate that lower thresholds enhance performance by
Percentile</p>
          <p>Accuracy</p>
          <p>Precision</p>
          <p>Recall F1 Score
50ℎ
25ℎ
20ℎ
10ℎ
applying stricter criteria for identifying dissimilar topics, thereby reducing false negatives. At the 50ℎ
percentile threshold, accuracy is 0.869, with high recall (0.997) but lower precision (0.794), suggesting
that more legitimate topics are misclassified. As the threshold decreases to 20ℎ and 10ℎ, precision
and F1 scores peak at 0.968 and 0.985, respectively. This demonstrates that lower thresholds improve
the ability of the model to accurately classify legitimate topics, reducing false positives and achieving
more precise detection of covert communications.</p>
          <p>This percentile-based approach is particularly efective because it enables adaptable thresholding,
leveraging the distribution of similarity scores on each broker to capture variations between legitimate
and covert topics across diverse broker scenarios. Detection accuracy can be further improved by
combining cosine similarity with other text analysis metrics. In this case, the complementary percentile
is applied as a threshold, with higher metric values indicating counterfeit topics, while lower values
confirm legitimacy. Although these metrics can contribute to detection improvements in specific cases,
results indicate that their overall impact on accuracy is limited. For example, combining cosine similarity
with Levenshtein distance yields a peak accuracy of 0.978 reached at the 90ℎ percentile. Moreover, the
low performance of compressibility, Levenshtein distance, and entropy in our model leads to an overall
accuracy of 0.569. These results further support the advantages of relying solely on cosine similarity,
which achieves robust performance without the complexity of implementing additional metrics.</p>
          <p>To address a limitation of the proposed cosine similarity-based detection method, i.e., the potential
misclassification of valid topics containing synonyms inadvertently introduced by legitimate users,
we propose an enhanced approach. We still evaluate the cosine similarity of incoming topics against
the existing topics on the broker, but we incorporate a semantic equivalence analysis leveraging
Word2Vec embeddings. This integration accounts for word-level variations within incoming topics
prior to classification, thereby reducing false positives by efectively recognizing synonyms. Results
demonstrate that such an approach ensures robustness across diverse broker scenarios while maintaining
a high level of detection accuracy. For instance, at the 20ℎ percentile threshold, accuracy is equal to
0.951, that is, only slightly smaller than the value obtained without introducing synonyms. Similarly,
precision and recall are equal to 0.942 and 0.959, respectively.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>In this paper, we presented an approach to improve the security of IoT ecosystems through the automatic
generation of test data. To this aim, we introduced a tool for the creation of trafic conditions based
on real-world threat templates, e.g., DoS and covert channels hidden within the MQTT protocol. We
also discussed a framework taking advantage of an SLM to generate realistic MQTT topics especially
to assess and improve possible countermeasures. As shown, both approaches can be efectively used
to conduct a wide array of investigations, such as implementing pseudo-fuzzing approaches against
detection metrics.</p>
      <p>Future works aim at refining the proposed tools, e.g., by improving the predefined ofensive templates.
For instance, we are working towards hiding mechanisms based on topic wildcard to implement covert
communications schemes among multiple endpoints as well as channels with an increased stealthiness.
Another relevant part of our ongoing research concerns the creation of an integrated framework able to
operate simultaneously at diferent layers of the protocol stack. For instance, this framework could be
used to create test scenarios for assessing the robustness against advanced attack schemes, e.g.,
multistage loading mechanisms using both the application messages and protocol data units to exchange
information with a remote command &amp; control facility.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research was partially funded by Project RAISE – Robotics and AI for Socio-economic Empowerment
(ECS00000035), Project SERICS – SEcurity and RIghts In the CyberSpace (PE00000014), and by Project
STRIVE/URAN – Advanced Approaches for Transitions in Urban Environments.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>Security, IH&amp;MMSec, 2021, pp. 113–124.
[20] L. Caviglione, W. Mazurczyk, You Can’t Do That on Protocols Anymore: Analysis of Covert
Channels in IETF Standards, IEEE Network 38 (2024) 255–263. doi:10.1109/MNET.2024.3352411.
[21] M. A. Elsadig, A. Gafar, Covert Channel Detection: Machine Learning Approaches, IEEE Access
10 (2022) 38391–38405.
[22] X. Zhu, S. Wen, S. Camtepe, Y. Xiang, Fuzzing: a Survey for Roadmap, ACM Computing Surveys
54 (2022).
[23] K. Hölk, W. Mazurczyk, M. Zuppelli, L. Caviglione, Investigating HTTP Covert Channels Through
Fuzz Testing, in: Proceedings of the 19th International Conference on Availability, Reliability and
Security, ARES ’24, Association for Computing Machinery, 2024.
[24] Y. Wang, P. Jia, L. Liu, C. Huang, Z. Liu, A Systematic Review of Fuzzing Based on Machine</p>
        <p>Learning Techniques, PloS one 15 (2020) e0237749.
[25] Rachit, S. Bhatt, P. R. Ragiri, Security Trends in Internet of Things: A Survey, SN Applied Sciences
3 (2021) 1–14.
[26] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding, in: Proceedings of the Conference of the North American
Chapter of the Association of Computational Linguistics, volume 1 of NAACL-HLT, 2019, pp.
4171–4186. doi:10.18653/v1/N19-1423.
[27] C. Cespi Polisiani, M. Zuppelli, M. C. Calzarossa, L. Caviglione, M. Guarascio, Mitigation of
Covert Communications in MQTT Topics Through Small Language Models, in: Proceedings of
the 32nd International Symposium on the Modeling, Analysis, and Simulation of Computer and
Telecommunication System, MASCOTS, IEEE, 2024.
[28] A. Velinov, A. Mileva, S. Wendzel, W. Mazurczyk, Covert Channels in the MQTT-based Internet of</p>
        <p>Things, IEEE Access 7 (2019) 161899–161915.
[29] C. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge University</p>
        <p>Press, 2008.
[30] W. Stallings, Cryptography and Network Security Principles and Practice, 8th Edition, Pearson
Education, 2023.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Mylonas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kalogeras</surname>
          </string-name>
          , G. Kalogeras,
          <string-name>
            <given-names>C.</given-names>
            <surname>Anagnostopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Alexakos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Muñoz</surname>
          </string-name>
          ,
          <article-title>Digital Twins From Smart Manufacturing to Smart Cities: A Survey, IEEE Access 9 (</article-title>
          <year>2021</year>
          )
          <fpage>143222</fpage>
          -
          <lpage>143249</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2021</year>
          .
          <volume>3120843</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Omolara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alabdulatif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. I.</given-names>
            <surname>Abiodun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alawida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alabdulatif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. H.</given-names>
            <surname>Alshoura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Arshad</surname>
          </string-name>
          ,
          <article-title>The Internet of Things Security: A Survey Encompassing Unexplored Areas</article-title>
          and New Insights,
          <source>Computers &amp; Security</source>
          <volume>112</volume>
          (
          <year>2022</year>
          )
          <fpage>102494</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B. B.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Quamara</surname>
          </string-name>
          ,
          <article-title>An Overview of Internet of Things (IoT): Architectural Aspects, Challenges, and</article-title>
          <string-name>
            <surname>Protocols</surname>
          </string-name>
          ,
          <source>Concurrency and Computation: Practice and Experience</source>
          <volume>32</volume>
          (
          <year>2020</year>
          )
          <article-title>e4946</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Hintaw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Manickam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Aboalmaaly</surname>
          </string-name>
          , S. Karuppayah, MQTT Vulnerabilities,
          <article-title>Attack Vectors and Solutions in the Internet of Things (IoT)</article-title>
          ,
          <source>IETE Journal of Research</source>
          <volume>69</volume>
          (
          <year>2023</year>
          )
          <fpage>3368</fpage>
          -
          <lpage>3397</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>M. M. Raikar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Meena</surname>
          </string-name>
          ,
          <article-title>Vulnerability Assessment of MQTT Protocol in Internet of Things (IoT)</article-title>
          ,
          <source>in: Proceedings of the 2nd International Conference on Secure Cyber Computing and Communications</source>
          ,
          <string-name>
            <surname>ICSCCC</surname>
          </string-name>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>535</fpage>
          -
          <lpage>540</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Caviglione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Choraś</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Corona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Janicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Mazurczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pawlicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wasielewska</surname>
          </string-name>
          , Tight Arms Race:
          <article-title>Overview of Current Malware Threats and Trends in Their Detection</article-title>
          ,
          <source>IEEE Access 9</source>
          (
          <year>2020</year>
          )
          <fpage>5371</fpage>
          -
          <lpage>5396</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Strachanski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Petrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmidbauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wendzel</surname>
          </string-name>
          ,
          <article-title>A Comprehensive Pattern-based Overview of Stegomalware</article-title>
          ,
          <source>in: Proceedings of the 19th International Conference on Availability, Reliability and Security</source>
          , ARES '24,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mileva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Velinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hartmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wendzel</surname>
          </string-name>
          , W. Mazurczyk,
          <article-title>Comprehensive analysis of MQTT 5.0 susceptibility to network covert channels</article-title>
          ,
          <source>Computers &amp; Security</source>
          <volume>104</volume>
          (
          <year>2021</year>
          )
          <fpage>102207</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Baqa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Le Gall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A. R.</given-names>
            <surname>Ortega</surname>
          </string-name>
          , J. Song, IoT-TaaS:
          <article-title>Towards a Prospective IoT Testing Framework</article-title>
          ,
          <source>IEEE Access 6</source>
          (
          <year>2018</year>
          )
          <fpage>15480</fpage>
          -
          <lpage>15493</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Duc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jabangwe</surname>
          </string-name>
          , P. Paul, P. Abrahamsson,
          <article-title>Security Challenges in IoT Development: a Software Engineering Perspective, in: Proceedings of the XP2017 Scientific Workshops</article-title>
          , XP'17,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>I.</given-names>
            <surname>Nadir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mahmood</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Asadullah, A Taxonomy of IoT Firmware Security and Principal Firmware Analysis Techniques</article-title>
          ,
          <source>International Journal of Critical Infrastructure Protection</source>
          <volume>38</volume>
          (
          <year>2022</year>
          )
          <fpage>100552</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghazanfar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hussain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ur</given-names>
            <surname>Rehman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Fayyaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Shahzad</surname>
          </string-name>
          , G. Shah,
          <string-name>
            <surname>IoT-Flock</surname>
          </string-name>
          :
          <article-title>An Open-source Framework for IoT Trafic Generation</article-title>
          ,
          <source>in: Proceedings of the 2020 International Conference on Emerging Trends in Smart Technologies, ICETST</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Nguyen-An</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Silverston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yamazaki</surname>
          </string-name>
          , T. Miyoshi, Generating IoT Trafic in Smart Home Environment,
          <source>in: Proceedings of the IEEE 17th Annual Consumer Communications &amp; Networking Conference</source>
          , CCNC, IEEE,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Knowledge Enhanced GAN for IoT Trafic Generation</article-title>
          ,
          <source>in: Proceedings of the ACM Web Conference</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>3336</fpage>
          -
          <lpage>3346</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ormazabal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          , et al.,
          <article-title>IoTGemini: Modeling IoT Network Behaviors for Synthetic Trafic Generation</article-title>
          ,
          <source>IEEE Transactions on Mobile Computing</source>
          <volume>23</volume>
          (
          <year>2024</year>
          )
          <fpage>13240</fpage>
          -
          <lpage>13257</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>I.</given-names>
            <surname>Vaccari</surname>
          </string-name>
          , G. Chiola,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aiello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mongelli</surname>
          </string-name>
          , E. Cambiaso,
          <source>MQTTset, a New Dataset for Machine Learning Techniques on MQTT, Sensors</source>
          <volume>20</volume>
          (
          <year>2020</year>
          )
          <fpage>6578</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>F.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Meghdouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Annessi</surname>
          </string-name>
          , T. Zseby, CCgen: Injecting Covert Channels into Network Trafic,
          <source>Security and Communication Networks</source>
          <year>2022</year>
          (
          <year>2022</year>
          )
          <fpage>2254959</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zuppelli</surname>
          </string-name>
          , L. Caviglione,
          <article-title>pcapStego: A Tool for Generating Trafic Traces for Experimenting with Network Covert Channels</article-title>
          ,
          <source>in: Proceedings of the 16th International Conference on Availability, Reliability and Security</source>
          , ARES '21,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lamshöft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Neubert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Krätzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Vielhauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dittmann</surname>
          </string-name>
          ,
          <article-title>Information Hiding in Cyber Physical Systems: Challenges for Embedding, Retrieval and Detection Using Sensor Data of the SWAT Dataset</article-title>
          ,
          <source>in: Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>