<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Serverless Cloud-to-Thing Framework for TinyMLOps Workflows</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Loconte</string-name>
          <email>davide.loconte@poliba.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saverio Ieva</string-name>
          <email>saverio.ieva@poliba.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agnese Pinto</string-name>
          <email>agnese.pinto@poliba.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Filippo Gramegna</string-name>
          <email>filippo.gramegna@poliba.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Loseto</string-name>
          <email>loseto@lum.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Floriano Scioscia</string-name>
          <email>floriano.scioscia@poliba.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Ruta</string-name>
          <email>michele.ruta@poliba.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LUM “Giuseppe Degennaro” University</institution>
          ,
          <addr-line>Strada Statale 100 km 18, I-70010 Casamassima (BA)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Polytechnic University of Bari</institution>
          ,
          <addr-line>Via Orabona 4, I-70125 Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TinyMLOps</institution>
          ,
          <addr-line>Edge Intelligence, Microcontroller Deployment, Model Compression, Cloud-to-Thing, Serverless</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The deployment of Machine Learning (ML) across the Cloud-to-Thing continuum is a significant challenge, particularly when considering the heterogeneity of the available devices. This work introduces a generalpurpose framework for Tiny Machine Learning Operations (TinyMLOps) that enables the orchestration of ML workflows in distributed, serverless environments spanning cloud, edge, and Internet of Things (IoT) nodes. The architecture follows an event-driven model in which each node-depending on its capabilities-implements a minimal mandatory set of components and an optional set of extended functionalities. Nodes advertise their capabilities and collaboratively fulfill MLOps tasks by handling requests they can satisfy. This decentralized approach allows for dynamic, context-aware distribution of operations across networks of heterogeneous resourceconstrained devices. The framework is validated by means of a prototype implementation and early experiments involving STM32-based microcontrollers and Raspberry Pi edge devices.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Inference</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        devices [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. TinyML can also integrate with the Semantic Web of Everything (SWoE) paradigm [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which
extends the Semantic Web vision to everyday physical objects for creating a new class of intelligent,
      </p>
      <p>CEUR
Workshop</p>
      <p>
        ISSN1613-0073
interoperable and context-aware autonomous systems able to perform local knowledge-based inference
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], facilitating dynamic distributed decision-making and process automation across heterogeneous
environments. Nonetheless, the integration of TinyML into complete ML pipelines remains an open
challenge. Continuous model improvement depends on feedback received from deployed models.
On the contrary, edge devices cannot always store or transmit raw data due to bandwidth, power or
privacy constraints. This limits automated retraining workflows that represent an important aspect of
MLOps. Additionally, the development of compressed models often requires specialized tools, which
may not integrate smoothly with Of-The-Shelf (OTS) ML development platforms. Finally, managing
the complete lifecycle of TinyML models, consisting of training, versioning, deployment and validation,
requires a lightweight yet robust MLOps framework adapted to the constraints of edge and IoT devices.
      </p>
      <p>This paper attempts to address these limitations by proposing a distributed, serverless, event- and
message-driven framework for TinyMLOps (TinyMLOps), aiming to support end-to-end ML lifecycle
execution across the Cloud-to-Thing continuum. The framework defines a modular architecture that
exposes a set of minimal and optional components and communicates through a Publish/Subscribe
(pub/sub) pattern. The primary contributions can be summarized as follows: (i) a computing node
architecture supporting serverless TinyMLOps across heterogeneous cloud, edge, and IoT devices; (ii) a
prototype implementation leveraging Amazon Web Services (AWS) IoT Core, AWS Greengrass, and
STM32-based nodes running the Zephyr Real Time Operating System (RTOS); (iii) an early experimental
evaluation quantifying latency and processing time across networked nodes under realistic constraints.</p>
      <p>The reminder of this paper is structured as follows. Section 2 reviews the foundational concepts
underlying this work. Section 3 describes the proposed conceptual architecture, which is used in
Section 4 to implement a prototype and perform an initial functional performance evaluation. Finally,
conclusion and future work directions are discussed in Section 5.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Background</title>
      <p>To ensure the paper is self-contained, this section presents the key technologies discussed throughout,
before a discussion of relevant related work.</p>
      <sec id="sec-3-1">
        <title>2.1. Serverless computing in the Cloud-to-Thing Continuum</title>
        <p>
          The Cloud-to-Thing continuum is a distributed computing paradigm that integrates IoT field devices,
edge nodes, and centralized cloud data centers. Serverless computing has emerged as a compelling
architectural paradigm encapsulating code in stateless functions, which can be deployed, invoked
and scaled automatically by the infrastructure, without the requirement for the user to manage the
underlying servers. Although initially limited to cloud environments, this paradigm has since expanded
to comprehend both edge and IoT contexts. In the cloud, serverless platforms automatically scale and
allocate resources in response to incoming request workloads. Similarly, these principles are now
being applied at the network edge [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Deploying serverless functions across the cloud, fog, and edge
continuum ofers substantial benefits. This methodology enables data processing closer to the point
of generation of data, aligning with emerging paradigms in distributed computing [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. It leads to
improvements in system reliability and end-to-end response times.
        </p>
        <p>
          Although serverless computing provides additional flexibility in edge environments, there are
significant technical challenges to solve. Compared to cloud servers, edge devices and IoT nodes typically
have restricted Central Processing Unit (CPU), memory, and power resources. If not properly
managed, these constraints can lead to increased response times or higher operational costs. Additionally,
the “design once, provide anywhere” goal [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] for serverless functions is hindered by hardware and
software heterogeneity, requiring attention to resource management and compatibility. Addressing
these challenges often involves the use of intelligent function allocation strategies and lightweight
virtualization techniques [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Artificial Intelligence at the Network’s Edge with TinyML</title>
        <p>
          Edge Intelligence is the process of implementing inference capabilities directly on resource-constrained
nodes, thereby enabling local data analysis and actuation without requiring persistent cloud
connectivity. This paradigm addresses critical requirements in IoT deployments, including low latency, limited
bandwidth, and data locality. A key enabler is TinyML, which allows microcontroller-class devices to
perform machine learning inference and, in some cases, training while complying with the resource
constraints of such environments. By executing models directly on-device, TinyML minimizes
communication overhead and latency, supporting real-time or ofline use cases, such as industrial anomaly
detection, environmental monitoring, and wearable computing [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>However, deploying Artificial Intelligence (AI) to edge hardware introduces constraints which do
not afect traditional ML pipelines. Models must be executed on bare-metal or thin RTOS runtime
environments that lack standard libraries and frameworks. As a result, model rollout and maintenance
are complicated by device heterogeneity and limited Application Programming Interfaces (APIs).
Additionally, they must be extensively re-engineered to meet resource constraints using techniques such
as quantization and pruning. To support embedded environments, Edge Intelligence requires
rethinking conventional AI workflows, with the focus shifting to the availability of lightweight deployment
mechanisms, support for heterogeneous endpoints, and runtime eficiency.</p>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. Related Work</title>
        <p>
          When extending MLOps to the Cloud-to-Thing continuum, the reproducibility, version control, and
continuous model delivery practices [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] have to be applied to distributed infrastructures featuring
resource-constrained devices. This implies a smart and context-aware distribution of tasks among
network nodes according to their energy, compute, and latency requirements. Typically, real-time
inference is performed closer to the data source, while training and storage are carried out in the cloud
[14].
        </p>
        <p>A primary objective is to optimize network usage and resource consumption by decomposing ML
pipelines into modular steps that can be dynamically allocated across the cloud, fog, and edge layers.
These steps are deployed on platforms capable of programmatically instantiating runtime environments,
configuring heterogeneous hardware and orchestrating deployment [ 15].</p>
        <p>TinyML oriented frameworks further specialize these operations for highly constrained IoT devices.
Emerging solutions target MicroController Units (MCUs) with limited memory and energy budgets,
aiming to orchestrate ML pipelines that include inference, monitoring, and limited training. These
systems address challenges such as intermittent connectivity, firmware-level deployment, and model
adaptation in response to real-world operational variability [16].</p>
        <p>Recent contributions have also detailed operational challenges and proposed enabling toolkits for
TinyMLOps in real-world deployments. In particular, secure Over The Air (OTA) updates and
performance evaluation of deployed models on embedded platforms have been addressed through dedicated
platforms such as RIOT-ML, a toolkit built on the RIOT operating system (https://www.riot-os.org/)
that enables secure model deployment and runtime assessment on constrained devices [17].</p>
        <p>The issue of efective model update on energy-constrained devices has been investigated in
intelligent vehicular networks, emphasizing mechanisms that tolerate high mobility, variable latency, and
intermittent connectivity while preserving model integrity and consistency [18]. Moreover,
eventdriven architectures on ultra-low-power MCUs have been proposed for real-time classification tasks,
highlighting the feasibility of continuous inference and partial reconfiguration even on constrained
edge devices [19].</p>
        <p>A broader analysis of the current limitations and requirements for scaling Edge Intelligence with
TinyMLOps across heterogeneous hardware infrastructures is provided in [20], which outlines the
foundational components needed for widespread adoption.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Serverless Cloud-to-IoT Framework for TinyMLOps</title>
      <p>Starting from a summary of the main challenges of moving from MLOps to TinyMLOps, this section
describes the proposed framework based on serverless computing. The goal is to provide a
generalpurpose platform, supporting distributed serverless functions for any type of application involving IoT,
edge, and cloud layers.</p>
      <sec id="sec-4-1">
        <title>3.1. From MLOps to TinyMLOps</title>
        <p>To transition from standard MLOps to TinyMLOps architectures in production, the ML lifecycle must
be adjusted to accommodate the following constraints of embedded deployments regarding the model
footprint, runtime compatibility, deployment mechanisms, and monitoring infrastructure [16].
a. Hardware-aware model packaging. To ensure compatibility with IoT, ML models must ensure
deterministic behavior, bounded latency, and fixed memory allocation. Techniques such as fixed-point
quantization, pruning, and code generation are required to match microcontroller-level constraints
[19].
b. Cross-platform build and deployment. Heterogeneous toolchains, memory layouts, and instruction
sets complicate portability [19]. Due to the absence of a standardized runtime across microcontroller
architectures, it is necessary to develop cross-compilation pipelines and platform-specific abstractions.
c. OTA and lifecycle management. Updating deployed models in the field requires OTA mechanisms
compatible with limited bandwidth and memory. Some MCUs lack OTA support or cannot aford update
operations due to energy and storage constraints, requiring pre-baked models during provisioning and
fallback strategies for critical updates [19].
d. Minimalist runtime monitoring. Continuous telemetry is infeasible on resource-constrained nodes.
Instead, lightweight mechanisms such as error flags, status counters, or sampled summaries must be
used. These are typically aggregated through gateways or edge nodes [21].
e. Operational resilience across nodes. Devices at diferent layers of the Cloud-to-Thing spectrum exhibit
variability in compute, memory, connectivity, and supported programming models (e.g., native code,
containers, or serverless blueprints). TinyMLOps must account for this heterogeneity when planning
deployment strategies, fallback behaviors, and runtime coordination [21].</p>
        <p>Despite significant advances in TinyMLOps, existing platforms encounter dificulties in handling
deployment and lifecycle management across multiple environments. Specifically, typical solutions
frequently assume a homogeneous and reliable software infrastructure, which is not available in many
IoT contexts. The subsequent section introduces a modular architecture aimed at addressing these gaps.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Architecture</title>
        <p>The fundamental element of the architecture is the Computing Node, representing any
computingcapable component. All nodes communicate over a Message Bus, which adheres to a pub/sub
communication model. In this setup, a Message Bus Server manages multiple topics. Multiple Message
Bus Clients can subscribe to or publish messages on any topic—publication is allowed even without a
prior subscription. When the Message Bus Server receives a message on a given topic, it broadcasts
it to all subscribed Message Bus Clients. This model aligns with widely adopted IoT protocols such
as Constrained Application Protocol (CoAP) (via the observer pattern) [22] and Message Queuing
Telemetry Transport (MQTT) [23], thereby facilitating practical implementation. This architecture
defines the internal structure and responsibilities of the Computing Node. Within the Cloud-to-Thing
continuum, it is unfeasible to enforce a uniform software stack across all nodes. For instance, MCUs
are constrained in both memory and storage, making deployment of modules such as Local Storage
impractical. Furthermore, due to their low duty cycle and energy constraints, these devices are generally
unsuitable to act as Message Bus Servers. Based on these observations, the framework comprises two
classes of computing nodes, depicted in Figure 1 and Figure 2 respectively: the Minimal Computing
Node, which includes only the mandatory software components defined by the architecture, and the
Complete Computing Node, which showcases both required and optional components.</p>
        <sec id="sec-4-2-1">
          <title>Message Bus Connector</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>Message</title>
        </sec>
        <sec id="sec-4-2-3">
          <title>Bus Client</title>
        </sec>
        <sec id="sec-4-2-4">
          <title>Core Service</title>
        </sec>
        <sec id="sec-4-2-5">
          <title>Built-in functions</title>
          <p>f1
f2</p>
          <p>Those two examples illustrate two extremes. Concrete implementations may include any
configuration that integrates the mandatory modules shown in Figure 1, along with any combination of the
optional modules and connectors depicted in Figure 2. This design choice allows the proposal to be
lfexible enough and to accommodate a large spectrum of hardware devices, spanning from less capable
MCUs to cloud nodes. In particular, the mandatory functional blocks are the following:
• Message Bus Client: receives messages from the subscribed topics, publishes messages on the
desired topic, and manages the connection with the nearest Message Bus Server;
• Message Bus: the only mandatory networking interface for the framework. It must implement
the communication stack required to allow the Message Bus Client to connect and exchange data
with the closest Message Bus Server;
• Core Service: manages the node and orchestrates the lifecycle of the other internal components.</p>
          <p>It processes incoming messages from the Message Bus Client, parses the response, and ascertains
whether the current node has the ability to respond to the request or the message can be forwarded
to another Message Bus Server that is capable of satisfying it. If both solutions fail, an error
message must be returned. In addition, this component allows third parties to schedule more
complex tasks, transfer data, move functions, and manage resources efectively by exposing the
local node’s capabilities to the external world;
• Built-in functions: zero or more hard-coded functions that are embedded in the device. They cannot
be migrated, moved, or updated to other nodes, making them particularly relevant for devices
that cannot support OTA updates or do not have suficient resources to run a serverless runtime.
Despite their static nature, these functions remain valuable within the overall architecture. This
block can be skipped if the device does not do anything, like fog nodes that just relay between
the edge and the cloud platform.</p>
          <p>A more capable node can incorporate additional components. While they are not essential for basic
compliance, they provide the architecture with the flexibility to execute complex tasks throughout the
entire Cloud-to-Thing continuum by opportunistically utilizing all available resources.
• Local Storage: provides persistent or semi-persistent storage capabilities. It can be used to cache
intermediate results, store logs, or maintain state across reboots or network partitions. Nodes
equipped with this module can support more complex functions and participate in data-centric
workflows.
• Message Bus Server : in addition to acting as a client, a node may optionally implement the message
bus server component, enabling it to work as a communication hub for other nodes. This is
particularly useful for edge clusters, where a device can act as a local coordinator or aggregation
point for nearby constrained nodes.
• Function Runtime: enables the dynamic deployment, execution, and migration of serverless
functions across nodes. Unlike Built-in functions functions, which are statically embedded, the
function runtime supports computational ofloading, adaptive deployment, and orchestration of
distributed tasks.
• Sensor Connector and Actuator Connector : logical connectors allowing the node to interact with
the physical environment. Sensor connectors receive input from attached sensing peripherals,
while actuator connectors allow the node to command physical components.
• Management Connector : allows external systems and human interfaces to carry out advanced
node administration. Through this connector, remote entities can inspect status, retrieve metrics,
configure components, update deployed functions, or control lifecycle transitions.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Lifecycle Operations</title>
        <p>Several workflows can be implemented using the architecture outlined in Section 3.2. Operations are
executed and coordinated in a decentralized manner. In general, a node joins the network by establishing
a connection with a neighboring Message Bus Server. A node can establish connections with multiple
neighbors. A unique identifier is always assigned to the node. When the connection is established, the
node publishes its capabilities. The Core Service is responsible for storing this information and routing
the message appropriately.</p>
        <p>The topics are organized following the pub/sub communication pattern, with requests being sent
to the Message Bus Server in a general topic and responses being published on topics exclusively
designated for the node-server pair. As shown in Figure 3, the message is forwarded to a node in the
network that possesses the desired capability in a dedicated and private topic if the server-acting node
is unable to fulfill the request. The server may attempt to forward the message to another Message Bus
Server if none are found. The node is issued an error message if an error occurs or if no node capable
of responding is found.</p>
        <p>The following high-level operations form the basis of typical TinyMLOps workflows. Each operation is
executed via message exchange over the Message Bus Server, and routed by the Core Service component
based on the discovery of available capabilities. All data and model artifacts are associated with
descriptors that specify their type and structure. Data descriptors may specify dimensionality, format,
and usage type (e.g., training, test, validation). For models, descriptors define input shape, output
type, and associated task metadata. The Core Service is responsible for registering and matching these
descriptors against operation requirements. An operation is executed only if the associated descriptors
satisfy the input constraints of the function to be applied.</p>
        <p>Table 1 summarizes the operations available in the proposed framework that define the basic building
blocks of TinyMLOps workflows. These functions are either built-in or serverless, and they are each
initiated by the Core Service based on messages received in the Message Bus. The logic behind the
Requesting</p>
        <p>Node
Operation
Request
Result
Delivery</p>
        <p>Capability
Resolution
Capability</p>
        <p>Result</p>
        <p>Forward To
Another Server</p>
        <p>Dispatch Operation
Operation Result</p>
        <p>Dispatch
Remote Operation</p>
        <p>Remote
Operation Result</p>
        <p>Execute
Function
removal and deletion of old, stale data is implemented of the Core Service. In the context of TinyMLOps,
this set of operations can be combined to implement complex distributed architectural patterns.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Early Experiments</title>
      <p>This section presents a preliminary experimental evaluation carried out to assess the feasibility of a
prototype implementation of the proposed TinyMLOps framework. This prototype has undergone
functional testing to verify whether the proposed architecture accurately responds to system events and
user requests, as well as to identify the potential benefits and limitations of adopting a Cloud-to-Thing
approach to TinyMLOps.</p>
      <sec id="sec-5-1">
        <title>4.1. Prototype Implementation</title>
        <p>The prototype implementation adopts the Cloud-to-Thing TinyMLOps architecture presented in
Section 3.2. A guiding design principle during experimentation is the reuse of OTS components whenever
possible. The prototype consists of the following nodes:
• Cloud Node: AWS (https://aws.amazon.com/) serves as the public cloud infrastructure used to
implement the Cloud Node for this experiment.
• Edge Node: a Raspberry Pi 4 Model B, representative of a generic microprocessor-class device.
• IoT Nodes: twelve STM32 B-L4S5I-IOT01A Discovery kits, simulating a network of distributed</p>
        <p>IoT devices equipped with sensors for environmental monitoring.</p>
        <p>Hardware specifications for the Edge and IoT nodes are listed in Table 2. The AWS platform is used
to implement the Cloud Node, which supports the complete set of optional modules and complies
with all mandatory components. AWS services such as Lambda (for the Function Runtime), Simple
Storage Service (S3) (for Local Storage), and IoT Core (for the Message Bus Client and Message Bus
Connector) map directly to the Complete Computing Node components described in Section 3.2 and
shown in Figure 2. The Management Connector is represented by the AWS Console and AWS Software
Development Kit (SDK), which provide global visibility and control over the system. The Core Service
can be interpreted as the backbone code of the cloud provider to orchestrate and expose those services.
Sensor Connector and Actuator Connector are not implemented at this level, as AWS cloud nodes do
not directly interface with the physical environment.</p>
        <p>The Edge Node is implemented using a Raspberry Pi 4 Model B running AWS IoT Greengrass. The
node is provisioned directly from the AWS Console, along with Moquette, a Java-based MQTT server
acting as both a Message Bus Server and a Message Bus Client for nearby IoT devices. The Local Storage
component is realized as a custom Greengrass component wrapping a Redis (https://redis.io/) instance.
The Greengrass Core Service itself represents the Core Service, which includes a Function Runtime
capable of executing and migrating serverless Lambda functions from the Cloud Node. The node can
also be configured to communicate with physical sensors and actuators; however, these capabilities
have not been implemented in the proposed experiments.</p>
        <p>The IoT Node is a resource-constrained device based on an STM32 B-L4S5I-IOT01A development
board, running custom firmware built with the Zephyr RTOS (https://zephyrproject.org/). Combined
with some domain-specific logic, this firmware acts as the Core Service. Due to its limited resources,
the node supports only the minimal configuration required for compliance. Specifically, it includes a
Message Bus Client implemented using the built-in Zephyr API1 and a collection of statically defined
Built-in Functions. The onboard sensors are accessed using the integrated STM32 Hardware Abstraction
Layer (HAL) and Zephyr drivers.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Materials, Methods and Results</title>
        <p>This section reports on a set of integration and communication tests conducted with the prototype
described in Section 4.1 to validate the feasibility of the TinyMLOps architecture. Furthermore, a
simplified data collection task has been implemented to evaluate the messaging pipeline performance
in terms of communication latency and node-level processing time.</p>
        <p>0
5</p>
        <p>10 15 20</p>
        <sec id="sec-5-2-1">
          <title>Message Number</title>
          <p>25
30
0
5</p>
          <p>10 15 20</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>Message Number</title>
          <p>25
30</p>
        </sec>
        <sec id="sec-5-2-3">
          <title>Edge Node</title>
        </sec>
        <sec id="sec-5-2-4">
          <title>Cloud Node</title>
          <p>The experimental setup begins with the configuration of both AWS IoT Core and AWS IoT Greengrass.
This involves the registration of each device as a Thing in IoT Core, the provisioning of X.509 certificates
for mutual Transport Layer Security (TLS) authentication, and the attachment of proper IoT policies
to establish permissions. To authorize peer-to-peer communication and access to other AWS services,
such as S3, each device is also assigned appropriate Identity and Access Management (IAM) roles. The
second step involves configuring the edge device by installing the Greengrass Core Device software. Once
the device appears on the AWS Console, a Greengrass deployment is initiated, including all essential
components: Message Bus Server, Message Bus Client, Function Runtime, Local Storage, and the relevant
Lambda functions. The third setup step focuses on the IoT devices. While the edge device benefits from
OTS modules provided by AWS Greengrass, the IoT devices requires a custom software stack developed
from scratch. To abstract board-specific hardware and facilitate code portability, Zephyr OS is exploited
as the firmware’s core RTOS. Identical firmware is flashed onto all twelve devices, difering only in
authentication credentials and target destination: six devices are configured to send messages to the
cloud node directly (  ,  = 1, … , 6 ), and six to the edge node (  ,  = 1, … , 6 ).
1https://docs.zephyrproject.org/latest/connectivity/networking/api/mqtt.html</p>
          <p>The edge and cloud nodes are configured to run a Lambda function that saves incoming messages in
local storage during the experiment. Each IoT device sends one message per second for five seconds,
using data from onboard sensors and publishing it via the message client. Upon receipt, the edge
(resp. cloud) node sends an acknowledgment on a dedicated topic through its Message Bus Client.
Communication latency is measured by the IoT device as the time elapsed between sending the message
and receiving the first acknowledgment. A second acknowledgment is sent by the edge (resp. cloud)
node after data is successfully stored, allowing the IoT device to calculate the processing time as the
interval between the two acknowledgments.</p>
          <p>Figure 4 and Table 3 present the experimental results. In the scatter plot, messages are ordered by the
timestamp of the data message received from the IoT devices, as recorded by the respective Message Bus
Server. The cloud node has experienced a substantially higher latency during the initial few messages
it receives, which is likely due to the cold start behavior of Lambda functions. As expected, the edge
node demonstrates significantly lower communication latency compared to the cloud. In contrast, the
cloud node’s superior computational resources are reflected in its shorter processing times. In some
cases, the edge node has achieved processing times comparable to the cloud, but with greater variance.
Considering communication latency, the edge node tends to have a shorter overall turnaround time.
These findings suggest the overall operational eficiency can be optimized through intelligent and
resource-aware task allocation. However, without careful management, task distribution may lead to
suboptimal outcomes and even degrade performance compared to a purely cloud-based solution.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions</title>
      <p>This article introduced a serverless, modular framework for TinyMLOps that enables the execution
of ML workflows in Cloud-to-Thing architectures. The system employs a lightweight event-driven
architecture and defines a set of minimal and optional software components, along with a corresponding
set of operations to support a variety of devices, including microcontroller-class IoT nodes, edge nodes
and cloud infrastructures. The feasibility of the proposed architecture has been demonstrated by means
of a prototype implementation that employed MCU nodes running Zephyr RTOS-based firmware and
AWS services. A preliminary experiment has validated the prototype’s functional compliance and
highlighted both the benefits and limitations of the edge computing approach compared to a centralized
architecture. In particular, response time and resource utilization can be optimized by intelligently
ofloading certain real-time critical operations from the cloud to edge devices, despite the latter’s limited
performance capabilities.</p>
      <p>Future work will concern several areas. Firstly, an extension of the orchestration capabilities will
be defined through lightweight and resource-aware approaches to the distribution of computational
tasks [24]. The main goal is to allow the system to automate complex tasks based on user requests
and contextual requirements. Integrating support for federated learning and formalizing service-level
guarantees under resource constraints are other two key areas of development. Thorough case studies
will also be conducted to assess the impact of task distribution across the cloud, edge, and IoT nodes.
This study will include a scalability analysis and the characterization of the framework’s fault tolerance
and responsiveness.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The work has been partially supported by the Space It Up project, funded by the Italian Space Agency
(ASI) and the Ministry of University and Research (MUR), under contract n. 2024-5-E.0 - CUP n.
I53D24000060005”, and by the NXT Digital Platform project (grant M1ERDW2), co-funded by NTT
DATA Italia S.p.A. and European Regional Development Fund for Apulia Region 2014/2020 Operating
Program.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[14] E. Raj, D. Bufoni, M. Westerlund, K. Ahola, Edge MLOps: An Automation Framework for AIoT
Applications, in: 2021 IEEE International Conference on Cloud Engineering (IC2E), IEEE, 2021, pp.
191–200.
[15] R. Miñón, J. Diaz-de Arcaya, A. I. Torre-Bastida, P. Hartlieb, Pangea: an MLOps tool for
automatically generating infrastructure and deploying analytic pipelines in edge, fog and cloud layers,
Sensors 22 (2022) 4425.
[16] M. Antonini, M. Pincheira, M. Vecchio, F. Antonelli, Tiny-MLOps: a framework for orchestrating
ML applications at the far edge of IoT systems, in: 2022 IEEE International Conference on Evolving
and Adaptive Intelligent Systems (EAIS), 2022, pp. 1–8. doi:10.1109/EAIS51927.2022.9787703.
[17] Z. Huang, K. Zandberg, K. Schleiser, E. Baccelli, RIOT-ML: toolkit for over-the-air secure updates
and performance evaluation of TinyML models, Annals of Telecommunications 80 (2025) 283–297.
doi:10.1007/s12243- 024- 01041- 5.
[18] T. K. S. Flores, I. Silva, M. B. Azevedo, T. d. A. d. Medeiros, M. d. A. Medeiros, D. G. Costa, P. Ferrari,
E. Sisinni, Advancing Tiny Machine Learning Operations: Robust Model Updates in the Internet
of Intelligent Vehicles, IEEE Micro 45 (2025) 76–86. doi:10.1109/MM.2024.3354323.
[19] M. T. Lê, J. Arbel, TinyMLOps for real-time ultra-low power MCUs applied to frame-based
event classification, in: Proceedings of the 3rd Workshop on Machine Learning and Systems,
EuroMLSys ’23, Association for Computing Machinery, New York, NY, USA, 2023, p. 148–153.
doi:10.1145/3578356.3592586.
[20] S. Leroux, P. Simoens, M. Lootus, K. Thakore, A. Sharma, TinyMLOps: Operational Challenges for
Widespread Edge AI Adoption, in: 2022 IEEE International Parallel and Distributed Processing
Symposium Workshops (IPDPSW), 2022, pp. 1003–1010. doi:10.1109/IPDPSW55747.2022.00160.
[21] C. Banbury, V. Janapa Reddi, A. Elium, S. Hymel, D. Tischler, D. Situnayake, C. Ward, L. Moreau,
J. Plunkett, M. Kelcey, M. Baaijens, A. Grande, D. Maslov, A. Beavis, J. Jongboom, J. Quaye, Edge
Impulse: An MLOps Platform for Tiny Machine Learning, in: D. Song, M. Carbin, T. Chen (Eds.),
Proceedings of Machine Learning and Systems, volume 5, Curan, 2023, pp. 254–268.
[22] C. Bormann, A. P. Castellani, Z. Shelby, CoAP: An application protocol for billions of tiny Internet
nodes, IEEE Internet Computing 16 (2012) 62–67.
[23] OASIS Standard, MQTT Version 3.1.1, https://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.</p>
      <p>1-os.html, 2014. OASIS Standard, 29 October 2014.
[24] G. Loseto, F. Scioscia, M. Ruta, F. Gramegna, S. Ieva, C. Fasciano, I. Bilenchi, D. Loconte, Osmotic
Cloud-Edge Intelligence for IoT-based Cyber-Physical Systems, Sensors 22 (2022) 2166.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Loconte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gramegna</surname>
          </string-name>
          , I. Bilenchi,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fasciano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Loseto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scioscia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ruta</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Di Sciascio</surname>
          </string-name>
          ,
          <article-title>Serverless Microservice Architecture for Cloud-Edge Intelligence in Sensor Networks</article-title>
          ,
          <source>IEEE Sensors Journal</source>
          <volume>25</volume>
          (
          <year>2024</year>
          )
          <fpage>7875</fpage>
          -
          <lpage>7885</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Rajapakse</surname>
          </string-name>
          , I. Karunanayake,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <article-title>Intelligence at the extreme edge: A survey on reformable TinyML</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Njor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hasanpour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Madsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Fafoutis</surname>
          </string-name>
          ,
          <article-title>A Holistic Review of the TinyML Stack for Predictive Maintenance</article-title>
          , IEEE Access (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ficco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Guerriero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Milite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Palmieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pietrantuono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <article-title>Federated learning for IoT devices: Enhancing TinyML with on-board training</article-title>
          ,
          <source>Information Fusion</source>
          <volume>104</volume>
          (
          <year>2024</year>
          )
          <fpage>102189</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Loconte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Loseto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scioscia</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Ruta, Expanding the cloud-to-edge continuum to the IoT in serverless federated learning</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          <volume>155</volume>
          (
          <year>2024</year>
          )
          <fpage>447</fpage>
          -
          <lpage>462</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ruta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scioscia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Bilenchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gramegna</surname>
          </string-name>
          , G. Loseto,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinto</surname>
          </string-name>
          ,
          <article-title>A multiplatform reasoning engine for the Semantic Web of Everything</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>73</volume>
          (
          <year>2022</year>
          )
          <volume>100709</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ruta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scioscia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Loseto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Di</given-names>
            <surname>Sciascio</surname>
          </string-name>
          ,
          <article-title>Machine Learning in the Internet of Things: a Semantic-enhanced Approach</article-title>
          ,
          <source>Semantic Web Journal</source>
          <volume>10</volume>
          (
          <year>2019</year>
          )
          <fpage>183</fpage>
          -
          <lpage>204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <source>FireFace: Leveraging Internal Function Features for Configuration of Functions on Serverless Edge Platforms, Sensors</source>
          <volume>23</volume>
          (
          <year>2023</year>
          )
          <fpage>7829</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Calavaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cardellini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lo Presti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Russo</surname>
          </string-name>
          <string-name>
            <surname>Russo</surname>
          </string-name>
          , Beyond Cloud:
          <article-title>Serverless Functions in the Compute Continuum</article-title>
          ,
          <source>SN Computer Science</source>
          <volume>6</volume>
          (
          <year>2025</year>
          ).
          <source>doi:10.1007/s42979- 025- 03699- 7.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Ahmadon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yamaguchi</surname>
          </string-name>
          ,
          <article-title>The Design Once, Provide Anywhere Concept for the Internet of Things Service Implementation</article-title>
          ,
          <source>Evolution of Information</source>
          , Communication and
          <string-name>
            <given-names>Computing</given-names>
            <surname>System</surname>
          </string-name>
          (
          <year>2023</year>
          )
          <fpage>43</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dautov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barišić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Da</surname>
          </string-name>
          <string-name>
            <surname>Rocha</surname>
          </string-name>
          ,
          <article-title>Function-as-a-service for the cloud-to-thing continuum: a systematic mapping study</article-title>
          ,
          <source>in: IoTBDS 2023-8th International Conference on Internet of Things, Big Data and Security</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>82</fpage>
          -
          <lpage>93</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Abadade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Temouden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bamoumen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Benamar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chtouki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Hafid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Comprehensive</given-names>
            <surname>Survey on TinyML</surname>
          </string-name>
          ,
          <source>IEEE Access 11</source>
          (
          <year>2023</year>
          )
          <fpage>96892</fpage>
          -
          <lpage>96922</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2023</year>
          .
          <volume>3294111</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Alla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Adari</surname>
          </string-name>
          ,
          <article-title>What is MLOps?, in: Beginning MLOps with MLFlow: Deploy Models in AWS SageMaker</article-title>
          , Google Cloud, and Microsoft Azure, Springer,
          <year>2020</year>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>