<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>From Architecture to Implementation: Robotic Data Pipelines for Digital Agriculture GIS (Application Paper)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Piotr Skrzypczyński</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Filip Baranowski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Krzysztof Ćwian</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maciej Krupka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jakub Pilarski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antoni Sopata</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Wrembel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Poznan University of Technology</institution>
          ,
          <addr-line>Poznań</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>The integration of mobile robotic platforms with Geographic Information Systems (GIS) is a key enabler of data-driven digital agriculture. While recent research has proposed architectures for integrating heterogeneous robotic and sensor data with GIS tools in cloud-edge environments, their practical realisation using existing data engineering technologies remains underexplored. This paper reports on the development of actual data processing pipelines compliant with the GIS4IoRT architecture, developed within our CHIST-ERA project. Focusing on precision agriculture use cases, we study representative spatio-temporal queries using real data collected from ifeld robots and implement them using three distinct data integration approaches that difer in a programming model, system architecture, and execution paradigm. We analyse design choices, implementation efort, and system limitations, highlighting the trade-ofs between the approaches. The results ofer practical guidance on instantiating the GIS4IoRT architecture and on designing robotic-GIS data integration pipelines.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;data integration</kwd>
        <kwd>GIS</kwd>
        <kwd>robotics</kwd>
        <kwd>spatio-temporal stream processing</kwd>
        <kwd>edge-cloud data processing</kwd>
        <kwd>sustainable agriculture</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and motivation</title>
      <p>Digital agriculture increasingly relies on mobile robots and sensor networks to monitor crops, soil
conditions, and farming operations at high spatial and temporal resolution [1]. Ground and aerial robots
equipped with GNSS, cameras, LiDAR, and environmental sensors continuously produce large volumes
of heterogeneous data streams that must be processed, integrated, and analysed in near-real-time
[2]. Geographic Information Systems (GIS) play a central role by providing spatial reference data,
visualisation, and spatial analytics required for decision support in precision agriculture [3].</p>
      <p>Integrating robotic data with GIS tools remains challenging. Robotic platforms typically rely on
specialised middleware such as ROS 2 [4], custom data formats, and streaming communication models,
whereas GIS systems have traditionally been designed for static or slowly evolving spatial datasets [5].
Bridging this gap requires architectures capable of handling multi-modal spatio-temporal data streams,
tolerating intermittent connectivity, and supporting both continuous and historical queries [6].</p>
      <p>Several stream processing architectures have been proposed for IoT and spatio-temporal data analytics,
including Lambda-style designs [7] and cloud-based streaming platforms built on engines such as
Apache Storm [8] and Kafka [9]. However, their integration with GIS environments is typically
not discussed. These technologies ofer very limited support for GIS-native spatial representations,
geofencing semantics, or unified handling of continuous and historical spatial queries. This highlights
the need for architectures that explicitly bridge stream processing systems and GIS-oriented data
management.</p>
      <p>To the best of our knowledge, there is no work on building and comparing integration architectures
for data produced by the Internet of Robotic Things (IoRT) and their orchestration with GIS tools. For
this reason, the CHIST-ERA GIS4IoRT project addresses these challenges by proposing a plug-and-play,
cloud-based middleware architecture for integrating data produced by the IoRT with GIS environments
(see [10]). The architecture combines edge–fog–cloud computing with a mediated data integration
and querying layer, Quality of Service (QoS) and Quality of Data (QoD) aware query execution, and
compliance with GIS standards.</p>
      <p>While the GIS4IoRT architecture specifies the required functionality, its practical realisation using
existing data processing frameworks remains largely unexplored. In particular, it is unclear how diferent
implementation choices afect system complexity, flexibility, and performance [ 11]. Addressing this gap
is essential for translating architectural designs into practical systems.</p>
      <p>Our previous corresponding paper [12] introduced the architecture, identified key research challenges,
and outlined its applicability to several domains, including precision agriculture. This paper adopts
an implementation-oriented perspective. We demonstrate how the GIS4IoRT architecture can be
instantiated through three alternative processing pipelines based on diferent programming models and
execution paradigms, namely:
1. a declarative SQL-based streaming approach built on a messaging backbone;
2. a low-level distributed stream processing framework with explicit state management and spatial
indexing;
3. an IoRT-oriented edge–cloud stream processing system designed for execution across
heterogeneous devices.</p>
      <p>To ensure comparability, all three processing pipelines implement the same core functionality, centred
on three representative queries derived from real precision agriculture scenarios:
1. geofencing query: determines whether a robot is outside a specified plot, evaluated as a
continuous query with periodic updates (1a), and as a historical query over persistently stored
data (1b);
2. collision detection query: detects potential spatial conflicts between robots based on their
relative positions over time;
3. IoT sensor proximity query: raises an alarm when a robot is within a specified distance from a
sensor and the sensor reading exceeds a defined threshold.</p>
      <p>All queries combine real robot odometry data obtained from ROS file recordings (called rosbags)
with static plot geometries stored in GIS-compatible formats (PostGIS). The data reflect real conditions
encountered in agricultural field deployments in France.</p>
      <p>
        The novelty of this paper lies in: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) providing end-to-end implementations of spatio-temporal
robotic–GIS queries on real data, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) systematically comparing three diferent processing paradigms
under a common architectural framework, and (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) analyzing practical trade-ofs in terms of development
efort, flexibility, and suitability for GIS4IoRT-style deployments.
      </p>
      <p>The remainder of the paper is organized as follows. Section 2 briefly recalls the GIS4IoRT architecture
as the backbone of our experiments. Section 3 presents the three implementation frameworks and
their design choices. Section 4 reports experimental results and qualitative comparisons based on real
agricultural robotic data. Section 5 concludes the paper and outlines directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Architecture</title>
      <p>This paper builds on the GIS4IoRT architecture previously introduced as part of a CHIST-ERA project
[12]. This section provides a condensed view of its key requirements and functional components, with
a particular emphasis on those elements that are instantiated in the implementations described in this
paper.</p>
      <sec id="sec-2-1">
        <title>2.1. Goals and requirements</title>
        <p>The GIS4IoRT architecture targets dynamic, heterogeneous, and spatio-temporal data processing,
produced by mobile robots operating in environments such as agricultural fields. The requirements
relevant to this study are as follows:
• heterogeneous stream integration: ingestion and correlation of robotic telemetry streams with
static GIS datasets, including trajectories and spatial geometries [6];
• spatio-temporal query processing: support for spatial predicates combined with temporal
semantics, enabling continuous real-time and historical queries [11];
• streaming and replay-based execution: low-latency processing of live streams and retrospective
analysis via stream replay or batch execution;
• dynamic deployment: flexible query configuration in the presence of mobile data sources and
changing connectivity [13];
• GIS-compatible outputs: emission of results in standard spatial formats consumable by GIS
applications.</p>
        <p>These requirements directly inform the architectural layering and underpin the choice of processing
frameworks compared in Section 3.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Overview of the GIS4IoRT architecture</title>
        <p>At a high level, the GIS4IoRT architecture follows a mediated data integration model [14, 15] deployed
across an edge–fog–cloud continuum [6]. As illustrated in Fig. 1, the architecture consists of four main
conceptual layers:
• IoRT data sources: mobile and stationary devices, including ground and aerial robots, sensors,
cameras, and LiDARs. In this work, robots run ROS 2 and produce time-stamped localisation
streams (e.g., nav_msgs/Odometry or sensor_msgs/NavSatFix) representing spatial
trajectories. At the same time, these data are uploaded into a central repository that stores also metadata
and ontologies for mapping data from multiple IoRT, i.e., data in diferent modalities.
• Data integration and querying layer: abstracts heterogeneous data sources and enables unified
spatio-temporal query processing by translating high-level queries into executable operations on
underlying processing engines [11], which are IoRT devices and a repository of historical static
data.
• GIS4IoRT middleware: provides orchestration, data routing, and caching across the infrastructure,
supporting spatio-temporal queries, QoS/QoD-aware execution, and transformation of results
into GIS-compatible representations.
• GIS applications: issue queries, visualise results, and combine robotic data with external spatial
datasets such as plot boundaries stored in GIS repositories.</p>
        <p>
          Beyond data quality and real-time performance constraints, this architecture poses two significant
challenges. First, the integration environment is highly dynamic. This dynamicity stems from: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
the on-the-fly deployment of new IoRT devices and (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) intermittent connectivity, where limited or
unavailable Wi-Fi causes mobile devices to temporarily disappear from the network. To manage this,
the architecture must support automatic service discovery and registration, allowing devices to be
incorporated immediately upon deployment or re-connection.
        </p>
        <p>
          Second, the query mechanism must account for the intermittent availability of robotic data sources.
For instance, while  devices may be present when a query is formulated, a few of them may disconnect
before or when the query is executed. For this reason, the system must be capable of dynamic query
routing to the remaining active devices. Furthermore, query results must be augmented with metadata
describing their quality, specifically: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) completeness metrics, indicating the percentage of missing
data due to device unavailability, and (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) data fidelity indicators, reflecting quality degradation caused
by network throughput limitations.
        </p>
        <p>GIS data
repository
external
geographical data</p>
        <p>GIS applications</p>
        <p>data in the GIS standard
query
query
query
GIS4IoRT middleware
interface to GIS4IoRT
data
data</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Architectural instantiation</title>
        <p>The design, development, and experimental work presented in this paper instantiates the GIS4IoRT
architecture using real agricultural robot data and three alternative implementations of the data integration
and querying layer.</p>
        <p>At the data source level, rosbag recordings are used to replay robot trajectories collected in real field
deployments. These trajectories represent the dynamic data streams processed by the system. Static
spatial data describing agricultural plots are provided in GIS-compatible formats (PostGIS geometry
exported to CSV), serving as reference data for geofencing queries.</p>
        <p>
          At the integration and processing level, the data integration and querying layer is instantiated using
three alternative frameworks that difer in their execution model and system scope. These include: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
a SQL-oriented streaming solution built on a messaging backbone, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) a general-purpose distributed
stream processing framework with native support for spatial indexing, and (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) an IoRT-focused stream
processing system supporting distributed edge–cloud execution. Although all three frameworks realise
semantically the same queries, they difer substantially in how data ingestion, state management, spatial
evaluation, and windowing are handled.
        </p>
        <p>The middleware functionality is implemented in a lightweight manner using messaging systems
(Kafka or MQTT) and REST/WebSocket-based APIs [13]. These components handle configuration
updates (e.g., plot definitions), dissemination of query results, and interaction with external clients or
visualization tools. At the GIS application level, query results are produced in standard, GIS-friendly
representations (e.g., JSON with spatial attributes), enabling straightforward integration with GIS tools
and dashboards.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Implementation frameworks</title>
      <p>This section presents three alternative implementations of the GIS4IoRT processing layer, each
realizing the same functional requirements but using diferent tools, programming models, and execution
paradigms [11]. All implementations process real robot data collected in agricultural fields and support
all of the core queries introduced in Section 1, namely: a continuous geofencing query (denoted as
Query 1a), a historical geofencing query (Query 1b), a collision detection query (Query 2), and an IoT
sensor proximity query (Query 3).</p>
      <p>The goal of this section is not to advocate a single “best” solution, but to demonstrate how diferent
technological choices afect system design, flexibility, and operational complexity within the same
architectural backbone.</p>
      <sec id="sec-3-1">
        <title>3.1. Declarative stream processing with ksqlDB and Apache Kafka</title>
        <p>The first approach adopts a declarative stream processing model, using Apache Kafka [ 16] as the
messaging backbone and ksqlDB as the processing engine [11]. This design emphasizes simplicity, rapid
development, and tight integration with event-driven architectures.</p>
        <p>Figure 2 shows the data flow in which robot telemetry data is produced by a ROS 2–Kafka bridge
that replays rosbag recordings and publishes odometry messages as JSON events into Kafka topics.
Messages are keyed by robot identifier to enable partition-based parallelism [ 13]. Plot definitions and
robot–plot assignments are managed through a control stream, allowing configuration updates to be
applied dynamically without restarting the system.</p>
        <p>Query 1a is implemented directly in ksqlDB using SQL-like continuous queries. Telemetry streams
are joined with the configuration table and evaluated in 1-second tumbling windows. Since ksqlDB
does not natively support spatial data types or operators, the point-in-polygon test is implemented as
a custom User-Defined Function (UDF) based on a standard geometry library. If a robot is detected
outside its assigned plot at least once within a window, an alert event is emitted to an output Kafka
topic.</p>
        <p>Query 1b is realized outside the streaming engine using batch processing. A Python-based tool parses
the recorded ROS bag files, extracts odometry data, and applies the same geometric logic to identify all
historical violations. While this results in a split streaming/batch architecture, it significantly simplifies
implementation by reusing existing geospatial libraries.</p>
        <p>Query 2 is implemented as an external Python program running alongside ksqlDB. This hybrid
approach was necessitated by the limitations of the standard Kafka partitioning, where sharding by
the robot identifier prevents the spatial co-locality required for collision detection. By comparing each
new position with every other robot, the logic runs efectively with ( 2) complexity. The primary
bottleneck of this implementation is its centralized state architecture. While it ensures accurate and
fast collision detection for small fleets, it scales poorly for larger ones.</p>
        <p>Query 3 is performed within ksqlDB utilizing the Broadcast Join strategy. Unlike the high-frequency
robot collision scenario, the lower frequency of sensor updates permits replicating sensor messages
across all partitions. This enables a local stream-stream join to pair robots with sensors. However, this
logic is based on the regularity of sensor updates. Irregular intervals create temporal blind spots where
telemetry may pass unchecked. Although the current approach prioritizes responsiveness, eliminating
these gaps would require windowed aggregation, inherently introducing processing lag.</p>
        <p>This approach fits well with the GIS4IoRT data integration and querying layer, ofering moderate
operational overhead along with high developer productivity. Its main limitations are the lack of native
spatial abstractions and limited control over execution internals.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Distributed spatial stream processing with Apache Flink and GeoFlink</title>
        <p>The second approach relies on Apache Flink [17] combined with the GeoFlink library [18], representing
a low-level but highly scalable implementation of the GIS4IoRT processing layer [11]. Unlike the
declarative ksqlDB approach, this solution uses Flink DataStream API and explicit state management.</p>
        <p>As shown in Fig. 3, robot telemetry and query configuration are ingested into the stream processing
engine from Kafka topics. Configuration info is distributed to all processing nodes using a broadcast
state pattern, ensuring local access to query rules. GeoFlink provides spatial data structures and indexing
mechanisms, most notably a uniform grid index used to spatially partition incoming data [18]. This
allows robot locations and complex plot definitions to be routed only to the relevant grid cells.</p>
        <p>Query 1a is implemented as a stateful streaming operator. Incoming telemetry events are routed
to grid cells, where local point-in-polygon checks are performed against the cached plot geometry.
Results are aggregated in 1-second windows and emitted as alert events. The explicit spatial partitioning
enables eficient scaling with increasing numbers of robots and higher data rates [11].</p>
        <p>Query 1b can be implemented by replaying historical data through the same Flink job or by running
a dedicated batch-style Flink pipeline, preserving a consistent processing model.</p>
        <p>Query 2 is implemented without time windows to ensure immediate reaction. Instead, it uses a
per-event processing model, where each incoming telemetry message is instantly compared against the
cached positions of nearby robots routed to the same spatial partition. Cached positions are retained
only for a limited duration (TTL) to prevent detection against obsolete data.</p>
        <p>Query 3 combines real-time robot telemetry with environmental sensor readings. The status of each
sensor, merging configuration data (e.g., thresholds) with the latest measurements, is maintained in the
worker’s local memory to track external conditions. This enables the system to trigger alerts based on
the robot’s proximity to a sensor and the simultaneous violation of specific safety limits.</p>
        <p>This approach ofers the greatest flexibility and performance potential, closely aligning with the
scalability goals of GIS4IoRT. At the same time, it requires deep expertise in distributed stream processing
and, in some cases, the implementation of custom spatial operators beyond those provided by GeoFlink.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Edge–cloud IoRT stream processing with NebulaStream</title>
        <p>The third approach utilises NebulaStream [19], an end-to-end stream processing system designed
specifically for IoT and IoRT environments spanning sensors, edge devices, and cloud infrastructure [ 6].
This framework is architecturally closest to the original GIS4IoRT vision of distributed, heterogeneous
execution (Fig. 4).</p>
        <p>In this setup, a NebulaStream Coordinator instance is initialised. The Coordinator is responsible
for receiving, compiling, optimising, and dispatching partial query plans to NebulaStream Workers.
Multiple workers can be deployed across various machines, including edge devices. A worker is created
for each robot or group of sensors, corresponding with a ROS 2-MQTT bridge to capture ROS 2 telemetry
data. All queries are written in Java, contain Java UDFs [20], and are sent to the Coordinator via the
REST API [6].</p>
        <p>Query 1a is implemented as a continuous NebulaStream query that calculates each new robot’s
position and evaluates geofence violations. Plot geometries are provided in binary spatial formats. The
query results are published to an MQTT topic and consumed by downstream services or visualisation
components.</p>
        <p>Since NebulaStream focuses primarily on streaming execution, Query 1b is emulated by replaying
historical data through the same pipeline. While this preserves a unified execution model, it increases
operational complexity.</p>
        <p>Query 2 necessitates the real-time, simultaneous comparison of robot positions for every pair. Since
data processing is strictly stream-based, there is no operator available to access the most recent historical
data. This necessitates the use of a join operator that functions exclusively on tumbling windows.
Consequently, the system must calculate average robot positions and subsequent distances within a
1-second window. This introduces additional processing latency due to the time required to aggregate
all messages within that window.</p>
        <p>Query 3 encounters a similar issue; to facilitate comparison, the maximum sensor value within a
given window must be determined.</p>
        <p>This approach demonstrates how GIS4IoRT-style queries can be deployed across edge–cloud
infrastructures. However, it requires significant configuration efort, extensive use of UDFs, and struggles
with the instantaneous comparison of distinct data streams. Furthermore, it currently lacks support for
dynamically attaching new data sources to running queries.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Comparison of implementation approaches</title>
        <p>Table 1 summarises and contrasts the three implementation approaches discussed above with respect
to their programming model, execution scope, spatial support, and operational characteristics within
the GIS4IoRT architecture. By presenting their key strengths and limitations side by side, the table
highlights the trade-ofs that arise from diferent technological choices when realising the same
spatiotemporal queries. These diferences motivate the experimental evaluation presented in the next section,
which assesses how the approaches perform in practice on real agricultural robot data.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and results</title>
      <p>This section presents an experimental evaluation of the three alternative GIS4IoRT processing pipelines
presented in Section 3. Using real robot trajectories and GIS datasets, we analyse how diferent
implementation choices afect query execution, latency, and system behaviour.</p>
      <sec id="sec-4-1">
        <title>4.1. Experimental setup</title>
        <p>Despite the distributed nature of the proposed architecture, the comparative tests were conducted on a
single physical machine to ensure a controlled environment and eliminate network variability. The
worker processes and processing engines were simulated within a Docker Compose environment. The
hardware platform consisted of an Intel Core i5-14600K CPU, 64 GB of DDR5 RAM, and a Docker
Engine environment running on WSL2.</p>
        <p>The evaluation was performed using recorded robot trajectories replayed from ROS bag files, rather
than live telemetry streams. This approach allowed for deterministic reproducibility of the experiments.</p>
        <p>We focused on two key metrics, outlined below.</p>
        <p>• Latency: defined as the diference between the data ingestion timestamp and the time the alert
was received by the data collector. To ensure statistical reliability, extreme outliers were excluded
from the final calculations.
• Correctness: verified by cross-referencing the system’s output against a ground truth dataset
derived directly from the raw telemetry logs. For queries involving time windows, the raw logs
were pre-aggregated to match the windowing logic, enabling precise verification of the results.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Baseline performance: geofencing</title>
        <p>The geofencing query (Query 1a) was selected as the primary baseline for quantitative performance
comparison. The test measured the system’s ability to detect boundary violations on a real-world
trajectory, as illustrated in Fig. 5. In this scenario, the standard 1-second tumbling window requirement
was intentionally relaxed to event-at-a-time processing. This modification allowed us to isolate the
inherent processing overhead of each engine without the masking efect of window aggregation delays.</p>
        <p>Table 2 presents comparative latency results for the geofencing scenario. All three engines correctly
detected 100% of the boundary crossing events with single-telemetry precision. The measurements,
however, reveal substantial diferences in processing overhead and execution behaviour.</p>
        <p>NebulaStream achieved the lowest latency across all metrics, reflecting the benefits of its compiled
C++ execution model and streamlined query pipelines. Apache Flink exhibited higher but stable latency
values, which are consistent with JVM-based execution and explicit state management. In contrast,
ksqlDB incurred significantly higher latency and variance, primarily due to the additional abstraction
layers introduced by its declarative processing model.</p>
        <p>These results illustrate the trade-of between execution eficiency and abstraction level: lower-level
stream processing frameworks ofer minimal latency at the cost of increased development efort, whereas
higher-level declarative systems favour simplicity and rapid development but introduce measurable
performance overhead.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Functional evaluation: collision and proximity</title>
        <p>For the collision detection (Query 2) and IoT sensor proximity (Query 3) scenarios, a direct latency
comparison was not suitable due to significant diferences in implementation.</p>
        <p>Specifically, NebulaStream required window-based aggregations to handle complex joins, whereas
ksqlDB SQL limitations necessitated ofloading logic to external Python consumers. Comparing the
latency of a windowed operation (which naturally waits for window closure) against continuous
stream processing would be misleading. Therefore, these scenarios were evaluated based on functional
correctness, as discussed below.</p>
        <p>Collision detection: the system analyzed robot trajectories within the field and correctly identified
all spatial violation events, fully matching the ground truth derived from raw GPS logs.</p>
        <p>IoT sensor proximity: the engines successfully identified the intervals during which robots entered
the sensor range (Fig. 6). However, minor discrepancies were observed at the level of individual telemetry
packets. These artifacts resulted from the asynchronous nature of the data streams, where a sensor
state update was occasionally processed before a slightly delayed telemetry packet. Such temporal
synchronization ofsets are characteristic of distributed real-time systems and did not impact the overall
reliability of the detected events.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this paper we presented preliminary results comparing three alternative technologies for building an
integration architecture for the IoRT. We compared the technologies with respect to their functionality,
data latency, and correctness of query results. To this end we used real data from robotic devices and
three types of the most common queries in robotic GIS. To the best of our knowledge, this is the only
work that compares three alternative system designs based on Kafka/ksqlDB, Flink/GeoFlink, and
NebulaStream, for the same business scenario. The work has been realized within the EU CHIST-ERA
project.</p>
      <p>
        Since the project is still in its early stage, future works will focus on: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) evaluating scalability of
each architectural design, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) developing techniques for dynamically including robotic devices into the
integration architecture, (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) developing mechanisms for querying intermittent robotic data sources, (
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
extending the query mechanism to include parameters like the quality of service and the quality of data.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research is supported from the National Science Centre, Poland, grant no. 2024/06/Y/ST6/00136,
funded from the CHIST-ERA call 2023 EU project Development of a Plug-and-Play Middleware for
Integrating Robot Sensor Data with GIS Tools in a Cloud Environment. The authors would like to thank
Dr Sandro Bimonte (INRAE) for sharing the field experiment data.</p>
      <p>Declaration on Generative AI. The authors have not employed any Generative AI tools.
[5] S. Bimonte, G. Bellocchi, F. Pinet, G. Chalhoub, M. A. Sakr, P. Skrzypczynski, Data engineering for
sustainable agriculture: developments, challenges, and case studies of a novel IoRT architecture,
Journal of Big Data 12 (2025) 195.
[6] N. V. B. Yogeswaranathan Kalyani, R. Collier, Digital twin deployment for smart agriculture in
cloud-fog-edge infrastructure, Int. Journal of Parallel, Emergent and Distributed Systems 38 (2023).
[7] M. Villari, A. Celesti, M. Fazio, A. Puliafito, AllJoyn Lambda: An architecture for the management
of smart environments in IoT, in: International Conference on Smart Computing Workshops
(SMARTCOMP Workshops), 2014.
[8] G. S. Thakur, B. L. Bhaduri, J. O. Piburn, K. M. Sims, R. N. Stewart, M. L. Urban, PlanetSense: a
real-time streaming and spatio-temporal analytics platform for gathering geo-spatial intelligence
from open source data, in: SIGSPATIAL Int. Conf. on Advances in Geographic Information Systems,
2015.
[9] S. Kamburugamuve, L. Christiansen, G. Fox, A framework for real time processing of sensor data
in the cloud, Journal of Sensors 2015 (2015).
[10] CHIST-ERA, Development of a plug-and-play middleware for integrating robot sensor data with
GIS tools in a cloud environment (GIS4IoRT). Chist-Era Project Call 2023, 2025. URL: https:
//www.geoscity.uliege.be/cms/c_13470217/en/gis4iort.
[11] S. A. Errami, H. Hajji, K. A. E. Kadi, H. Badir, Spatial big data architecture: From data warehouses
and data lakes to the lakehouse, Journal of Parallel and Distributed Computing 176 (2023).
[12] J. Kasprzyk, R. Billen, S. Bimonte, L. d’Orazio, D. Sacharidis, P. Skrzypczynski, R. Wrembel, On
integrating robotic data with GIS tools in a cloud environment, in: Workshops of the EDBT/ICDT
Joint Conference, volume 3946 of CEUR Workshop Proceedings, 2025.
[13] A. Prountzos, E. G. M. Petrakis, Defog: dynamic micro-service placement in hybrid cloud-fog-edge
infrastructures, Int. Journal of Web and Grid Services 20 (2024).
[14] P. Brezany, A. M. Tjoa, H. Wanek, A. Wöhrer, Mediators in the architecture of grid information
systems, in: Int. Conf. on Parallel Processing and Applied Mathematics (PPAM), volume 3019 of
LNCS, Springer, 2003.
[15] G. Wiederhold, Mediators in the architecture of future information systems, Computer 25 (1992).
[16] M. J. Sax, Apache Kafka, in: Encyclopedia of Big Data Technologies, Springer, Cham, 2018.
[17] A. Katsifodimos, S. Schelter, Apache Flink: Stream analytics at scale, in: IEEE International</p>
      <p>Conference on Cloud Engineering Workshop (IC2EW), 2016, pp. 193–193.
[18] S. A. Shaikh, K. Mariam, H. Kitagawa, K.-S. Kim, Geoflink: A distributed and scalable framework
for the real-time processing of spatial streams, in: ACM Int. Conf. on Information and Knowledge
Management (CIKM), 2020.
[19] M. M. G. Duarte, D. P. A. Nugroho, G. Tod, E. Bevernage, P. Moelans, E. Tas, E. Zimányi, M. Sakr,
S. Zeuch, V. Markl, Mobility stream processing on NebulaStream and MEOS, in: Companion of
the Int. Conf. on Management of Data (SIGMOD/PODS), 2025.
[20] Nebula Stream documentation, 2025. URL: https://web.archive.org/web/20250115180134/https:
//docs.nebula.stream/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Raja</surname>
          </string-name>
          ,
          <article-title>Software architecture for agricultural robots: Systems, requirements, challenges, case studies, and future perspectives</article-title>
          ,
          <source>IEEE Transactions on AgriFood Electronics</source>
          <volume>2</volume>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Dritsas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Trigka</surname>
          </string-name>
          ,
          <article-title>Remote sensing and geospatial analysis in the Big Data era: A survey</article-title>
          ,
          <source>Remote Sensing</source>
          <volume>17</volume>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gillet</surname>
          </string-name>
          , É. Leclercq,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cullot</surname>
          </string-name>
          ,
          <article-title>Lambda+, the renewal of the lambda architecture: Category theory to the rescue</article-title>
          ,
          <source>in: Int. Conf. on Advanced Information Systems Engineering (CAiSE)</source>
          , volume
          <volume>12751</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Macenski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Foote</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gerkey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lalancette</surname>
          </string-name>
          , W. Woodall,
          <article-title>Robot Operating System 2: Design, architecture, and uses in the wild</article-title>
          ,
          <source>Science Robotics</source>
          <volume>7</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>