<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MobilityDuck: Mobility Data Management with DuckDB</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nhu Ngoc Hoang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ngoc Hoa Pham</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viet Phuong Hoang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Esteban Zimányi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Université Libre de Bruxelles</institution>
          ,
          <addr-line>Brussels</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Eficient management and analysis of spatiotemporal data are critical for modern mobility applications, yet existing moving object database (MOD) systems oftentimes remain overly complex or heavyweight for practical analytics workflows. While MobilityDB, a notable MOD extension, can expand PostgreSQL with advanced spatiotemporal capabilities, its reliance on a traditional database architecture introduces significant overhead and complexity. In this paper, we present MobilityDuck, a DuckDB extension that integrates the MEOS library to provide support for spatiotemporal and other temporal data types in DuckDB. MobilityDuck leverages DuckDB's lightweight, columnar, in-memory executable properties to deliver eficient analytics. To the best of our knowledge, no existing in-memory or embedded analytical system ofers native spatiotemporal types and continuous trajectory operators as MobilityDuck does. We evaluate MobilityDuck using the BerlinMOD-Hanoi benchmark dataset and compare its performance to MobilityDB. Our results show that MobilityDuck preserves the expressiveness of spatiotemporal queries while benefiting from DuckDB's in-memory, columnar architecture.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Spatiotemporal</kwd>
        <kwd>Trajectories</kwd>
        <kwd>Mobility</kwd>
        <kwd>DuckDB</kwd>
        <kwd>BerlinMOD</kwd>
        <kwd>MEOS</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The rapid growth of spatiotemporal data has created new opportunities for mobility analytics, where
discovering patterns and trends in object trajectories plays a central role in applications such as urban
planning, intelligent transportation systems, and mobility-as-a-service platforms.</p>
      <p>Despite an extensive body of research in moving object databases (MODs), and the emergence
of systems like MobilityDB, mainstream adoption is still limited by architectural complexity, setup
overhead, and integration challenges in modern analytics pipelines. However, MobilityDB inherits
PostgreSQL’s complexity, which limits its eficiency for lightweight querying, embedded deployment,
and exploratory data science workflows where ease of use and speed of integration are paramount. At
the same time, DuckDB has rapidly emerged as a modern analytical database, designed to be lightweight,
embeddable, and highly optimized for in-memory, columnar query execution. Nevertheless, DuckDB
currently lacks first-class support for spatiotemporal data types and operators.</p>
      <p>This paper introduces MobilityDuck, the first DuckDB extension to support spatiotemporal and
temporal data types. By combining DuckDB’s in-memory, vectorized execution model with MEOS’s
mature spatiotemporal algebra, MobilityDuck brings the expressiveness of moving object databases
into a lightweight analytical engine. We also adapt the BerlinMOD benchmark to the Hanoi urban
environment, producing BerlinMOD-Hanoi, a reproducible dataset and query workload for diverse
mobility analytics. Our experimental evaluation shows that MobilityDuck maintains query expressiveness
while delivering significant performance improvements on most benchmark tasks. Beyond engineering
eforts, MobilityDuck addresses important research questions at the intersection of spatiotemporal data
management and high-performance in-memory analytics, demonstrating that mature and semantically
rich temporal data models can be incorporated into modern columnar, vectorized DBMSs with minimal
overhead. We further present integration patterns and experimental analysis, creating a blueprint for
research libraries to enter production-strength analytics platforms.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>
        Research on spatiotemporal data management has a long history in both the database and GIS
communities with early eforts extending spatial databases with temporal versioning or enriching temporal
databases with spatial attributes [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. A comprehensive review of these early models can be found in
the literature [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Among the many research prototypes, MobilityDB [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] has emerged as the most complete open-source
implementation of a moving object database [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It extends PostgreSQL and PostGIS with temporal types
and spatiotemporal operators, building on the MEOS (Mobility Engine Open Source) library. It supports
moving points (e.g., vehicle trajectories), temporal spans, and temporal aggregates. MobilityDB has
become a reference implementation for managing mobility data, but inherits PostgreSQL’s overhead in
query execution and storage management. However, its performance remains limited by PostgreSQL’s
general-purpose query engine and storage layer.
      </p>
      <p>
        Motivated by the need for faster analytical processing and simpler deployment, several systems have
explored in-memory and memory-eficient architectures for spatiotemporal data. S4STRD presents a
scalable in-memory storage system for real-time trajectory data, keeping recent updates in RAM and
using NoSQL backends for persistence [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. SharkDB [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is an in-memory, column-oriented trajectory
storage system that partitions trajectories into time-based frames, allowing eficient compression,
memory throughput, and parallel processing across cores. In a complementary direction, Richly et al.
propose optimized spatiotemporal data structures for in-memory columnar databases, adapting memory
layouts, compression, and tiering to trajectory workloads [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. These works illustrate the feasibility
and challenges of in-memory spatiotemporal storage, particularly for reducing I/O overhead, but they
emphasize storage and access optimizations rather than full query semantics.
      </p>
      <p>In parallel, DuckDB1 is an open-source relational database management system developed by Mark
Raasveldt and Hannes Mühleisen [9]. DuckDB is optimized for online analytical processing (OLAP)
workloads, making it a suitable system for handling complex querying on large datasets [10]. The key
features of DuckDB are as follows:
• Embeddability: Unlike traditional database systems with large servers running as stand-alone
processes, DuckDB is designed to be an embedded database system that runs completely within
another host process.
• Analytical: While other embedded systems (e.g., SQLite) focus more on transactional (OLTP)
workloads, DuckDB is geared towards eficiently executing analytical SQL queries.
• High performance: DuckDB employs a vectorized interpreted execution engine, which
optimizes CPU cache usage and allows batch processing of data.</p>
      <p>Recent work has extended DuckDB with domain-specific extensions, e.g., for geospatial analytics via
DuckDB Spatial Extension[11], or for machine learning via QuackML [12]. However, there is no native
support for spatiotemporal types and operators.</p>
      <p>Evaluating spatiotemporal DBMSs requires reproducible benchmarks. BerlinMOD [13] is the
standard benchmark for moving object databases. It defines a synthetic mobility model, a trip generation
based on an underlying road-network, and a set of queries measuring performance on indexing, joins,
and aggregates. In this work, to adapt BerlinMOD to diferent geographic contexts, we introduced
BerlinMOD-Hanoi (see Section 5), which applies the BerlinMOD benchmark using the Hanoi road
network from OpenStreetMap data as base map. Rather than proposing new benchmark queries,
BerlinMOD-Hanoi serves as a case-study dataset that enables empirical evaluation of spatiotemporal
analytics in a non-European urban context.</p>
      <p>Taken together, existing work provides mature spatiotemporal data models and eficient storage
techniques, but there is limited empirical understanding of how such functionality can be embedded into
an in-memory, vectorized analytical DBMS. This paper positions MobilityDuck as a systems case study
that explores the engineering challenges, design tradeofs, and performance implications of integrating
MEOS-based spatiotemporal analytics into DuckDB.</p>
    </sec>
    <sec id="sec-3">
      <title>3. MobilityDuck: Architecture and Implementation</title>
      <sec id="sec-3-1">
        <title>3.1. Design Goals</title>
        <p>Our primary goal with MobilityDuck is to enable spatiotemporal analytics within DuckDB by reusing
the mature functionality of the MEOS library. The design is guided by the following principles:
• Lightweight integration: MobilityDuck is implemented as a DuckDB extension, preserving</p>
        <p>DuckDB’s embedded deployment model.
• Reuse of MEOS: Instead of reimplementing temporal types and operators, we wrap MEOS
natively in C++, ensuring correctness and consistency with MobilityDB.
• DuckDB compatibility: All types and functions are exposed as DuckDB user-defined types
(UDTs) and functions, allowing seamless integration with DuckDB’s SQL engine, storage manager,
and vectorized execution model.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. System Architecture</title>
        <p>MobilityDuck follows a simple and modular architecture that connects DuckDB with the MEOS library
through a thin C++ extension layer. At query time, DuckDB executes SQL statements as usual, while
the extension intercepts calls to spatiotemporal functions and forwards them to MEOS.</p>
        <p>Conceptually, the system has three main layers:
• DuckDB core: provides the SQL parser, planner, storage engine, and vectorized execution
framework. MobilityDuck registers its custom types and functions within this engine at load
time.
• MobilityDuck extension layer: acts as the bridge between DuckDB and MEOS. It defines
DuckDB user-defined types and functions (e.g., tint, tfloat, span) based on their
corresponding MEOS structures.
• MEOS library: provides the underlying temporal and spatial operators and data structures used
by MobilityDB.</p>
        <p>This design ensures minimal overhead while maintaining full compatibility with existing DuckDB
operations.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Data Types, Functions and Operators</title>
        <p>MobilityDuck adopts the same logical type system as MobilityDB in order to preserve the semantics of
temporal and spatiotemporal data. Internally, all MEOS types are represented using the native DuckDB
type BLOB, with explicit type aliases used to expose them as first-class spatiotemporal types at the SQL
level.</p>
        <p>In addition, MobilityDuck exposes functionality through three categories of functions, including Cast
functions, Scalar functions and Operators. Cast functions implement explicit conversions between
logical types in DuckDB. Scalar functions implement MobilityDuck operations following DuckDB’s
scalar function interface, taking one or more input values and producing a result value that can be
used in SQL expressions. Finally, operators are exposed using DuckDB’s function mechanism, allowing
familiar spatiotemporal predicates (e.g., overlap tests) to be used directly in SQL expressions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Indexing System</title>
      <p>
        To accelerate spatiotemporal range queries, MobilityDuck implements an R-tree index over
spatiotemporal bounding boxes (stbox). R-trees are specifically designed for multidimensional data and provide
eficient spatial access methods for indexing geographic and spatiotemporal information by organizing
data using topological containment relations, making them ideal for spatial queries [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In contrast,
MobilityDB (built on PostgreSQL) supports both R-tree (via GiST) and quad-tree (via SP-GiST) indexes
for spatiotemporal data, ofering alternative indexing strategies depending on the data and query types
(both types of indexes are used for benchmarking MobilityDB in Section 6).
      </p>
      <p>The indexing system integrates with DuckDB’s query optimizer to enable eficient spatial query
processing. Index scans are registered as specialized operators on stbox data. The indexing system
seamlessly integrates with MEOS spatial functions, ensuring that:
• Spatiotemporal bounding boxes are correctly extracted from temporal geometries,
• R-tree insertion and search operations use MEOS spatial predicates,
• Index maintenance operations preserve spatial integrity, and
• Query results are consistent with MobilityDB semantics.</p>
      <sec id="sec-4-1">
        <title>4.1. Index Construction</title>
        <p>Our implementation supports two distinct scenarios for index construction, each optimized for diferent
use cases in database operations.</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Incremental Construction: Index-First Approach</title>
          <p>In the first scenario, an index already exists on a table, and new data is being inserted. When new
data is inserted into a table that already has an RTree index, the Append method handles incremental
updates. This method evaluates index expressions on the new data and constructs index entries using
the MEOS RTree insertion functionality. The Construct method processes data chunks and inserts
them into the RTree structure. Once the stbox is prepared, the method applies the MEOS library’s
insert function rtree_insert to handle the actual insertion into the RTree data structure.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Bulk Construction: Data-First Approach</title>
          <p>The second scenario occurs when creating an index on a table that already contains data, typically
through a CREATE INDEX statement. This situation requires a diferent strategy optimized for processing
large volumes of existing data. Our implementation follows a three-phase pipeline that leverages parallel
processing for eficiency:
• Phase 1: Data Collection As DuckDB’s execution framework scans the table in parallel across
multiple threads, each thread processes its assigned data partition through the Sink() method.
This method receives chunks of data containing stbox values and row identifiers, appending
them to thread-local storage.
• Phase 2: Data Combination The Combine() method consolidates thread-local collections into
a single global dataset through thread-safe merging operations. This consolidation is protected
by a mutex to ensure data consistency.
• Phase 3: Index Construction In this phase, the system constructs the actual index entries from
the collected data. For each chunk, the task deserializes stbox data, performs SRID normalization,
and collects valid entries into arrays. It then calls the BulkConstruct method with these arrays,
which inserts entries through rtree_insert.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Query Optimization and Index Scan Injection</title>
        <p>DuckDB’s query optimizer automatically replaces sequential scans with index scans when applicable
predicates are detected. To enable this optimization, the index registers a scan matcher for operators
between two stbox operands. When a spatial filter predicate matches the indexable pattern, the
optimizer substitutes the original table scan with an index scan operator. MobilityDuck currently
supports pattern matching for the spatial overlap operator (&amp;&amp;) between two stbox operands. During
query optimization, when the optimizer encounters a filter expression containing this operator, the
index attempts to bind the operands. If one operand is a constant stbox value, the index can perform
an eficient bounding-box search using the R-tree structure.</p>
        <p>During index scan execution, the scan normalizes the query’s spatial reference system (SRID) to
ensure geometric consistency. Then, the normalized bounding box queries the R-tree structure using
the underlying MEOS R-tree implementation, which returns identifiers of all entries whose bounding
boxes overlap with the query region. Finally, the scan operator iterates through these candidate row
IDs, retrieving and returning qualifying tuples to the query pipeline.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. BerlinMOD-Hanoi</title>
      <p>BerlinMOD is the de facto benchmark for evaluating spatiotemporal database systems. However, its
MobilityDB implementation tailors the road network and mobility model to Brussels, Belgium. To
extend its applicability, we created BerlinMOD-Hanoi, an adaptation of the benchmark to the urban
setting of Hanoi, Vietnam. This allows us to evaluate MobilityDuck on realistic mobility data from a
nonEuropean city with diferent trafic patterns, densities, and cultural mobility habits. BerlinMOD-Hanoi
datasets, SQL scripts, and visualization functions are publicly available.2</p>
      <sec id="sec-5-1">
        <title>5.1. Dataset Preparation</title>
        <p>We followed the BerlinMOD methodology but replaced Brussels’s road network with the one from
OpenStreetMap (OSM) for Hanoi:
• Extracted the Hanoi road network using osm2pgsql and osm2pgrouting, configured with</p>
        <p>BerlinMOD’s mapconfig.xml to select road types.
• Constructed a routable network topology with pgRouting.
• Applied the BerlinMOD trip generation logic, adjusted with population statistics of Hanoi’s
administrative regions using a customized SQL script hanoi_preparedata.sql.</p>
        <p>The generated trips simulate commuting activities, sampled according to home–work distributions
derived from administrative region statistics. Each trip is represented as a temporal sequence of positions
(tgeompoint) with associated time instants, fully compatible with MEOS/MobilityDB types.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Dataset Characteristics</title>
        <p>BerlinMOD-Hanoi produces scalable datasets through the scale factor (SF) parameter. We also provide
GeoJSON exports of trips and administrative regions, enabling visualization in Kepler.gl.3 Figure 1
shows an animation of the synthetic trips generated by BerlinMOD-Hanoi, while Figure 2 shows the
administrative boundaries used to sample realistic home and work locations.
2https://github.com/MobilityDB/MobilityDB-BerlinMOD-Hanoi
3https://kepler.gl/</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Experimental Evaluation</title>
      <p>This section demonstrates the utilization of MobilityDuck in trajectory manipulation using the
BerlinMOD-Hanoi dataset. Initial data exploration is conducted by integrating MobilityDuck with
DuckDB’s Python API and traditional Python libraries for visualization. Additionally, MobilityDuck
(DuckDB) is evaluated against MobilityDB (PostgreSQL) using 17 range queries provided by the
BerlinMOD benchmark.</p>
      <p>All experiments in this section were conducted on an Oracle virtual machine running Ubuntu 20.04,
4 CPUs, 24GB RAM, 20GB swap memory. The BerlinMOD-Hanoi datasets used were generated at
4 diferent scale factors (see Section 6.2.1). MobilityDuck was built with DuckDB version 1.3.2. The
exploratory steps using DuckDB’s Python API were run with Python 3.9, DuckDB Python client 1.3.2.
MobilityDB benchmarking was conducted on PostgreSQL 15.13.</p>
      <sec id="sec-6-1">
        <title>6.1. Use Case Demonstration</title>
        <p>This section presents a preliminary demonstration to showcase MobilityDuck’s capacity in manipulating
spatiotemporal data as well as its potential in integrating with external APIs and libraries.</p>
        <p>The demonstration utilizes the existing BerlinMOD-Hanoi dataset containing trips, where each row
shows the coordinates (longitude and latitude) of a specific vehicle made during a given trip at a specific
timestamp. The coordinates and timestamp of a row are used to create a tgeompoint value, which is
ideal for representing a temporal geometry (which, in this case, is a POINT). Then, the tgeompoint
values are aggregated by vehicle IDs and trip IDs to create tgeompointSeq values, which are temporal
geometries with the additional sequence subtype to represent the evolution of the geometries over a
sequence of time instants. Finally, to facilitate visualization in Python, the tgeompointSeq values are
turned into trajectories in GEOMETRY type using the trajectory() function.</p>
        <p>The GEOMETRY data type, which is prevalent in other spatial database systems such as PostgreSQL
(extended with PostGIS), is supported in DuckDB by the Spatial extension.4 As such, handling the
GEOMETRY type is beyond the scope of MobilityDuck. The latest version of MobilityDuck includes a
preliminary interface with Spatial’s GEOMETRY and WKB_BLOB types to ensure the usability of MEOS
functions originally involving geometries. When integrating with Python, the geometries can be loaded
using the Shapely library5 to return a GeoPandas6 dataframe for further processing and visualizing.</p>
        <p>Having loaded the data and conducted the aforementioned data preparation steps, a number of
operations are run and their results are captured and visualized:
1. Show the trajectories of all trips (Figure 3)
2. Show the trip(s) that cross the highest number of districts (Figure 4)
3. Show the trips that cross Hai Ba Trung district (Figure 5)
4. Show the total distance traveled per district (Figure 6)
5. Show 6 districts with the highest number of trips crossing them, and show parts of the trips that
cross the districts (Figure 7)</p>
        <p>The SQL queries and Python script for recording and visualizing results are available as a Jupyter
Notebook in the example section of MobilityDuck repository.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. BerlinMOD-Hanoi Benchmarking</title>
        <sec id="sec-6-2-1">
          <title>6.2.1. Introduction</title>
          <p>Performance of MobilityDuck (in DuckDB) and MobilityDB (in PostgreSQL) are compared using 17
range-style queries on the BerlinMOD-Hanoi dataset of 4 scale factors: SF-0.05, SF-0.1, SF-0.15, and
SF-0.2. Table 1 summarizes the scale factors.
4https://duckdb.org/docs/stable/core_extensions/spatial/overview.html
5https://shapely.readthedocs.io/en/stable/
6https://geopandas.org/en/stable/</p>
          <p>Scale factor # vehicles # trips # raw GPS points</p>
          <p>SF-0.05
SF-0.1
SF-0.15
SF-0.2
447
632
775
894
9,491
18,910
26,919
35,319
35,670,635
72,888,909
101,557,323
131,250,325</p>
          <p>While MobilityDuck includes an initial R-tree index implementation, at the time of our experiments,
the index was in its early stages of development and did not support all spatiotemporal data types
required by the benchmark workload. As a result, we ran all MobilityDuck benchmarks without index
supports. In contrast, MobilityDB was evaluated both with R-tree (GiST) and quad-tree (SP-GiST)
indexes enabled on relevant columns. The loading phase is excluded from the evaluation, and only the
elapsed times of running the queries are used for the subsequent comparisons. We introduce next a
selected number of queries. All queries are available in the benchmark section of the accompanying
GitHub repository.</p>
          <p>Query 5: What is the minimum distance between places, where a vehicle with a license from Licenses1
and a vehicle with a license from Licenses2 have been?
WITH Temp1 ( L i c e n s e 1 , T r a j s ) AS (</p>
          <p>SELECT l 1 . L i c e n s e , S T _ C o l l e c t ( l i s t ( t r a j e c t o r y ( t 1 . T r i p ) : : GEOMETRY ) )
FROM T r i p s t 1 , L i c e n s e s 1 l 1
WHERE t 1 . V e h i c l e I d = l 1 . V e h i c l e I d</p>
          <p>GROUP BY l 1 . L i c e n s e ) ,
Temp2 ( L i c e n s e 2 , T r a j s ) AS (</p>
          <p>SELECT l 2 . L i c e n s e , S T _ C o l l e c t ( l i s t ( t r a j e c t o r y ( t 2 . T r i p ) : : GEOMETRY ) )
FROM T r i p s t 2 , L i c e n s e s 2 l 2
WHERE t 2 . V e h i c l e I d = l 2 . V e h i c l e I d</p>
          <p>GROUP BY l 2 . L i c e n s e )
SELECT L i c e n s e 1 , L i c e n s e 2 , S T _ D i s t a n c e ( t 1 . T r a j s , t 2 . T r a j s ) AS</p>
          <p>M i n D i s t
FROM Temp1 t 1 , Temp2 t 2
ORDER BY L i c e n s e 1 , L i c e n s e 2 ;</p>
          <p>This query processes high volumes of trajectory values, which involves casting between WKB_BLOB
and GEOMETRY type, heavily increasing the runtimes. To optimize such bulky operations, we
implemented MobilityDuck-native equivalents of DuckDB’s spatial functions that take GEOMETRY as input,
such as ST_Collect() and ST_Distance(). The modified version of the query is shown below:
WITH Temp1 ( L i c e n s e 1 , T r a j s ) AS (</p>
          <p>SELECT l 1 . L i c e n s e ,</p>
          <p>c o l l e c t _ g s ( l i s t ( t r a j e c t o r y _ g s ( t 1 . T r i p ) ) )
FROM T r i p s t 1 , L i c e n s e s 1 l 1
WHERE t 1 . V e h i c l e I d = l 1 . V e h i c l e I d</p>
          <p>GROUP BY l 1 . L i c e n s e ) ,
Temp2 ( L i c e n s e 2 , T r a j s ) AS (</p>
          <p>SELECT l 2 . L i c e n s e ,</p>
          <p>c o l l e c t _ g s ( l i s t ( t r a j e c t o r y _ g s ( t 2 . T r i p ) ) )
FROM T r i p s t 2 , L i c e n s e s 2 l 2
WHERE t 2 . V e h i c l e I d = l 2 . V e h i c l e I d</p>
          <p>GROUP BY l 2 . L i c e n s e )
SELECT L i c e n s e 1 , L i c e n s e 2 ,</p>
          <p>d i s t a n c e _ g s ( t 1 . T r a j s , t 2 . T r a j s ) AS M i n D i s t
FROM Temp1 t 1 , Temp2 t 2
ORDER BY L i c e n s e 1 , L i c e n s e 2 ;</p>
          <p>This version uses trajectory_gs(), a version of trajectory() that returns a GSERIALIZED
object as a BLOB in DuckDB instead of the well-known binary WKB_BLOB format. collect_gs(), a
modified version of ST_Collect(), then takes an array of these geometries and aggregates them into
a collection. Finally, distance_gs() takes two geometries, still in GSERIALIZED format, and returns
the distance between them. This optimized query overcomes the drawback of the current interface with
Spatial’s GEOMETRY data type by utilizing MEOS’ PostGIS-based functions.</p>
          <p>Query 7: What are the license plate numbers of the passenger cars that have reached the points from
Points first of all passenger cars during the complete observation period?
WITH Timestamps AS (</p>
          <p>SELECT DISTINCT v . L i c e n s e , p . P o i n t I d , p . Geom ,</p>
          <p>MIN( s t a r t T i m e s t a m p ( a t V a l u e s ( t . T r i p ,</p>
          <p>p . Geom : : WKB_BLOB ) ) ) AS I n s t a n t
FROM T r i p s t , V e h i c l e s v , P o i n t s 1 p
WHERE t . V e h i c l e I d = v . V e h i c l e I d AND
v . V e h i c l e T y p e = ’ p a s s e n g e r ’ AND
t . T r i p &amp;&amp; s t b o x ( p . Geom : : WKB_BLOB ) AND</p>
          <p>S T _ I n t e r s e c t s ( t r a j e c t o r y ( t . T r i p ) : : GEOMETRY , p . Geom )</p>
          <p>GROUP BY v . L i c e n s e , p . P o i n t I d , p . Geom )
SELECT t 1 . L i c e n s e , t 1 . P o i n t I d , t 1 . Geom , t 1 . I n s t a n t
FROM Timestamps t 1
WHERE t 1 . I n s t a n t &lt;= ALL (</p>
          <p>SELECT t 2 . I n s t a n t
FROM Timestamps t 2</p>
          <p>WHERE t 1 . P o i n t I d = t 2 . P o i n t I d )
ORDER BY t 1 . P o i n t I d , t 1 . L i c e n s e ;</p>
          <p>This query first creates the common table expression (CTE) Timestamps containing the vehicle
information and the timestamps at which a passenger car reaches the points. In the SELECT statement,
the query utilizes the startTimestamp() and atValues() functions. The atValues() function
takes in a temporal value (in this case, temporal point geometry tgeompoint from Trips) and a
base value (in this case, point geometry from Points1) to return the temporal value restricted to the
second argument. Essentially, this function takes the full trip and returns only the temporal geometry
values at the points from Points1. This value is then passed to startTimestamp() which, as the
name suggests, returns the start timestamp of the temporal value, equivalent to the earliest timestamp
at which a trip reaches any point from Points1. The third join condition utilizes the overlaps (&amp;&amp;)
predicate to filter trips that overlap with the point geometry by first creating a spatiotemporal bounding
box (stbox) around the point.</p>
          <p>Query 10: When and where did the vehicles with license plate numbers from Licenses1 meet other
vehicles (distance &lt; 3 meters) and what are the latter licenses?
WITH Temp AS (</p>
          <p>SELECT l 1 . L i c e n s e AS L i c e n s e 1 , t 2 . V e h i c l e I d AS C a r 2 I d ,</p>
          <p>whenTrue ( t D w i t h i n ( t 1 . T r i p , t 2 . T r i p , 3 . 0 ) ) AS P e r i o d s
FROM T r i p s t 1 , L i c e n s e s 1 l 1 , T r i p s t2 , V e h i c l e s v
WHERE t 1 . V e h i c l e I d = l 1 . V e h i c l e I d AND
t 2 . V e h i c l e I d = v . V e h i c l e I d AND
t 1 . V e h i c l e I d &lt;&gt; t 2 . V e h i c l e I d AND
t 2 . T r i p &amp;&amp; e x p a n d S p a c e ( t 1 . t r i p : : STBOX , 3 . 0 ) )
SELECT L i c e n c e 1 , C a r 2 I d , P e r i o d s
FROM Temp
WHERE P e r i o d s I S NOT NULL ;</p>
          <p>This query first creates the CTE Temp to store the Periods when the vehicles met other vehicles
within the spatial constraint. The tDwithin() function used in the projection is one of the spatial
relationships generalized for temporal geometries. This function first computes, at each instant, whether
the distance between the temporal points (Trip values from the Trips table) is less than or equal
to 3. The function yields a tbool (temporal boolean) value representing the condition at all time
instants of the trips. The resulting tbool value is passed to whenTrue(), which returns the time when
the temporal boolean takes the value true as a tstzspanset value. This mobility type represents
sets of ranges of timestamptz values. To filter out trips that are very far from each other, the
expandSpace() function is used in the fourth join condition. The trip (from t2) is first cast into the
stbox type, representing a spatiotemporal bounding box around the whole trip. expandSpace()
expands the spatial dimension of this bounding box by 3 units. Only trips that overlap with this
expanded box, filtered using the overlaps (&amp;&amp;) predicate, are kept.</p>
        </sec>
        <sec id="sec-6-2-2">
          <title>6.2.2. Results and Discussions</title>
          <p>Figure 8 visualizes the runtimes of all 17 queries, across 4 scale factors, and for 3 scenarios: using
MobilityDuck on DuckDB (yellow bars), MobilityDB on PostgreSQL with R-tree indexes (GiST, dark
blue bars), and MobilityDB with quad-tree indexes (SP-GiST, light blue bars).</p>
          <p>MobilityDuck outperforms MobilityDB both with and without indexes across all scale factors in 13
out of 17 queries (1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 16, 17). For Query 5, MobilityDuck still manages to
achieve the best runtimes in all scale factors except SF-0.15. Similarly, for Query 15, MobilityDuck
achieves the best runtimes in all scales except SF-0.2 (where it runs approximately 3.7 seconds slower
than MobilityDB).</p>
          <p>MobilityDuck shows lower performance across all scale factors for Query 10 and Query 14. This can
be explained due to the fact that both of these queries extensively call upon table columns with indexes
when running with MobilityDB in PostgreSQL.</p>
          <p>Overall, in the majority of cases, MobilityDuck without indexes outperforms MobilityDB with
indexes. These results demonstrate the efectiveness of integrating spatiotemporal querying directly
within DuckDB’s analytical engine, rather than relying on external index-based acceleration. The
strong performance of MobilityDuck, even in the absence of specialized indexes, suggests an efective
utilization of DuckDB’s architecture to handle mobility workloads.</p>
        </sec>
        <sec id="sec-6-2-3">
          <title>6.2.3. Scalability Limits on Commodity Hardware</title>
          <p>While the system demonstrated robust performance up to SF-0.2 (corresponding to approximately
20GB of on-disk data), experiments at SF 0.3 and 0.5 (corresponding to about 303 million raw GPS
points) encountered resource exhaustion on the test configuration. Despite DuckDB’s out-of-core
capabilities (spilling to disk), the system experienced critical memory saturation (near 100% utilization
of RAM and swap), resulting in process termination. We attribute this limitation to the specific
characteristics of spatiotemporal workloads on commodity hardware. Unlike standard relational data,
spatiotemporal query processing involves complex intermediate structures such as high-cardinality
geometries or variable-length spatial objects, incurring significant memory overhead that might not
always be immediately spillable. We therefore interpret this as a practical upper bound for interactive
spatiotemporal analytics using an in-memory analytical engine on commodity hardware under the
evaluated workload. Importantly, this limitation reflects the constraints of the execution environment
rather than an inherent scalability limit of DuckDB itself. Larger scale factors are expected to be feasible
on machines with higher memory capacity or through distributed and cloud-based deployments.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Limitations and Future Work</title>
      <p>The latest implementation of MobilityDuck integrates MEOS and binds spatiotemporal types and
functions adapted from the implementation of MobilityDB. As MEOS and MobilityDB are both constantly
evolving with frequent additions of types and functionalities, the volume of such adaptation can grow
very rapidly. Future development of MobilityDuck can benefit from an automated tool for generating
bindings of all types and functions to ensure the most complete and up-to-date implementation on a
par with MEOS.</p>
      <p>In working with GEOMETRY type, we do not work with this type directly due to its specialized
implementation by the Spatial extension. In the latest MobilityDuck implementation, we use an
additional proxy layer where the functions that are supposed to return GEOMETRY type will, instead,
return either WKB_BLOB or VARCHAR types, which are standardized. The casting to and from these types
and GEOMETRY is left for the Spatial extension to handle, which is enforced by adding ::GEOMETRY,
::WKB_BLOB, etc. to the relevant values. This interface, while simple in terms of implementation,
results in unnecessary overheads when dealing with data of type GEOMETRY, as previously discussed
in Section 6.2. Future development of MobilityDuck will focus on a more refined integration with the
Spatial extension in order to support the GEOMETRY-related functions more natively and eficiently.
(a) Query runtimes at SF-0.05</p>
      <p>(b) Query runtimes at SF-0.1
(c) Query runtimes at SF-0.15
(d) Query runtimes at SF-0.2</p>
      <p>MobilityDuck currently does not support the geography type. Future work will examine the native
support for this data type, such as with the use of the Geography extension,7 in order to develop
MobilityDuck’s interface for handling this type.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>This paper introduces MobilityDuck, a DuckDB extension which integrates the mobility data
management capacity of MEOS into a lightweight, in-memory analytical database system. By embedding
spatiotemporal types and trajectory operators directly into an in-memory analytical engine,
MobilityDuck is among the first systems to bridge the gap between traditional moving object databases
and modern embedded analytics systems. It enables eficient analysis of large spatiotemporal datasets
while preserving DuckDB’s strengths in performance, simplicity, and seamless integration with data
7https://duckdb.org/community_extensions/extensions/geography.html
science environments. By using the BerlinMOD-Hanoi dataset, we demonstrated the performance of
MobilityDuck. First, we presented a use case scenario that integrates MobilityDuck with DuckDB’s
Python API and the Jupyter Notebook environment, enabling fast visualizations of query results and
showing the quick integration of MobilityDuck into the existing DuckDB ecosystem. Secondly, we
compared the runtime of MobilityDuck against MobilityDB using a set of benchmark queries of
BerlinMOD. In the majority of cases, MobilityDuck achieved better results than MobilityDB with two types
of indices, showing its potential for developing a unified, high-performance analytical framework for
spatiotemporal data.</p>
      <p>For future work, we aim to expand the spatiotemporal analytics capabilities of MobilityDuck by
adding support for the remaining types and functions of MEOS, and potentially develop an automated
tool for keeping MobilityDuck up-to-date with both MEOS and MobilityDB. Additionally, we plan to
further develop the indexing capabilities of MobilityDuck to support indexing more spatiotemporal data
types, as well as to re-evaluate its performance once indexing has been more thoroughly supported.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Artifacts</title>
      <p>All relevant code and instructions for building MobilityDuck as well as the use case demonstration and
benchmark are available on the oficial GitHub repository. 8 In addition, precompiled binary extension
packages are also provided in GitHub repository for Linux, macOS, and DuckDB-Wasm. At present,
Windows is not supported due to compatibility limitations in the underlying MEOS library. Support for
Windows will be made available once MEOS provides the necessary compatibility. All code and data
related to BerlinMOD-Hanoi are also available.9</p>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT in order to paraphrase and reword.
After using this tool/service, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.
8https://github.com/MobilityDB/MobilityDuck
9https://github.com/MobilityDB/MobilityDB-BerlinMOD-Hanoi
[9] M. Raasveldt, H. Mühleisen, Duckdb: An embeddable analytical database, in: Proc. of the 2019 Int.</p>
      <p>Conf. on Management of Data, ACM, 2019.
[10] M. Raasveldt, H. Mühleisen, Data management for data science-towards embedded analytics, in:</p>
      <p>Proc. of the 10th Conference on Innovative Data Systems Research, 2020.
[11] M. Gabrielsson, PostGEESE? Introducing The DuckDB Spatial Extension, 2023. https://duckdb.</p>
      <p>org/2023/04/28/spatial.html.
[12] P. Gabel, quackML: A duckdb extension implementing a full service ai/ml engine, 2025. https:
//github.com/parkerdgabel/quackML.
[13] C. Düntgen, T. Behr, R. Güting, Berlinmod: A benchmark for moving object databases, The VLDB
Journal 18 (2009) 1335–1368.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Newell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Theriault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Easterfield</surname>
          </string-name>
          ,
          <article-title>Temporal gis: Modeling the evolution of spatial data in time</article-title>
          ,
          <source>Computers &amp; Geosciences</source>
          <volume>18</volume>
          (
          <year>1992</year>
          )
          <fpage>427</fpage>
          -
          <lpage>433</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gebbert</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Pebesma,</surname>
          </string-name>
          <article-title>The grass gis temporal framework</article-title>
          ,
          <source>International Journal of Geographical Information Science</source>
          <volume>31</volume>
          (
          <year>2017</year>
          )
          <fpage>1273</fpage>
          -
          <lpage>1292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pelekis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Theodoulidis</surname>
          </string-name>
          , I. Kopanakis,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Theodoridis</surname>
          </string-name>
          ,
          <article-title>Literature review of spatio-temporal database models</article-title>
          ,
          <source>The Knowledge Engineering Review</source>
          <volume>19</volume>
          (
          <year>2004</year>
          )
          <fpage>235</fpage>
          -
          <lpage>274</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Zimányi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sakr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lesuisse</surname>
          </string-name>
          ,
          <article-title>Mobilitydb: A mobility database based on postgresql and postgis</article-title>
          ,
          <source>ACM Transactions on Database Systems</source>
          (
          <year>2020</year>
          )
          <volume>19</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          :
          <fpage>42</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sakr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaisman</surname>
          </string-name>
          , E. Zimányi,
          <source>Mobility Data Science: From Data to Insights</source>
          , Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Doan</surname>
          </string-name>
          ,
          <article-title>S4strd: A scalable in memory storage system for spatio-temporal real-time data</article-title>
          ,
          <source>in: Proc. of the 2015 IEEE Int. Conf. on Smart City/SocialCom/SustainCom</source>
          , IEEE,
          <year>2015</year>
          , pp.
          <fpage>896</fpage>
          -
          <lpage>901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , S. Sadiq,
          <string-name>
            <surname>SharkDB:</surname>
          </string-name>
          <article-title>An in-memory column-oriented trajectory storage</article-title>
          ,
          <source>in: Proc. of the 23rd ACM Int. Conf. on Information and Knowledge Management</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1409</fpage>
          -
          <lpage>1418</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Richly</surname>
          </string-name>
          ,
          <article-title>Memory-eficient storing of timestamps for spatio-temporal data management in columnar in-memory databases</article-title>
          ,
          <source>in: Proc. of the Int. Conf. on Database Systems for Advanced Applications</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>542</fpage>
          -
          <lpage>557</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>