<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Neuro-symbolic Complex Event Recognition in Autonomous Driving</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tatiana Boura</string-name>
          <email>tatiana.boura@tugraz.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nikos Katzouris</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Complex Event Recognition, autonomous driving, neuro-symbolic AI</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Graz Center for Machine Learning</institution>
          ,
          <addr-line>Graz</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Informatics and Telecommunications, NCSR 'Demokritos'</institution>
          ,
          <addr-line>Ag. Paraskevi</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Machine Learning and Neural Computation, TU Graz</institution>
          ,
          <addr-line>Graz</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Complex Event Recognition (CER) systems aim to identify critical events of interest that emerge from streams of data. These complex events are typically specified as spatio-temporal compositions of simpler events (e.g., sensor readings), using symbolic temporal patterns such as finite state machines or temporal logic rules. In practice, CER systems often need to operate on sub-symbolic input. For instance, autonomous vehicles must detect complex, temporally extended events, such as overtaking maneuvers, based on raw sensor data like video streams, in order to react safely and efectively. Neuro-symbolic (NeSy) AI ofers a promising framework in this context, as it combines neural networks' ability to interpret sub-symbolic data with symbolic reasoning over structured knowledge, such as CER patterns. However, the application of NeSy techniques to temporal learning and reasoning in real-world domains remains significantly underexplored. To address this gap, we propose a NeSy approach, which utilizes the NeSyA framework, for detecting overtaking events between vehicles in an autonomous driving setting. We conduct an empirical evaluation on the ROAD dataset and demonstrate that our approach outperforms purely neural baselines in terms of complex event recognition performance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Many applications require processing of continuously
streaming data from geographically dispersed sources.
Complex event recognition (CER) involves identifying events
within these streams, enabling the implementation of both
reactive and proactive actions [1]. Beyond their time
eficiency, CER systems are valued for their emphasis on
trustworthy decision-making. This is achieved through
welldefined theoretical frameworks, such as logic specifications
and automata, and machine learning methods like Inductive
Logic Programming and structure learning, which provide
symbolic pattern definitions, sound pattern learning and
eficient inference.</p>
      <p>However, in applications involving sub-symbolic input,
such as video data, there is a need to integrate these
symbolic methods with sub-symbolic models to maintain
performance. This necessity motivates the introduction of
NeuroSymbolic Artificial Intelligence (NeSy) into the CER domain.
NeSy systems integrate neural-based learning with
logicbased reasoning, combining sub-symbolic data processing
with symbolic knowledge representation. This integration
aims to enhance interpretability, robustness, and
generalization of sub-symbolic methods, particularly improving their
capacity to handle out-of-distribution data.</p>
      <p>A relevant domain for the integration of NeSy methods
and CER is autonomous driving, since –given the
missioncritical nature of this domain– event recognition must be
both eficient and reliable. In this context, vehicles must
interpret data from cameras and sensors to quickly identify
events that may require action. Many events in this domain
can be formally described using rules and enriched with
background knowledge, which can be efectively defined
and leveraged through CER methods.
(N. Katzouris)
(N. Katzouris)
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License</p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073</p>
      <p>In this setting, simple event predictors can be modeled
using sub-symbolic structures, while complex event
recognition is addressed through established symbolic CER
frameworks. Several NeSy works have been proposed that handle
temporal dynamics present in data sequences [2, 3, 4, 5, 6],
but they are application specific and do not ofer a
generalized framework that learns over a formalization of
simple events. On the other hand, generalizable NeSy
frameworks such as DeepStochLog [7], DeepProbLog [8], and
NeurASP [9] are not inherently designed to model temporal
events and need to be enforced with time-aware
reasoning (e.g. timestamps, sequential neural models,
stochastic processes etc.). One model that addresses both
limitations is NeSyA (Neuro-Symbolic Automata) [10], which
combines symbolic automata with neural-based perception
under probabilistic semantics in an end-to-end diferentiable
framework. NeSyA supports temporal reasoning while
enabling the learning of common symbolic structures used in
CER.</p>
      <p>This work represents an initial efort to address complex
events in autonomous driving with NeSy, and specifically
NeSyA, with the incentive to yield better results than purely
neural approaches, focusing on the recognition of overtake
incidents between agents in the ROAD dataset [11]. The
remainder of the paper is structured as follows. Section 2
presents the necessary theoretical background, focusing on
CER and its relation to symbolic automata and autonomous
driving. Section 3 outlines our neuro-symbolic approach,
explaining the integration of the sub-symbolic models and
symbolic automata for training and inference. The complete
dataset, the experimental setup, results and analysis are
provided in Section 4, for the challenging task of recognizing
the complex event where a road agent overtakes another.
Finally, Section 5 concludes the paper and outlines directions
for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>2.1. Complex Event Recognition</title>
        <p>Complex Event Recognition (CER), also known as complex
event pattern matching, refers to the detection of complex
events in streaming data by identifying temporal patterns
composed of simple events, i.e. low-level occurrences, or
even other complex events [12]. Typically, CER systems
operate on streams of event tuples [1, 13], which are
timestamped collections of attribute-value pairs. Conceptually,
CER input can be seen as a multivariate sequence, with one
sub-sequence per event attribute. For example, an attribute
might represent the output of a specific sensor, and its
values correspond to the sensor’s readings over time, whether
numerical, categorical, and/or sub-symbolic. Each event
tuple serves as a observation of the joint evolution of all
relevant attributes at a specific time point. Complex event
patterns define both a temporal structure over these event
tuples and a set of constraints on their attributes. A pattern
is matched when a sequence of event tuples satisfies both
the required temporal ordering and the attribute constraints.</p>
        <p>These patterns are typically specified by domain experts
using event specification languages [ 13]. Such languages
must support a core set of event-processing operators [14, 1,
15], including: (a) sequence, indicating that specific events
must occur in temporal succession; (b) iteration (Kleene
Closure), requiring one or more repeated occurrences of an
event type; (c) filtering, which restricts matches to events
satisfying predefined predicates.</p>
        <p>These operators naturally align with a computational
model based on Symbolic Finite Automata (SFAs) [16].
Unlike classical automata, which assume finite alphabets, SFAs
generalize transitions to be governed by logical predicates
over potentially infinite domains, represented using
efective Boolean algebras [17, 18]. This enables expressive and
compact representations of complex event structures. As a
result, most existing CER systems rely on SFA-based pattern
representations [19, 15, 20, 21, 22, 23, 24]. In these systems,
patterns are typically written in declarative languages (e.g.,
SQL-like syntax) and compiled into symbolic (often
nondeterministic) automata.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. CER in Autonomous Driving</title>
        <p>Existing work in the autonomous driving domain typically
describes activities as driving events, i.e., events occurring
during driving [25, 26, 27]. The connection to the CER
theory is evident: autonomous vehicles must process
numerical and/or sub-symbolic sensor data to recognize driving
events. For example, sudden braking may follow a sequence
in which a car stops at a red light, accelerates when it turns
green, and then brakes abruptly as a deer crosses the road.</p>
        <p>Framing these problems as CER tasks is motivated by the
fact that many target patterns are either known or can be
explicitly defined. When such patterns are not predefined,
learning-based methods can be used to discover patterns
compatible with CER systems. A relevant example is the
ROAD dataset [11, 28], a richly annotated autonomous
driving dataset based on the RobotCar dataset [29]. ROAD
provides frame-level annotations for agents, including their
identity (e.g., vehicle, pedestrian), action(s) (e.g., overtaking,
turning left), and semantic location(s) (e.g., left pavement,
incoming lane).</p>
        <p>From a CER perspective, certain actions, such as
‘overtake’, constitute complex events, while others, such as ‘green
trafic light’, represent states. Among these, ‘overtake’ is
particularly notable due to its temporal extent, involvement
of multiple (simple) sub-events, and significant impact on
the scene, making it a compelling CER task. However, the
‘overtake’ pattern is not predefined. Given the complexity of
scenes-multiple agents, dynamic locations, and concurrent
actions, and the lack of domain experts, manual
specification is infeasible. Section 4 details the learning approach
used to extract such patterns.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Neuro-Symbolic Approach</title>
      <p>To perform CER on the video input, we combine ideas from
(sequential) NeSy frameworks and standard CER pipelines:
neural networks process sub-symbolic input to detect
simple events (actions and semantic locations), while symbolic
automata handle pattern matching to recognize ‘overtake’
incidents. In this Section we will describe in detail the NeSy
integration in our work, by outlining the NeSyA framework
and its theoretical basis and connecting it to our decisions,
driven by the task at hand.</p>
      <sec id="sec-3-1">
        <title>3.1. SFAs and Markov Models</title>
        <p>In sequence modeling, it is often reasonable to assume that
recent observations are more predictive than distant ones.
This motivates the use of Markov models, where future
states depend only on a limited history, typically just the
current or previous state [30]. In these models, transitions
between states are governed by probabilities. A model is
considered non-stationary if these transition probabilities
change over time.</p>
        <p>Markov models represent sequences using a state space
and a transition function in the form of a matrix that defines
the likelihood of moving from one state to another. At each
time step, the distribution over states is updated based on the
previous distribution and the current transition probabilities.
This formulation allows for eficient modeling of temporal
dynamics in data.</p>
        <p>A seemingly diferent approach comes from SFAs. Rather
than using probabilities, SFAs define transitions using
logical conditions over structured inputs. Specifically, inputs
are interpreted as truth assignments over a set of
propositional variables and transitions occur when the current
input satisfies a logical formula attached to an edge in the
automaton.</p>
        <p>Both frameworks process sequences by transitioning
through states in response to observed inputs, whether
those inputs are numeric symbols or logical interpretations
and when SFAs are applied to data streams (where input
patterns or variable co-occurrences can be estimated)
transitions can be interpreted probabilistically, much like in a
nonstationary Markov chain. So, SFAs can subsume Markov
models by encoding structured dependencies while
remaining amenable to probabilistic analysis.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Diferentiable Probabilistic Inference via SFAs</title>
        <p>Probabilistic reasoning over structured domains typically
involves modeling uncertainty using joint probability
distributions over finite sets of variables [ 31, 32]. While
expressive, these distributions grow exponentially with the
number of variables, rendering exact inference intractable.
A widely used approach to address this is Weighted Model
Counting (WMC), which encodes the probabilistic model
as a weighted logical theory, consisting of a propositional
formula and a function assigning weights (probabilities) to
literals [33]. The probability of a query is then computed
by summing the weights of all satisfying assignments,
generalizing the classical model counting problem.</p>
        <p>This process underlies probabilistic logical inference,
where one computes the probability that a logical formula
holds under uncertain inputs. Since WMC is a # -complete
problem, practical inference relies on Knowledge
Compilation, which transforms formulas into tractable
representations, such as deterministic decomposable negation normal
form (d-DNNF) circuits [34]. Once compiled, inference
becomes linear in the size of the circuit and diferentiable.</p>
        <p>Symbolic automata define transitions between states
using propositional formulas over input variables. When
inputs are uncertain or noisy, each transition can be evaluated
probabilistically by applying WMC to the corresponding
formula. If the automaton is constructed using compiled
circuits for each transition, the entire system becomes a
diferentiable probabilistic model, enabling integration with
gradient-based learning methods.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. End-to-end training</title>
        <p>Let us present in this section the NeSy pipeline in both
inference and learning scenarios. Note that the process of
probabilistic inference and learning is embedded in NeSyA,
but we will not distinguish it here so that the pipeline is
more coherent. We begin by outlining the inference process
–a single feed-forward pass from input to prediction.</p>
        <p>A video is processed by simple event recognition
networks, which output probability distributions over simple
events, specifically each two agents’ actions and semantic
locations for every frame. These distributions are then used
to classify (ground) the agents’ discrete actions and
locations for the evaluation of the symbolic automaton. Next, a
smooth d-DNNF circuit is compiled from the ASP
representation of the automaton. The circuit includes one variable
for each possible action and location value, and supports
probabilistic queries corresponding to the automaton’s
transitions. These queries form the transition matrix by
computing weighted model counts that accumulate probabilities
in the states of the automaton.</p>
        <p>For each video, a row vector representing the
probability distribution over automaton states at each time step is
maintained. It is initialized such that the start state has
probability mass 1, with all others set to 0. As each frame
is processed, the state vector is updated by multiplying it
with the current transition matrix. Each column of the
transition matrix represents the probability of transitioning into
a particular state at a given frame. Because the transition
matrix is computed from neural network outputs, which
vary at every timestep, we consider our symbolic automata
non-stationary. The final output is the state distribution
after processing the last frame.</p>
        <p>We now turn to the learning procedure. After each
forward pass, the computed state probability distribution can
be used to evaluate the prediction loss over the complex
event. This loss can be defined over the entire
distribution or based solely on the acceptance probability –that
is, the probability mass assigned to the automaton’s final
(accepting) state. Since the compiled symbolic automaton
is diferentiable, the loss can be backpropagated through
the symbolic layer. This enables end-to-end training of the
simple event recognition networks via gradient descent. As
a result, the model learns to adjust its predictions of
simple events in a way that improves recognition of complex
events, which in our task is the ‘overtake’ event through
distant supervision. A visualization of the proposed pipeline
is presented in Figure 1.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Sequential datasets</title>
        <p>ROAD dataset consists of 22 real-world 8-minute videos
recorded between November 2014 and December 2015 in
central Oxford, covering a range of routes and seasonal
conditions. Of these, 20 videos are currently available for
training and evaluation.</p>
        <p>Road events are defined as a series of bounding boxes
linked in time (frames), annotated with the agent’s label,
action(s), and semantic location(s) (cf. Table 1). Regarding
the autonomous vehicle (AV), we only know its unique
egoaction (Table 1). Each agent has a unique identifier per video.
The dataset includes approximately 122K annotated frames
(12 fps) at 1280 × 960 resolution with a multitude of agents
per frame.</p>
        <p>Regarding the complex ‘overtake’ actions, the dataset
contains 30 unique overtakes, performed either by the AV
or other agents. Durations range from 2 to 164 frames (mean:
49.83; std: 41.87), all occurring within 9 videos. Figure 2
illustrates an overtaking instance from the ROAD dataset.</p>
        <p>To enable neurosymbolic integration and construct a
pipeline that extracts sub-symbolic information from video
and feeds it into a symbolic reasoning module for overtake
recognition, we extract two aligned sequential datasets from
the complete dataset: one symbolic and one sub-symbolic,
in one-to-one correspondence. We diferentiate between
overtakes involving the AV and those involving two external
agents. This distinction is necessary, as each type exhibits
diferent visual and symbolic patterns. When the AV is
involved, its position is fixed, and its visual representation
is not relevant, unlike scenarios where the AV is not part
of the overtake. The dataset consists of sequences ranging
from 6 to 10 frames (approximately 0.5 to 1 second), a
duration suficient for humans to recognize overtakes in both
symbolic and sub-symbolic modalities.</p>
        <p>We define three classes: 0 for negative examples (no
overtake), 1 when the first agent overtakes the second, and 2
when the second agent overtakes the first. This labeling
explicitly captures the directionality of the overtake.
Positive instances were generated by selecting video segments
with a maximum length of 10 frames, using non-overlapping
chunks to prevent overfitting during NeSy training. A
sliding window approach was avoided due to the limited
number of positive examples, which would result in highly
similar instances. This process yielded 92 positive instances,
each concluding with and containing an overtake event.</p>
        <p>Selecting negative instances is inherently more
challenging, as any sequence not classified as an overtake could
theoretically serve as a negative. To ensure informative
training, we focused on close negatives: sequences that
initially resemble overtakes but do not culminate in one
vehicle passing another. To construct these, we identified
the action pairs performed by agents prior to overtakes,
along with their frequency, and stochastically searched the
dataset for similar sequences that do not result in
overtakes. Only one instance per agent pair was included, and
both agents were required to appear for at least 6 frames.
This process yielded approximately 2,000 negative instances.
While downsampling negative examples could balance the
dataset, we deliberately avoided this approach. Overtake
events are inherently sparse, and artificially balancing the
dataset would introduce unrealistic conditions. Also
training on simplified, artificially balanced data would lead to
poor performance, given the sub-symbolic complexity of
the task.</p>
        <p>The symbolic dataset provides a structured, logic-based
representation of events occurring within each frame. Each
instance encodes facts describing the two agents involved,
including their identity (e.g., AV, large vehicle), actions,
semantic locations, and normalized bounding box coordinates
at each timepoint (frame). The instance’s class label is also
included. This representation enables the grounding of the
complex overtake event in terms of simple events, defined
by combinations of agent actions and locations within the
symbolic framework. The sub-symbolic includes the
corresponding images of the frames that consist the symbolic
dataset.</p>
        <p>To ensure unbiased evaluation, we enforced a strict
separation between training and testing sets, preventing overlap
of augmented positives or negatives from the same video
segments. We performed an 80/20 train/test split, analogous
to k-fold cross-validation, using disjoint sets of videos for
positive samples. This resulted in up to 36 splits, allowing
testing on out-of-distribution data. While we initially
applied the same strategy to negative samples, we observed a
drawback: videos vary significantly in visual characteristics
(e.g., snow-covered vs. leafy junctions), and training solely
on one type reduces generalization. To mitigate this, we
allowed negatives from all videos but enforced a minimum
temporal distance of 100 frames between any two selected
instances, avoiding redundancy while maintaining visual
diversity.</p>
        <p>To simplify the task, we focused only on one positive
class and overtakes not involving the AV. As a result, not
all data splits remained suitable, since some lacked relevant
positives or exhibited more positives in the testing set. We
randomly selected four viable splits for training and
evaluation. Across these splits, the number of positive sequences
in the training set ranges from 46 to 75, and from 17 to
46 in the test set. The corresponding number of negative
sequences is approximately 550 for training and 250 for
testing.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Extracting Background Knowledge</title>
        <p>Since ‘overtake’ patterns were not predefined, we
employed the ASAL framework [35] to learn the patterns from
the symbolic sequential dataset. ASAL learns Answer Set
Automata, an extension of SFAs tailored for CER over
multivariate event streams, where transition predicates are
deifned via ASP rules. Through declarative learning with
symbolic reasoning it produces compact models with strong
generalization performance.</p>
        <p>We used ASAL with the objective of maximizing
generalization on the test set. We learned a general automaton
from the diferent symbolic splits. This led to the selection
of a subset of simple events most relevant for complex event
recognition. The selected actions were: moving away,
moving towards, stop, and other (none of the above). The selected
semantic locations were: incoming lane, vehicle lane,
junction, and other. Intuitively, this aligns with human reasoning:
recognizing an ‘overtake’ primarily requires understanding
the orientation and motion direction of the vehicle.</p>
        <p>
          The above process resulted in the automaton shown in
Figure 3. This learned symbolic automaton accepts multiple
patterns as valid instances of overtakes, represented by
different paths leading to the accepting state. Examples of such
paths include: f(
          <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
          ) → f(
          <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
          ) → f(
          <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
          ) → f(
          <xref ref-type="bibr" rid="ref2 ref4">2,4</xref>
          ) or
f(
          <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
          ) → f(
          <xref ref-type="bibr" rid="ref1 ref4">1,4</xref>
          ). Let us give an intuitive overtaking
pattern that is validated by the shortest accepting path f(
          <xref ref-type="bibr" rid="ref1 ref4">1,4</xref>
          ):
• AV detects two vehicles in the same lane as itself
(vehicle lane)
• Both vehicles are visible in front of the AV, meaning
they are positioned side by side without overlapping
in the AV’s field of view
• If one of these vehicles is detected as moving, while
the other is static or moving slower, the moving
vehicle is classified as overtaking the other
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Experimental Setup</title>
        <p>In a higher level of abstraction, the task is framed as a
binary sequence classification problem: determining whether
a given sequence of frames constitutes an ‘overtake’.
Experiments were conducted on the four (sub-symbolic) data splits
described in Section 4.1. We trained NeSy models and
compared their performance against purely neural baselines.</p>
        <p>For simple event recognition, we employed two
architectures: a 2D-CNN for semantic location prediction and a
3D-CNN for action recognition, both with multiple
convolutional layers. The temporal modeling capability of the
3DCNN is particularly important for recognizing motion-based
actions. Each module outputs eight predictions per frame:
probability distributions over the actions and locations of
each agent. Although the annotations are multi-label (e.g.,
an agent may simultaneously move toward the AV and
signal a left turn), the task is cast as multi-class due to the
requirement in the NeSy pipeline for probability
distributions over mutually exclusive classes. Both networks receive
the same input: a 10-frame video segment and bounding
boxes of the two agents of interest per frame.</p>
        <p>To evaluate the temporal reasoning capabilities of our
NeSy model, we compare it against a standard
spatiotemporal neural architecture: a Long Short-Term Memory
(LSTM) network [36]. In this baseline, the outputs of the
simple event recognition modules are passed to an LSTM
(hidden size 10), whose output is used to predict the final
classification probability.</p>
        <p>For training, we used the Adam optimizer [37] with a
batch size of 8. Due to the difering temporal context –80
frames for the semantic location network versus 8 for the
action recognition network– we set distinct learning rates
for each. Empirically, we found that the semantic location
module required a lower learning rate, so we used 10−5 for
the 3D-CNN action recognizer and halved it for the location
module.</p>
        <p>All CER models were trained for a fixed 40 epochs. The
neural baseline took approximately 20 seconds per epoch,
whereas for the NeSy approach took 30 seconds. Given the
scarcity of positive examples in the training set, we did not
employ a validation set. Instead, model selection was based
on training loss dynamics: we normalized losses to the [0, 1]
range using the first epoch’s loss as the maximum and 0 as
the minimum, then selected the model at the earliest epoch
where the loss plateaued, defined as a change of less than
0.05 across a window of two consecutive epochs.</p>
        <p>Since ‘overtake’ instances are sparse, comprising only
10% of the dataset, the task becomes a highly imbalanced
binary classification problem. To address this, we
evaluated two loss functions for NeSy and baseline training:
weighted binary cross-entropy (weighted BCE) and focal loss.
While weighted BCE increases the contribution of the
minority class by reweighting class loss terms, focal loss
downweights easy examples, focusing learning on harder,
misclassified ones.</p>
        <p>
          In the neural baseline, outputting a complex event
probability is straightforward. In contrast, the NeSy model
produces a state probability distribution over the automaton.
The first and last entries in this vector correspond to the start
and accepting states, respectively. We experimented with
two approaches for mapping this distribution to a
classification probability: (a) using only the acceptance probability,
and (b) comparing the full state distribution to the target
distribution (
          <xref ref-type="bibr" rid="ref1">0, 0, 0, 1</xref>
          ) using the Kolmogorov-Smirnov (KS)
distance. The KS distance provides a bounded [0, 1]
similarity score between cumulative distributions, ofering a
principled, interpretable metric to evaluate whether the final
state is reached.
start
% State 1 -&gt; 2: if agent 2 is moving towards the AV and not transitioning to State 4.
f(
          <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
          ) :- action_2(movtow), not f(
          <xref ref-type="bibr" rid="ref1 ref4">1,4</xref>
          ).
% State 1 -&gt; 4: if agents are in the same lane and the agent 2 is moving away from the AV.
f(
          <xref ref-type="bibr" rid="ref1 ref4">1,4</xref>
          ) :- same_lane(l1, l2), action_2(movaway).
% Stay in State 1: if not moving to State 2 or 4.
f(
          <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
          ) :- not f(
          <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
          ), not f(
          <xref ref-type="bibr" rid="ref1 ref4">1,4</xref>
          ).
% State 2 -&gt; 3: if agent 2 is moving towards the AV and not transitioning to State 4.
f(
          <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
          ) :- action_2(movtow), not f(
          <xref ref-type="bibr" rid="ref2 ref4">2,4</xref>
          ).
% State 2 -&gt; 4: if agent 1 is stopped and agent 2 is in the incoming lane.
f(
          <xref ref-type="bibr" rid="ref2 ref4">2,4</xref>
          ) :- action_1(stop), location_2(incomlane).
% Stay in State 2: if not moving to State 3 or 4.
f(
          <xref ref-type="bibr" rid="ref2 ref2">2,2</xref>
          ) :- not f(
          <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
          ), not f(
          <xref ref-type="bibr" rid="ref2 ref4">2,4</xref>
          ).
% State 3 -&gt; 1: if agent 1 is in the incoming lane.
f(
          <xref ref-type="bibr" rid="ref1 ref3">3,1</xref>
          ) :- location_1(incomlane).
% Stay in State 3: if not moving to State 1.
f(
          <xref ref-type="bibr" rid="ref3 ref3">3,3</xref>
          ) :- not f(
          <xref ref-type="bibr" rid="ref1 ref3">3,1</xref>
          ).
% Stay in State 4: always; absorbing state.
f(
          <xref ref-type="bibr" rid="ref4 ref4">4,4</xref>
          ) :- #true.
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Results and Discussion</title>
        <sec id="sec-4-4-1">
          <title>4.4.1. End-to-end NeSy</title>
          <p>Our primary objective is to evaluate complex event
recognition, i.e., the recognition of the ‘overtake’ event, across
the four sub-symbolic data splits. To ensure a fair
comparison across splits with imbalanced class distributions, we
adopt the micro-averaged F1 score as our evaluation
metric across all data splits. Table 2 presents the comparative
results on complex events for the NeSy and baseline for all
loss configurations.</p>
          <p>Overall, the NeSy counterpart outperforms the neural
baseline by a large margin across all configurations.
Additionally, focal loss yields better performance than weighted
BCE in both model types. However, no single acceptance
probability computation strategy consistently outperforms
the other across all loss types within the NeSy
configurations. Specifically, using the full state probability
distribution is superior when employing focal loss, whereas relying
solely on the acceptance probability yields better results
under weighted BCE.</p>
          <p>This discrepancy can be attributed to the characteristics
of each loss function. Focal loss is particularly efective at
emphasizing hard, misclassified examples, especially from
the minority class. In such cases, the richer information
provided by the full automaton state distribution enables
ifner-grained adjustments that help reduce loss more
effectively. The KS-derived score, computed from the full
distribution, provides a softer, less confident prediction
signal that is less biased and better reflects uncertainty across
states. Focal loss benefits from this nuance, as it is designed
not for probability calibration but for modulating loss based
on prediction confidence. In contrast, weighted BCE
operates as a weighted maximum likelihood estimator under
asymmetric class priors, assuming calibrated, true
probabilities as input. Consequently, it performs best when provided
with a single, well-defined probability –such as the
acceptance probability– rather than a heuristic proxy derived
from distributional similarity.</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>4.4.2. Evaluation on Simple Events</title>
          <p>However, as seen in Table 2, the F1 scores on the testing set
remain relatively low. Again, as mentioned in Section 4.1,
the computer vision task itself is dificult, so low scores in
the distant supervision task of classifying an ‘overtake’ is
expected. Additionally, for the neural baseline, this outcome
is expected due to the high variability among ‘overtake’
instances, which hinders generalization. In contrast, the
reduced performance of the NeSy model suggests deficiencies
in simple event recognition, since the symbolic automaton,
demonstrates high generalization on the testing set.</p>
          <p>To investigate this hypothesis, we overfit a NeSy model
on the training set and then evaluate its simple event
predictors directly on the training data. As shown in Table 3,
although the model achieves perfect recognition of ‘overtake’
instances, it relies on what can be described as reasoning
shortcuts: it learns to exploit superficial cues in the input
to satisfy the automaton transitions without truly
understanding or modeling the intended semantics of the simple
events. Note that in preliminary experiments we also used
pre-trained simple event predictors, but the complex event
training still managed to find the best training shortcut.
Given the sub-optimal performance of the NeSy model, one
natural consideration is to decouple training and reasoning,
i.e., to first train the simple event predictors independently,
and then incorporate the symbolic component only at
inference time.</p>
          <p>Two approaches are possible: (a) utilizing the entire
dataset for the simple event prediction task, and (b)
utilizing only the sub-symbolic dataset splits defined for the
end-to-end NeSy task, as described in Section 4.1. The
evaluation results for the simple event predictors trained using
these two approaches are presented in Table 4. As expected,
leveraging a larger portion of the dataset for training leads
to improved performance in simple event recognition.
However, since our primary evaluation pertains to the NeSy
training and inference process, we proceed with the simple
event predictors trained on the dataset used for the
endto-end NeSy component. This configuration serves as the
baseline for the current task definition and dataset setup.</p>
          <p>If we evaluate ‘overtake’ recognition using the pre-trained
simple event recognizers by appending the symbolic
automaton, the results show that relying solely on this sequential
setup, without end-to-end training, yields a complex event
F1 score of 0.0, indicating that end-to-end training is
essential for achieving non-trivial performance.</p>
          <p>However, while the overall complex event performance
is low, a score of exactly zero suggests further investigation.
We therefore conduct an additional experiment in Table 5,
where we evaluate complex event recognition while
selectively fixing some simple event predictions to their
groundtruth labels. This allows us to assess whether the accurate
prediction of specific simple events has a disproportionately
large influence on complex event recognition and whether
certain errors in simple event prediction are particularly
detrimental.</p>
          <p>If we provide the symbolic automaton with the
groundtruth distribution of all simple events, as expected, we
recover the automaton’s maximum F1 score on the testing
set (cf. Figure 3). When providing only the ground truth
for the agents’ actions, the ‘overtake’ recognition F1 score
increases to 0.43. In contrast, supplying only the ground
truth for the agents’ semantic locations yields a much lower
score of 0.01. Interestingly, when fixing agent 1’s action and
agent 2’s location to their true values, the F1 score rises to
0.81, very close to the automaton’s upper limit.</p>
          <p>
            This observation highlights that not all simple event
predictions contribute equally to complex event
recognition. Intuitively, one might expect that accurate semantic
location predictions would significantly improve
performance, as predicates such as same_lane, location_1, and
location_2 appear in multiple transitions within the
automaton, but that is not the case. On the contrary, examining
the learned automaton reveals that action_1 is involved
only in the transition f(
            <xref ref-type="bibr" rid="ref2 ref4">2,4</xref>
            ), where it is conjuncted with
location_2(incomlane). Accurately predicting this
specific conjunction appears to be critical for achieving high
complex event recognition performance. These results
indicate that certain transitions in the symbolic automaton
are more crucial for temporal reasoning than others, and
accurate prediction of the literals involved in these key
transitions has a disproportionately large impact on overall
complex event recognition.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>In this work, we presented a Neuro-Symbolic (NeSy)
pipeline for Complex Event Recognition, focusing on the
recognition of overtake incidents between two vehicles from
video data. Our experiments demonstrate that the NeSy
model significantly outperforms its purely neural
counterpart across all configurations.</p>
      <p>We also evaluated the learned simple events as well as
a loosely coupled NeSy setting. Interestingly, our findings
show that the end-to-end NeSy model does not rely solely on
accurate simple event predictions for correct complex event
recognition; instead, it is subject to reasoning shortcuts. In
the loosely coupled setting, we observed that the importance
of specific simple events depends more on their role in key
automaton transitions rather than on their frequency within
the automaton structure.</p>
      <p>A primary direction for future work is the reformulation
and expansion of the dataset. Incorporating more data and
a broader range of complex events would address one of the
main limitations of our study, namely, the limited training
data combined with the inherent complexity of the computer
vision tasks involved.</p>
      <p>Another promising direction is the systematic study of
the relationship between symbolic automaton structure and
NeSy training dynamics. It is plausible that certain
automaton architectures are more suitable for guiding the neural
component. For instance, automata with fewer conjunctive
conditions in their transitions may make the simple event
training easier, while more complex automata could ofer
smoother convergence or improved generalization.</p>
      <p>Finally, a highly relevant avenue is the joint learning of
both the neural and symbolic components. Instead of fixing
background knowledge in advance, we could provide a
flexible knowledge base and allow the system to learn both the
automaton structure and the neural network parameters
simultaneously. While this approach poses considerable
challenges, it holds the potential for creating more flexible
and powerful models that can incorporate symbolic
knowledge without introducing domain-specific biases.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by the project EVENFLOW – Robust
Learning and Reasoning for Complex Event Forecasting, which
has received funding from the European Union’s Horizon
research and innovation programme under grant agreement
No 101070430.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used
ChatGPT-3.5 in order to: Grammar and spelling check. After
using this tool, the authors reviewed and edited the content
as needed and take full responsibility for the publication’s
content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Giatrakos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Alevizos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Artikis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Deligiannakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Garofalakis</surname>
          </string-name>
          ,
          <article-title>Complex event recognition in the big data era: a survey</article-title>
          ,
          <source>The VLDB Journal</source>
          <volume>29</volume>
          (
          <year>2020</year>
          )
          <fpage>313</fpage>
          -
          <lpage>352</lpage>
          . doi:
          <volume>10</volume>
          .1007/s00778- 019- 00557- w.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Vilamala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cerutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Preece</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <article-title>Neuroplex: learning to detect complex events in sensor networks through knowledge injection</article-title>
          ,
          <source>in: Proceedings of the 18th Conference on Embedded Networked Sensor Systems</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>489</fpage>
          -
          <lpage>502</lpage>
          . doi:
          <volume>10</volume>
          .1145/3384419.3431158.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Apriceno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Serafini</surname>
          </string-name>
          ,
          <article-title>A NeuroSymbolic Approach to Structured Event Recognition</article-title>
          ,
          <source>in: 28th International Symposium on Temporal Representation and Reasoning (TIME</source>
          <year>2021</year>
          ), volume
          <volume>206</volume>
          <source>of Leibniz International Proceedings in Informatics (LIPIcs)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          . doi:
          <volume>10</volume>
          .4230/LIPIcs.TIME.
          <year>2021</year>
          .
          <volume>11</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vilamala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , L. Garcia,
          <string-name>
            <given-names>M.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Preece</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kimmig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cerutti</surname>
          </string-name>
          ,
          <article-title>Using deepproblog to perform complex event processing on an audio stream (</article-title>
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.2110. 08090.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Apriceno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Serafini</surname>
          </string-name>
          ,
          <article-title>A NeuroSymbolic Approach for Real-World Event Recognition from Weak Supervision</article-title>
          , in: 29th
          <source>International Symposium on Temporal Representation and Reasoning (TIME</source>
          <year>2022</year>
          ), volume
          <volume>247</volume>
          <source>of Leibniz International Proceedings in Informatics (LIPIcs)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          . doi:
          <volume>10</volume>
          .4230/LIPIcs.TIME.
          <year>2022</year>
          .
          <volume>12</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Apriceno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Erculiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passerini</surname>
          </string-name>
          ,
          <article-title>A NeuroSymbolic Approach for Non-Intrusive Load Monitoring</article-title>
          ,
          <source>in: Volume 372: ECAI</source>
          <year>2023</year>
          ,
          <source>Frontiers in Artificial Intelligence and Applications</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>3175</fpage>
          -
          <lpage>3181</lpage>
          . doi:
          <volume>10</volume>
          .3233/FAIA230638.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Winters</surname>
          </string-name>
          , G. Marra,
          <string-name>
            <given-names>R.</given-names>
            <surname>Manhaeve</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Raedt</surname>
          </string-name>
          ,
          <article-title>DeepStochLog: Neural stochastic logic programming</article-title>
          .
          <source>, AAAI Conference on Artificial Intelligence</source>
          <volume>36</volume>
          (
          <year>2022</year>
          )
          <fpage>10090</fpage>
          -
          <lpage>10100</lpage>
          . doi:
          <volume>10</volume>
          .1609/aaai.v36i9.
          <fpage>21248</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Manhaeve</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dumančić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kimmig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Demeester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Raedt</surname>
          </string-name>
          ,
          <article-title>Neural probabilistic logic programming in DeepProbLog</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>298</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1016/j.artint.
          <year>2021</year>
          .
          <volume>103504</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ishay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>NeurASP: Embracing neural networks into answer set programming</article-title>
          ,
          <source>in: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, International Joint Conferences on Artificial Intelligence Organization</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1755</fpage>
          -
          <lpage>1762</lpage>
          . URL:
          <volume>10</volume>
          .24963/ijcai.
          <year>2020</year>
          / 243.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Manginas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Paliouras</surname>
          </string-name>
          , L. De Raedt, NeSyA: Neurosymbolic automata, in: J.
          <string-name>
            <surname>Kwok</surname>
          </string-name>
          (Ed.),
          <source>Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, International Joint Conferences on Artificial Intelligence Organization</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>5950</fpage>
          -
          <lpage>5958</lpage>
          . doi:
          <volume>10</volume>
          .24963/ijcai.
          <year>2025</year>
          /662.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Akrigg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Di</given-names>
            <surname>Maio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Fontana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Alitappeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jeddisaravi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yousefi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Culley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nicholson</surname>
          </string-name>
          , et al.,
          <article-title>ROAD: The ROad event awareness dataset for autonomous driving</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis &amp; Machine Intelligence</source>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2022</year>
          .
          <volume>3150906</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Artikis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Sergot</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. Paliouras,</surname>
          </string-name>
          <article-title>An event calculus for event recognition</article-title>
          ,
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>27</volume>
          (
          <year>2015</year>
          )
          <fpage>895</fpage>
          -
          <lpage>908</lpage>
          . doi:
          <volume>10</volume>
          .1109/TKDE.
          <year>2014</year>
          .
          <volume>2356476</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Riveros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ugarte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vansummeren</surname>
          </string-name>
          ,
          <article-title>A formal framework for complex event recognition 46 (</article-title>
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1145/3485463.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E.</given-names>
            <surname>Alevizos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Skarlatidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Artikis</surname>
          </string-name>
          , G. Paliouras,
          <article-title>Probabilistic complex event recognition: A survey 50 (</article-title>
          <year>2017</year>
          ). doi:
          <volume>10</volume>
          .1145/3117809.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Diao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Immerman</surname>
          </string-name>
          ,
          <article-title>On complexity and optimization of expensive queries in complex event processing</article-title>
          ,
          <source>in: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>217</fpage>
          -
          <lpage>228</lpage>
          . doi:
          <volume>10</volume>
          .1145/2588555.2593671.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>L. D'Antoni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Veanes</surname>
          </string-name>
          ,
          <article-title>The power of symbolic automata and transducers</article-title>
          , in: Computer Aided Verification, Springer International Publishing, Cham,
          <year>2017</year>
          , pp.
          <fpage>47</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B. W.</given-names>
            <surname>Watson</surname>
          </string-name>
          ,
          <article-title>Implementing and using finite automata toolkits</article-title>
          ,
          <source>Natural Language Engineering</source>
          <volume>2</volume>
          (
          <year>1996</year>
          )
          <fpage>295</fpage>
          -
          <lpage>302</lpage>
          . doi:
          <volume>10</volume>
          .1017/S135132499700154X.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>G. van Noord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gerdemann</surname>
          </string-name>
          ,
          <article-title>Finite state transducers with predicates and identities</article-title>
          ,
          <source>Grammars</source>
          <volume>4</volume>
          (
          <year>2001</year>
          )
          <fpage>263</fpage>
          -
          <lpage>286</lpage>
          . doi:
          <volume>10</volume>
          .1023/A:
          <fpage>1012291501330</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Diao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gyllstrom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Immerman</surname>
          </string-name>
          ,
          <article-title>Efifcient pattern matching over event streams</article-title>
          ,
          <source>in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>147</fpage>
          -
          <lpage>160</lpage>
          . URL: https://doi.org/10.1145/1376616.1376634. doi:
          <volume>10</volume>
          . 1145/1376616.1376634.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gyllstrom</surname>
          </string-name>
          , E. Wu, H.
          <article-title>-</article-title>
          <string-name>
            <surname>J. Chae</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Diao</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Stahlberg</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Anderson, SASE: Complex event processing over streams</article-title>
          ,
          <source>CoRR abs/cs/0612128</source>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>E.</given-names>
            <surname>Alevizos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Artikis</surname>
          </string-name>
          , G. Paliouras,
          <article-title>Complex event forecasting with prediction sufix trees</article-title>
          ,
          <source>VLDB J</source>
          .
          <volume>31</volume>
          (
          <year>2022</year>
          )
          <fpage>157</fpage>
          -
          <lpage>180</lpage>
          . doi:
          <volume>10</volume>
          .1007/S00778-021-00698-X.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>G.</given-names>
            <surname>Cugola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Margara</surname>
          </string-name>
          ,
          <article-title>TESLA: a formally defined event specification language</article-title>
          ,
          <source>in: Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>50</fpage>
          -
          <lpage>61</lpage>
          . doi:
          <volume>10</volume>
          .1145/ 1827418.1827427.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Apache</surname>
          </string-name>
          , FlinkCEP, ???? URL: https://nightlies.apache. org/flink/flink-docs-master/docs/libs/cep/, version 2.2-SNAPSHOT, accessed
          <year>July 2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bucchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Grez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Quintana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Riveros</surname>
          </string-name>
          , S. Vansummeren,
          <article-title>CORE: a complex event recognition engine</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>15</volume>
          (
          <year>2022</year>
          )
          <fpage>1951</fpage>
          -
          <lpage>1964</lpage>
          . doi:
          <volume>10</volume>
          .14778/3538598.3538615.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mitrović</surname>
          </string-name>
          ,
          <article-title>Reliable method for driving events recognition</article-title>
          ,
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          <volume>6</volume>
          (
          <year>2005</year>
          )
          <fpage>198</fpage>
          -
          <lpage>205</lpage>
          . doi:
          <volume>10</volume>
          .1109/TITS.
          <year>2005</year>
          .
          <volume>848367</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>E.</given-names>
            <surname>Lokman</surname>
          </string-name>
          , V. T. Goh,
          <string-name>
            <given-names>T. T. V.</given-names>
            <surname>Yap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <article-title>Driving event recognition using machine learning and smartphones</article-title>
          ,
          <source>F1000Research</source>
          <volume>11</volume>
          (
          <year>2022</year>
          ).
          <source>doi:10.12688/ f1000research.73134.2.</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>M. Z. Yazd</surname>
            ,
            <given-names>I. T.</given-names>
          </string-name>
          <string-name>
            <surname>Sarteshnizi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Samimi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sarvi</surname>
          </string-name>
          ,
          <article-title>A robust machine learning structure for driving events recognition using smartphone motion sensors</article-title>
          ,
          <source>Journal of Intelligent Transportation Systems</source>
          <volume>28</volume>
          (
          <year>2024</year>
          )
          <fpage>54</fpage>
          -
          <lpage>68</lpage>
          . doi:
          <volume>10</volume>
          .1080/15472450.
          <year>2022</year>
          .
          <volume>2101109</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>G.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sapienza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Torr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cuzzolin</surname>
          </string-name>
          ,
          <article-title>Online real-time multiple spatiotemporal action localisation and prediction</article-title>
          ,
          <source>in: Proceedings of the IEEE International Conference on Computer Vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>3637</fpage>
          -
          <lpage>3646</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>W.</given-names>
            <surname>Maddern</surname>
          </string-name>
          , G. Pascoe,
          <string-name>
            <given-names>C.</given-names>
            <surname>Linegar</surname>
          </string-name>
          , P. Newman,
          <volume>1</volume>
          <fpage>year</fpage>
          , 1000km:
          <article-title>The oxford RobotCar dataset</article-title>
          ,
          <source>The International Journal of Robotics Research (IJRR) 36</source>
          (
          <year>2017</year>
          )
          <fpage>3</fpage>
          -
          <lpage>15</lpage>
          . doi:
          <volume>10</volume>
          .1177/0278364916679498.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>D.</given-names>
            <surname>Barber</surname>
          </string-name>
          ,
          <source>Bayesian Reasoning and Machine Learning</source>
          , Cambridge University Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>L.</given-names>
            <surname>Getoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Taskar</surname>
          </string-name>
          , Introduction, The MIT Press,
          <year>2007</year>
          . doi:
          <volume>10</volume>
          .7551/mitpress/7432.003.0003.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Raedt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kersting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Natarajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Poole</surname>
          </string-name>
          ,
          <source>Statistical Relational Artificial Intelligence: Logic</source>
          , Probability, and
          <string-name>
            <surname>Computation</surname>
          </string-name>
          ,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>031</fpage>
          -01574-8.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>J.</given-names>
            <surname>Renkens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shterionov</surname>
          </string-name>
          , G. Broeck,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vlasselaer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fierens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Meert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Janssens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Raedt</surname>
          </string-name>
          ,
          <article-title>ProbLog2: From probabilistic programming to statistical relational learning</article-title>
          ,
          <source>in: Proceedings of the NIPS Probabilistic Programming Workshop</source>
          „
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>A.</given-names>
            <surname>Darwiche</surname>
          </string-name>
          ,
          <article-title>Tractable boolean and arithmetic circuits</article-title>
          , in: P. Hitzler, M. K. Sarker (Eds.),
          <source>NeuroSymbolic Artificial Intelligence: The State of the Art</source>
          , volume
          <volume>342</volume>
          <source>of Frontiers in Artificial Intelligence and Applications</source>
          , IOS Press,
          <year>2021</year>
          , pp.
          <fpage>146</fpage>
          -
          <lpage>172</lpage>
          . doi:
          <volume>10</volume>
          .3233/ FAIA210350.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>N.</given-names>
            <surname>Katzouris</surname>
          </string-name>
          , G. Paliouras,
          <article-title>Answer set automata: A learnable pattern specification framework for complex event recognition</article-title>
          ,
          <source>ECAI</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Long</given-names>
            <surname>Short-Term</surname>
          </string-name>
          <string-name>
            <surname>Memory</surname>
          </string-name>
          ,
          <source>Neural Comput. 9</source>
          (
          <year>1997</year>
          )
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          . doi:
          <volume>10</volume>
          .1162/ neco.
          <year>1997</year>
          .
          <volume>9</volume>
          .8.1735.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic optimization</article-title>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>