<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.13140/RG.2.2.25449.98401</article-id>
      <title-group>
        <article-title>with Logical Temporal Knowledge</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elena Umili</string-name>
          <email>umili@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriel Paludo Licks</string-name>
          <email>licks@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Patrizi</string-name>
          <email>patrizi@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Compostela</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>3432</volume>
      <fpage>46</fpage>
      <lpage>57</lpage>
      <abstract>
        <p>Despite significant advancements in deep learning for sequence forecasting, neural models are typically trained only on data, and the incorporation of high-level prior logical knowledge in their training is still an hard challenge. This limitation hinders the exploitation of background knowledge, such as common sense or domain-specific information, in predictive tasks performed by neural networks. In this work, we propose a principled approach to integrate prior knowledge in Linear Temporal Logic over finite traces (LTL ) into deep autoregressive models for multistep symbolic sequence generation (i.e., sufix prediction) at training time. Our method involves representing logical knowledge through continuous probabilistic relaxations and employing a diferentiable schedule for sampling the next symbol from the network. We test our approach on synthetic datasets based on background knowledge in Declare, inspired by Business Process Management (BPM) applications. The results demonstrate that our method consistently improves the performance of the neural predictor, achieving lower Damerau-Levenshtein (DL) distances from target sequences and higher satisfaction rates of the logical knowledge compared to models trained solely on data.</p>
      </abstract>
      <kwd-group>
        <kwd>Sufix prediction</kwd>
        <kwd>Neurosymbolic AI</kwd>
        <kwd>Deep learning with logical knowledge</kwd>
        <kwd>Linear Temporal Logic</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>This paper addresses the problem of sufix prediction by exploiting both data and prior logic
temporal knowledge. The task of sufix prediction is particularly important in Business Process
Management (BPM) for forecasting the future of a process trace, enabling resource allocation,
and the anticipation of future steps in a process based on historical data.</p>
      <p>
        Recently, there has been significant interest in employing deep learning techniques for sufix
prediction in BPM [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], involving the use of Recurrent Neural Networks (RNNs) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Transformers
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and Deep Reinforcement Learning algorithms [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Despite the significant advancements
in machine learning and deep learning, some studies show how deep models can surprisingly
fail to satisfy even the most basic logical constraints [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] derived from commonsense or
additional domain-knowledge. This occurs because these models are trained solely on data,
with the integration of logical knowledge into the training process remaining an open challenge.
Due to this, while in many application fields, such as BPM, both data and logical knowledge
about the process are often available, the latter remain typically unused. This work explores a
novel approach, aimed at bridging this gap, by taking advantage of prior knowledge expressed
in Linear Temporal Logic over finite traces ( LTL ), when training a deep generative model for
sufix prediction.
      </p>
      <p>
        Very few methods incorporate logical knowledge into deep sequence generation [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ].
Specifically, STLnet [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] employs specifications in Signal Temporal Logic on continuous temporal
sequences for applications other than BPM, however, the attempt to apply it to our domains
produced poor results. On the other hand, Di Francescomarino et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] apply their method
to sufix prediction in BPM but use background knowledge only at test time, employing a
variant of beam search that also takes into account the temporal formula modeling some process
properties. Thus, the proposed method does not fully integrate background knowledge into
the training process but, rather, a smart way to query the pre-trained network, considering the
LTL specification. Interestingly, though, this approach can be seamlessly combined with ours,
to further maximize constraint satisfaction.
      </p>
      <p>
        Our contribution is a method to integrate the knowledge of certain process properties,
expressed in LTL over finite traces ( LTL ), into the training process of a sequence neural
predictor. This enables us to leverage both these sources of information –data and background
LTL knowledge– at training time. Specifically, we achieve this by defining a diferentiable
counterpart of the LTL knowledge and leveraging a diferentiable sampling process known as
Gumbel-Softmax trick [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. By utilizing these two elements, we define a logical loss function
that can be used in conjunction with any other loss employed by the predictor. This ensures
that the network learns to generate traces similar to those seen in the training dataset and to
satisfy the given temporal specifications at the same time. Through preliminary experiments on
synthetic data, we show that incorporating our loss function at training-time results, for RNN
models, in predicted sufixes that feature both a lower Damerau-Levenshtein (DL) distance [ 9]
from the target sufixes and a higher rate of satisfaction of the LTL constraints.
      </p>
      <p>We observe that, although grounded here on a typical BPM task, our approach can be applied
to any scenario involving multi-step symbolic sequence generation through deep learning
models, thus making the main results of this work potentially of general interest to the wider
audience focusing on Machine Learning and, in particular, neurosymbolic AI.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        Recently, there has been significant interest in employing deep Neural Networks (NN) in
predictive monitoring of business processes, for tasks such as next activity prediction, sufix
prediction, and attribute prediction [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Despite significant advances in the field, nearly all works
rely on training these models solely on data without utilizing any formal prior knowledge about
the process. They mainly focus on two aspects: (i) enhancing the neural model, ranging from
RNNs [
        <xref ref-type="bibr" rid="ref7">10, 11, 7, 12</xref>
        ], Convolutional NN (CNN) [13], Generative Adversarial Networks (GANs) [14,
12], Autoencoders [12], and Transformers [12]; and (ii) wisely choosing the sampling technique
to query the network at test time to generate the sufix, mostly using greedy search [ 10], random
search [11], or beam search [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and more recently, policies trained with Reinforcement Learning
(RL) [
        <xref ref-type="bibr" rid="ref4">15, 4</xref>
        ]. Among all these works, only one exploits prior process knowledge [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], expressed as
a set of LTL formulas, but it uses this knowledge only at test time, modifying the beam search
sampling algorithm to select potentially compliant traces with the background knowledge.
      </p>
      <p>
        In this work, we take a radically diferent approach by introducing a principled way to
integrate background knowledge in LTL with a deep NN model for sufix prediction at training
time. This is based on defining a logical loss that can be combined with the loss of any
autoregressive neural model and any sampling technique at test time, that draws inspiration
from the literature in Neurosymbolic (NeSy) AI [16]. In this field, many prior works focus on
exploiting temporal logical knowledge in deep learning tasks, but none have been used for
multi-step symbolic sequence generation. T-leaf [17] creates a semantic embedding space to
represent both formulas and traces and uses it in tasks such as sequential action recognition
and imitation learning, which do not involve multi-step prediction. [18] modifies Logic Tensor
Networks (LTN) [19] to integrate LTL background knowledge in image sequence classification
tasks rather than generative tasks. STLnet [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] adopts a student-teacher training scheme where
the student network proposes a sufix based on the data, that is corrected by the teacher network
to satisfy the formula. This work uses Signal Temporal Logic (STL) formulas and is applied to
continuous trajectories rather than discrete traces. Our attempts to apply it to discrete data
and LTL formulas translated into STL yielded poor results, as the resulting STL formulas were
extremely challenging for the framework to handle.
      </p>
      <p>Our work is the first to integrate temporal knowledge in the generation of multi-step symbolic
sequences. It is based on encoding LTL formulas using a matrix representation that we
previously used for very diferent tasks –such as learning RL policies for non-Markovian tasks
[20] and inducing automata from a set of labeled traces [21]– that we adapt here for use in the
generative task of sufix prediction.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Background and Notation</title>
      <p>Linear Temporal Logic and Deterministic Finite Automata Linear Temporal Logic (LTL)
[22] is a language which extends traditional propositional logic with modal operators. With the
latter we can specify rules that must hold through time. In this work we use LTL interpreted
over finite traces ( LTL ) [23], which model finite, but length-unbounded process executions,
and is thus adequate for finite-horizon problems. Given a finite set  of atomic propositions,
the set of LTL formulas  is inductively defined as follows:
 ∶∶= ⊤ ∣ ⊥ ∣  ∣ ¬ ∣  ∧  ∣   ∣  ,
(1)
where  ∈  . We use ⊤ and ⊥ to denote true and false respectively.  (Strong Next) and 
(Until) are temporal operators. Other temporal operators are:  (Weak Next) and  (Release)
respectively, defined as   ≡ ¬ ¬ and  1 2 ≡ ¬(¬ 1 ¬ 2);  (globally)  ≡ ⊥ and 
(eventually)   ≡ ⊤  . A trace  = [ (1),  (2), … ,  () ] is a sequence of propositional
assignments to the propositions in  , where  () ⊆  is the set of all and only propositions that are true
at instant  . Additionally, || =  represents the length of  . Since every trace is finite, || &lt; ∞ and
 ∈ (2  )∗. If the propositional symbols in  are all mutually exclusive, e.g. the domain produces
one and only one symbol true at each step, we have that  () ∈  . As customary in BPM, we
make this assumption, known as the Declare assumption [24]. By  ⊨  we denote that the trace
 satisfies the LTL formula  . We refer the reader to [23] for a formal description of the LTL
semantics. Any LTL formula  can be translated into a Deterministic Finite Automaton (DFA)
[23]   = (Σ, ,</p>
      <p>0, ,  ) , with Σ the automaton alphabet,  the finite set of states,  0 ∈  the
initial state,  ∶  × Σ → 
trace  ∈ 2  , we have that  ∈ (
the transition function and  ⊆</p>
      <p>the set of final states, s.t. for a
 ) if  ⊨  , where (  ) denotes the language (of words)
accepted by   . Depending on whether the Declare assumption holds or not, Σ can be  or 2 ,
respectively. In our case, the former holds.</p>
      <sec id="sec-4-1">
        <title>Deep Autoregressive Models and Sufix Prediction</title>
        <sec id="sec-4-1-1">
          <title>Deep autoregressive models are a</title>
          <p>class of deep learning models that automatically predict the next component in a sequence by
using the previous elements in the sequence as inputs. These models can be applied to both
continuous and categorical (symbolic) data, finding applications in various generative AI tasks
such as Natural Language Processing (NLP) and Large Language Models (LLM) [25, 26], image
synthesis [27, 28], and time-series prediction [29]. They encompass deep architectures such as
RNNs and Transformers and, in general, any neural model capable of estimating the probability
( () ∣  (1),  (2), … ,  (−1) ) = ( () ∣  &lt; )
where  = [ (1),  (2), … ,  () ] is a sequence of data. Its probability can be calculated as
(2)
(3)

=1
() =</p>
          <p>
            ∏ ( () ∣  (&lt;) )
In sufix prediction in BPM, given a subsequence (or prefix) of
that the process has produced up to the current time step  , with  () in a finite set of activities
 , we aim to complete the trace by generating the sequence of future events, called also sufix
  = [ ̃(+1) ,  ̃(+2) , … ]. Note that we denote ground-truth events observed in the domain as 
and predicted events as  ,̃ that may in general deviate from actual future events. This task can
also be accomplished using autoregressive models, by choosing at each step the most probable
next event according to the neural network, concatenating it with the prefix, and continuing
to predict the next event in this manner until a special end-of-trace symbol is predicted or the
trace has reached a maximum number of steps  . We denote the trace obtained in this way as
past events   = [ (1),  (2), … ,  () ]
 =̃   +   , with:
 ̃() = argmax ( ̃ () =  ∣  (1), … ,  () ,  ̃(+1) , … ,  ̃(−1) )
 &lt;  ≤ 
(4)
∈
This method of generating the sufix is a greedy search strategy and generally may not produce
the most probable sufix , e.g., the trace that maximizes the probability in Equation 3. Other
non-optimal sampling strategies commonly used for this task include Beam Search, Random
Sampling, and Temperature Sampling [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ].
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Method</title>
      <p>
        Our method assumes an autoregressive neural model to estimate the probability of the next
event given a trace of previous events (Equation 2). For explanation purposes, we consider a
where  () ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]| | is the event of the trace at time  encoded as one-hot vector. ℎ() ∈ ℝ ℎ is the
hidden state of the RNN, a continuous vector representation of past events in the trace having
size  ℎ.  ∶ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] | | × ℝ ℎ → ℝ ℎ is a parametric recursive function calculating the current
hidden state ℎ() given the previous one ℎ(−1) and the current event  () as input. We denote its
learnable parameters as   . Finally  ∶ ℝ  ℎ → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]| | is the learnable function, with parameters
  , mapping the hidden state to the probability vector of the next event  () . Function  and 
can be implemented by an RNN and a multi-layer perceptron (MLP), respectively, as shown in
Figure 1. The model parameters are usually trained with a supervised loss   evaluated on a
dataset of  ground truth traces  = { 1,  2, … ,   } recorded observing the process. The loss
formulation for a trace   ∈  of length  is as follows:
  (  ,   ,   ) =
1
      </p>
      <p>∑ crossentropy((ℎ(  ≤ )),   +1 )
 =1
This loss teaches the network to produce the next symbol in the trace to mimic as closely as
possible the data contained in the dataset. In our method, this loss is combined with a logic loss
  , which imposes the satisfaction of prior knowledge in the following way
simple RNN-based next-activity predictor, where:
ℎ() =  ( () , ℎ(−1) ;   )
 () = (ℎ () ;   )
 =</p>
      <p>+ (1 − ) 
where  is a constant chosen between 0 and 1 balancing the influence of each loss on the training
process.
4.1. Logic Loss Calculation
We assume known certain process properties that need to be enforced at training time. Such
properties, which constitute the background (or prior ) knowledge we have about the process,
are expressed as an LTL formula  and constitute the basis for defining the logic loss   .</p>
      <p>Essentially,   estimates the probability that the sufix generated by the RNN satisfies  and
is enforced to be as close as possible to 1. Observe that, while the supervised loss in Equation
6 takes into account only the next-step prediction, the satisfaction of  must be checked over
the entire predicted trace, which is produced by querying the network multiple times. This
is because compliance of the trace to  can be evaluated only one the complete trace. Indeed,
the fact that the predicted partial trace  ̃ satisfies (or violates)  before termination does not
preclude the entire predicted trace from violating (or satisfying)  .</p>
      <p>Additionally, notice that the next-activity predictor is trained with perfectly one-hot (or
symbolic) data and produces a (continuous) probability vector as output, which may be very
diferent from a one-hot vector. Therefore, we cannot directly feed the network with the
probabilities predicted in previous steps, as this may generate unexpected outputs. Instead,
we need to sample from these probabilities, in order to produce one-hot vectors again. At
(5)
(6)
(7)
the same time, this sampling process needs to guarantee diferentiability, so as to ensure
backpropagation’s applicability.</p>
      <p>The calculation of the logic loss is exemplified in the following steps:
1. given a prefix   , generate a sufix</p>
      <p>that is at the same time: (i) very probable according
to the network; (ii) diferentiable; and ( iii) at least nearly symbolic, i.e., as close as possible
to a one-hot vector;
tiable procedure;
2. calculate the probability that the predicted trace  =̃   +   satisfies  through a
diferen</p>
      <sec id="sec-5-1">
        <title>3. enforce maximization of the obtained probability.</title>
        <p>Given the diferentiability of both the knowledge evaluator and the sampling process, the
loss can be backpropagated through the sufix and influence the parameters
  and   of the
next activity predictor.</p>
        <sec id="sec-5-1-1">
          <title>Diferentiable Sufix Generation</title>
          <p>
            To sample the next activity  ̃(+1) from the probability
distribution returned by the neural network  () , while ensuring diferentiability of the output
sample, we use the Gumbel-Softmax reparameterization trick [
            <xref ref-type="bibr" rid="ref8">8, 30</xref>
            ]:
 ̃(+1) = softmax (
log( () ) + 

)
(8)
where  is a random sample drawn from the Gumbel distribution, and  is a temperature
parameter used to interpolate between discrete one-hot-encoded categorical distributions and
continuous categorical probabilities. When  → 0 , the output samples tend to be one-hot, while
they tend to the continuous probabilities in  () when  = 1 .
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Diferentiable LTL Evaluation</title>
      <p>In order to evaluate the LTL specification in a diferentiable
way, we represent  through a matrix representation that is generally used to encode Probabilistic
Finite Automata (PFA) [21, 31, 20]. We first translate  into the equivalent DFA   [32] and
simplify the automaton with the Declare assumption. We then represent the automaton   =
( , ,</p>
      <p>
        0, ,  )
a finality vector 
as an input vector  0 ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]1×|| , a transition matrix   ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]||×||×||
 ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]||×1 . The input vector  0 takes the value 1 at index  if  =  0,
, and
otherwise. We denote   [] ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] ||×||
and 0 otherwise. The matrix   takes the value 1 at index (, , 
as the 2D transition matrix for symbol  . The output
′) if (, ) = 
′, and 0
vector   has a 1 at position  if  ∈  , and 0 otherwise. Given a string  = [ (1) (2) …  () ],
we denote   = [ 
(0)
,  (1), … ,
      </p>
      <p>
        () ] as the sequence of probabilities to visit a certain state, and
 

 = [ 
(0)
,  (1), … ,  
() ] as the probabilities that the sequence is accepted. Here,  
() ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]1×||
is a row vector containing at position  the probability of being in state  at time  , while
() ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] is a scalar representing the probability that the sequence is accepted at time  .
 (0) =  0
 
 
() =  
() =  
() ×  
(−1) ×   [ () ] ∀ &gt; 0
Enforcing Compliance with the Formula In summary, given a prefix   = [ (1),  (2), … ,  () ],
we calculate the sufix

 = [ ̃(+1) , … ,  ̃( ) ] up to a time-step  &gt;  by iterating through the
following steps at each time step: (i) producing the RNN probability for the next symbol (Eq. 5),
(ii) diferentiably sampling the next activity from this probability (Eq. 8), and (iii) concatenating
the next activity to the current trace. The trace generated in this way,  =̃  + 
, is processed by
the diferentiable automaton (Eq. 10), which produces an acceptance probability at the last step,
 
( ) . We enforce this probability to stay close to 1 by minimizing the loss
  = − log( 
( ) ).
      </p>
      <p>Here, × denotes the inner product. Therefore, the probability of a string being accepted is the
probability of being in a final state in the last computed state  
while the predicted symbol sampled by the network  ̃() (Equation 8) is a probability vector,
which tends to approximate a one-hot vector but is not guaranteed to be composed only of
zeros and ones and may contain some noise. For this reason, we adapt Equation 9 to our case
by adding the expectation computation over the next automaton state.</p>
      <p>() . Note that  () ∈  is an integer,
||
∑( 
=1
 
 
() =
() =  
() ×  
(−1) ×   []) ̃ () [] ∀ &gt; 0
(9)
(10)
(11)
The entire process is illustrated in Figure 1. The figure shows how the loss is backpropagated
through both the sufix prediction and knowledge evaluation processes, afecting the parameters
of the learnable functions  and  of the next activity predictor.</p>
    </sec>
    <sec id="sec-7">
      <title>5. Experiments</title>
      <p>In this section, we describe the setup for our method’s empirical evaluation and discuss the
results achieved. The experiments are reproducible using our implementation at https://github.
com/whitemech/suffix-prediction-pmai2024.
5.1. Experimental Setup
We aim to evaluate our method on a diverse set of formulas with varying levels of dificulty. To
achieve this, we construct random formulas by combining Declare constraints [33] as follows:
(i) we set the maximum number of conjuncts  and the maximum number of disjuncts  ;
(ii) we generate a random formula  as the disjunction of up to  disjuncts, each of them is
the conjunction of up to  randomly selected Declare constraints, possibly negated, with each
constraint having random symbols from  ; (iii) we evaluate  on a set of random strings and
select the formula for the experiment only if the satisfaction rate is between 10% and 90%, in
order to eliminate formulas that are too obvious or too dificult.</p>
      <p>For each random formula, we construct a dataset  composed of  traces of length  that are
compliant with the formula. We then split the dataset into 90% for training and 10% for testing.
We generate 25 dataset-formula pairs by setting  =  = 5 ,  = {, , } ,  = 1000 , and  = 20 .
These pairs are used to evaluate our framework Additionally, each experiment’s training and
testing are performed on diferent prefix lengths || ∈ 5, 10, 15 to evaluate the model’s ability
to predict shorter and longer sufixes. During testing, the sufix prediction stops when the
model outputs the end-of-trace symbol or when the predicted sufix reaches a maximum of
 = 2 ×  = 40 timesteps. Each experiment is repeated 5 times with diferent seeds to ensure
robustness and reliability of the results.
5.2. Empirical Results
Our empirical evaluation compares the performance of four models:
• RNN (random): An RNN trained on the training data, queried with random sampling.</p>
      <p>
        The next activity is selected by randomly sampling according to the probability
distribution predicted by the RNN output layer.
• RNN+LTL (random): An RNN trained with the combined loss in Eq 7, where   is
calculated using the random formula associated with the dataset, and queried with
random sampling.
• RNN (greedy): An RNN trained only with data, with sufixes generated using greedy
sampling (Eq. 4), always taking the most probable next event predicted by the output
layer. This model is equivalent to the one proposed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
• RNN+LTL (greedy): Similar to the previous model but with the addition of LTL
background knowledge during training.
      </p>
      <p>We instantiate all models as LSTMs with two layers and 100 neurons in each layer (as in
[10]), a batch size of 64, and an Adam optimizer with a learning rate of 5 × 10−4. The models
rccyau30
%
cA20
10
0
that incorporate LTL background knowledge use  set to 0.6 (Eq. 7),  equal to 0.5 (Eq. 8), and 
equal to 2 ×  = 40 timesteps (Eq. 11).</p>
      <p>The plots in Figure 2 show the average and standard deviation of the results obtained from
running experiments on all 25 datasets. The evaluated metrics are the model’s test accuracy
on the next activity prediction (Figure 2a), the satisfaction rate of test sufixes (Figure 2b), and
the Damerau-Levenshtein (DL) distance (Figure 2c) of the predicted traces against the ground
truth traces. The empirical results show that the integration of background knowledge does not
afect the accuracy of next activity prediction, indicating that the predictor retains its capability
to mimic training data when combined with logical knowledge. However, RNN+LTL models
consistently achieve the best performance in sufix prediction. In terms of the satisfiability
percentage metric, utilizing LTL background knowledge consistently results in an improved
satisfaction rate over using only data, regardless of the sampling technique employed. Varying
prefix lengths display no significant impact on the satisfaction rate, implying that the models
are equally capable of sampling traces that satisfy the formulas for any input prefix length. The
DL distance comparison further supports the superiority of models that utilize LTL background
knowledge, showing a lower distance between the sampled traces and the ground truth traces
for both sampling strategies used. The prefix length, however, does impact the DL distance,
with shorter prefix inputs leading to a higher distance between the predicted and ground truth
traces. In general, random sampling shows better performance compared to greedy sampling in
terms of both metrics. Additionally, the DL distance results of the random sampling methods
exhibit a more stable standard deviation than those of the greedy sampling methods. Overall,
our empirical evaluation supports the hypothesis that integrating background knowledge into
the models consistently results in improved performance.</p>
    </sec>
    <sec id="sec-8">
      <title>6. Conclusions and Future Work</title>
      <p>In conclusion, we present a novel NeSy framework for incorporating background knowledge in
LTL into the training process of a deep autoregressive model for multistep symbolic sequence
generation (i.e., sufix prediction). Our method involves: (i) translating the LTL formula into
a DFA, (ii) representing the DFA probabilistically through a neural framework, and (iii) softly
enforcing DFA acceptance on diferentiable sufixes obtained via Gumbel-Softmax sampling.
This process allows us to define a logical loss that smoothly guides the predicted sufixes to
satisfy the given knowledge, which can be integrated with other loss functions of the predictor.
Experimental results on synthetic datasets demonstrate that applying our loss function to RNN
models consistently produces sufixes with both lower DL distances and higher rates of LTL
knowledge satisfaction, regardless of the sampling technique used for querying the network.
In future work, we plan to apply our framework to realistic BPM datasets and more advanced
neural predictors, such as GANs and Transformers.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by PNRR MUR project PE0000013-FAIR , the Sapienza
Project MARLeN, the EU H2020 project AIPlan4EU (No. 101016442), the ERC-ADG WhiteMech
(No. 834228), and the PRIN project RIPER (No. 20203FFYLK).
International Conference on Learning Representations, ICLR 2017, Toulon, France, April
24-26, 2017, Conference Track Proceedings, 2017.
[9] F. J. Damerau, A technique for computer detection and correction of spelling errors,</p>
      <p>Commun. ACM 7 (1964) 171–176. doi:10.1145/363958.363994.
[10] N. Tax, I. Verenich, M. L. Rosa, M. Dumas, Predictive business process monitoring with
LSTM neural networks, in: Advanced Information Systems Engineering - 29th
International Conference, CAiSE 2017, Essen, Germany, June 12-16, 2017, Proceedings, 2017, pp.
477–492. doi:10.1007/978-3-319-59536-8\_30.
[11] J. Evermann, J. Rehse, P. Fettke, Predicting process behaviour using deep learning, Decis.</p>
      <p>Support Syst. 100 (2017) 129–140. URL: https://doi.org/10.1016/j.dss.2017.04.003. doi:10.
1016/J.DSS.2017.04.003.
[12] I. Ketykó, F. Mannhardt, M. Hassani, B. F. van Dongen, What averages do not tell
predicting real life processes with sequential deep learning, CoRR abs/2110.10225 (2021).</p>
      <p>URL: https://arxiv.org/abs/2110.10225. arXiv:2110.10225.
[13] N. D. Mauro, A. Appice, T. M. A. Basile, Activity prediction of business process
instances with inception CNN models, in: AI*IA 2019 - Advances in Artificial
Intelligence - XVIIIth International Conference of the Italian Association for Artificial
Intelligence, Rende, Italy, November 19-22, 2019, Proceedings, 2019, pp. 348–361. URL:
https://doi.org/10.1007/978-3-030-35166-3_25. doi:10.1007/978-3-030-35166-3\_25.
[14] F. Taymouri, M. L. Rosa, S. M. Erfani, A deep adversarial model for sufix and remaining time
prediction of event sequences, in: Proceedings of the 2021 SIAM International Conference
on Data Mining, SDM 2021, Virtual Event, April 29 - May 1, 2021, 2021, pp. 522–530. URL:
https://doi.org/10.1137/1.9781611976700.59. doi:10.1137/1.9781611976700.59.
[15] A. Chiorrini, C. Diamantini, A. Mircoli, D. Potena, A preliminary study on the application of
reinforcement learning for predictive process monitoring, in: Process Mining Workshops
- ICPM 2020 International Workshops, Padua, Italy, October 5-8, 2020, Revised Selected
Papers, 2020, pp. 124–135. doi:10.1007/978-3-030-72693-5\_10.
[16] T. R. Besold, A. S. d’Avila Garcez, S. Bader, H. Bowman, P. M. Domingos, P. Hitzler,
K. Kühnberger, L. C. Lamb, P. M. V. Lima, L. de Penning, G. Pinkas, H. Poon, G. Zaverucha,
Neural-symbolic learning and reasoning: A survey and interpretation, in: Neuro-Symbolic
Artificial Intelligence: The State of the Art, 2021, pp. 1–51. doi: 10.3233/FAIA210348.
[17] Y. Xie, F. Zhou, H. Soh, Embedding symbolic temporal knowledge into deep
sequential models, CoRR abs/2101.11981 (2021). URL: https://arxiv.org/abs/2101.11981.
arXiv:2101.11981.
[18] E. Umili, R. Capobianco, G. D. Giacomo, Grounding ltlf specifications in image sequences,
in: Proceedings of the 20th International Conference on Principles of Knowledge
Representation and Reasoning, KR 2023, Rhodes, Greece, September 2-8, 2023, 2023, pp. 668–678.</p>
      <p>URL: https://doi.org/10.24963/kr.2023/65. doi:10.24963/KR.2023/65.
[19] S. Badreddine, A. d’Avila Garcez, L. Serafini, M. Spranger, Logic tensor networks, Artificial
Intelligence 303 (2022) 103649. URL: https://www.sciencedirect.com/science/article/pii/
S0004370221002009. doi:https://doi.org/10.1016/j.artint.2021.103649.
[20] E. Umili, F. Argenziano, A. Barbin, R. Capobianco, Visual reward machines, in: Proceedings
of the 17th International Workshop on Neural-Symbolic Learning and Reasoning, La
Certosa di Pontignano, Siena, Italy, July 3-5, 2023, 2023, pp. 255–267. URL: https://ceur-ws.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rama-Maneiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lama</surname>
          </string-name>
          ,
          <article-title>Deep learning for predictive business process monitoring: Review and benchmark</article-title>
          ,
          <source>IEEE Transactions on Services Computing</source>
          <volume>16</volume>
          (
          <year>2020</year>
          )
          <fpage>739</fpage>
          -
          <lpage>756</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tax</surname>
          </string-name>
          , I. Verenich,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <article-title>Predictive business process monitoring with LSTM neural networks</article-title>
          ,
          <source>in: Advanced Information Systems</source>
          Engineering - 29th International Conference, CAiSE
          <year>2017</year>
          , Essen, Germany, June 12-16,
          <year>2017</year>
          , Proceedings,
          <year>2017</year>
          , pp.
          <fpage>477</fpage>
          -
          <lpage>492</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -59536-8_
          <fpage>30</fpage>
          . doi:
          <volume>10</volume>
          .1007/ 978- 3-
          <fpage>319</fpage>
          - 59536- 8\_
          <fpage>30</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Rivera</surname>
          </string-name>
          <string-name>
            <surname>Lazo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ñanculef</surname>
          </string-name>
          <article-title>, Multi-attribute transformers for sequence prediction in business process management</article-title>
          ,
          <source>in: Discovery Science: 25th International Conference, DS</source>
          <year>2022</year>
          , Montpellier, France,
          <source>October 10-12</source>
          ,
          <year>2022</year>
          , Proceedings, Springer-Verlag, Berlin, Heidelberg,
          <year>2022</year>
          , p.
          <fpage>184</fpage>
          -
          <lpage>194</lpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>031</fpage>
          - 18840- 4_
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rama-Maneiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Patrizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lama</surname>
          </string-name>
          ,
          <article-title>Towards learning the optimal sampling strategy for sufix prediction in predictive monitoring</article-title>
          ,
          <source>in: Proc. of CAISE</source>
          <year>2024</year>
          ,
          <year>2024</year>
          , p. To Appear.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Stoian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cuzzolin</surname>
          </string-name>
          , T. Lukasiewicz, ROAD-R:
          <article-title>the autonomous driving dataset with logical requirements</article-title>
          ,
          <source>Mach. Learn</source>
          .
          <volume>112</volume>
          (
          <year>2023</year>
          )
          <fpage>3261</fpage>
          -
          <lpage>3291</lpage>
          . doi:
          <volume>10</volume>
          . 1007/S10994- 023- 06322- Z.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stankovic</surname>
          </string-name>
          , Stlnet:
          <article-title>Signal temporal logic enforced multivariate recurrent neural networks</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , H. Lin (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>14604</fpage>
          -
          <lpage>14614</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Francescomarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Maggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Petrucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yeshchenko</surname>
          </string-name>
          ,
          <article-title>An eye into the future: Leveraging a-priori knowledge in predictive business process monitoring</article-title>
          ,
          <source>in: Business Process Management - 15th International Conference, BPM</source>
          <year>2017</year>
          , Barcelona, Spain,
          <source>September 10-15</source>
          ,
          <year>2017</year>
          , Proceedings,
          <year>2017</year>
          , pp.
          <fpage>252</fpage>
          -
          <lpage>268</lpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>319</fpage>
          - 65000- 5\_
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Poole</surname>
          </string-name>
          ,
          <article-title>Categorical reparameterization with gumbel-softmax</article-title>
          ,
          <source>in: 5th</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>