-

1613-0073

10.13140/RG.2.2.25449.98401

with Logical Temporal Knowledge

Elena Umili

umili@diag.uniroma1.it 0

Gabriel Paludo Licks

licks@diag.uniroma1.it 0

Fabio Patrizi

patrizi@diag.uniroma1.it 0 0 Compostela , Spain

2023

3432 46 57

Despite significant advancements in deep learning for sequence forecasting, neural models are typically trained only on data, and the incorporation of high-level prior logical knowledge in their training is still an hard challenge. This limitation hinders the exploitation of background knowledge, such as common sense or domain-specific information, in predictive tasks performed by neural networks. In this work, we propose a principled approach to integrate prior knowledge in Linear Temporal Logic over finite traces (LTL ) into deep autoregressive models for multistep symbolic sequence generation (i.e., sufix prediction) at training time. Our method involves representing logical knowledge through continuous probabilistic relaxations and employing a diferentiable schedule for sampling the next symbol from the network. We test our approach on synthetic datasets based on background knowledge in Declare, inspired by Business Process Management (BPM) applications. The results demonstrate that our method consistently improves the performance of the neural predictor, achieving lower Damerau-Levenshtein (DL) distances from target sequences and higher satisfaction rates of the logical knowledge compared to models trained solely on data.

Sufix prediction Neurosymbolic AI Deep learning with logical knowledge Linear Temporal Logic

CEUR ceur-ws.org

1. Introduction

This paper addresses the problem of sufix prediction by exploiting both data and prior logic temporal knowledge. The task of sufix prediction is particularly important in Business Process Management (BPM) for forecasting the future of a process trace, enabling resource allocation, and the anticipation of future steps in a process based on historical data.

Recently, there has been significant interest in employing deep learning techniques for sufix prediction in BPM [ 1 ], involving the use of Recurrent Neural Networks (RNNs) [ 2 ], Transformers [ 3 ], and Deep Reinforcement Learning algorithms [ 4 ]. Despite the significant advancements in machine learning and deep learning, some studies show how deep models can surprisingly fail to satisfy even the most basic logical constraints [ 5 ] [ 6 ] derived from commonsense or additional domain-knowledge. This occurs because these models are trained solely on data, with the integration of logical knowledge into the training process remaining an open challenge. Due to this, while in many application fields, such as BPM, both data and logical knowledge about the process are often available, the latter remain typically unused. This work explores a novel approach, aimed at bridging this gap, by taking advantage of prior knowledge expressed in Linear Temporal Logic over finite traces ( LTL ), when training a deep generative model for sufix prediction.

Very few methods incorporate logical knowledge into deep sequence generation [ 6, 7 ]. Specifically, STLnet [ 6 ] employs specifications in Signal Temporal Logic on continuous temporal sequences for applications other than BPM, however, the attempt to apply it to our domains produced poor results. On the other hand, Di Francescomarino et al. [ 7 ] apply their method to sufix prediction in BPM but use background knowledge only at test time, employing a variant of beam search that also takes into account the temporal formula modeling some process properties. Thus, the proposed method does not fully integrate background knowledge into the training process but, rather, a smart way to query the pre-trained network, considering the LTL specification. Interestingly, though, this approach can be seamlessly combined with ours, to further maximize constraint satisfaction.

Our contribution is a method to integrate the knowledge of certain process properties, expressed in LTL over finite traces ( LTL ), into the training process of a sequence neural predictor. This enables us to leverage both these sources of information –data and background LTL knowledge– at training time. Specifically, we achieve this by defining a diferentiable counterpart of the LTL knowledge and leveraging a diferentiable sampling process known as Gumbel-Softmax trick [ 8 ]. By utilizing these two elements, we define a logical loss function that can be used in conjunction with any other loss employed by the predictor. This ensures that the network learns to generate traces similar to those seen in the training dataset and to satisfy the given temporal specifications at the same time. Through preliminary experiments on synthetic data, we show that incorporating our loss function at training-time results, for RNN models, in predicted sufixes that feature both a lower Damerau-Levenshtein (DL) distance [ 9] from the target sufixes and a higher rate of satisfaction of the LTL constraints.

We observe that, although grounded here on a typical BPM task, our approach can be applied to any scenario involving multi-step symbolic sequence generation through deep learning models, thus making the main results of this work potentially of general interest to the wider audience focusing on Machine Learning and, in particular, neurosymbolic AI.

2. Related Work

Recently, there has been significant interest in employing deep Neural Networks (NN) in predictive monitoring of business processes, for tasks such as next activity prediction, sufix prediction, and attribute prediction [ 1 ]. Despite significant advances in the field, nearly all works rely on training these models solely on data without utilizing any formal prior knowledge about the process. They mainly focus on two aspects: (i) enhancing the neural model, ranging from RNNs [ 10, 11, 7, 12 ], Convolutional NN (CNN) [13], Generative Adversarial Networks (GANs) [14, 12], Autoencoders [12], and Transformers [12]; and (ii) wisely choosing the sampling technique to query the network at test time to generate the sufix, mostly using greedy search [ 10], random search [11], or beam search [ 7 ], and more recently, policies trained with Reinforcement Learning (RL) [ 15, 4 ]. Among all these works, only one exploits prior process knowledge [ 7 ], expressed as a set of LTL formulas, but it uses this knowledge only at test time, modifying the beam search sampling algorithm to select potentially compliant traces with the background knowledge.

In this work, we take a radically diferent approach by introducing a principled way to integrate background knowledge in LTL with a deep NN model for sufix prediction at training time. This is based on defining a logical loss that can be combined with the loss of any autoregressive neural model and any sampling technique at test time, that draws inspiration from the literature in Neurosymbolic (NeSy) AI [16]. In this field, many prior works focus on exploiting temporal logical knowledge in deep learning tasks, but none have been used for multi-step symbolic sequence generation. T-leaf [17] creates a semantic embedding space to represent both formulas and traces and uses it in tasks such as sequential action recognition and imitation learning, which do not involve multi-step prediction. [18] modifies Logic Tensor Networks (LTN) [19] to integrate LTL background knowledge in image sequence classification tasks rather than generative tasks. STLnet [ 6 ] adopts a student-teacher training scheme where the student network proposes a sufix based on the data, that is corrected by the teacher network to satisfy the formula. This work uses Signal Temporal Logic (STL) formulas and is applied to continuous trajectories rather than discrete traces. Our attempts to apply it to discrete data and LTL formulas translated into STL yielded poor results, as the resulting STL formulas were extremely challenging for the framework to handle.

Our work is the first to integrate temporal knowledge in the generation of multi-step symbolic sequences. It is based on encoding LTL formulas using a matrix representation that we previously used for very diferent tasks –such as learning RL policies for non-Markovian tasks [20] and inducing automata from a set of labeled traces [21]– that we adapt here for use in the generative task of sufix prediction.

3. Background and Notation

Linear Temporal Logic and Deterministic Finite Automata Linear Temporal Logic (LTL) [22] is a language which extends traditional propositional logic with modal operators. With the latter we can specify rules that must hold through time. In this work we use LTL interpreted over finite traces ( LTL ) [23], which model finite, but length-unbounded process executions, and is thus adequate for finite-horizon problems. Given a finite set of atomic propositions, the set of LTL formulas is inductively defined as follows: ∶∶= ⊤ ∣ ⊥ ∣ ∣ ¬ ∣ ∧ ∣ ∣ , (1) where ∈ . We use ⊤ and ⊥ to denote true and false respectively. (Strong Next) and (Until) are temporal operators. Other temporal operators are: (Weak Next) and (Release) respectively, defined as ≡ ¬ ¬ and 1 2 ≡ ¬(¬ 1 ¬ 2); (globally) ≡ ⊥ and (eventually) ≡ ⊤ . A trace = [ (1), (2), … , () ] is a sequence of propositional assignments to the propositions in , where () ⊆ is the set of all and only propositions that are true at instant . Additionally, || = represents the length of . Since every trace is finite, || < ∞ and ∈ (2 )∗. If the propositional symbols in are all mutually exclusive, e.g. the domain produces one and only one symbol true at each step, we have that () ∈ . As customary in BPM, we make this assumption, known as the Declare assumption [24]. By ⊨ we denote that the trace satisfies the LTL formula . We refer the reader to [23] for a formal description of the LTL semantics. Any LTL formula can be translated into a Deterministic Finite Automaton (DFA) [23] = (Σ, ,

0, , ) , with Σ the automaton alphabet, the finite set of states, 0 ∈ the initial state, ∶ × Σ → trace ∈ 2 , we have that ∈ ( the transition function and ⊆

the set of final states, s.t. for a ) if ⊨ , where ( ) denotes the language (of words) accepted by . Depending on whether the Declare assumption holds or not, Σ can be or 2 , respectively. In our case, the former holds.

Deep Autoregressive Models and Sufix Prediction Deep autoregressive models are a

class of deep learning models that automatically predict the next component in a sequence by using the previous elements in the sequence as inputs. These models can be applied to both continuous and categorical (symbolic) data, finding applications in various generative AI tasks such as Natural Language Processing (NLP) and Large Language Models (LLM) [25, 26], image synthesis [27, 28], and time-series prediction [29]. They encompass deep architectures such as RNNs and Transformers and, in general, any neural model capable of estimating the probability ( () ∣ (1), (2), … , (−1) ) = ( () ∣ < ) where = [ (1), (2), … , () ] is a sequence of data. Its probability can be calculated as (2) (3) =1 () =

∏ ( () ∣ (<) ) In sufix prediction in BPM, given a subsequence (or prefix) of that the process has produced up to the current time step , with () in a finite set of activities , we aim to complete the trace by generating the sequence of future events, called also sufix = [ ̃(+1) , ̃(+2) , … ]. Note that we denote ground-truth events observed in the domain as and predicted events as ,̃ that may in general deviate from actual future events. This task can also be accomplished using autoregressive models, by choosing at each step the most probable next event according to the neural network, concatenating it with the prefix, and continuing to predict the next event in this manner until a special end-of-trace symbol is predicted or the trace has reached a maximum number of steps . We denote the trace obtained in this way as past events = [ (1), (2), … , () ] =̃ + , with: ̃() = argmax ( ̃ () = ∣ (1), … , () , ̃(+1) , … , ̃(−1) ) < ≤ (4) ∈ This method of generating the sufix is a greedy search strategy and generally may not produce the most probable sufix , e.g., the trace that maximizes the probability in Equation 3. Other non-optimal sampling strategies commonly used for this task include Beam Search, Random Sampling, and Temperature Sampling [ 4 ].

4. Method

Our method assumes an autoregressive neural model to estimate the probability of the next event given a trace of previous events (Equation 2). For explanation purposes, we consider a where () ∈ [ 0, 1 ]| | is the event of the trace at time encoded as one-hot vector. ℎ() ∈ ℝ ℎ is the hidden state of the RNN, a continuous vector representation of past events in the trace having size ℎ. ∶ [ 0, 1 ] | | × ℝ ℎ → ℝ ℎ is a parametric recursive function calculating the current hidden state ℎ() given the previous one ℎ(−1) and the current event () as input. We denote its learnable parameters as . Finally ∶ ℝ ℎ → [ 0, 1 ]| | is the learnable function, with parameters , mapping the hidden state to the probability vector of the next event () . Function and can be implemented by an RNN and a multi-layer perceptron (MLP), respectively, as shown in Figure 1. The model parameters are usually trained with a supervised loss evaluated on a dataset of ground truth traces = { 1, 2, … , } recorded observing the process. The loss formulation for a trace ∈ of length is as follows: ( , , ) = 1

∑ crossentropy((ℎ( ≤ )), +1 ) =1 This loss teaches the network to produce the next symbol in the trace to mimic as closely as possible the data contained in the dataset. In our method, this loss is combined with a logic loss , which imposes the satisfaction of prior knowledge in the following way simple RNN-based next-activity predictor, where: ℎ() = ( () , ℎ(−1) ; ) () = (ℎ () ; ) =

+ (1 − ) where is a constant chosen between 0 and 1 balancing the influence of each loss on the training process. 4.1. Logic Loss Calculation We assume known certain process properties that need to be enforced at training time. Such properties, which constitute the background (or prior ) knowledge we have about the process, are expressed as an LTL formula and constitute the basis for defining the logic loss .

Essentially, estimates the probability that the sufix generated by the RNN satisfies and is enforced to be as close as possible to 1. Observe that, while the supervised loss in Equation 6 takes into account only the next-step prediction, the satisfaction of must be checked over the entire predicted trace, which is produced by querying the network multiple times. This is because compliance of the trace to can be evaluated only one the complete trace. Indeed, the fact that the predicted partial trace ̃ satisfies (or violates) before termination does not preclude the entire predicted trace from violating (or satisfying) .

Additionally, notice that the next-activity predictor is trained with perfectly one-hot (or symbolic) data and produces a (continuous) probability vector as output, which may be very diferent from a one-hot vector. Therefore, we cannot directly feed the network with the probabilities predicted in previous steps, as this may generate unexpected outputs. Instead, we need to sample from these probabilities, in order to produce one-hot vectors again. At (5) (6) (7) the same time, this sampling process needs to guarantee diferentiability, so as to ensure backpropagation’s applicability.

The calculation of the logic loss is exemplified in the following steps: 1. given a prefix , generate a sufix

that is at the same time: (i) very probable according to the network; (ii) diferentiable; and ( iii) at least nearly symbolic, i.e., as close as possible to a one-hot vector; tiable procedure; 2. calculate the probability that the predicted trace =̃ + satisfies through a diferen

3. enforce maximization of the obtained probability.

Given the diferentiability of both the knowledge evaluator and the sampling process, the loss can be backpropagated through the sufix and influence the parameters and of the next activity predictor.

Diferentiable Sufix Generation

To sample the next activity ̃(+1) from the probability distribution returned by the neural network () , while ensuring diferentiability of the output sample, we use the Gumbel-Softmax reparameterization trick [ 8, 30 ]: ̃(+1) = softmax ( log( () ) + ) (8) where is a random sample drawn from the Gumbel distribution, and is a temperature parameter used to interpolate between discrete one-hot-encoded categorical distributions and continuous categorical probabilities. When → 0 , the output samples tend to be one-hot, while they tend to the continuous probabilities in () when = 1 .

Diferentiable LTL Evaluation

In order to evaluate the LTL specification in a diferentiable way, we represent through a matrix representation that is generally used to encode Probabilistic Finite Automata (PFA) [21, 31, 20]. We first translate into the equivalent DFA [32] and simplify the automaton with the Declare assumption. We then represent the automaton = ( , ,

0, , ) a finality vector as an input vector 0 ∈ [ 0, 1 ]1×|| , a transition matrix ∈ [ 0, 1 ]||×||×|| ∈ [ 0, 1 ]||×1 . The input vector 0 takes the value 1 at index if = 0, , and otherwise. We denote [] ∈ [ 0, 1 ] ||×|| and 0 otherwise. The matrix takes the value 1 at index (, , as the 2D transition matrix for symbol . The output ′) if (, ) = ′, and 0 vector has a 1 at position if ∈ , and 0 otherwise. Given a string = [ (1) (2) … () ], we denote = [ (0) , (1), … ,

() ] as the sequence of probabilities to visit a certain state, and = [ (0) , (1), … , () ] as the probabilities that the sequence is accepted. Here, () ∈ [ 0, 1 ]1×|| is a row vector containing at position the probability of being in state at time , while () ∈ [ 0, 1 ] is a scalar representing the probability that the sequence is accepted at time . (0) = 0 () = () = () × (−1) × [ () ] ∀ > 0 Enforcing Compliance with the Formula In summary, given a prefix = [ (1), (2), … , () ], we calculate the sufix = [ ̃(+1) , … , ̃( ) ] up to a time-step > by iterating through the following steps at each time step: (i) producing the RNN probability for the next symbol (Eq. 5), (ii) diferentiably sampling the next activity from this probability (Eq. 8), and (iii) concatenating the next activity to the current trace. The trace generated in this way, =̃ + , is processed by the diferentiable automaton (Eq. 10), which produces an acceptance probability at the last step, ( ) . We enforce this probability to stay close to 1 by minimizing the loss = − log( ( ) ).

Here, × denotes the inner product. Therefore, the probability of a string being accepted is the probability of being in a final state in the last computed state while the predicted symbol sampled by the network ̃() (Equation 8) is a probability vector, which tends to approximate a one-hot vector but is not guaranteed to be composed only of zeros and ones and may contain some noise. For this reason, we adapt Equation 9 to our case by adding the expectation computation over the next automaton state.

() . Note that () ∈ is an integer, || ∑( =1 () = () = () × (−1) × []) ̃ () [] ∀ > 0 (9) (10) (11) The entire process is illustrated in Figure 1. The figure shows how the loss is backpropagated through both the sufix prediction and knowledge evaluation processes, afecting the parameters of the learnable functions and of the next activity predictor.

5. Experiments

In this section, we describe the setup for our method’s empirical evaluation and discuss the results achieved. The experiments are reproducible using our implementation at https://github. com/whitemech/suffix-prediction-pmai2024. 5.1. Experimental Setup We aim to evaluate our method on a diverse set of formulas with varying levels of dificulty. To achieve this, we construct random formulas by combining Declare constraints [33] as follows: (i) we set the maximum number of conjuncts and the maximum number of disjuncts ; (ii) we generate a random formula as the disjunction of up to disjuncts, each of them is the conjunction of up to randomly selected Declare constraints, possibly negated, with each constraint having random symbols from ; (iii) we evaluate on a set of random strings and select the formula for the experiment only if the satisfaction rate is between 10% and 90%, in order to eliminate formulas that are too obvious or too dificult.

For each random formula, we construct a dataset composed of traces of length that are compliant with the formula. We then split the dataset into 90% for training and 10% for testing. We generate 25 dataset-formula pairs by setting = = 5 , = {, , } , = 1000 , and = 20 . These pairs are used to evaluate our framework Additionally, each experiment’s training and testing are performed on diferent prefix lengths || ∈ 5, 10, 15 to evaluate the model’s ability to predict shorter and longer sufixes. During testing, the sufix prediction stops when the model outputs the end-of-trace symbol or when the predicted sufix reaches a maximum of = 2 × = 40 timesteps. Each experiment is repeated 5 times with diferent seeds to ensure robustness and reliability of the results. 5.2. Empirical Results Our empirical evaluation compares the performance of four models: • RNN (random): An RNN trained on the training data, queried with random sampling.

The next activity is selected by randomly sampling according to the probability distribution predicted by the RNN output layer. • RNN+LTL (random): An RNN trained with the combined loss in Eq 7, where is calculated using the random formula associated with the dataset, and queried with random sampling. • RNN (greedy): An RNN trained only with data, with sufixes generated using greedy sampling (Eq. 4), always taking the most probable next event predicted by the output layer. This model is equivalent to the one proposed in [ 2 ]. • RNN+LTL (greedy): Similar to the previous model but with the addition of LTL background knowledge during training.

We instantiate all models as LSTMs with two layers and 100 neurons in each layer (as in [10]), a batch size of 64, and an Adam optimizer with a learning rate of 5 × 10−4. The models rccyau30 % cA20 10 0 that incorporate LTL background knowledge use set to 0.6 (Eq. 7), equal to 0.5 (Eq. 8), and equal to 2 × = 40 timesteps (Eq. 11).

The plots in Figure 2 show the average and standard deviation of the results obtained from running experiments on all 25 datasets. The evaluated metrics are the model’s test accuracy on the next activity prediction (Figure 2a), the satisfaction rate of test sufixes (Figure 2b), and the Damerau-Levenshtein (DL) distance (Figure 2c) of the predicted traces against the ground truth traces. The empirical results show that the integration of background knowledge does not afect the accuracy of next activity prediction, indicating that the predictor retains its capability to mimic training data when combined with logical knowledge. However, RNN+LTL models consistently achieve the best performance in sufix prediction. In terms of the satisfiability percentage metric, utilizing LTL background knowledge consistently results in an improved satisfaction rate over using only data, regardless of the sampling technique employed. Varying prefix lengths display no significant impact on the satisfaction rate, implying that the models are equally capable of sampling traces that satisfy the formulas for any input prefix length. The DL distance comparison further supports the superiority of models that utilize LTL background knowledge, showing a lower distance between the sampled traces and the ground truth traces for both sampling strategies used. The prefix length, however, does impact the DL distance, with shorter prefix inputs leading to a higher distance between the predicted and ground truth traces. In general, random sampling shows better performance compared to greedy sampling in terms of both metrics. Additionally, the DL distance results of the random sampling methods exhibit a more stable standard deviation than those of the greedy sampling methods. Overall, our empirical evaluation supports the hypothesis that integrating background knowledge into the models consistently results in improved performance.

6. Conclusions and Future Work

In conclusion, we present a novel NeSy framework for incorporating background knowledge in LTL into the training process of a deep autoregressive model for multistep symbolic sequence generation (i.e., sufix prediction). Our method involves: (i) translating the LTL formula into a DFA, (ii) representing the DFA probabilistically through a neural framework, and (iii) softly enforcing DFA acceptance on diferentiable sufixes obtained via Gumbel-Softmax sampling. This process allows us to define a logical loss that smoothly guides the predicted sufixes to satisfy the given knowledge, which can be integrated with other loss functions of the predictor. Experimental results on synthetic datasets demonstrate that applying our loss function to RNN models consistently produces sufixes with both lower DL distances and higher rates of LTL knowledge satisfaction, regardless of the sampling technique used for querying the network. In future work, we plan to apply our framework to realistic BPM datasets and more advanced neural predictors, such as GANs and Transformers.

Acknowledgments

This work has been partially supported by PNRR MUR project PE0000013-FAIR , the Sapienza Project MARLeN, the EU H2020 project AIPlan4EU (No. 101016442), the ERC-ADG WhiteMech (No. 834228), and the PRIN project RIPER (No. 20203FFYLK). International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017. [9] F. J. Damerau, A technique for computer detection and correction of spelling errors,

Commun. ACM 7 (1964) 171–176. doi:10.1145/363958.363994. [10] N. Tax, I. Verenich, M. L. Rosa, M. Dumas, Predictive business process monitoring with LSTM neural networks, in: Advanced Information Systems Engineering - 29th International Conference, CAiSE 2017, Essen, Germany, June 12-16, 2017, Proceedings, 2017, pp. 477–492. doi:10.1007/978-3-319-59536-8\_30. [11] J. Evermann, J. Rehse, P. Fettke, Predicting process behaviour using deep learning, Decis.

Support Syst. 100 (2017) 129–140. URL: https://doi.org/10.1016/j.dss.2017.04.003. doi:10. 1016/J.DSS.2017.04.003. [12] I. Ketykó, F. Mannhardt, M. Hassani, B. F. van Dongen, What averages do not tell predicting real life processes with sequential deep learning, CoRR abs/2110.10225 (2021).

URL: https://arxiv.org/abs/2110.10225. arXiv:2110.10225. [13] N. D. Mauro, A. Appice, T. M. A. Basile, Activity prediction of business process instances with inception CNN models, in: AI*IA 2019 - Advances in Artificial Intelligence - XVIIIth International Conference of the Italian Association for Artificial Intelligence, Rende, Italy, November 19-22, 2019, Proceedings, 2019, pp. 348–361. URL: https://doi.org/10.1007/978-3-030-35166-3_25. doi:10.1007/978-3-030-35166-3\_25. [14] F. Taymouri, M. L. Rosa, S. M. Erfani, A deep adversarial model for sufix and remaining time prediction of event sequences, in: Proceedings of the 2021 SIAM International Conference on Data Mining, SDM 2021, Virtual Event, April 29 - May 1, 2021, 2021, pp. 522–530. URL: https://doi.org/10.1137/1.9781611976700.59. doi:10.1137/1.9781611976700.59. [15] A. Chiorrini, C. Diamantini, A. Mircoli, D. Potena, A preliminary study on the application of reinforcement learning for predictive process monitoring, in: Process Mining Workshops - ICPM 2020 International Workshops, Padua, Italy, October 5-8, 2020, Revised Selected Papers, 2020, pp. 124–135. doi:10.1007/978-3-030-72693-5\_10. [16] T. R. Besold, A. S. d’Avila Garcez, S. Bader, H. Bowman, P. M. Domingos, P. Hitzler, K. Kühnberger, L. C. Lamb, P. M. V. Lima, L. de Penning, G. Pinkas, H. Poon, G. Zaverucha, Neural-symbolic learning and reasoning: A survey and interpretation, in: Neuro-Symbolic Artificial Intelligence: The State of the Art, 2021, pp. 1–51. doi: 10.3233/FAIA210348. [17] Y. Xie, F. Zhou, H. Soh, Embedding symbolic temporal knowledge into deep sequential models, CoRR abs/2101.11981 (2021). URL: https://arxiv.org/abs/2101.11981. arXiv:2101.11981. [18] E. Umili, R. Capobianco, G. D. Giacomo, Grounding ltlf specifications in image sequences, in: Proceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning, KR 2023, Rhodes, Greece, September 2-8, 2023, 2023, pp. 668–678.

URL: https://doi.org/10.24963/kr.2023/65. doi:10.24963/KR.2023/65. [19] S. Badreddine, A. d’Avila Garcez, L. Serafini, M. Spranger, Logic tensor networks, Artificial Intelligence 303 (2022) 103649. URL: https://www.sciencedirect.com/science/article/pii/ S0004370221002009. doi:https://doi.org/10.1016/j.artint.2021.103649. [20] E. Umili, F. Argenziano, A. Barbin, R. Capobianco, Visual reward machines, in: Proceedings of the 17th International Workshop on Neural-Symbolic Learning and Reasoning, La Certosa di Pontignano, Siena, Italy, July 3-5, 2023, 2023, pp. 255–267. URL: https://ceur-ws.

[1]

Rama-Maneiro ,

J. C.

Vidal ,

Lama , Deep learning for predictive business process monitoring: Review and benchmark , IEEE Transactions on Services Computing 16 ( 2020 ) 739 - 756 .

[2]

Tax , I. Verenich,

M. L.

Rosa ,

Dumas , Predictive business process monitoring with LSTM neural networks , in: Advanced Information Systems Engineering - 29th International Conference, CAiSE 2017 , Essen, Germany, June 12-16, 2017 , Proceedings, 2017 , pp. 477 - 492 . URL: https://doi.org/10.1007/978-3- 319 -59536-8_ 30 . doi: 10 .1007/ 978- 3- 319 - 59536- 8\_ 30 .

[3]

Rivera Lazo ,

Ñanculef , Multi-attribute transformers for sequence prediction in business process management , in: Discovery Science: 25th International Conference, DS 2022 , Montpellier, France, October 10-12 , 2022 , Proceedings, Springer-Verlag, Berlin, Heidelberg, 2022 , p. 184 - 194 . doi: 10 .1007/978- 3- 031 - 18840- 4_ 14 .

[4]

Rama-Maneiro ,

Patrizi ,

J. C.

Vidal ,

Lama , Towards learning the optimal sampling strategy for sufix prediction in predictive monitoring , in: Proc. of CAISE 2024 , 2024 , p. To Appear.

[5]

Giunchiglia ,

M. C.

Stoian ,

Khan ,

Cuzzolin , T. Lukasiewicz, ROAD-R: the autonomous driving dataset with logical requirements , Mach. Learn . 112 ( 2023 ) 3261 - 3291 . doi: 10 . 1007/S10994- 023- 06322- Z.

[6]

Ma ,

Gao ,

Feng ,

Stankovic , Stlnet: Signal temporal logic enforced multivariate recurrent neural networks , in: H. Larochelle , M.

Ranzato , R.

Hadsell , M.

Balcan , H. Lin (Eds.), Advances in Neural Information Processing Systems , volume 33 , Curran

Associates

, Inc., 2020 , pp. 14604 - 14614 .

[7]

Di Francescomarino ,

F. M.

Maggi ,

Petrucci ,

Yeshchenko , An eye into the future: Leveraging a-priori knowledge in predictive business process monitoring , in: Business Process Management - 15th International Conference, BPM 2017 , Barcelona, Spain, September 10-15 , 2017 , Proceedings, 2017 , pp. 252 - 268 . doi: 10 .1007/978- 3- 319 - 65000- 5\_ 15 .

[8]

Jang ,

Gu ,

Poole , Categorical reparameterization with gumbel-softmax , in: 5th