1. Introduction

Journal of the American Society for Information Science 51 (2000) 95-110. [17] J. Vennekens

10.1007/978-3-642-14538-4

Probabilistic Compliance in Declarative Process Mining

Michela Vespa

michela.vespa@unife.it 1

Elena Bellodi

elena.bellodi@unife.it 1

Federico Chesani

federico.chesani@unibo.it 0

Daniela Loreti

daniela.loreti@unibo.it 0

Paola Mello

paola.mello@unibo.it 0

Evelina Lamma

evelina.lamma@unife.it 1

Anna Ciampolini

anna.ciampolini@unibo.it 0 0 Dipartimento di Informatica - Scienza e Ingegneria , Viale Risorgimento 2, Bologna , Italy 1 Dipartimento di Ingegneria, Università di Ferrara , Via Saragat 1, Ferrara , Italy

2011

56 41 63

When addressing real-world processes, it is crucial to account for their intrinsic uncertainty to better reflect the nature of such processes. In this work, we introduce the concept of Probabilistic Declarative Process Specification (PDS), which is based on the Distribution Semantics from Probabilistic Logic Programming, in order to describe declarative process models with both crisp and probabilistic constraints. The probability associated to a constraint represents its strength or importance in a specific process domain. From this, we propose a novel notion of probabilistic compliance of a process trace w.r.t. a PDS, and how to compute it with an existing algorithm. Preliminary experimental results on a healthcare protocol are presented to evaluate the feasibility of our proposed semantics on process conformance checking.

eol>Process Mining Declarative language Compliance Distribution Semantics Probabilistic Logic Programs

1. Introduction

In the Business Process Mining community, many diferent languages have been proposed to model/describe a process, capturing the many aspects of a work process, ranging from the resource perspective, to the artifact-based and data perspective, up to the control-flow aspects. The latter viewpoint in particular has been subject to intense research activity. Two families of process modeling languages emerged: procedural approaches, such as BPMN1, and declarative ones, such as Declare [ 1, 2 ] and DCR Graphs [ 3 ]. Declarative approaches have been introduced to overcome some forms of rigidness derived by procedural ones, with the aim of finding a balance between flexibility and control. Declare in particular, was defined with this aim: allowing the specification of which properties a process instance should exhibit, without specifying the exact steps to achieve them. Since the initial proposal, Declare has been equipped with a formal semantics based on a strict subset of LTL . This enables a straightforward specification of process properties through the use of established formulas.

However, the adoption of a logic-based semantics has also raised a practical issue when dealing with the problem of evaluating if a log is compliant with a process specification. On one side, a log is composed of many traces; on the other side a Declare process specification is composed of many constraints. Typically, in real-world applications, diferent subsets of traces are compliant with diferent subsets of constraints. The direct consequence is that a logic-crisp notion of compliance might be dificult or unsatisfactory to identify. In turn, such dificulty could undermine the capacity of a process specification to properly describe the process. In a seminal work [ 4 ], the authors tackled the problem by introducing the notion of probabilities into the Declare formalism: each constraint is assigned a probability (and a relational operator). A probabilistic constraint is satisfied over the log if the number of traces satisfying the constraint over the log cardinality achieves the mathematical relation established by the relational operator and the probability assigned. As a practical example, a constraint c1 with assigned probability (0.9, ≥ ) is satisfied if the number of traces complaint with c1 is at least (≥ ) the 90% of the total traces. In summary, probabilities express the relative frequency of traces, and are employed with such a meaning for the tasks of discovering, monitoring and conformance checking. Nonetheless, when the available log does not represent the whole process domain, the application of the approach [ 4 ] by domain expert might be challenging.

In this work, we also tackle the problem of probabilistic conformance checking with Declare constraints, but we propose a diferent semantics underlying the probabilistic constraints. We introduce the concept of Probabilistic Declarative Process Specification (PDS) starting from Probabilistic Logic Programming (PLP) and the Distribution Semantics [ 5 ]. A PLP program under that semantics defines a probability distribution over normal logic programs called worlds. In a PLP program, logic formulas are equipped with a probability, which is interpreted as the probability of them appearing or not in a normal logic program. Similarly, the probability of a Declare constraint is treated as the probability that such constraint will appear (or not) in a possible process specification. The presence or not of a constraint in a process specification tells us how “important” or strong that constraint is. The stronger or more important the constraint, the greater its probability. According to the our semantics, all worlds including one such constraint "inherit" its probability, while those not including the constraint take into account the complement to 1 of its probability. Diferently from [ 4 ], this probability does not represent a fraction of traces satisfying the constraint, but allows one to model uncertainty in the domain by means of the probability attached to the constraint.

To better clarify our proposal, let us consider the following university course scenario, described with two constraints: () it is mildly advised that a student only take the final examination if she has attended all the classes; () it is strongly advised that a student attend a practical session only if she attended the corresponding preparatory class. Declare provides the way for expressing such constraints; however, there is no way to specify that rule () is “mildly advised”, while rule () is “strongly advised”. Our proposal consists of attaching probabilities to constraints () and (): such values would indicate the probability that each rule would end up in a possible model (process specification), and the probability attached to () would be higher than the one attached to (). By making () and () probabilistic, we would find four possible process models, and we can formally define the compliance of the students’ careers (process traces) towards these four models (i.e., worlds in the Distribution Semantics terminology).

The contribution of this work is the following: we introduce a semantics for probabilistic declarative process specifications, provide a novel notion of compliance, and ground its application on a health protocol. Then, we exploit previous results, and in particular () the mapping provided to Declare in terms of the SCIFF modeling language [6] and () the extension of the Distribution Semantics to Abductive Logic Programming (ALP) in the SCIFF Framework [7]. Thanks to the existing work, we can implement our proposed semantics and compute a compliance probability. In order to evaluate the feasibility of our approach, we report some preliminary experimentation over the clinical guideline.

The paper is organized as follows: Section 2 introduces background on Declare and the distribution semantics, Section 3 describes the proposed semantics, Section 4 describes how we perform probabilistic compliance. Section 5 presents first experimental results and Section 6 concludes the paper.

2. Background 2.1. Traces, logs, and the Declare modeling language

In the BPM setting [8] the starting point is the observation of a process execution in terms of the execution of the activities that compose the process. Each execution of the process is usually referred to as a process instance or trace, where the executions of the distinct activities are referred to as distinct activity instances or activity executions. Each activity is characterized by at least a name, and each activity instance is usually denoted with a temporal timestamp. The timestamp provides, within the (a) The response template.

(b) The init template.

(c) The precedence template. same trace, a relation order between the diferent activity executions: it is often the case that traces are represented directly as a sequence of activity names, ordered on the base of the timestamp. Depending on the specific context, the timestamp can be omitted: this is also our choice.

Formally, we consider the existence of a finite alphabet of symbols Σ corresponding to the activity names. A trace and a log then are defined as follows: Definition 1 (Trace and Log ℒ). Given a finite set Σ of symbols (i.e., activity names), a trace is a ifnite, ordered sequence of symbols over Σ, i.e. ∈ Σ* , where the latter is the infinite set of all the possible ifnite sentences over Σ. A log ℒ is a finite set of traces.

In a process log, each trace represents a diferent process instance. Diferent process instances may have the same trace, although referring to diferent executions.

Example 1. Let us suppose that a process is made of activities a, b, c, and d. Σ = {a, b, c, d}. An example of a log might be: ℒ = {1 = ⟨a, b, c⟩, 2 = ⟨a, b, a, d⟩, 3 = ⟨a, a, d⟩, 4 = ⟨a, b, c⟩, 5 = ⟨a, b, c⟩}.

The Declare modeling language was initially introduced by [ 2 ] and subsequently studied in [9]. Aimed to overcome the “rigidness” of procedural modeling languages, it focuses on modeling processes by specifying what are the relevant properties that a process instance should exhibit, without specifying how these properties should be achieved. To this end, Declare models a process in terms of constraints, sort of rules about activities that can appear in a process execution (i.e., in a trace), with qualitative temporal relations among these activities. A simple example of a Declare constraint is response(a,b), meaning that every occurrence of (the execution of) activity a in a trace should be followed by the occurrence of an activity b.

Declare provides a finite set of constraint patterns, whose arguments should be properly instantiated with the activity names peculiar to the specific process to be modeled. For example, a common pattern is response(x,y) with x and y placeholders to be substituted with proper activities. Another pattern is init(x), that specifies that every process instance should always start with the execution of activity x. Another common pattern is precedence(x,y), that states that every occurrence of the execution of y should be preceded by the execution of x.

Declare provides a graphical representation for the patterns (see Figure 1 for a few examples), and has been equipped with two diferent formal semantics, both based on logic formalisms. In the original formulation in [ 2 ] the semantics was given using the LTL temporal logic; subsequent works have shown the feasibility of adopting the LTL logic [10]: for a recent recap see [11]. A second formal semantics has been proposed in [6], where the SCIFF language and ALP [12] has been exploited.

Both the semantics exploit the idea that each Declare template can be mapped onto one (or more) logic formula , and that logical entailment can be used to define the notion of compliance/violation of a trace w.r.t. to a constraint formula . With the aim of being the most general, we provide an intuitive definition of compliance/violation, where the meaning of the entailment symbol |= should be referred to the chosen semantics (LTL or ALP).

Definition 2 (Compliance/violation of a trace versus a constraint). A trace is compliant with a Declare constraint if, named the corresponding logic formula modelling that constraint, it holds: |= .

A trace violates a Declare constraint if it does not entail the corresponding formula , i.e. if ̸|= . Definition 3 (Declarative Process Specification, from [ 11]). A declarative process specification is a tuple DS=(Rep, Σ, ) where: • Rep is a finite non-empty set of templates, where each template is a predicate c(1, . . . , ) ∈ Rep on variables 1, . . . , (with ∈ N the arity of c); • Σ is a finite non-empty set of activity names; • is a finite set of constraints, obtained by instantiating templates from Rep to Σ; we will compactly denote such constraints with c(1, . . . , ), 1, . . . , ∈ Σ.

Usually the constraints c(1, . . . , ) in a DS are considered as being in logical conjunction. The notion of compliance can be then lifted from a trace vs. a constraint towards a trace vs. a DS as follows: Definition 4 (Compliance of a trace versus a Declarative Process Specification) . A trace is compliant with a DS if it entails the conjunction of the formulas corresponding to the ∈ : |= 1 ∧ . . . ∧ where is the cardinality of .

Example 2. Let us consider the log introduced in Example 1, and the following DS (Rep and Σ are omitted for the sake of simplicity): • 1, 4, and 5 are compliant with c1; • 2 is not compliant with c1 because the second occurrence of activity a is not followed by an occurrence of activity b; • 3 is not compliant with c1 because two occurrences of a are not followed by an occurrence of b; • 1, . . . , 5 are all compliant with c2.

2.2. Distribution Semantics and Probabilistic Logic Programming

Probabilistic Logic Programming has recently received an increasing attention for its ability to incorporate probability in logic programming. Among various proposals for PLP, the one based on the distribution semantics [ 5 ] has gained popularity as being introduced for the PRISM language [ 5 ] but is the basis for many other languages such as Probabilistic Logic Programs [13], Probabilistic Horn Abduction (PHA) [14], Independent Choice Logic (ICL) [15], pD [16], Logic Programs with Annotated Disjunctions (LPADs) [17], ProbLog [18] and CP-logic [19]. Such semantics is particularly appealing for its intuitiveness and because eficient inference algorithms were proposed. By combining probability with logic programming, PLP languages are a suitable framework to handle uncertain information.

A program in one of these languages defines a probability distribution over normal logic programs called worlds. The distribution semantics has been defined both for programs that do not contain function symbols, and thus have a finite set of worlds, and for programs that contain them, that have an infinite set of worlds. We consider here the first case for the sake of simplicity, for the treatment of function symbols see [20]. A survey of the distribution semantics in PLP can be found in [21]. In the following, the distribution semantics will be described with reference to LPADs. Formally, an LPAD consists of a finite set of "annotated disjunctive clauses", where the head is a disjunction in which each atom is annotated with a probability. If the body holds true, only one of the atoms in the head will be true with the associated probability. An annotated disjunctive clause is of the form ℎ1 : 1; . . . ; ℎ : : − 1, . . . , , where ℎ1, . . . , ℎ are logical atoms and {1, . . . , } are real numbers in the interval [ 0, 1 ] such that ∑︀=1 ≤ 1; 1, . . . , is indicated with (). If ∑︀=1 < 1, the head implicitly contains an extra atom that does not appear in the body of any clause and whose annotation is 1 − ∑︀=1 . We denote by ( ) the grounding of an LPAD .

An atomic choice [15] is a triple (, , ) where ∈ , is a substitution that grounds and ∈ {1, . . . , } identifies one of the head atoms. (, , ) means that, for the ground clause , the head ℎ was chosen. A set of atomic choices is consistent if only one head is selected from the same ground clause; we assume independence between the diferent choices. A composite choice is a consistent set of atomic choices [15]. The probability ( ) of a composite choice is the product of the probabilities of the independent atomic choices, i.e. ( ) = ∏︀(, ,)∈ .

A selection is a composite choice that, for each clause in ( ), contains an atomic choice (, , ). Let us indicate with the set of all selections. A selection identifies a normal logic program defined as = {(ℎ ← ()) |(, , ) ∈ }. is called a (possible) world of . Since selections are composite choices, we can assign a probability to worlds: ( ) = ( ) = ∏︀(, ,)∈ .

We denote the set of all worlds of by . ( ) is a probability distribution over worlds, i.e., ∑︀∈ () = 1. A composite choice identifies a set of worlds = { | ∈ , ⊇ }. Similarly we can define the set of possible worlds associated to a set of composite choices : = ⋃︀ ∈ .

Example 3. Consider the following LPAD T encoding the efect of flu and hay fever on the sneezing symptom.

(1) _() : 0.3; _() : 0.5 : − (). (2) _() : 0.2; _() : 0.6 : − ℎ_ (). (3) ().

(4) ℎ_ ().

If somebody has the flu or hay fever, there is the possibility that he experiences sneezing symptoms with diferent intensity: if she has the flu, then she might show strong sneezing with probability 0.3, or moderate sneezing with probability 0.5; similarly for the second clause. She might not experience any symptom with probability 0.2 in both cases. We know for sure that Bob has both the flu and hay fever. has 3 · 3 = 9 worlds, as each probabilistic clause has one grounding 1 = {/}. Worlds are shown in Table 1. ∑︁ (|) () = ∈

∑︁ ∈ :|= () (1) The probability of a goal given a world is (|) = 1 if |= and 0 otherwise. () = ( ) = ( ), i.e. is the product of the annotations of the atoms selected in . Therefore, the probability of a goal G can be computed by summing the probability of the worlds where the goal is true. In practice, given a goal to solve, it is unfeasible to enumerate all the worlds where is entailed. Inference algorithms, instead, find explanations for a goal: a composite choice is an explanation for if is entailed by every world of . In particular, algorithms find a covering set of explanations w.r.t. , where a set of composite choices is covering with respect to if every program ∈ in which is entailed is in .

3. Probabilistic Declarative Process Specifications under the Distribution semantics

In this Section, we propose a semantics for Declarative Process Specifications that is highly inspired by the Distribution Semantics introduced in Section 2.2. First of all, we introduce the notion of probabilistic constraint. Please note that also in Definition 8 of [ 4 ] it is adopted the term probabilistic constraint, but that definition includes also a relational operator that we do not need in our semantics. Definition 5 (Probabilistic Constraint). Given a finite, non-empty set Rep of constraint templates and a set Σ of activity names 1, . . . , , a probabilistic constraint is a constraint template c ∈ Rep instantiated over Σ that is annotated with a real number ∈ [0..1] (the probability of the constraint). We will indicate a generic -th probabilistic constraint with the syntax:

:: c(1, . . . , )

The probability is to be interpreted as the strength or importance of c. The stronger or more important the constraint, the greater its probability. By applying the Distribution Semantics, a constraint c(. . .) will have a probability of being part of a Probabilistic Declarative Process Specification , defined as follows: Definition 6 (Probabilistic Declarative Process Specification) . A Probabilistic Declarative Process Specification PDS is a Declarative Process Specification DS where each constraint c ∈ is a probabilistic constraint.

We will refer to constraints annotated with probability = 1 as crisp constraints. If all the constraints have probability equal to 1, we simply end up having a Declarative Specification as in Def. 3. Probabilities will be omitted when they are equal to 1. We now propose to interpret the probability of a constraint borrowing a few concepts from Section 2.2.

Definition 7 (Atomic Choice, and its probability). An atomic choice over a probabilistic constraint :: c is a couple (c, ) where ∈ {0, 1}. indicates whether c is chosen to be included in a PDS with probability (e = 1), or not with probability 1 − (e = 0).

Note that here we do not need the substitution as in the definition of atomic choice in subsection 2.2, since, according to Def. 5, the constraint c is ground.

Definition 8 (Composite Choice, and its probability). A set of atomic choices is consistent if there is only one choice in for each probabilistic constraint c; we assume independence between the diferent atomic choices. A composite choice is a consistent set of atomic choices.

The probability ( ) of a composite choice is ( ) = ∏︀(c,1)∈ ∏︀(c,0)∈ (1 − ), where is the probability associated with c.

Definition 9 (Selection over a PDS, and its probability). A selection over a Probabilistic Declarative Process Specification is a composite choice that contains an atomic choice (c, ) for every probabilistic constraint of the PDS. A selection identifies a world in this way: = {c|(c, 1) ∈ }. The probability of a world is ( ) = ( ) = ∏︀(c,1)∈ ∏︀(c,0)∈ (1 − ).

Definition 10 (Probability of a DS). Given the set of all selections over a Probabilistic Declarative Process Specification, every world identified by a selection ∈ is a diferent Declarative Process Specification , and its probability is () = ( ) = ( ).

A PDS defines a probability distribution over regular (non-probabilistic) DSs that correspond to worlds. Let us consider the following example to better illustrate this concept.

Example 4. Let us consider the following PDS, obtained as an extension of the DS in Example 2: The probability of c1 indicates that the constraint response(a,b) is considered very strong therefore, the more traces satisfy it, the better. The probability of c2 indicates that the fact that a trace starts with the execution of activity a is rather important, but less than the fact that a is followed by b. For each constraint there are 2 diferent atomic choices, and this leads to 4 selections and 4 possible worlds; in turn, 4 diferent regular DS are possible, each one corresponding to a world:

Selection

1 = {(1, 1), (2, 1)} 2 = {(1, 1), (2, 0)} 3 = {(1, 0), (2, 1)} 4 = {(1, 0), (2, 0)}

DS 1 = {response(a,b), init(a)} 2 = {response(a,b)} 3 = {init(a)} 4 = { }

() (1) = 0.9 × 0.8 = 0.72 (2) = 0.9 × 0.2 = 0.18 (3) = 0.1 × 0.8 = 0.08 (4) = 0.1 × 0.2 = 0.02

Example 4 allows us to highlight some characteristics of the semantics proposed in this paper. First of all, we might notice that, thanks to the notion of world, we started from a probabilistic declarative specification, and ended up with a set of non-probabilistic declarative specifications. Secondly, we might notice that each diferent has its own probability (i.e., the one obtained by the corresponding selection ), and that the set of all the possible selections achieves a probability distribution over DSs, i.e. ∑︀ () = 1.

4. Probabilistic Compliance

In this Section we will define the notion of compliance of a trace w.r.t. a Probabilistic Declarative Process Specification PDS. In doing so, we will heavily rely on the simpler notion of compliance of a trace vs. a DS as in Definition 4. Note that in turn Def. 4 builds upon Def. 2, that abstracts away from the formal Declare semantics ( or ALP). As a consequence, our definition of compliance towards a probabilistic specification is general, and is valid for both and ALP. Obviously, in order to compute the compliance, one semantics should be chosen: we will discuss such a choice and the implementation in the next Section.

Definition 11 (Compliance of a trace versus a PDS). Given a a Probabilistic Declarative Process Specification PDS, let us consider all the possible selections over PDS, and all the possible Declarative Specification associated with . A trace has a probability of compliance w.r.t. a PDS defined as follows: (, ) =

∑︁ : is compliant with () (2)

We might observe that, w.r.t. Def. 4, we move forward from a crisp boolean concept towards a degree of compliance: this is an expected consequence of introducing probabilities in the specification. Then, we might notice that the formula above indeed is simply an application of Equation 1 to the process specification domain. Let us apply the notion of compliance of a trace vs. a PDS by considering the following example: Example 5 (continued from Ex. 4). Consider the PDS of Ex. 4 and the trace 1 = ⟨a, b, c⟩: 1 is compliant with all 4 and, as expected, its probability of compliance is: (1, ) = (1) + (2) + (3) + (4) = 0.72 + 0.18 + 0.08 + 0.02 = 1 Let us consider then a trace 2 = ⟨c, a, b⟩ and suppose temporarily that c1 and c2 are crisp constraints: 2 is compliant with c1 since after a there is b; however it is not compliant with c2 since 2 does not start with a. By considering again the constraint as probabilistic, out of the four declarative specifications , 2 is compliant with 2 and 4, i.e. those specifications that do not contain c2. 2 probability of compliance is:

(2, ) = (2) + (4) = 0.18 + 0.02 = 0.2

By violating c2, which was a rather important constraint as indicated by a probability value of 0.8, the trace has a low degree of compliance vs. the PDS.

Finally, let us consider the trace 3 = ⟨c, a⟩. 3 is not compliant with c1 nor with c2, considered individually, however it is compliant with the empty specification 4, so its probability of compliance is not zero but is equal to (4) = 0.02.

Example 5 illustrates the intuition behind the idea, proposed in this work, of interpreting the probability of a constraint as its strength or, in other words, how much important it is to satisfy such constraint. In the aforementioned example, trace 1 is compliant with both the constraints, hence its degree of compliance is the maximum possible. Trace 2 instead is compliant with only one of the two constraints, hence its degree is lower in relation to the strength/importance we have associated to the violated constraint. We might finally notice that trace 3, even if it does not comply with any constraints, still achieves a score. This is a direct consequence of the interpretation of the probability of a constraint as its strength: if the strength is lower than 1, then we are saying that we will admit non-compliant traces with a degree (1 − ).

In more formal terms, if all the constraints in a PDS have the attached probabilities < 1, then there is always a selection * corresponding to including no constraint in the * (the empty specification), and whose probability is ( * ) > 0. In Example 4, such a special selection is 4. Every possible trace will be always compliant with * , since it’s empty. Similarly, all the constraints annotated with probability = 1 will appear in all the corresponding non-probabilistic DS, as shown in the following example: Example 6. Let us consider the following PDS: = { 0.8 :: response(a,b) init(a) (c1) (c2) } The corresponding selections and specifications would be:

Selection

1 = {(c1, 1)} 2 = {(c1, 0)}

DS 1 = {response(a,b), init(a)} 2 = {init(a)}

() (1) = 0.8 (2) = 0.2

Example 6 clearly illustrates how our proposed semantics for PDS accommodates for both probabilistic and non-probabilistic constraints: the latter are treated as mandatory constraints as usual in Process Mining, and the derived specifications will always contain them.

5. Implementation and Evaluation

For computing the probabilistic compliance of a trace vs a PDS we leveraged on the implementation presented in [22]. This implementation supports reasoning on ALP programs featuring "integrity constraints" similar to those ofered by IFF [ 23], extended with the possibility of annotating them with a probability value. The semantics of these programs, called IFFProb programs, defines a probability distribution over IFF programs inspired by the distribution semantics, so the IFFProb implementation could be directly used for our goals. IFFProb is based on the implementation of the SCIFF proof-procedure. SCIFF [12] is an extension of the IFF proof procedure that also features constraints (à la CLP) and universally quantified abducibles. In [ 22], the IFF sub-language is extended to the probabilistic case with a new CHR constraint that represents the current explanation (, ) meaning that, in the current derivation branch, the explanation is and has probability . is a collection of couples (c, ), holding the constraint c and the Boolean value indicating whether c1 belongs to or not.

In [9] declare constraints were mapped into a rule-based language known as CLIMB (Computational Logic for the verIfication and Modeling of Business constraints), a specialized subset of the SCIFF language that is based on abductive semantics. This mapping allows declarative models to be converted into sets of executable, logic-based rules. These rules are referred to as integrity constraints since they constrain the courses of interaction to ensure their integrity and compliance with interaction protocols. Each atomic Declare activity can be expressed in CLIMB as a term and is linked to a single time point, allowing its execution to be represented as either happened (H), expected (E), or forbidden. For example, the fact that an atomic activity has happened at time can be denoted by (exec(), ), while (exec(), ) states that is expected to occur at time . We do not consider here forbidden activities. Integrity constraints are of the form → , where contains (a conjunction of) happened events, together with constraints on their variables, as well as Prolog predicates; contains (a disjunction of conjunctions of) positive and negative expectations, together with constraints and Prolog predicates, applied on their variables and/or on variables contained in the .

We created a Declare process model for elective colorectal surgery based on the ERAS® (Enhanced Recovery After Surgery) Society guidelines for perioperative care [24], as shown in Figure 2. The model is a PDS written in the CLIMB language, where the protocol activities are subject to both crisp and probabilistic constraints: the former represent strongly recommended practices according to the guidelines, reflecting critical, evidence-supported activities that must be adhered to rigorously, while the latter represent weakly recommended practices, whose associated probability states the importance to apply them in the healthcare flow. Weakly recommended practices may have some degree of flexibility in their implementation due to individual patient needs, local variations in resources and demographics, advancements in medical research, institutional policies, and cultural considerations. The model includes 21 constraints capturing essential perioperative events, from patient admission to post-surgery recovery. Of these, 14 constraints were identified as crisp ( = 1) and 7 as probabilistic. An excerpt of the PDS is: true → (exec(program_admission), 0). (exec(program_admission), 1) → (exec(counseling), 2) ∧ 2 > 1. (exec(start_surgery), 2) → (exec(counseling), 1) ∧ 1 < 2. 0.3 :: (exec(start_surgery), 2) → (exec(preanesthesia), 1) ∧ 1 < 2. 0.4 :: (exec(start_surgery), 2) → (exec(fasting), 1) ∧ 1 < 2. (1) (2) (3) (4) (5)

Here, the integrity constraint 1 represents the init constraint. It specifies that the process must always start with the event program admission of the patient, occurring at time 0. Obviously, it is a mandatory constraint. Constraint 2 represents the response constraint, indicating that whenever the event program admission happens, the pre-operative patient counseling and education must eventually follow at a time T2 later than T1. It’s strongly recommended to begin patient counseling after admission into the program and prior to surgical procedures, thus constraints 2 and 3 are treated as crisp. Instead, 4 and 5 are modeled as probabilistic precedence constraints since, in the guidelines, preanesthesia and fasting before surgery could sometimes be skipped. The relatively low probability assigned to them derives from the consensus around those specific perioperative practices, suggesting they are generally not enforced for optimal postoperative outcomes. Each event trace was then adapted to the format required by the algorithm, transforming each trace into an interpretation. Every fact has two arguments: the first one is a ground term that records the event name, and the second one is an integer that indicates the timestamp. An example of an interpretation is the following: ℎ((_), 0). ℎ((), 5).

ℎ(( ), 80).

To perform our experiments we wrote one synthetic trace composed of 21 events, one for each activity of the protocol. The process trace satisfied all the model constraints.

We provide two preliminary scalability tests in order to show how both crisp and probabilistic constraints influence the execution time in computing the probability of compliance of that trace. Experiments were carried out on a Linux machine with Intel® Xeon® E5-2630v3 running at 2.40 GHz with 20 GB of RAM and a time limit of 8 hours. In the first test, the number of crisp constraints was kept fixed respectively to 3, 6, 9, and 14 and the number of probabilistic constraints was increased from 1 to 7 at steps of 1. In the second test, the number of probabilistic constraints was kept fixed at 2, 4, and 7 and the number of crisp constraints was increased from 1 to 14 at steps of 1.

1 2 3 4 5 6 Number of Probabilistic Constraints 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Number of Crisp Constraints (a) Execution time as the number of probabilistic constraints varies, keeping the number of crisp constraints fixed.

(b) Execution time as the number of crisp constraints varies, keeping the number of probabilistic constraints fixed.

Figure 3 illustrates the results, showing an exponential trend in execution times as the number of either crisp or probabilistic constraints increases. The impact on the time is higher when using 1 · 104 ()s 0.8 e iTm0.6 n ito 0.4 u c exE 0.2 0 0 3 Crisp Constr. 6 Crisp Constr. 9 Crisp Constr. 14 Crisp Constr.

1 · 104 probabilistic constraints: we obtain an exponential trend with 7 probabilistic constraints compared to 14 crisp ones.

6. Conclusions and Future Work

In this paper, we presented a new semantics for Declare process model specifications which allows one to specify probabilistic constraints representing their strength/importance in a specific process domain. In this way, we can model domains where some constraints are stronger (or weaker) than others. Such models are called Probabilistic Declarative Process Specifications. The underlying semantics is inspired by the distribution semantics from Probabilistic Logic Programming, and abstracts away from the formal Declare semantics (LTL or ALP), aiming at computing the probability of compliance of a trace versus the model. The computation of the compliance relies on an existing algorithm. Preliminary tests show that the time taken for computing the probability has an exponential trend by increasing the number of both crisp and probabilistic constraints. Future work includes: extending experimental evaluation, defining the compliance of a set of traces (log) vs. a PDS; extending the new semantics to manage uncertainty at the level of events, traces or the log itself; studying the profiles of energy consumption when computing the probability of compliance.

7. Acknowledgments

Research funded by the Italian Ministerial grant PRIN 2022 “Probabilistic Declarative Process Mining (PRODE)”, n. 20224C9HXA - CUP F53D23004240006, funded by European Union – Next Generation EU. Research partly funded by the Italian Ministry of University and Research through PNRR - M4C2 Investimento 1.3 (Decreto Direttoriale MUR n. 341 del 15/03/2022), Partenariato Esteso PE00000013 - "FAIR - Future Artificial Intelligence Research" - Spoke 8 "Pervasive AI" - CUP J33C22002830006, funded by the European Union under the NextGeneration EU programme". This work was realized by Daniela Loreti with a research contract co-financed by the European Union (PON Ricerca e Innovazione 2014-2020 art. 24, comma 3, lett. a), della Legge 30/12/2010, n. 240 e s.m.i. e del D.M. 10/08/2021 n. 1062).

[1]

Pesic ,

Schonenberg , W. M. van der Aalst, Declare: Full support for loosely-structured processes , in: 11th IEEE International Enterprise Distributed Object Computing Conference (EDOC 2007 ), 2007 , pp. 287 - 287 . doi: 10 .1109/EDOC. 2007 . 14 .

[2]

Pesic , Constraint-based workflow management systems : shifting control to users , Phd thesis 1 (research tu/e / graduation tu/e), Industrial Engineering and Innovation Sciences , 2008 . doi: 10 . 6100/IR638413, proefschrift.

[3]

T. T.

Hildebrandt ,

R. R.

Mukkamala , Declarative event-based workflow as distributed dynamic condition response graphs , in: K. Honda , A . Mycroft (Eds.), Proceedings Third Workshop on Programming Language Approaches to Concurrency and communication-cEntric Software, PLACES 2010 , Paphos, Cyprus, 21st March 2010 , volume 69 of EPTCS , 2010 , pp. 59 - 73 . doi: 10 .4204/EPTCS.69.5.

[4]

Alman ,

F. M.

Maggi ,

Montali ,

Peñaloza , Probabilistic declarative process mining , Inf. Syst . 109 ( 2022 ) 102033 . doi: 10 .1016/J.IS. 2022 . 102033 .

[5]

Sato , A statistical learning method for logic programs with distribution semantics , in: L. Sterling (Ed.), Logic

Programming

, Proceedings of the Twelfth International Conference on Logic Programming , Tokyo, Japan, June 13-16, 1995 , MIT Press, 1995 , pp. 715 - 729 .