Keeping interval-based functional dependencies up-to-date � Carlo Combi and Pietro Sala Department of Computer Science, University of Verona {carlo.combi|pietro.sala} @univr.it Abstract. In the temporal database literature, every fact stored in a database may be equipped with two temporal dimensions: the valid time, which describes the time when the fact is true in the modeled reality, and the transaction time, which describes the time when the fact is current in the database and can be retrieved. Temporal functional dependen- cies (TFDs) add valid time to classical functional dependencies (FDs) in order to express database integrity constraints over the flow of time. Currently, proposals dealing with TFDs adopt a point-based approach, where tuples hold at specific time points, to express integrity constraints such as “for each month, the salary of an employee depends only on his role”. To the best of our knowledge, there are no proposals dealing with interval-based temporal functional dependencies (ITFDs), where the as- sociated valid time is represented by an interval and there is the need of representing both point-based and interval-based data dependencies. In this paper, we propose ITFDs based on Allen’s interval relations and discuss their expressive power with respect to other TFDs proposed in the literature: ITFDs allow us to express interval-based data dependen- cies, which cannot be expressed through the existing point-based TFDs. ITFDs allow one to express constraints such as “employees starting to work the same day with the same role get the same salary” or “employees with a given role working on a project cannot start to work with the same role on another project that will end before the first one”. Furthermore, we propose new algorithms based on B-trees to efficiently verify the sat- isfaction of ITFDs in a temporal database. These algorithms guarantee that, starting from a relation satisfying a set of ITFDs, the updated relation still satisfies the given ITFDs. 1 An example of interval-based constraints Most health care institutions collect a large quantity of clinical information about patient and physician actions, such as therapies and surgeries, as well as about health care processes, such as admissions, discharges, and exam requests. All these pieces of information are temporal in nature and the associated tempo- ral dimension needs to be carefully considered in order to be able to properly represent clinical data and to reason about them [2]. In this section, we briefly � A short summary of the results published in [3] and [4]. 330 C. Combi and P. Sala. Keeping interval-based functional dependencies up-to-date # TherType PatId Phys DrugCode Qty B E 1 antiviral 1 Dorian 0458 300 1 16 2 analgesics 1 Cox 0976 200 2 10 3 cardiovascular 1 Turk 0118 100 3 8 4 antipyretics 1 Cox 0976 100 9 11 5 sedative 1 Turk 0345 10 13 15 6 anxiolytic 1 Dorian 0345 10 17 19 7 antiviral 2 Kelso 0458 200 1 10 8 cardiovascular 2 Quinlan 0118 100 4 7 9 analgesics 2 Reid 0976 150 5 9 10 antiviral 2 Reid 0458 300 8 14 11 antiviral 1 Dorian 0789 200 1 18 Dorian Dorian Cox Turk Cox Turk Dorian Kelso Quinlan Reid Reid 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Fig. 1. An instance of relation PatTherapies, storing data about patient therapies and its representation on the time line with values for attribute Phys introduce a real-world example taken from clinical medicine, namely that of patient therapies. Suppose we have patients who undergo several di↵erent therapies: each ther- apy can be supervised by a physician, and consists of the administration of some drug to the patient. Information about patients and therapies is stored in a re- lation according to the schema PatTherapies(TherType, PatId , DrugCode, Qty, Phys, B , E ), where TherType identifies a type of pharmacological therapy, PatId represents a patient ID, DrugCode and Qty the prescribed drug and its quan- tity, respectively, and Phys the physician who made the prescription (and is re- sponsible for the therapy). Finally, attributes B and E represent the beginning and end time points of the tuple valid interval, respectively: they represent the bounds of the interval specified by the physician for each therapy. An instance of PatTherapies is provided in Fig. 1. Example 1. A policy of the hospital may be described as follows: Every patient may receive several therapies at the same time from di↵erent physicians, but overlapping therapies for the same patient must be prescribed by the same physician. In other words, if a patient during a therapy needs another therapy which lasts beyond the end of the current therapy, then this therapy must be prescribed by the same physician who prescribed the other one; 331 C. Combi and P. Sala. Keeping interval-based functional dependencies up-to-date I1 finishes I0 (I1 F I0 ) I1 I2 during I0 (I2 D I0 ) I3 starts I0 (I3 S I0 ) I2 I4 overlaps I0 (I4 O I0 ) I3 I5 meets I0 (I5 M I0 ) I6 before I0 (I6 B I0 ) I0 I0 equals I0 (I0 = I0 ) I0 finished by I1 (I0 F̄ I1 ) I4 I0 contains I2 (I0 D̄ I2 ) I0 started by I3 (I0 S̄ I3 ) I5 I0 overlapped by I4 (I0 Ō I4 ) I6 I0 met by I5 (I0 M̄ I5 ) I0 after I6 (I0 B̄ I6 ) Fig. 2. The thirteen Allen relations between intervals It is easy to see that in order to verify these policies through the acquired data, both the start points and the end points of every pair of tuples come into play. 2 Interval-based functional dependencies Given a totally ordered set O = �O, ≤�, an interval I over O is a pair I = [b, e] where b, e ∈ O and b ≤ e. For any interval I = [b, e] over O let points(I) denote the set of points in O between b and e: points(I) = {p � p ∈ O and b ≤ p ≤ e}. While the possible distinct relations between two points considering only the linear order are reduced to three (equality, successor, and predecessor), considering the order among the two endpoints of two intervals leads us to have thirteen possible relations. These relations are depicted in Fig. 2 according to the notation proposed by Allen in [1]. It is worth noting that every relation has its dual obtained by switching the position of the two intervals. Consider, for example, two intervals I1 = [b1 , e1 ] and I2 = [b2 , e2 ]: we have that I1 D I2 (I1 during I2 ), if and only if b2 < b1 < e1 < e2 . By reverting the arguments, we have that I2 D I1 (I2 contains I1 ), if and only if b2 < b1 < e1 < e2 , which is equivalent to I1 D I2 . More precisely, given two intervals I = [b, e] and I ′ = [b′ , e′ ] we say that: (1) I = I ′ i↵ b = b′ and e = e′ ; (2) I M I ′ i↵ e = b′ ; (3) I S I i↵ b = b and e < e ; ′ ′ ′ (4) I F I ′ i↵ b > b′ and e = e′ ; (5) I O I i↵ b < b and b < e < e ; (6) I D I ′ i↵ b′ < b and e < e′ ; ′ ′ ′ ′ (7) I B I ′ i↵ e < b′ . In discussing our new functional dependencies based on intervals within a relational framework, we use a simple temporal (relational) data model based on the concept of temporal relation. A temporal relation r is a relation on a tem- poral relation schema R defined on attributes U ∪ {B, E}, where U represents a 332 C. Combi and P. Sala. Keeping interval-based functional dependencies up-to-date set of atemporal attributes and B, E are the temporal attributes describing the valid interval of a tuple. We assume that the domain of both attributes B and E is a totally ordered set O. Clearly, a tuple t ∈ r satisfies t[B] ≤ t[E]. We recall that, assuming the underlying domain for attributes A1 and A2 has a total order, atomic formulas for comparing tuples are either of the form t[A1 ] ✓ t′ [A2 ] or of the form t[A1 ] ✓ c, with ✓ ∈ {=, ≠, <, ≤, >, ≥}, A1 , A2 being attribute names, c a constant value and t, t′ tuples of relation r. To avoid ambiguities in the terminol- ogy employed, in the following we will use (temporal) instance for “(temporal) relation” and will let relation refer to Allen’s interval relations. 2.1 Interval-based temporal functional dependencies Let us now consider the basic definition of an Interval-based Temporal Functional Dependency (ITFD). In the following, we will only deal with interval relations in the set A = {S, F, B, M, D, O, =}. Indeed, in this case it is not meaningful to distinguish between a relation and its dual, as it will be clear from the following definition of interval-based temporal functional dependency. Definition 1. Let X and Y be sets of atemporal attributes of a temporal relation schema R = R(U, B, E) and ∼ an Allen’s interval relation. An instance r of R satisfies an ITFD X →∼ Y if for each pair of tuples t1 and t2 such that [t1 [B], t1 [E]] ∼ [t2 [B], t2 [E]] and t1 [X] = t2 [X], it is also true that t1 [Y ] = t2 [Y ]. Basically, ITFDs group tuples whose B and E attribute values satisfy the interval relation ∼. In the above definition, all the possible tuples having as valid interval either [b, e] or [b′ , e′ ], where [b, e] ∼ [b′ , e′ ] are considered together. If there exist two tuples having their valid intervals related through the considered relation ∼, respectively, and both tuples agree on (the tuple of) values of atemporal attributes X, then the ITFD imposes that both tuples must agree on (the tuple of) values of atemporal attributes Y . As already mentioned, we focus only on (sub) set A of Allen’s interval rela- tions, without considering the dual ones. Indeed, dual relations are not needed for the specification and verification of ITFDs, because ITFDs are based on the equality of the considered (atemporal) values. Thus, each (ordered) pair of tu- ples satisfying an interval relation will satisfy also the dual one, where tuples will be considered in the pair with the opposite order. In other words, any ITFD with a given interval relation implies also the corresponding ITFD with the dual relation (and vice versa). Let us now consider the first requirement expressed in Example 1 of Sect. 1: it can be rephrased as “overlapping drug administrations for a given patient must have the same physician”. This constraint can be expressed by the ITFD PatId →O Phys. A time-oriented graphical account of tuples of relation PatTherapies is pro- vided in the lower part of Fig. 1. As we may notice, the instance satisfies ITFD 333 C. Combi and P. Sala. Keeping interval-based functional dependencies up-to-date PatId →O Phys only for tuples related to the patient with PatId = 1. Dr. Cox added a therapy antipyretics, but the related valid interval is contained in the in- terval of therapy antiviral prescribed by Dr. Dorian. Tuples related to therapies of patient with PatId = 2 instead do not satisfy ITFD PatId →O Phys, as both intervals of therapies prescribed by Dr. Reid overlap a therapy prescribed by another physician. This kind of property cannot be expressed with point-based TFDs. ∼ Verifying the satisfaction of X → Y may be considered in two di↵erent but intertwined ways: i) given an instance r of R, check whether or not r satisfies ∼ ∼ X → Y , ii) given an instance r of R satisfying X → Y and a tuple t, verify ∼ whether r ∪ {t} still satisfies X → Y . We call the first problem checking ITFD satisfaction, while the second one is called incremental ITFD verification. It is not difficult to see that these two problems are closely related. In fact, checking ITFD satisfaction reduces to the incremental ITFD verification by adopting the algorithm developed for this problem and, starting from i = 0 with instance r0 = � with schema R, incrementally verifying whether ri ∪ {ti } with ti ∈ r � ri satisfies ∼ ∼ ITFD X → Y . If the update of ri with ti still verifies X → Y , then ri+1 = ri ∪{ti }, ∼ i = i + 1 and the algorithm is applied again. If r satisfies X → Y , after �r� iterations we can determine ITFD satisfaction. Some complexity improvements to this naive approach can be done as shown in Table 1. ITFD tuple insertion tuple deletion ITFD satisfaction checking X →S Y O(log(�r�)) O(log(�r�)) O(�r� ⋅ log(�r�)) X →F Y O(log(�r�)) O(log(�r�)) O(�r� ⋅ log(�r�)) X →B Y O(log(�r�)) O(log(�r�)) O(�r� ⋅ log(�r�)) X →M Y O(log(�r�)) O(log(�r�)) O(�r� ⋅ log(�r�)) X →D Y O(log(�r�)) O(�r�) O(�r� ⋅ log(�r�)) X →O Y O(log(�r�)) O(�r�) O(�r� ⋅ log(�r�)) Table 1. The complexities for the tuple insertion, deletion, and ITFD satisfaction checking, by our proposed incremental verification algorithm of ITFDs References [1] James F. Allen. Maintaining knowledge about temporal intervals. Commun. ACM, 26(11):832–843, 1983. [2] Carlo Combi, Elpida Keravnou-Papailiou, and Yuval Shahar. Temporal Information Systems in Medicine. Springer-Verlag New York, Inc., New York, NY, USA, 2010. [3] Carlo Combi and Pietro Sala. Temporal functional dependencies based on interval relations. In Carlo Combi, Martin Leucker, and Frank Wolter, editors, TIME, pages 23–30. IEEE, 2011. [4] Carlo Combi and Pietro Sala. Interval-based temporal functional dependencies: specification and verification. Annals of Mathematics and Artificial Intelligence, pages 1–46, 2013. 334