=Paper= {{Paper |id=Vol-2214/paper1 |storemode=property |title=An Algorithm for Verifying Approximate Pure Evolving Functional Dependencies |pdfUrl=https://ceur-ws.org/Vol-2214/paper1.pdf |volume=Vol-2214 |authors=Pietro Sala |dblpUrl=https://dblp.org/rec/conf/cilc/Sala18 }} ==An Algorithm for Verifying Approximate Pure Evolving Functional Dependencies== https://ceur-ws.org/Vol-2214/paper1.pdf
    An algorithm for verifying Approximate Pure
         Evolving Functional Dependencies‹

                                      Pietro Sala

             Department of Computer Science, University of Verona, Italy
                             pietro.sala@univr.it



        Abstract. Fuctional dependencies (FDs) form the foundations of data-
        base theory since the beginning of time, given they may represent con-
        straints such as “the employee role determines his monthly salary”. If
        we consider temporal databases where each tuple is timestamped with a
        valid time (V T ) attribute, finer constraints may be imposed on the evo-
        lution of the data such as “the employee role before a promotion and his
        role afterwards determine his new monthly salary”. Such constraints are
        called Pure Evolving Functional Dependencies (PEFDs) according to the
        classification introduced in [3]. By adding approximation to such rules
        they may be used for extracting knowledge in place of imposing con-
        straints. Then we may want to discover that “the employee role before a
        promotion and his role afterwards generally determine his new monthly
        salary”. This kind of approximate dependencies are called Approximate
        Pure Evolving Functional Dependencies (APEFDs). Verifying whether or
        not a given APEFD holds over a database instance is an NP-Complete
        problem [4]. Despite the discouraging complexity, in this work we pro-
        pose an algorithms that solve in an efficient way this problem by trying
        to divide-and-conquer and prune the search space as much as possible.

        Keywords: Temporal Databases · Approximate Functional Dependen-
        cies · Data Mining.


1     Temporally Mixed Functional Dependencies
According to [3], in the following we study a family of temporal functional de-
pendencies [2, 5, 6] i.e., the Pure Evolving Functional Dependencies, called PEFD.
From now on our temporal functional dependencies will be given over a temporal
schema R “ U Y tV T u, where U is a set of atemporal attributes and V T is a
special attribute denoting the valid time of each tuple. Hereinafter we assume
tuples timestamped with natural numbers (i.e, DompV T q “ N). Let J Ď U be
a non-empty subset of U . We define the set W as W “ U zJ and set W , which
is basically a renaming of attributes in W . Formally, for each attribute A P W ,
we have A P W (i.e., W “ tA : A P W u). Given an instance r of R, we define an
instance τJr of schema Rev “ JW W tV T, V T u as follows:
‹
    This work has not been previously published and it is an extension of the results
    published in [4].
                   $ ˇ       ¨                                                      ˛,
                   ’
                   ’
                     ˇ
                     ˇ          rptq^rpt1 q^trJs“t1 rJs“urJs^urW s“trW s^             /
                   &         ˚                                                      ‹/
                     ˇ       ˚ urW s“t1 rW s^trV T s“urV T s^t1 rV T s“urV T s ‹ .
            τJr “ u ˇˇ Dt,t1 ˚
                             ˚
                                                                                    ‹
                                                                                    ‹
                 ’
                 ’   ˇ       ˚
                             ˝                 ^trV T săt1 rV T s^                  ‹/
                                                                                    ‚/
                 % ˇ                                                                  -
                               @t2 pprpt2 q^trV T săt2 rV T sqÑt1 rV T sďt2 rV T sq

     Schema Rev is called the evolution schema of R. We will denote by τJR the
view on R that is built by expression τJr for every instance r of R. View τJR
joins two tuples t1 and t2 that agree on the values of the attributes in J (i.e.
t1 rJs “ t2 rJs) and t1 rV T s ă t2 rV T s. Moreover, such tuples are joined if does
not exist a tuple t P r with trJs “ t1 rJs and t1 rV T s ă trV T s ă t2 rV T s (i.e.,
there exists a tuple that holds at some point in between the valid times of such
tuples). For application purposes, it is correct correct to consider in a evolution
schema only those pair of consecutive tuples whose the difference between V T
and V T respects some given bound. Given a parameter k P N Y t`8u, tuples
of τJr are filtered by means of the selection ∆k pτJr q “ σtrV T s´trV T sďk pτJr q (notice
that ∆`8 pτJr q “ τJr ). ∆k pτJr q forces to consider only those tuples belonging to τJr
having a temporal distance within the given threshold k. In the following, given
a tuple t P τJr , we denote its temporal distance trV T s ´ trV T s with ∆ptq.
     A Pure Temporally Evolving Functional Dependency [3] over the temporal
schema R “ U Y tV T u, PEFD for short, is an expression of the form
                                     r∆k pτJR qsXY Ñ Z.
    We have that X Ď W and Y , Z Ď W with X ‰ H and |Z| “ 1 (Z contains a
single attribute). We say that an instance r of R fulfills a PEFD r∆k pτJR qsXY Ñ
Z, written r |ù r∆k pτJR qsXY Ñ Z, if and only if for each pair of tuples t, t1 P
∆k pτJr q we have trXs “ t1 rXs ^ trY s “ t1 rY s Ñ trZs “ t1 rZs.
    Now add approximation to PEFD in a very standard way [7, 2]. First, we
provide measurement G to deal with PEFD as follows:
    Gpr∆k pτJR qsXY Ñ Z, rq “ |r| ´ maxt|s| : s Ď r, s |ù r∆k pτJR qsXY Ñ Zu.
    By means of G we can define the relative scaled measurement g for TM-FD
as follows:
                                                  Gpr∆k pτJR qsXY Ñ Z, rq
               gpr∆k pτJR qsXY Ñ Z, rq “                                  .
                                                              |r|
    Now we are ready to define the Approximate Pure Temporally Evolving Func-
tional Dependency (APEFD for short). Formally, given a real number 0 ď  ď 1,
                                                                    
we say that an instance r of R satisfies the APEFD r∆k pτJR qsX Y Ñ Z, written
                    
r |ù r∆k pτJR qsXY Ñ Z, if and only if gpr∆k pτJR qsXY Ñ Z, rq ď .
    In Section 2 we provide an algorithm for checking an APEFD against an
instance r, we call this problem Check-APEFD:
Problem 1. (Check-APEFD). Given a temporal schema R, a PEFD r∆k pτJR qs
XY Ñ Z on R, an instance r of R, and a real number 0 ď  ď 1 determine
                                  
whether or not r |ù r∆k pτJR qsXY Ñ Z.
             #           Name     P hys     CT Durpminutesq      VT
             1        M cM urphy Sayer      self   „15       1 Jan 2016
             2        M cM urphy Sayer f amily     „5        5 Jan 2016
             3        M cM urphy M aguire f amily  „15      10 Jan 2016
             4        M cM urphy M aguire self     „15      15 Jan 2016
             5        M cM urphy M aguire self     „5        20 Jan 2016
          r“ 6        M cM urphy M aguire self     „5       25 Jan 2016
             7           Lowe     Sayer f amily    „15       7 Jan 2016
             8           Lowe     Sayer f amily    „15      15 Jan 2016
             9           Lowe    M aguire self     „15      22 Jan 2016
             10          Lowe     Sayer     self   „5       28 Jan 2016
             11          Lowe    M aguire self     „15       3 Feb 2016
             12          Lowe     Sayer     self   „5       6 Feb 2016

Fig. 1. An instance r of schema Contact that stores the phone contacts about two
psychiatric cases. Attribute # represents the tuple number and it is used only for
referencing tuples in the text (i.e., # does not belong to the schema Contact).




                                                   r
                                                  τName

   #       Name        P hys      CT       Dur       VT       P hys CT Dur             VT
  1, 2   M cM urphy    Sayer      self     „15   1 Jan 2016 Sayer f amily „5        5 Jan 2016
 2, 3    M cM urphy    Sayer     f amily   „5    5 Jan 2016 M aguire f amily „15   10 Jan 2016
  3, 4   M cM urphy M aguire f amily       „15   10 Jan 2016 M aguire self „15     15 Jan 2016
  4, 5   M cM urphy M aguire      self     „15   15 Jan 2016 M aguire self   „5    20 Jan 2016
  5, 6   M cM urphy M aguire      self     „5    20 Jan 2016 M aguire self   „5    25 Jan 2016
  7, 8      Lowe       Sayer     f amily   „15   7 Jan 2016 Sayer f amily „15      15 Jan 2016
  8, 9      Lowe       Sayer     f amily   „15   15 Jan 2016 M aguire self „15     22 Jan 2016
 9, 10      Lowe      M aguire    self     „15   22 Jan 2016 Sayer self      „5    28 Jan 2016
10, 11      Lowe       Sayer      self     „5    28 Jan 2016 M aguire self „15     3 Feb 2016
11, 12      Lowe      M aguire    self     „15   3 Feb 2016 Sayer self       „5    6 Feb 2016

                                                            r
                          Fig. 2. The evolution expression τN ame .
    Now we consider a scenario, borrowed from the clinical domain, in order to
provide examples of how PEFDs and APEFDs work. In particular, this scenario
is taken from psychiatric case register. Let us consider the temporal schema
Contact “ tN ame, P hys, CT, Duru Y tV T u. Such a schema stores values about
a phone-call service provided to psychiatric patients. This service is intended for
monitoring and helping psychiatric patients, who are not hospitalized. Whenever
a patient feels the need to talk to a physician, he can call the service. Data
about calls are collected according to schema Contact. For the sake of simplicity,
temporal attribute V T identifies the day when the call has being received, not
the exact time the call has begun which is usually its semantics in real world
applications. In addition, the service may be used by people somehow related
to patients, as, for instance, relatives afraid of current conditions of a patient.
More precisely, attribute N ame identifies patients, P hys identifies physicians,
CT (Contact Type) identifies the physical person who is doing the call (e.g.,
value ’self’ stand for the patient himself, ’family’ for a relative), and Dur stores
information about total duration of calls (value „ n means approximately n
                                                                        r            r
minutes). An instance r of R is provided in Figure 1. Instance τN         ame , and τJ
in general, may be seen as the output of a two-phase procedure. First, table
Contact is partitioned into subsets of tuples, one for each value of N ame. Then,
each tuple is joined with its immediate successor in its partition, w.r.t. V T
                               r
values. The whole relation τN    ame is provided in Figure 2. In the following we
will use t for referencing tuples of r and u for referencing tuples of τJr . Moreover,
in the following each tuple u in τJr will be identified by the pair of indexes of the
tuples in r that generate u. For instance, the first tuple of τJr in Figure 2 will be
denoted by u1,2 since it is generated by the join of tuples t1 and t2 in r.

     Going back to our example, it is worth noting that tuples t2 and t7 are
                       r
not joined in τN         ame , even if t7 rV T s “ t2 rV T s ` 2 and there is no tuple t
with trV T s “ t7 rV T s ` 1. This is due to the fact that t7 rNames ‰ t2 rNames
                                r                                                                r
forbids the join in τN            ame . Moreover, t1 and t3 are not joined in τName . In-
deed, the presence of tuple t2 with t1 rNames “ t2 rNames “ t3 rNames and
                                                               r
t1 rV T s ă t2 rV T s ă t3 rV T s forbids the join in τName           . Figure 3 graphically depicts
how pairs of tuples pt1 , t2 q, pt2 , t3 q, pt3 , t4 q, pt4 , t5 q, pt5 , t6 q and pt7 , t8 q, pt8 , t9 q, pt9 ,
                                                 r
t10 q, pt10 , t11 q, pt11 , t12 q are joined in τName   . In both scenarios depicted in Figure
3, nodes represent tuples and are labeled by the corresponding tuple number.
Values for attribute Dur are reported above each node. Values of P hys and
CT attributes are reported below every node, respectively. Every edge pti , tj q is
labeled by value ∆pui,j q “ tj rV T s ´ ti rV T s (i.e., the temporal distance between
                                                   r
two tuples). Basically, each tuple u P τN            ame corresponds to an edge in Figure 3
while we have a node for each tuple in r.
                                                 Contact
     In our example, we have that r |ù r∆5 pτName        qsP hys, P hys Ñ CT . How-
                Contact
ever, r * r∆6 pτName qsP hys, P hys Ñ CT , because of pairs pt2 , t3 q and pt10 , t11 q.
More precisely, we have that t2 rNames “ t3 rNames “ M cM urphy, t10 rNames “
t11 rNames “ Lowe, t2 rP hyss “ t10 rP hyss “ Sayer, t3 rP hyss “ t11 rP hyss “
M aguire, but t3 rCT s ‰ t11 rCT s (i.e., t3 rCT s “ f amily, and t11 rCT s “ self ).
                                          M cM urphy

                                        10

    „15     4 days
                       „5      5 days
                                        „15   5 days
                                                       „15     5 days
                                                                        „5    5 days
                                                                                        „5
     t1                t2                t3             t4              t5              t6
    Sayer             Sayer         M aguire         M aguire       M aguire       M aguire
     self            f amily        f amily            self           self           self

                                              Lowe


    „15               „15               „15             „5              „15             „5
            8 days             7 days         6 days           6 days         3 days
     t7                t8                t9             t10             t11             t12
  Sayer               Sayer         M aguire           Sayer        M aguire           Sayer
 f amily             f amily          self              self          self              self

                                                                 r
                        Fig. 3. A graph-based representation of τN ame .




In other words, we have that the set of tuples tu2,3 , u10,11 u does not satisfy the
FD P hysP hys Ñ CT .
    Let us observe that the two proposed PEFDs differ only for the maximum
temporal distance allowed. In particular, tuple u10,11 is responsible for r |zù
       Contact                                                             Contact
r∆6 pτName     qsP hys, P hys Ñ CT and it does not belong to ∆5 pτName             q because
                                                 Contact
of ∆pu10,11 q ą 5 then we have r |ù r∆5 pτName qsP hys, P hys Ñ CT . This allows
us to point out a general property of PEFDs, such a property holds for APEFDs,
too. Given a PEFD r∆k pτJR qsXY Ñ Z we have that for every instance r of R such
that r |ù r∆k pτJR qsXY Ñ Z then for every h ď k we have r |ù r∆h pτJR qsXY Ñ
Z.
                                                           R                    
    In our example, if we consider APEFD r∆6 pτName            qsP hys, P hys Ñ CT with
      1                                  R                      
 “ 12  , we have that r |ù r∆6 pτName        qs P hys, P hys Ñ CT . It is worth noting
                                      1
that, by considering relation r “ rztt3 u, this dependency would hold without
the need of approximation (i.e., if tuple t3 is deleted from relation r). More pre-
                  r1         r
cisely, we have τName    “ τName    ztu2,3 , u3,4 uYtu2,4 u. Notice that tuple u2,4 was not
                r
originally in τName because of tuple t3 . Figure 3 depicts this new scenario, by re-
placing edges pt2 , t3 q and pt3 , t4 q with the dashed edge pt2 , t4 q. Moreover, we have
                      R                                                       R
that r1 |ù r∆`8 pτName      qsP hys, P hys Ñ CT . Thus, r |ù r∆`8 pτName             qsP hys,
                          1
P hys Ñ CT with  “ 12 .


2     An algorithm for checking APEFDs

As we proved in [4], the problem Check-APEFD given an APEFD rτJR , swpkqsXY
 
Ñ Z and an instance r of R is NP-Complete in |r|. This is not an uncommon
situation in the realm of temporal functional dependencies (see [8, 1, 9, 7] for
further examples).
     Then, in principle, there is no asimptoptycally better algorithm than explore
                                                                1
the whole set of of possible subsets r1 of r with |r|´|r    |r|
                                                                  |
                                                                     ď . However, in the
following, we provide an algorithm that make use of heuristics for pruning the
search space in order to achieve the tractability for many cases.
     Such algorithm is general and it may be applied under no assumptions on
the input instance r, it makes use of two optimization techniques. The first op-
timization technique consists of trying, whenever is possible, to split the current
subset of r in two subsets on which the problem may be solved independently
(i.e., choices in one subset do not affect choices in the other one and viceversa).
The latter optimization technique consists of checking if the current partial so-
lution may not lead to an optimal solution (i.e., a solution r1 where |r1 | is the
maximum possible number of tuples that may be kept) if this happen the subtree
is pruned immediately (i.e., we are looking only for optimal solutions).
     Our algorithm relies on the concept of color that we will explain through an
                                                                    
example in the following. Given APEFD rτJR , swpkqsXY Ñ Z and an instance r
of R, let us suppose that we are solving problem Check-APEFD on such instance
with a simple guess-and-check procedure that make use of two, initially empty,
subset r` (the tuples to be kept in the solution) and r´ (the tuples to be deleted
in the solution) of r. At each step the procedure guess a tuple t in rzpr` Yr´ q and
decides non-deterministically (guessimg phase) either to update r` to r` Y ttu
(i.e., t is kept in the current partial solution) to update r´ to r´ Y ttu (i.e, t
is deleted in the current partial solution). When r “ r` Y r´ (check phase),
the procedure returns Y ES if r` |ù rτJR , swpkqsXY Ñ Z and |r´ | ď  ¨ |r|,
otherwise it returns N O. From now on, for us a partial solution is a triple
pr, r` , r´ q such that pr` Y r´ q Ď r and r` X r´ “ H, if r “ r` Y r´ we simply
say that pr` Y r´ q is a solution. A solution is consistent pr, r` , r´ q if and only if
                           
r` |ù rτJR , swpkqsXY Ñ Z. Given two partial solution pr, r`               ´            ` ´
                                                                      1 , r1 q and pr, r2 , r2 q
we say that pr, r2 , r2 q extends pr, r1 , r1 q if and only if r1 Ď r2 and r1 Ď r´
                    ` ´                 ` ´                         `       `       ´
                                                                                            2.
     Is there a way to check if we are generating an inconsistent solution possibly
without guessing all the tuples in r? Violations of the latter constraint (i.e.,
|r´ | ď  ¨ |r|) are fairly simple to detect during the guessing phase, it suffices to
check after each insertion in r´ if r´ exceeds  ¨ |r|, if it is the case the procedure
may return N O immediately without guessing any further. Violations of the
first constraint (i.e., r` |ù rτJR , swpkqsXY Ñ Z) during the guessing phase are
trickier to detect, then we need the following additional definitions. From now on
when two tuples t, t1 share the same value for the attribute J (i.e., trJs “ t1 rJs)
we will say that they are in the same J-group and we will say that trJs is the
value of the J-group containing t and t1 . For the sake of brevity, for a given
j P DompJq we will use j-group for denoting the J-group with value j. An
ordered pair, written o-pair, is a pair pt, t1 q P r ˆ r such that t and t1 are in the
same J-group and trV T s ă t1 rV T s. Given an o-pair t, t1 P r we say that the pair
pt, t1 q is an edge if and only if 0 ă t1 rV T s ´ trV T s ď k. Given triple pr, r` , r´ q,
an o-pair pt, t1 q P r is active if and only if t, t1 P r` and for every tuple t in
the same J-group of t if trV T s ă t ă t1 rV T s we have t P r´ (i.e., t and t1 are
selected in the current partial solution and all the tuples between t and t1 in
the same J-group are deleted). Given two valid times vt, vt1 P DompV T q and
a value j P DompJq we say that vt and vt1 are consecutive in j-group if and
only if there exists an active o-pair pt, t1 q with trV T s “ vt, t1 rV T s “ vt1 , and
trJs “ j. Notice that we may have two distinct values j, j 1 P DompJq and two
distinct valid times vt, vt1 P DompV T q which are consecutive in j-group and not
consecutive in j 1 -group. Moreover, we may have edges pt, t1 q that are not active
and active pairs pt, t1 q that are not edges, such is the case of active pairs pt, t1 q
with t1 rV T s´trV T s ą k. A color is a tuple c on the schema C “ XY Z, two colors
px, y, zq and px1 , y 1 , z 1 q are conflicting if and only if x “ x1 , y “ y 1 and z ‰ z 1 .
Given an o-pair pt, t1 q, denoted by cpt, t1 q is the tuple cpt, t1 q “ ptrXs, trY s, t1 rZsq.
Two o-pairs pt, t1 q, pt2 , t3 q are conflicting if and only if cpt, t1 q and cpt2 , t3 q are
conflicting. The following result it is easy to prove and its proof is left as exercise
for the reader.
                                                         
Theorem 1. Given an APEFD rτJR , swpkqsXY Ñ Z, an instance r of R, and a
partial solution pr, r` , r´ q, if there exists two active edges pt, t1 q and pt2 , t3 q
then every solution pr, r` f , r´ f q that follows pr, r` , r´ q satisfy r`
                                                                          f  z|ù rτJR ,
swpkqsXY Ñ Z (i.e., is inconsistent).

     The above theorem guarantees that from a partial solution pr, r` , r´ q that
feature at least two conflicting edges we cannot reach a solution pr, r` f , r´ f q
that satisfy the first constraint and in such a case we may return immedi-
ately N O without going any further from pr, r` , r´ q. The colours of a par-
tial solution pr, r` , r´ q is the set colourspr, r` , r´ q “ tptrXs, t1 rY s, t1 rZsq :
pt, t1 q is an active edge in pr, r` , r´ qu. It is easy to see that the hypothesis of
Theorem 1 applies if and only if colourspr, r` , r´ q contains at least two conflict-
ing colours. Then, by means of colours, our above guess-and-check procedure
may be improved by adding the control on the size of r´ and by keeping up-
dated the current set of colours colourspr, r` , r´ q. Once an insertion of a tuple
in either r` or r´ introduce a color c that is conflicting with at least one colour
in pr, r` , r´ q the procedure answers N O immediately.
     An example of how the procedure works is given in Figure 4 where we have
an instance of 5 tuples with  “ 0.2 (i.e., we may delete at most one tuple) and
swpkq “ 6 (all the tuples are in the same window). The execution depicted in
Figure 4 guesses the values of the tuples from the oldest (t1 ) to the newest one
(t2 ) according to the value of V T . First it tries to put the current tuple t in r` if
no violation arises it continues, if some violation arises it tries to insert the tuple
t in r´ if no violation arises it continues otherwise it goes back to the previous
choice (i.e., backtracking). Every internal node is labelled with the current tuple
which will be guessed next, every leaf is labelled either with Y ES (i.e., the
current branch is a solution) or N O (i.e, a violation has arisen), the current set
of colours is reported within the node. Nodes are numbered according to their
order of appearance. We have that the root is n1 followed by the introduction of
the nodes n1 . . . n4 in this precise order. If we introduce t4 in the partial solution
associated to n4 we violate the first constraint. Since in n4 adding t4 in r´ does
not generate any violation, node n5 is created as child of n4 . However, node
                                                            px1 ,y1 ,z2 q

                                         px1 ,y1 ,z2 q                       px2 ,y1 ,z2 q

                         px1 ,y1 ,z2 q                      px2 ,y1 ,z2 q                    px1 ,y1 ,z2 q




x1 |y1 |z1 px1 ,y1 ,z1 q x2 |y1 |z1 px2 ,y1 ,z2 q x1 |y1 |z2 px1 ,y1 ,z2 q x2 |y1 |z2 px2 ,y1 ,z2 q x1 |y1 |z2
                                                   n1 t1 ,tu


                                                             t1 Pr`

                                                   n2 t2 ,tu

                                                                                                             # J X Y    Z VT
                                                             t2 Pr`
                                                                                                             1 j1 x1 y1 z1   1
                                              n3 t3 ,tpx1 ,y1 ,z1 qu
                                                                                                             2 j1 x2 y1 z1   2
                                              t3 Pr`                    t3 Pr´                               3 j1 x1 y1 z2   3
                            "               *
                     n4 t4 , px1 ,y1 ,z1 q,                              n6 t4 ,tpx1 ,y1 ,z1 qu              4 j1 x2 y1 z2   4
                              px2 ,y1 ,z2 q
                                                                                                             5 j1 x1 y1 z2   5
                          t4 Pr`
                                            t4 Pr´                                           t4 Pr`
                       NO                                      n5
                conflict between                        "
                                                            px1 ,y1 ,z1 q,
                                                                             *             "               *
          px1 ,y1 ,z1 q and px1 ,y1 ,z2 q        t5 ,                               n7 t5 , px1 ,y1 ,z1 q,
                                                            px2 ,y1 ,z2 q                    px2 ,y1 ,z2 q

                                               t5 Pr`               t5 Pr´                       t5 Pr`

                                         NO                                  NO                   Y ES
                                conflict between                        |r´ |ą1          r“r` Yr´ , |r´ | ď 1, "
                          px1 ,y1 ,z1 q and px1 ,y1 ,z2 q                                                                       *
                                                                                                                 px1 ,y1 ,z1 q,
                                                                                         colourspr, r` , r´ q “  px2 ,y1 ,z2 q




Fig. 4. An example of how the use of colors improve a guess and check procedure for
solving the problem Check-APEFD.



n5 cannot be extended without introducing a violation in the above constraints
because if put t5 in r` we introduce a conflicting color, while if we put t5 in r´
we exceed the maximum number of allowed deletions. We backtrack to n4 and
we have that even for it all the possible choices have been explored and thus we
backtrack to n3 where the choice of adding t3 to r´ is attempted generating the
node n6 . From n6 we put t5 in r` without violating any constraint and thus we
                                              0.2
have that tt1 , t2 , t4 , t5 u |ù rτJR , 6sXY Ñ Z.
    Let us consider deeper in the description of the first algorithm. On an higher
lever the algorithm operates like the procedure above apart from some trivial
technicalities. However, two more heuristics are introduced in order to possibly
stop early during the exploration of a branch in the tree of computation. The
main procedure of the algorithm is reported in Figure 7 with auxiliary procedures
reported in Figure 5 and in Figure 6. The algorithm is implemented by the
function T upleW iseM in that takes 4 arguments. The first argument is Gr which
                                                                          
is a pre-processing of r based on the APEFD rτJR , swpkqsXY Ñ Z that has to be
checked. More precisely, Gr is an instance of the schema J, X, Y, Z, V T, count,
with Dompcountq “ N. We have that t P Gr if and only if there exists t1 P r
for which pt1 rJs, t1 rXs, t1 rY s, t1 rZsq “ ptrJs, trXs, trY s, trZsq and trcounts “ |tt1 P
r : pt1 rJs, t1 rXs, t1 rY s, t1 rZsq “ ptrJs, trXs, trY s, trZsqu|, that is, we count how
procedure InBetweenpGr , t1 , t2 q
           returns the subset of Gr consisting of all the tuples in the same J-group
 comment:
           of t1 and t2 that feature valid times between t1 rV T s and t2 rV T s.
 return tt P Gr : t1 rV T s ă trV T s ă t2 rV T s ^ trJs “ t1 rJsu

procedure EdgeConflict?ppt1 , t2 q, pt11 , t12 qq
 comment: returns true if and only if the two input edges feature conflicting colors.
 if t1 rXs “ t11 rXs ^ t2 rY s “ t12 rY s ^ t2 rZs ‰ t12 rZs
   then return true
   else return false

procedure ColorConflict?ppt1 , t2 q, Cq
           returns true if and only if the color of the edge pt1 , t2 q is conflicting
 comment:
           with at least one color in the set of colors C.
 if Dcpc P C ^ t1 rXs “ crXs ^ t2 rY s “ crY s ^ t2 rZs ‰ crZsq
   then return true
   else return false

procedure E!pGr , k, G`         ´
                          r , Gr q
           returns the set of all and only active edges in the current partial
 comment:
           solution.
       "                                                               *
                     t1 rJs “ t2 rJs ^ 0 ă t2 rV T s ´ t1 rV T s ď k^
 return pt1 , t2 q :
                     tt1 , t2 u Ď G`                                 ´
                                   r ^ InBetweenpGr , t1 , t2 q Ď Gr q


procedure E?pGr , k, G`        ´
                          r , Gr , Cq
           returns the set of all and only pending edges in the current partial
 comment:
           solution.
       $                                                                                    ,
       ’
       ’             t1 rJs “ t2 rJs ^ 0 ă t2 rV T s ´ t1 rV T s ď k ^ tt1 , t2 u X G´
                                                                                     r “ H^ /
                                                                                            /
                     ^ ColorConf lict?ppt1 , t2 q, Cq^
       &                                                                                    .
 return pt1 , t2 q :                                 `
       ’
       ’             ^InBetweenpGr , t1 , t2 q X Gr “ H^                                    /
                                                                                            /
                     ^ptt1 , t2 u Y InBetweenpGr , t1 , t2 qq Ę pG`        ´
                                                                    r Y Gr q
       %                                                                                    -


procedure Reach?pt1 , t2 , N odes, Edgesq
           returns true if and only if there exists a path from t1 to t2 in the graph
 comment: pN odes, Edgesq. It is a function that checks wether or not there exits
           a path between two nodes in a graph.
 if Dptt1 . . . tm u Ď N odesqpt1 “ t1 ^ tm “ t2 ^ @p1 ď i ă mqppti , ti`1 q P Edgesqq
   then return true
   else return false


   Fig. 5. Auxiliary procedures used by procedures presented in Figures 6 and 7.
procedure GroupIndependent?pj, Gr , k, G`         ´
                                             r , Gr , Cq
           returns true if and only if in the current partial solution pending edges
 comment: involving tuples belonging to the J-group with value j are not
           conflicting with the pending edges introduced by the others J-groups.
 Ej Ð tpt1 , t2 q P E?pGr , k, G`          ´
                                     r , Gr , Cq : t1 rJs “ ju
                        `       ´
 E j Ð E?pGr , k, Gr , Gr , CqzEj
 if Dt1 Dt2 Dt1 Dt2 ppt1 , t2 q P Ej ^ pt, t2 q P E j ^ EdgeConf lict?ppt1 , t2 q, pt1 , t2 qqq
   then return false
   else return true

procedure MacConsistentSubsetpEq
           returns the maximal subset E 1 of E such that d for every edge
 comment: pt11 , t12 q P E 1 and for every edge pt1 , t2 q P E we have that cpt11 , t12 q and
           cpt1 , t2 q are not conflicting.
 E1 Ð H                         "
                                 if @t1 @t2 ppt1 , t2 q P E Ñ EdgeConf lict?ppt1 , t2 q, pt1 , t2 qqq
 for each pt1 , t2 q P E do
                                   then E 1 Ð E 1 Y tpt1 , t2 qu
 return E 1

procedure ReplacePath?pt, t, Gr , k, G`      ´
                                        r , Gr , Cq
           returns true if and only if in the current partial solution the consecutive
 comment:
           valid times trV T s and trV T s in trJs-group may be safely replaced.
 if   Dt1 pt1 P Gr ^ t1 rJs “ trJs ^ trV T s ă t1 rV T s ă trV T sq
   then return false
 Ns Ð tt1 P G`    r : t1 rJs “ trJs ^ t1 rV T s “ trV T su
 Ne Ð tt1 P G`    r : t1 rJs “ trJs ^ t1 rV T s “ trV T su
 Nm Ð tt1 P G´     r : t1 rJs “ trJs ^ trV T s ă t1 rV T s ă trV T su
 N Ð Ns Y Ne Y Nm
 Ẽ Ð tpt1 , t2 q : t1 P N ^ t2 P N ^ t1 rV T s ă t2 rV T s ^ pt1 R Ns _ t2 R Ne qu
 P airs Ð tpt1 , t2 q : t1 P N ^ t2 P N ^ t1 rV T s ` k ă t2 rV T s ^ pt1 R Ns _ t2 R Ne qu
 Ẽok Ð pM axConsistentSubsetpE?pGr , k, G`                  ´
                                                        r , Gr , Cqq X Ẽq Y P airs
 N otSaf e Ð H
 for each
        $ pt1 , t2 q P ẼzẼok                       `  ´                      `  ´
        &for each " pt1 , t2 q P pE?pGr , k, Gr , Gr , Cq Y E!pGr , k, Gr , Gr q Y Ẽq
   do               if EdgeConf lict?ppt1 , t2 q, pt1 , t2 qq
        % do
                      then N otSaf e Ð N otSaf e Y tpt1 , t2 qu
 Ẽ Ð ẼzN otSafˆe                                                           ˙
                              t1 , t2 P Nm ^t1 P Ns ^t2 P Ne^
 if Dt1 Dt2 @t1 @t2
                       ^pt1 , t1 q, pt2 , t2 q P Ẽ ^Reach?pt1 , t2 , N, Ẽq
   then return true
 return false


      Fig. 6. Auxiliary procedures used by procedure T upleW iseM in (Figure 7).
Algorithm 2.1: TupleWiseMin(Gr , k, G`       ´
                                     r “ H, Gr “ H, C “ H)

procedure MaximalPaths?pGr , k, G`        ´
                                     r , Gr , Cq
           returns true if and only if in the current partial solution for every
 comment: J-group j-group every pair of consecutive valid times vt and vt1 in
           j-group cannot be safely replaced.
 if Dt1 Dt2 ppt1 , t2 q P E!pGr , k, G`    ´                                      ` ´
                                      r , Gr q ^ ReplaceP ath?pt1 , t2 , k, Gr , Gr Gr , Cqq
   then return false
   else return true

procedure IsConsistent?pGr , k, G`      ´
                                   r , Gr , Cq
           returns true if and only if the current partial solution features
 comment: consistent colors in C and for every J-group j-group every pair of
           consecutive valid times vt and vt1 in j-group cannot be safely replaced.
 if Dt1 Dt2 ppt1 , t2 q P E!pGr , k, G`     ´
                                       r , Gr q ^ ColorConf lict?ppt1 , t2 q, Cqq
   then return false
 if M aximalP aths?pGr , k, G`             ´
                                     r , Gr , Cq
   then return true
   else return false

main
                returns the minimum value m “ min ||Gr zG1r ||
 comment:                                       XY Zcount
                for G1r in tG2r Ď Gr : G1r |ù rτGr
                                                          sXY Ñ Zu.
 if Gr “ H then return ||G´     r ||
 if Djpj P$DompJq ^ GroupIndependent?pj, Gr , k, G`                ´
                                                              r , Gr , Cqq
          &Gr Ð ttˆP Gr : trJs “ ju                                              ˙
   then              T upleW iseCheckpGr , k, G`                    ´
                                                        r X Gr , Gr X Gr , Cq`
          %return                                            `          ´
                       T upleW iseCheckpGr zGr , k, Gr zGr , Gr zGr , Cq
 let t P Gr
                                 `           ´
 if IsConsistent?pG
          " 1        r zttu, k, Gr Y ttu, Gr , Cq
           C Ð C Y tpt1 rXs, t2 rY s, t2 rZsq : pt1 , t2 q P E!pGr zttu, k, G`         ´
                                                                             r Y ttu, Gr qu
   then                                                 `           ´    1
           mt Ð T upleW iseCheckpGr zttu, k, Gr Y ttu, Gr , C q
   else mt Ð `8
                                 `    ´
 if IsConsistent?pG
          " 1        r zttu, k, Gr , Gr Y ttu, Cq
           C Ð C Y tpt1 rXs, t2 rY s, t2 rZsq : pt1 , t2 q P E!pGr zttu, k, G`    ´
                                                                             r , Gr Y ttuqu
   then                                                 `    ´           1
           mzt Ð T upleW iseCheckpGr zttu, k, Gr , Gr Y ttu, C q
   else mzt Ð `8
 return minpmt , mzt q


Fig. 7. The main procedure for checking APEFDs in a tuple-wise fashion. Notice that
we use a compact notation for the recursive procedure which is initially called as
T upleW iseM inpGr , kq where when G`      ´
                                      r , Gr , and C are ommitted in the procedure call
they get their respective default values specified in the procedure declaration (i.e., H
for each of them in this case).
many tuples in r share the same values for attributes J, X, Y, and Z. The input
parameter k is the length for the sliding window grouping swpkq. The sets G`                   r
and G´    r , originally initialized to H, represent the tuples of Gr that are either
kept or deleted in the current solution respectively. On instances r of schema
R satisfying count P R we denote       ř      with ||r|| the sum on the count attribute for
the tuples in r (i.e., ||r|| “ tPr trcounts). Finally, C is a set of colors which is
initially set to H. A color c is a tuple on the schema X, Y, Z. As we will see, C
keeps track via colours of the constraints introduced so far in the construction
of the solution.
    The procedure T upleW iseM in returns the minimum number of tuples that
has to be deleted from r in order to obtain an instance r1 such that r1 |ù
rτJR , swpkqsXY Ñ Z. Then if such minimum is less or equal than  ¨ |r| we
                                                                                         
can conclude r |ù rτJR , swpkqsXY Ñ Z else we have r |ù               z rτJR , swpkqsXY Ñ Z.
Given Gr , G`          ´                                                           1
                 r , Gr , and a set of colours C we say that an edge pt, t q P Gr ˆ Gr
is pending if and only if the following conditions hold:
 1. t, t1 R G´                     1
                r and ptrXs, t rY s, zq R C for every z P DompZq;
 2. for every t with t rJs “ trJs and trV T s ă t2 rV T s ă t1 rV T s we have t2 R G`
                   2         2
                                                                                              r ;
 3. there exists t2 P pGr Y tt, t1 uqzpG´          r Y  G`
                                                         r q with t2
                                                                     rJs “  trJs  and trV T s ď
      t2 rV T s ď t1 rV T s.
Informally speaking, a pending edge is an edge that is not active in the current
partial solution but it may become active during the computation and, if it
happens, it introduces a new color in C. In our algorithm, pending edges for the
current partial solution are retrieved by the procedure E? while active edges are
retrieved by the procedure E!.
    The procedure T upleW iseM in (Figure 7) works as follows. If G`                    ´
                                                                                  r YGr “ Gr
it means that we have obtained a solution without violating any constraint and
thus we can return ||G´                                                           `     ´
                               r || (i.e., the number of deleted tuples). If Gr Y Gr ‰ Gr
                                                     `     ´
the algorithm guesses a tuple t P Gr zpGr Y Gr q and proceed as follows. First, it
checks if inserting t into G`        r does not cause any violation in the constraints, if so
it stores in mt the value of the recursive call to T upleW iseM in where t belongs
to G`  r and C has been updated accordingly. Notice that by inserting a tuple t
in G`  r the algorithm is asserting that t belongs to the current partial solution,
while by inserting t in G´         r the algorithm is asserting that t does not belong to
the current partial solution. If a constraint is violated the algorithm stores in mt
the value `8, which means that t may not be kept in the current solution.
    Then, it checks if inserting t into G´            r does not cause any violation in the
constraints, if so it stores in mzt the value of the recursive call to T upleW iseM in
where G´    r and C are updated accordingly. If a constraint is violated the algorithm
stores in mzt the value `8, which means that t must be kept in the current partial
solution. In procedure T upleW iseM in the only way in which a constraint may
be violated is that, after the insertion a tuple t in G`                    ´              1 2
                                                                r (resp. Gr ), an edge pt , t q
                                                 1     2     2
turns out to be active and its color pt rXs, t rY s, t rZsq turns out to be conflicting
with at least one color in C.
    The first one allows us to restrict the search space by splitting the problem
into independent sub-problems in a divide-and-conquer fashion. Let us suppose
that at a certain step of our computation there exists a value j P ttrJs : t P Gr u
                                                                          1
for which for each pair of conflicting pending edges pt, t1 q and pt, t q we have that
                           1                                         1
either all t, t1 , t, and t belong to j-group or all t, t1 , t, and t do not belong to j-
group (such condition is verified by sub-procedure GroupIndependent? reported
in Figure 6.). Let Gr “ tt : trJs “ ju, if every edge involving tuples in j-group
is not conflicting with every edge that may be introduced outside j-group then
we can split the problem into the two sub-problems pGr , k, G`                   ´
                                                                       r X Gr , Gr X Gr q
and pGr zGr , k, G`    r zGr , G´
                                r zGr q. Such  problems  are    independent and  may be
resolved in a separate way. The resulting value for the solution is the sum of the
values returned by T upleW iseM in applied to both the two sub-problems. Let
H “ |G´         `        ´                    ´    `    ´
        r zpGr Y Gr q| and h “ |tt P pGr zpGr Y Gr qq : trJs “ ju|, notice that, in
this case, the upper bound of the complexity at the current step of computation
drops from Op2H q to Op2H´h ` 2h q.

    The second optimization allows us to prune a sub-tree of computation even
before a contradiction arises. The second optimization implements a criteria to
detect, in many cases, if every possible solution that may be built starting from
the current partial one turns to be not minimal. Suppose that there exists an
active o-pair pt, t1 q in a partial solution pGr , G`    ´
                                                    r , Gr q such that there exists t P Gr
in the same J-group of t with trV T s ă trV T s ă t1 rV T s. By definition of active
                                                                        1
o-pair, we have that t belongs to G´       r as well as every tuple t in the same J-
                               1
group of t with trV T s ă t rV T s ă t1 rV T s, here the additional condition is that
there exists at least one of such tuples. Given a partial solution pGr , G`             ´
                                                                                   r , Gr q
                                    `   ´                          `    ´
we denote with colorspGr , Gr , Gr q the set colorspGr , Gr , Gr q “ tpx, y, zq :
there exists an active edge pt, t1 q with cpt, t1 q “ px, y, zqu. Let us define the set
of colors pendingpGr , G`         ´
                             r , Gr q “ tpx, y, zq : there exists a pending edge pt, t q
                                                                                         1
            1
 with cpt, t q “ px, y, zqu which collects all and only the colors that may be
introduced later on in the current computation.

   A color px, y, zq is safe in pvt, vt1 , jq if and only if one of the following three
conditions hold:

 1. px, y, zq P colorspGr , G`      ´
                              r , Gr q;
 2. every color px, y, z q in pendingpGr , G`
                         1                         ´              1
                                              r , Gr q satisfies z “ z (i.e., px, y, zq is a
    pending color and there is no pending color that is conflicting with px, y, zq);
 3. the color is not conflicting with any color in colorspGr , G`         ´
                                                                    r , Gr q Y pendingp
                                                         1                      2
    Gr , G`     ´                                                `     ´
           r , Gr q and do not exist two tuples t, t P pGr Y Gr q X tt P Gr :
     2                        2                             1
    t rJs “ j ^ trV T s ď t ď t1 rV T su such that pt, t q is an edge and the color
             1      1
    ptrXs, t rY s, t rZsq is conflicting with px, y, zq

Notice that the above three conditions imply that if a color is safe in pvt, vt1 , jq
then it is neither in conflict with a color colorspGr , G`      ´
                                                           r , Gr q nor with a color in
                `    ´
pendingpGr , Gr , Gr q, however this is just a necessary but not sufficient condi-
tion. Given a partial solution pGr , G`     ´                         1              1
                                       r , Gr q and the triple pvt, vt , jq a pvt, vt , jq-
                                                    `
replace DAG is a DAG pV, Eq where V “ tt P Gr : trV T s “ vt ^ trJs “ ju Y tt P
G´                      1                        `                 1
 r : vt ă trV T s ă vt ^ trJs “ ju Y tt P Gr : trV T s “ vt ^ trJs “ ju and
        "                                                                             *
                             ptrV T s ‰ vt _ t1 rV T s ‰ vt1 q ^ trV T s ă t1 rV T s^
          pt, t1 q P V ˆ V :
                                         cpt, t1 q is safe in pvt, vt1 , jq
  E“
                                               Y
        tpt, t1 q P V ˆ V : ptrV T s ‰ vt _ t1 rV T s ‰ vt1 q ^ t1 rV T s ´ trV T s ą ku

      A node t P V is a starting node (resp. ending node) if and only if vt ă
trV T s ă vt1 and for every t1 P V with t1 rV T s “ vt (resp. t1 rV T s “ vt1 ) we have
pt1 , tq P E (resp. pt, t1 q P E). A replace path is pvt, vt1 , jq-replace DAG pV, Eq is
any path t1 . . . tm in pV, Eq for whicht t1 is a starting node and tm is an ending
node. We say that vt and vt1 in j can be safely replaced if and only if there exists
a replace path in the pvt, vt1 , jq-replace DAG pV, Eq.
      Using the above definitions of replace DAGs/paths we can provide the fol-
lowing result.

Theorem 2. Given a partial solution pGr , G`       ´
                                              r , Gr q if there exists a group j with
two consecutive valid times vt and vt such that vt and vt1 can be safely replaced
                                      1

in j then every consistent solution that follows pGr , G`      ´
                                                          r , Gr q is not optimal.

    The proof of the theorem is very simple and left as exercise to the reader.
Let us suppose that t1 . . . tm is a replace path in the pvt, vt1 , jq-replace DAG,
it must exists by hypothesis and by definition we have t1 , . . . , tm P G´            r , it suf-
                                                         `    ´
fices to take any consistent solution pGr , Gr , Gr q that follows pGr , G`                    ´
                                                                                          r , Gr q
                `                        ´
and that pGr , Gr Y tt1 , . . . , tm u, Gr ztt1 , . . . , tm uq is still a consistent solution,
non-optimality immediately follows. We take advantage of Theorem 2 by prun-
ing every computation rooted in a partial solution pGr , G`                     ´
                                                                          r , Gr q that features
                                                                              1
a J-group j-group and two consecutive valid times vt and vt in j-group such
that vt and vt1 can be safely replaced in j. Notice that verify wether or not
such condition applies may be performed in polynomial time. In the procedure
T upleW iseM in, this optimization is realized by the joint work of sub-procedures
M aximalP aths? and ReplaceP ath? reported in Figure 7 and 6 respectively.



3    Conclusion

In this work we propose an algorithm for solving the problem of verifying whether
or not a given APEFD EF holds over a given instance r of a temporal schema.
Such problem is known to be NP-Complete in the size of r (i.e., data-complexity).
We expect our algrithm to perform considerably better w.r.t. standard branch
and bound procedures since it considerably prune the search space in most cases.
We are building a prototype in order to measure the effective improvement w.r.t.
frameworks that solve generic NP-Complete problems (i.e., Integer Linear Pro-
grammming tools, SAT solvers, and logic programming), obviously this implies
that we have to encode our problem in such formalisms.
References
1. Combi, C., Mantovani, M., Sabaini, A., Sala, P., Amaddeo, F., Moretti, U.,
   Pozzi, G.: Mining approximate temporal functional dependencies with pure tem-
   poral grouping in clinical databases. Comp. in Bio. and Med. 62, 306–324 (2015).
   https://doi.org/10.1016/j.compbiomed.2014.08.004, https://doi.org/10.1016/j.
   compbiomed.2014.08.004
2. Combi, C., Mantovani, M., Sala, P.: Discovering quantitative temporal functional
   dependencies on clinical data. In: 2017 IEEE International Conference on Health-
   care Informatics, ICHI 2017, Park City, UT, USA, August 23-26, 2017. pp. 248–257.
   IEEE (2017). https://doi.org/10.1109/ICHI.2017.80, https://doi.org/10.1109/
   ICHI.2017.80
3. Combi, C., Montanari, A., Sala, P.: A uniform framework for temporal functional
   dependencies with multiple granularities. In: Pfoser, D., Tao, Y., Mouratidis, K.,
   Nascimento, M.A., Mokbel, M.F., Shekhar, S., Huang, Y. (eds.) Advances in Spatial
   and Temporal Databases - 12th International Symposium, SSTD 2011, Minneapolis,
   MN, USA, August 24-26, 2011, Proceedings. Lecture Notes in Computer Science,
   vol. 6849, pp. 404–421. Springer (2011). https://doi.org/10.1007/978-3-642-22922-
   0 24, https://doi.org/10.1007/978-3-642-22922-0$\_$24
4. Combi, C., Rizzi, R., Sala, P.: The price of evolution in temporal databases.
   In: Grandi, F., Lange, M., Lomuscio, A. (eds.) 22nd International Sympo-
   sium on Temporal Representation and Reasoning, TIME 2015, Kassel, Ger-
   many, September 23-25, 2015. pp. 47–58. IEEE Computer Society (2015).
   https://doi.org/10.1109/TIME.2015.24, https://doi.org/10.1109/TIME.2015.24
5. Combi, C., Sala, P.: Temporal functional dependencies based on interval
   relations. In: Combi, C., Leucker, M., Wolter, F. (eds.) Eighteenth In-
   ternational Symposium on Temporal Representation and Reasoning, TIME
   2011, Lübeck , Germany, September 12-14, 2011. pp. 23–30. IEEE (2011).
   https://doi.org/10.1109/TIME.2011.15, https://doi.org/10.1109/TIME.2011.15
6. Combi, C., Sala, P.: Interval-based temporal functional dependencies: spec-
   ification and verification. Ann. Math. Artif. Intell. 71(1-3), 85–130 (2014).
   https://doi.org/10.1007/s10472-013-9387-1,              https://doi.org/10.1007/
   s10472-013-9387-1
7. Combi, C., Sala, P.: Mining approximate interval-based temporal dependen-
   cies. Acta Inf. 53(6-8), 547–585 (2016). https://doi.org/10.1007/s00236-015-0246-x,
   https://doi.org/10.1007/s00236-015-0246-x
8. Sala, P.: Approximate interval-based temporal dependencies: The complex-
   ity landscape. In: Cesta, A., Combi, C., Laroussinie, F. (eds.) 21st Interna-
   tional Symposium on Temporal Representation and Reasoning, TIME 2014,
   Verona, Italy, September 8-10, 2014. pp. 69–78. IEEE Computer Society (2014).
   https://doi.org/10.1109/TIME.2014.20, https://doi.org/10.1109/TIME.2014.20
9. Sala, P., Combi, C., Cuccato, M., Galvani, A., Sabaini, A.: A framework for mining
   evolution rules and its application to the clinical domain. In: Balakrishnan, P., Sri-
   vatsava, J., Fu, W., Harabagiu, S.M., Wang, F. (eds.) 2015 International Conference
   on Healthcare Informatics, ICHI 2015, Dallas, TX, USA, October 21-23, 2015. pp.
   293–302. IEEE Computer Society (2015). https://doi.org/10.1109/ICHI.2015.42,
   https://doi.org/10.1109/ICHI.2015.42