=Paper= {{Paper |id=Vol-1703/paper15 |storemode=property |title=Characterization of Order-like Dependencies with Formal Concept Analysis |pdfUrl=https://ceur-ws.org/Vol-1703/paper15.pdf |volume=Vol-1703 |authors=Victor Codocedo,Jaume Baixeries,Mehdi Kaytoue,Amedeo Napoli |dblpUrl=https://dblp.org/rec/conf/ecai/CodocedoBKN16 }} ==Characterization of Order-like Dependencies with Formal Concept Analysis == https://ceur-ws.org/Vol-1703/paper15.pdf
Contributions to the Formalization of Order-like
           Dependencies using FCA?

 Victor Codocedo1 , Jaume Baixeries2 , Mehdi Kaytoue1 , and Amedeo Napoli3
     1
      Université de Lyon. CNRS, INSA-Lyon, LIRIS. UMR5205, F-69621, France.
        2
          Universitat Politècnica de Catalunya. 08032, Barcelona. Catalonia.
    3
      LORIA (CNRS - Inria Nancy Grand Est - Université de Lorraine), B.P. 239,
                            F-54506, Vandœuvre-lès-Nancy.




          Abstract. Functional Dependencies (FDs) play a key role in many fields
          of the relational database model, one of the most widely used database
          systems. FDs have also been applied in data analysis, data quality, knowl-
          edge discovery and the like, but in a very limited scope, because of their
          fixed semantics. To overcome this limitation, many generalizations have
          been defined to relax the crisp definition of FDs. FDs and a few of their
          generalizations have been characterized with Formal Concept Analysis
          which reveals itself to be an interesting unified framework for charac-
          terizing dependencies, that is, understanding and computing them in a
          formal way. In this paper, we extend this work by taking into account
          order-like dependencies. Such dependencies, well defined in the database
          field, consider an ordering on the domain of each attribute, and not sim-
          ply an equality relation as with standard FDs.



1        Introduction

Functional dependencies (FDs) are well-known constraints in the relational model
used to show a functional relation between sets of attributes [10], i.e. when the
values of a set of attributes are determined by the values of another set of at-
tributes. They are also used in different tasks within the relational data model,
as for instance, to check the consistency of a database, or to guide the design of
a data model [9].
    Different generalizations of FDs have been     id Month Year Av. Temp. City
defined in order to deal with imprecision, errors  t1   1   1995    36.4   Milan
and uncertainty in real-world data, or simply,     t2   1   1996    33.8   Milan
                                                   t3   5   1996    63.1   Rome
to mine and discover more complex patterns         t4   5   1997    59.6   Rome
and constraints within data when the seman-        t5   1   1998    41.4   Dallas
                                                   t6   1   1999    46.8   Dallas
tics of FDs have shown to be too restrictive for   t7   5   1996    84.5  Houston
modeling certain attribute domains. For exam-      t8   5   1998    80.2  Houston
ple, consider the database in the table above as
?
    Note: A longer version of this paper has been accepted at the conference Concept
    Lattices and their Applications, Moscow, Russia, July 2016.
an example4 . Attributes of these 8 tuples are city names, month identifiers, years
and average temperatures. From this table, we could expect that the value for
average temperature is determined by a city name and a month of the year (e.g.
the month of May in Houston is hot, whereas the month of January in Dallas
is cold). Therefore, we would expect that this relationship should be somehow
expressed as a (functional) dependency in the form city name, month → average
temperature. However, while the average temperature is truly determined by a
city and a time of the year, it is very hard that it will be exactly the same from
one year to another. Instead, we can expect that the value will be similar, or
close throughout different years, but rarely the same. Unfortunately, semantics
of FDs is based on an equivalence relation and fail to grasp the dependencies
among these attributes.
    To overcome the limitations of FDs while keeping the idea that some at-
tributes are functionally determined by other attributes, different generalizations
of functional dependencies have been defined, as recently deeply reviewed in a
comprehensive survey [3]. Actually, the example presented in the last paragraph
is a so-called similarity dependency [1,3].
    In this paper we present an FCA-based characterization of order-like de-
pendencies, a generalization of functional dependencies in which the equality of
values is replaced by the notion of order. Firstly, we show that the characteriza-
tion of order dependencies in their general definition [7] can be achieved through
a particular use of general ordinal scaling [6]. Secondly, we extend our charac-
terization in order to support restricted order dependencies through which other
FDs generalizations can be modeled, namely sequential dependencies and trend
dependencies [3].
    The rest of this paper is organized as follows. In Section 2 we formally intro-
duce the definition of functional dependencies, formal concept analysis and the
principle of the characterization of FDs with FCA. In Section 3, we characterize
order dependencies in their general definition. We show that our formalization
can be adapted to restricted ordered dependencies in Section 4 before to conclude.


2     Preliminaries

2.1    Functional dependencies

We deal with datasets which are sets of tuples. Let U be a set of attributes and
Dom be a set of values (a domain). For the sake of simplicity, we assume that
Dom is a numerical set. A tuple t is a function t : U 7→ Dom and then a table T is
a set of tuples. We define the functional notation of a tuple for a set of attributes
X ⊆ U as follows, assuming that there exists a total ordering on U. Given a tuple
t ∈ T and X = {x1 , x2 , . . . , xn }, we have: t(X) = ht(x1 ), t(x2 ), . . . , t(xn )i.
4
     Example from The University of Dayton, that shows the month average tempera-
    tures for different cities: http://academic.udayton.edu/kissock/http/Weather/
Definition 1 (Functional dependency [10]). Let T be a set of tuples (data
table), and X, Y ⊆ U. A functional dependency (FD) X → Y holds in T if:
                     ∀t, t0 ∈ T : t(X) = t0 (X) → t(Y ) = t0 (Y )          id   ab c d
                                                                           t1   1 3 4 1
Example. The table on the right presents 4 tuples T =                      t2   4 3 4 3
                                                                           t3   1 8 4 1
{t1 , t2 , t3 , t4 } over attributes U = {a, b, c, d}. We have that        t4   4 3 7 8
t2 ({a, c}) = ht2 (a), t2 (c)i = h4, 4i. Note that the set notation is
usually omitted and we write ab instead of {a, b}. In this example,         Table 1
the functional dependency d → c holds and a → c does not hold.

2.2   Characterization of Functional Dependencies with FCA
It has been shown in previous work that functional dependencies can be char-
acterized with FCA. For example, Ganter & Wille [6] presented a data trans-
formation of the initial set of tuples into a formal context. In this context, im-
plications are in 1-to-1 correspondence with the functional dependencies of the
initial dataset.
    On the bottom-right figure, we illustrate this characterization with the set
of tuples from Table 1 where each possible pair of tuples is an object in the
formal context. Attributes remain the same. Object (ti , tj ) has attribute m iff
ti (m) = tj (m).
    The concept lattice in the bottom right: there are two im-
plications, namely d → c and d → a, which are also the func-
tional dependencies in the original set of tuples. However, this        K       a b c d
approach implies that a formal context much larger than the (t1 , t2 ) × ×
                                                                     (t1 , t3 ) ×   ××
original dataset must be processed. It was then shown that (t1 , t4 ) ×
this formal context can actually be encoded with a pattern (t2 , t3 )               ×
                                                                     (t , t ) × ×
structure [5]: each attribute of the original dataset becomes (t23 , t44 )
an object of the pattern structure and is described by a par-
tition on the tuple set. Actually, each block of the partition
is composed of tuples taking the same value for the given at-
tribute [8]. For example, in Table 1, the partition describing
a is {{t1 , t3 }, {t2 , t4 }}. Then, the implications in the pattern
concept lattice are here again in 1-to-1 correspondence with
the functional dependencies of the initial dataset [2]. What
is important to notice is that this formalization is possible as a partition is an
equivalence relation: a symmetric, reflexive and transitive binary relation. In [1],
another kind of dependencies was formalized in a similar way, i.e. similarity de-
pendencies, where the equality relation is relaxed to a similarity relation when
comparing two tuples. An attribute is not anymore described by a partition, but
by a tolerance relation, i.e. a symmetric, reflexive, but not necessarily transitive
binary relation. Each original attribute is then described by a set of tolerance
blocks, each being a maximal set of tuples that have pairwise similar values
(instead of equal values for classical dependencies).
    As we will show next, this way of characterizing FDs and similarity depen-
dencies actually fails for order dependencies, as the relation in this case is not
symmetric: it is neither an equality nor a similarity but a partial order in the
general case.


3    Characterization of Order Dependencies with FCA

Although functional dependencies are used in several domains, they cannot be
used to express some relationships that exist in data. Many generalizations have
been proposed and we focus in this article on order dependencies [7,3]. Such
dependencies are based on the attribute-wise order on tuples. This order assumes
that each attribute follows a partial order associated to the values of its domain.
For the sake of generality, we represent this order with the symbol vx for all
x ∈ U. In practice, this symbol will be instantiated by intersections of any
partial order on the domain of this attribute, as, for instance, <, ≤, >, ≥, etc.
We remark that this order on the set of values of a single attribute does not
need to be a total order, although in many different instances, like numeric or
character strings domains, this will be the case. Now we formalize operator vx
(Definition 2) and define accordingly order dependencies (Definition 3).
Definition 2 (Attribute-wise ordering). Given two tuples ti , tj ∈ T and a
set of attributes X ⊆ U, the attribute-wise order of these two tuples on X is:
ti vX tj ⇔ ∀x ∈ X : ti [x] vx tj [x]
This definition states that one tuple is greater –in a sense involving the order of
all attributes– than another tuple if their attribute-wise values meet this order.
This operator induces a partial order ΠX = (T, ≺X ) on the set T of tuples.

Definition 3 (Order dependency). Let X, Y ⊆ U be two subsets of attributes
in a dataset T . An order dependency X → Y holds in T if and only if: ∀ti , tj ∈
T : ti vX tj → ti vY tj
Example. Consider the table on the right with six tuples and three            id   ab c
attributes. Taking va , vb and vc defined as the ordering ≤. The              t1   1 3 1
                                                                              t2   2 7 2
orders induced by the sets of attributes {a},{b},{c} and {a, b} are:          t3   3 4 4
    Πa = (T, ≺a ) = {{t1 } ≺ {t2 } ≺ {t3 , t6 } ≺ {t5 } ≺ {t4 }}              t4   5 3 9
    Πb = (T, ≺b ) = {{t5 } ≺ {t1 , t4 } ≺ {t3 } ≺ {t2 } ≺ {t6 }}              t5   4 2 5
                                                                              t6   3 8 4
    Πc = (T, ≺c ) = {{t1 } ≺ {t2 } ≺ {t3 , t6 } ≺ {t5 } ≺ {t4 }}
    Πab = (T, ≺ab ) = {{t1 } ≺ {t2 } ≺ {t6 }; {t1 } ≺ {t3 }; {t1 } ≺ {t4 };   Table 2
                      {t5 } ≺ {t4 }}
    These orders are such that the order dependency {a, b} → {c} holds. Remark
that Definition 3 is generic since the orders that are assumed for each attribute
need to be instantiated: we chose ≤ in this example for all attributes, while
taking the equality would produce standard functional dependencies.
    To achieve the characterization of order dependencies with FCA, we propose
to represent the partial order ΠX = (T, ≺X ) associated to each subset of at-
tribute X ⊆ U as a formal context KX (a binary relation on T × T thanks to
a general ordinal scaling [6]). Then, we show that an order dependency X → Y
holds iff KX = KXY .
      @c t1 t2 t3 t4 t5 t6          @ab t1 t2 t3 t4 t5 t6    @abc t1 t2 t3 t4 t5 t6
      t1    × × × × ×                t1    × ×         ×      t1     × ×         ×
      t2       × × × ×               t2                ×      t2                 ×
      t3          × × ×              t3                       t3
      t4                             t4                       t4
      t5          ×                  t5          ×            t5           ×
      t6          × ×                t6                       t6


     Table 3: (T, T, @c )          Table 4: (T, T, @ab )    Table 5: (T, T, @abc )


Definition 4 (General ordinal scaling of the tuple set). Given a subset
of attributes X ⊆ U and a table dataset T , we define a formal context for ΠX =
(T, ≺X ) (the partial order it induces) as follows: KX = (T, T, @X ) where @X =
{(ti , tj ) | ti , tj ∈ T, ti vX tj }. This formal context is the general ordinal scale
of ΠX [6]. All formal concepts (A, B) ∈ KX are such that A is the set of lower
bounds of B and B is the set of upper bounds of A. Its concept lattice is the
smallest complete lattice in which the order ΠX can be order embedded.

    This way to characterize a partial order is only one among several pos-
sibilities. However, the choice of formal contexts is due to their versatility,
since they can characterize binary relations, hierarchies, dependencies, differ-
ent orders [6] and graphs [4]. In the next section we will see how this versatil-
ity allows us to generalize similarity dependencies. Given the set of attributes
X ⊆ U, an associated partial order ΠX = (T, ≺X ) and the formal context
(T, T, @X ), it is easy to T
                           show that the later is a composition of contexts defined
as: (T, T, @X ) = (T, T,      @x ).
                             x∈X
   We can now propose a characterization of order dependencies with FCA.
Proposition 1. An order dependency X → Y holds in T iff KX = KXY .

Proof. Recall that KXY = (T, T, @XY ) = (T, T, @X ∩ @Y ). We have that

                   X → Y ⇐⇒ @X =@X ∩ @Y ⇐⇒ @X ⊆ @Y
                              ⇐⇒ ∀ti , tj ∈ T, ti vX tj → ti vY tj

   The fact that the order dependency {a, b} → {c} holds can be illustrated
with the formal contexts in Tables 3,4 and 5. We have indeed that Kab = Kabc .


Order dependencies and other FDs generalizations. We have seen that
the definition of order dependencies replaces the equality condition present in
FDs or other similarity measures present in other dependencies, by an order
relation. This may suggest that order dependencies and other kinds of FDs gen-
eralizations are structurally very similar, whereas this is not the case. Functional
dependencies generate a reflexive, symmetric and transitive relation in the set
of tuples, i.e. an equivalence relation. Then the set of tuples can be partitioned
into equivalence classes that are used to characterize and compute the set of
FDs holding in a dataset, as presented in a previous work [2].
    In the generalization of functional dependencies that replaces the equality
condition by a similarity measure or a distance function, this measure generates
a symmetric relation in the set of tuples, but not necessarily a transitive relation.
In turn, this implies that the set of tuples can be partitioned into blocks of
tolerance instead of equivalence classes, as shown in [1].
    In this article, the novelty is that we are dealing with a transitive relation,
but not necessarily a symmetric relation. That means that we are not dealing
with equivalence classes nor blocks of tolerance any longer, but, precisely, with
orders. Since the characterization of these dependencies cannot be performed in
terms of equivalence classes nor blocks of tolerance, it requires for a more general
approach: general ordinal scaling.


4   Characterization of Restricted Order Dependencies

Order dependencies allow taking into account the or-           Time People waiting
dering of the values of each attribute when looking for     t1 10:00     101
                                                            t2 10:20     103
dependencies in data. However, violations of the or-        t3 10:40     105
dering due to value variations should sometimes not         t4 11:00      77
                                                            t5 11:20      80
be considered in many real world scenarios. Consider        t6 11:40      85
the example given in Table 6: it gives variations on the
number of people waiting at a bus station over time.                Table 6
In such a scenario we can expect that more people will be waiting in the station
as time moves on (P eople waiting → T ime). However, at some point, a bus
arrives and the number of people waiting decreases and starts increasing again.
It is easy to observe that the order dependency P eople waiting → T ime does
not hold as we have the counter-example: t4 vP eople waiting t3 and t3 vT ime t4 .
    However, the gap between the values 77 and 105 is significant enough to be
considered as a different instance of the ordering. We can formalize this idea
by introducing a similarity threshold θ = 10 for the attribute P eople waiting
such that the ordering between values is checked iff the difference is smaller than
θ. In this way, the previous counter-example is avoided (restricting the binary
relation) along with any other counter-example and we have that the restricted
order dependency P eople waiting → T ime holds.
    We now formalize the tuple ordering relation, and consequently the notion
of restricted order dependencies.

Definition 5. Given two tuples ti , tj ∈ T and a set of attributes X ⊆ U, the
attribute-wise order on X is: ti v∗x tj ⇔ ∀x ∈ X : 0 ≤ tj [x] − ti [x] ≤ θx .

Definition 6. Let X, Y ⊆ U two sets of attributes in a table T such that |T | = n,
and let θX , θY be thresholds values of tuples in X and Y respectively. A restricted
order dependency X → Y holds in T iff: t[X] v∗X t0 [X] → t[Y ] v∗Y t0 [Y ]

   Using these definitions we can encode the tuple ordering relations as formal
contexts for any subset of attributes X ⊆ U. Indeed, the binary relations between
tuples by operator v∗X can be encoded in a formal context K∗X = (T, T, v∗X )
  v∗
   T m t1 t2 t3 t4 t5 t6
                            v∗
                             P p t1 t2 t3 t4 t5 t6    v∗
                                                       T m,P p t1 t2 t3 t4 t5 t6
   t1 × × × × × ×            t1   × × ×                  t1    × × ×
   t2     × × × × ×          t2     × ×                  t2       × ×
   t3        × × × ×         t3       ×                  t3          ×
   t4           × × ×        t4           × × ×          t4             × × ×
   t5              × ×       t5             × ×          t5                × ×
   t6                 ×      t6               ×          t6                   ×

Table 7: (T, T, v∗T m )    Table 8: (T, T, v∗P p )   Table 9: (T, T, v∗T m,P p )


which in turn, can be composed from single attributes x ∈ U: v∗X =                       v∗x .
                                                                                   T
                                                                                   x∈X
Moreover, we can use the same rationale we used to mine order dependencies to
find restricted order dependencies.
Proposition 2. A restricted order dependency X → Y holds in T iff

                              X → Y ⇐⇒ K∗X = K∗XY

Proof. This proposition can be proved similarly to Proposition 1.
Example. For the previous example, we calculate the corresponding formal con-
texts shown in Tables 7 and 8 (v∗T m for Time, and v∗P p for People waiting). It
is easy to observe that the restricted order dependency P eople waiting → T ime
holds as we have that K∗P p = K∗P p,T m .


Restricted order dependencies and other FDs generalizations. Simi-
larity dependencies (SDs) generalize functional dependencies through the use of
a tolerance relation instead of an equivalence relation between values of tuples
for a given set of attributes. A tolerance relation is a reflexive, symmetric and
non-transitive binary association between two tuples given a threshold θ. In a
nutshell, a SD is established between two tuples if their values are within a
given distance controlled by the threshold. Such dependencies were studied in
a previous work [1]. However, from the perspective of order dependencies, we
can request that such distance has a certain polarity. As we have previously dis-
cussed, order dependencies arise from anti-symmetric, not necessarily reflexive,
and transitive binary relations (<, ≤). Then, it can be expected that using a
threshold of distance θ between tuple values for a given set of attributes requires
an antisymmetric, non-transitive relation between the values of tuples w.r.t. a
set of attributes X, that we have defined as v∗X .
    The current approach has the potential to implement some other FD gen-
eralizations of such as sequential dependencies and trend dependencies [3]. The
latter is actually a particular case of restricted order dependencies where the
threshold is applied to an attribute not contained in the attributes of the depen-
dency. Instead, it is applied to a time attribute that allows defining snapshots of
a database. In sequential dependencies, the antecedent is a mapping of a set of
attributes with a complete unrestricted order (without a threshold). Details on
both these dependencies have been left out from this paper for space reasons.
5   Conclusion

We have presented a characterization of order dependencies with FCA, which
can be potentially extended to other types of order-like dependencies, used in
different fields of database theory, knowledge discovery and data quality. These
dependencies are part of a set of functional dependencies generalizations where
equality condition is replaced with a more general relation. In some cases, the
equality is replaced by an approximate measure, in other cases, like in order
dependencies, by an order relation.
    We have seen that order dependencies are based on a transitive, but not
necessarily symmetric relation, contrasting similarity dependencies, which are
based on a symmetric, but not necessarily transitive relation. It is precisely this
formalization in terms of FCA that allows us to find these structural differences
between these types of dependencies.
    Nevertheless, the present work needs to be extended to other kinds of order-
like dependencies while experimentation needs to be performed in order to verify
the computational feasibility of this approach.
Acknowledgments. This research work has been supported by the SGR2014-890
(MACDA) project of the Generalitat de Catalunya, the MINECO project APCOM
(TIN2014-57226-P) and the French National Project FUI AAP 14 Tracaverre.


References
 1. J. Baixeries, M. Kaytoue, and A. Napoli. Computing similarity dependencies with
    pattern structures. In CLA 2013, CEUR Workshop Proceedings, 2013.
 2. J. Baixeries, M. Kaytoue, and A. Napoli. Characterizing functional dependencies
    in formal concept analysis with pattern structures. Annals of Mathematics and
    Artificial Intelligence, 72(1-2):129–149, Oct. 2014.
 3. L. Caruccio, V. Deufemia, and G. Polese. Relaxed functional dependencies - A
    survey of approaches. IEEE Trans. Knowl. Data Eng., 28(1):147–165, 2016.
 4. S. Ferré. A Proposal for Extending Formal Concept Analysis to Knowledge Graphs.
    In Formal Concept Analysis, volume LNCS 9113, pages 271–286, June 2015.
 5. B. Ganter and S. O. Kuznetsov. Pattern structures and their projections. In ICCS
    2001, LNCS 2120, pages 129–142. Springer, 2001.
 6. B. Ganter and R. Wille. Formal Concept Analysis. Springer, Berlin, 1999.
 7. S. Ginsburg and R. Hull. Order dependency in the relational model. Theoretical
    Computer Science, 26(1):149 – 195, 1983.
 8. Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen. Tane: An efficient algo-
    rithm for discovering functional and approximate dependencies. Computer Journal,
    42(2):100–111, 1999.
 9. H. Mannila and K.-J. Räihä. The Design of Relational Databases. Addison-Wesley,
    Reading (MA), USA, 1992.
10. J. Ullman. Principles of Database Systems and Knowledge-Based Systems, volumes
    1–2. Computer Science Press, Rockville (MD), USA, 1989.