Imprecise Data and Knowledge Based OLAP
                   Ermir Rogova1, Panagiotis Chountas1, Krassimir Atanassov2
         1
             Harrow School of Computer Science, Data & Knowledge Management Group
                                   University of Westminster
                           Watford Road, Northwick Park, HA1 3TP
                                        London, UK
             2
                 CLBME – Bulgarian Academy of Sciences, B1. 105, Sofia-1113, Bulgaria
                                        rogovae@wmin.ac.uk


       Abstract. In this paper we present our approach for extending the OLAP model
       to include treatment of value uncertainty as part of a multidimensional model
       inhabited by flexible data and non-rigid hierarchical structures of organisation.
       A new multidimensional-cubic model named as the IF-Cube is introduced
       which is able to operate over data with imprecision either in the facts or in the
       dimensional hierarchies.


1. Introduction

   In this paper we introduce the semantics of the Intuitionistic Fuzzy cubic
representation in contrast to the basic multidimensional-cubic structures. The basic
cubic operators are extended and enhanced with the aid of [1], [2] Intuitionistic Fuzzy
Logic.
   Since the emergence of the OLAP technology [3] different proposals have been
made to give support to different types of data and application purposes. One of this
is to extend the relational model (ROLAP) to support the structures and operations
typical of OLAP. Further approaches [4], [5] are based on extended relational systems
to represent data-cubes and operate over them. The other approach is to develop new
models using a multidimensional view of the data [6].
   Nowadays, information and knowledge-based systems need to manage imprecision
in the data and more flexible structures are needed to represent the analysis domain.
New models have appeared to manage incomplete datacube [7], imprecision in the
facts and the definition of fact using different levels in the dimensions [8].


2. Semantics of the IF-Cube in contrast to Crisp Cube

   Each element of an Intuitionistic fuzzy [1], [2] set has degrees of membership or
truth (µ) and non-membership or falsity (ν), which don’t sum up to 1.0 thus leaving a
degree of hesitation margin (π).


Y. Ioannidis, B. Novikov and B. Rachev (Eds.): Local Proceedings of ADBIS 2007, pp. 50-55
© Technical University of Varna, 2007
51    Ermir Rogova, Panagiotis Chountas and Krassimir Atanassov

    As opposed to the classical definition of a fuzzy set given by A′ = {< x, µA′(x) > |x
∈ X} where µA(x) ∈ [0, 1] is the membership function of the fuzzy set A′, an
intuitionistic fuzzy set A is given by:
                                   A = {< x, µA(x),vA(x) > |x ∈ X}
where: µA : X → [0, 1] and vA : X → [0, 1] such that 0≤ µA(x) + vA(x)≤1 and µA(x)
vA(x) ∈ [0, 1] denote a degree of membership and a degree of non-membership of x
∈ A, respectively.
    Obviously, each fuzzy set may be represented by the following Intuitionistic fuzzy
set                              A={<x, µA′ (x), (x), 1− µA′ (x)>|x ∈ X}
    For each intuitionistic fuzzy set in X, we will call πA (x) = 1 − µA(x) − vA(x) an
intuitionistic fuzzy index (or a hesitation margin) of x ∈ A which expresses a lack of
knowledge of whether x belongs to A or not. For each x ∈ A 0<πA (x)<1.
    The IF-Cube is an abstract structure that serves as the foundation for the
multidimensional data cube model. Cube C is defined as a five-tuple (D, l, F, O, H)
where:
     • D is a set of dimensions
     • l is a set of levels                l1,…, ln,
     • A dimension di = (l ≤ O, l┴, l┬) dom(di) where l = li i=1...n.
          li is a set of values and li ∩ lj = {},
           ≤ O is a partial order between the elements of l.
          To identify the level l of a dimension, as part of a hierarchy we use dl.
                                 l┴: base level      l┬: top level
                     for each pair of levels li and lj we have the relation
                     µij : li × lj  [0,1] νij : li × lj  [0,1] 0 < µij + νij < 1
     • F is a set of fact instances with schema F = {<x, µF(x) , νF(x)>| x∈ X },
          where x=<att1, …,attn> is an ordered tuple belonging to a given universe X,
          µF(x) and νF(x) are the degree of membership and non-membership of x in
          the fact table F respectively.
     • H is an object type history that corresponds to a cubic structure( l, F, O, H′ )
          which allows us to trace back the evolution of a cubic structure after
          performing a set of operators i.e. aggregation.


3. Cubic operators

   Selection (Σ): The selection operator selects a set of fact-instances from a cubic
structure that satisfy a predicate (θ). A predicate (θ) involves a set of atomic
predicates (θ1, …, θn ) associated with the aid of logical operators p ( i.e. ∧, ∨, etc.) .
The set of possible facts (cubic instances) that satisfy the θ should carry a degree of
membership µ and non-membership ν expressed as
   F = {<x, min(µF(x), µ(θ (x))), max(νF(x), ν(θ (x))))> | x∈ X }
Input:    Ci = (D, l, F, O, H) and the predicate θ
Output: Co= (D, l, Fo , O, H) where Fo ⊆ F and Fo={f | (f ∈ F) ∧ (f satisfies θ)
Mathematical notation:         ∑ (Ci ) = Co
                                 θ
                                          Imprecise Data and Knowledge Based OLAP         52


   Cubic Product ( ⊗ ): This is a binary operator Ci1 ⊗ Ci2 . It is used to relate two
cubes Ci1 and Ci2 assuming that D1 ⊆ D2 and O1 , O2 are reconcilable partial orders.
Thus, l1, l2 could lead to lo being a ragged hierarchy.
Input: Ci1 = (D1, l1, F1, O1, H1) and Ci2 = (D2,l2, F2, O2, H2)
Output: Co= (Do, lo, Fo, Oo, Ho) where
          Do= D1 ∪ D2 , lo= l1 ∪ l2, Oo= O1 ∪ O2 Ho= H1 ∪ H2, Fo= F1 X F2
         Fo ={<<x, y>, min(µf1(x), µf2(y)), max(νf1(x), νf2(y),)>|<x, y>∈ X×Y}
Mathematical notation: Ci1 ⊗ Ci2 = Co
   Join (Θ): It can be expressed using Cubic Product operator. Ci1 = (D1, l1, F1, O1,
H1)) and Ci2 = (D2 ,l2, F2, O2, H2) are candidates to join if D1 ∩ D2 ≠ 0,
Input: Ci1 = (D1, l1, F1, O1, H1) and Ci2 = (D2,l2, F2, O2, H2)
Output: Co= (Do, lo, Fo, Oo, Ho)
Mathematical notation: Ci1 Θ Ci2 = σp(Ci1 ⊗ Ci2)
   Union (∪  ∪): The union operator is a binary operator that finds the union of two
cubes. Ci1 and Ci2 have to be union compatible. The operator also coalesces the value-
equivalent facts using the minimum membership and maximum non-membership.
Input: Ci1 = (D1, l1, F1, O1, H1) and Ci2 = (D2,l2, F2, O2, H2)
Output: Co= (Do, lo, Fo, Oo, Ho) where Do=D1=D2, lo=l1=l2, Oo=O1=O2, Ho=H1=H2,
Fo= F1 ∪ F2 = { < x, max(µF1 (x), µF2(x)), min(νF1(x),νF2(x)) > | x ∈ X }
Mathematical notation: Ci1 ∪ Ci2 = Co
   Difference (-):. The difference operator removes the portion of the cube Ci1 that is
common to both cubes. Ci1 and Ci2 have to be union compatible
Input: Ci1 = (D1, l1, F1, O1, H1) and Ci2 = (D2, l2, F2, O2, H2)
Output: Co= (Do, lo, Fo, Oo, Ho) where Do=D1=D2, lo=l1=l2, Oo=O1=O2, Ho=H1=H2,
          Fo= F1 ∩ F2 = { < x, min(µF1(x),µF2(x)), max(νF1(x),νF2(x)) > | x ∈ X }
Mathematical notation: Ci1 – Ci2 = Co
   Aggregation (A): An aggregation operator A is a function A(G) where G = {<x,
µF(x) , νF(x)>| x∈ X } where x=<att1, …,attn> is an ordered tuple belonging to a
given universe X, {att1, …, attn} is the set of attributes of the elements of X, µF(x) and
νF(x) are the degree of membership and non-membership of x. The result is a bag of
the type {<x′, µF(x′) , νF(x′)>| x′∈ X }. To this extent, the bag is a group of elements
that can be duplicated and each one has a degree of µ and ν.
Input: Ci = (D, l, F, O, H) and the function A(G)
Output: Co = (D, lo, Fo , Oo , Ho)
   The definition of the extended group operators allows us to define the extended
group operators Roll up (∆), and Roll Down (Ω).
   Roll up (∆ ∆): The result of applying Roll up over dimension di at level dlr using the
aggregation operator A over a datacube Ci = (Di ,li ,Fi , O , Hi ) is another datacube
Co = (Do ,lo ,Fo , O , Ho ).
Input:       Ci = (Di ,li ,Fi , O , Hi )
Output: Co = (Do ,lo ,Fo , O , Ho )
                                                           ω is the initial state of the cube
An object of type history is a recursive structure H =
                                                          (l, D, A, H’) is the state of the
                                                          cube    after    performing    an
                                                          operation on the cube
53    Ermir Rogova, Panagiotis Chountas and Krassimir Atanassov

    The structured history of the datacube allows us to keep all the information when
applying Roll up and get it all back when Roll Down is performed. To be able to apply
the operation of Roll Up we need to make use of the IFSUM aggregation operator.
    Roll Down (Ω): This operator performs the opposite function of the Roll Up
operator. It is used to roll down from the higher levels of the hierarchy with a greater
degree of generalization, to the leaves with the greater degree of precision. The result
of applying Roll Down over a datacube Ci = (D, l, F, O, H) having H=( l’, D’, A’, H’ )
is another datacube Co= (D’, l’, F’, O, H’).
Input: Ci=(D, l, F, O, H)
Output: Co=(D’,l’, F’, O, H’) where F’  set of fact instances defined by operator A.
    To this extent, the Roll Down operative makes use of the recursive history structure
previously created after performing the Roll Up operator.
    The definition of aggregation operator points to the need of defining the IF
extensions for traditional group operators [9], [10], [11] such as SUM, AVG, MI? and
MAX. Based on the standard group operators, we provide their IF extensions and
meaning.
    IFSUM : The IFsum aggregate, like its standard counterpart, is only defined for
numeric domains. Given a fact F defined on the schema X (att1, …,attn), let attn-1
defined on the domain U={u1 , …, un ). The fact F consists of fact instances Fi with 1
≤ i ≤ m. The fact instances Fi are assumed to take Intuitionistic Fuzzy values for the
attribute attn-1 for i = 1 to m we have Fi[attn-1] = {<µi(uki), νi(uki)>/ uki | 1 ≤ ki ≤ n } .
The IFsum of the attribute attn-1 of the fact table F is defined by:
                              IFSUM((attn-1)(F)) =
                       m
{<u>/ y | (( u= min i =1 (µi(uki), νi(uki)) ∧ (y = ∑km u ki ) (∀ k1, …km : 1 ≤ k1, …km ≤ n))}
                                                     ki = k 1

   IFAVG : The IFAVG aggregate, like its standard counterpart, is only defined for
numeric domains. This aggregate makes use of the IFSUM that was discussed
previously and the standard COU?T. The IFAVG can be defined as:
            IFAVG((attn-1)(F) = IFSUM((attn-1)(F)) / COU?T((attn-1)(F))
   IFMAX : The IFMAX aggregate, like its standard counterpart, is only defined for
numeric domains. The IFsum of the attribute attn-1 of the fact table F is defined by:
                              IFMAX((attn-1)(F)) =
{<u>y|((u= min im=1 (µi(uki),νi(uki))∧(y= max im=1 (µi(uki),νi(uki)))(∀k1,…km :1≤k1,…km≤ n))}
   IFMI) : The IFMI? aggregate, like its standard counterpart, is only defined for
numeric domains. Given a fact F defined on the schema X (att1, …,attn), let attn-1
defined on the domain U={u1 , …, un ). The fact F consists of fact instances fi with 1 ≤
i ≤ m. The fact instances fi are assumed to take Intuitionistic Fuzzy values for the
attribute attn-1 for i = 1 to m we have fi[attn-1] = {<µi(uki), νi(uki)>/ uki | 1 ≤ ki ≤ n } .
The IFsum of the attribute attn-1 of the fact table F is defined by:
                     IFMI?((attn-1)(F)) =
{<u>/ y|(( u= min im=1 (µi(uki),νi(uki))∧(y= min im=1 (µi(uki),νi(uki)))(∀k1,…km :1≤ k1,…km≤ n))}
   We can observe that the IFMI? is extended in the same manner as IFMAX aggregate
except for replacing the symbol max in the IFMAX definition with min.
   Once we have defined our Intuitionistic Fuzzy multidimensional model and have
defined the IF cubic-algebra, the concept of knowledge based OLAP is introduced.
                                                                        Imprecise Data and Knowledge Based OLAP                                      54


4. The Case for Knowledge Based OLAP-K,OLAP

   Let us consider the Intuitionistic fuzzy set M defined as: {Milk<0.8,0.1>, Whole-
Milk<0.7,0.1>, Condensed-Milk<0.4,0.3>}} which is presented in “Fig.1”. Then the
next step is to calculate the <µ, ν> values for “Pasteurized milk”, “Whole Pasteurized milk
” and “Condensed whole milk.”
     • If the hierarchical IF structure expresses preferences in a query, the choice of the
          maximum values for µ and minimum value ν from the pairs of values <µ, ν>
          from the parent elements to the sub elements allows us not to exclude any
          possible answer (high possibility necessity degrees). In real cases, the lack of
          answers to a query generally makes this choice preferable, because it consists of
          widening the query answer rather than restricting it.
     • If the hierarchical IF represents an ill-known concept, the choice of the maximum
          value for µ and minimum value ν allows us to preserve all the possible values,
          but it also makes the answer less specific. In a way, it also participates in
          enlarging the query, as a less specific datum may share more common values
          with the query (the possibility degree of matching can thus be higher, although
          the necessity degree can decrease).
                                      Milk                                                                   Milk
                                   <0.8, 0.1>                                                             <0.8, 0.1>


            Pasteurized milk      Wholemilk             Condensedmilk             Pasteurized milk       Whole milk                 Condensed milk
                                  <0.7, 0.1>              <0.4, 0.3>                  <0.8, 0.1>         <0.7, 0.1>                   <0.4, 0.3>


          Whole pasteurizedmilk          Condensedwhole milk                    Whole pasteurized milk          Condensed whole milk
                                                                                      <0.8, 0.1>                       <0.7, 0.1>


           Fig. 1. “IF Hierarchy ‘Milk’ ”                                    Fig. 2. “Fully weighted Hierarchy ‘Milk’ “

“Fig.2” is a fully weighted Hierarchy after applying the maximum values for µ and
minimum value ν from the pairs of values <µ, ν> from the parent elements to the sub
elements, i.e. from (whole-milk, condensed-milk) to (condensed-whole-milk), from (milk)
to (pasteurized milk), and from (whole-milk, pasteurized milk) to (pasteurized-whole-
milk).
The complete study of the hierarchical IF requires the formal definition of the IF
hierarchical closure. We will further need to formally define the containment of an IF
hierarchical set to another.


5. Conclusions

   In this paper we have presented a new multidimensional-cubic model named as
the IF-Cube. The main contribution of this new model is that is able to operate over
data with imprecision in the facts and the summarisation hierarchies. Classical
55    Ermir Rogova, Panagiotis Chountas and Krassimir Atanassov

models imposed a rigid structure that made the models present difficulties when
merging information from different but still reconcilable sources. We introduce the
automatic recommendation of analysis according to the context of users’ explorations
in order to guide the decision making with the aid of Intuitionistic fuzzy set over a
universe that has a hierarchical structure and the corresponding hierarchies.
   There is finally a need to formally define the closure of Intuitionistic fuzzy set over
a universe that has a hierarchical structure as well the containment between different
versions of these sets.


References

[1] K., Atanassov (1999). Intuitionistic Fuzzy Sets, Springer-Verlag, Heidelberg
[2] K., Atanassov Intuitionistic Fuzzy Sets, Fuzzy Sets and Systems, 20, 87–96, (1986)
[3] R. Kimball, The Data Warehouse Toolkit. New York: John Wiley & Sons, 1996.
[4] S., Chaudhuri U. Dayal V., Ganti. Database Technology for Decision Support Systems. In:
   Computer, Vol. 34, p. 48-55, 2001
[5] M., Jarke et al. Fundamentals of data warehouses. Springer, London, 2002
[6] H. Thomas & A., Datta, A Conceptual Model and Algebra for On-Line Analytical
   Processing in Decision Support Databases. Information Systems Research 12: 83-102, 2001
[7] C. Dyreson, Information retrieval from an incomplete data cube, VLDB, Morgan Kaufman
   Publishers, pp. 532-543, 1996.
[8] T. Pedersen, C. Jensen, and C. Dyreson, A foundation for capturing and querying complex
   multidimensional data, Information Systems, vol. 26, pp. 383-483, 2001.
[9]D., Dubois et al (1988). Handling Incomplete or Uncertain Data and Vague Queries in
   Database Applications, Plenum Press, 1988
[10] H., Prade (1993). Annotated bibliography on fuzzy information processing. Readings on
   Fuzzy Sets in Intelligent Systems, H. Prade, D. Dubois, and R. Yager, Eds. Morgan
   Kaufmann Publishers Inc., 1993
[11]E. Rundensteiner L. Bic. Aggregates in posibilistic databases, VLDB'89, pp. 287-295,
      1989