Imprecise Data and Knowledge Based OLAP Ermir Rogova1, Panagiotis Chountas1, Krassimir Atanassov2 1 Harrow School of Computer Science, Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK 2 CLBME – Bulgarian Academy of Sciences, B1. 105, Sofia-1113, Bulgaria rogovae@wmin.ac.uk Abstract. In this paper we present our approach for extending the OLAP model to include treatment of value uncertainty as part of a multidimensional model inhabited by flexible data and non-rigid hierarchical structures of organisation. A new multidimensional-cubic model named as the IF-Cube is introduced which is able to operate over data with imprecision either in the facts or in the dimensional hierarchies. 1. Introduction In this paper we introduce the semantics of the Intuitionistic Fuzzy cubic representation in contrast to the basic multidimensional-cubic structures. The basic cubic operators are extended and enhanced with the aid of [1], [2] Intuitionistic Fuzzy Logic. Since the emergence of the OLAP technology [3] different proposals have been made to give support to different types of data and application purposes. One of this is to extend the relational model (ROLAP) to support the structures and operations typical of OLAP. Further approaches [4], [5] are based on extended relational systems to represent data-cubes and operate over them. The other approach is to develop new models using a multidimensional view of the data [6]. Nowadays, information and knowledge-based systems need to manage imprecision in the data and more flexible structures are needed to represent the analysis domain. New models have appeared to manage incomplete datacube [7], imprecision in the facts and the definition of fact using different levels in the dimensions [8]. 2. Semantics of the IF-Cube in contrast to Crisp Cube Each element of an Intuitionistic fuzzy [1], [2] set has degrees of membership or truth (µ) and non-membership or falsity (ν), which don’t sum up to 1.0 thus leaving a degree of hesitation margin (π). Y. Ioannidis, B. Novikov and B. Rachev (Eds.): Local Proceedings of ADBIS 2007, pp. 50-55 © Technical University of Varna, 2007 51 Ermir Rogova, Panagiotis Chountas and Krassimir Atanassov As opposed to the classical definition of a fuzzy set given by A′ = {< x, µA′(x) > |x ∈ X} where µA(x) ∈ [0, 1] is the membership function of the fuzzy set A′, an intuitionistic fuzzy set A is given by: A = {< x, µA(x),vA(x) > |x ∈ X} where: µA : X → [0, 1] and vA : X → [0, 1] such that 0≤ µA(x) + vA(x)≤1 and µA(x) vA(x) ∈ [0, 1] denote a degree of membership and a degree of non-membership of x ∈ A, respectively. Obviously, each fuzzy set may be represented by the following Intuitionistic fuzzy set A={|x ∈ X} For each intuitionistic fuzzy set in X, we will call πA (x) = 1 − µA(x) − vA(x) an intuitionistic fuzzy index (or a hesitation margin) of x ∈ A which expresses a lack of knowledge of whether x belongs to A or not. For each x ∈ A 0<πA (x)<1. The IF-Cube is an abstract structure that serves as the foundation for the multidimensional data cube model. Cube C is defined as a five-tuple (D, l, F, O, H) where: • D is a set of dimensions • l is a set of levels l1,…, ln, • A dimension di = (l ≤ O, l┴, l┬) dom(di) where l = li i=1...n. li is a set of values and li ∩ lj = {}, ≤ O is a partial order between the elements of l. To identify the level l of a dimension, as part of a hierarchy we use dl. l┴: base level l┬: top level for each pair of levels li and lj we have the relation µij : li × lj  [0,1] νij : li × lj  [0,1] 0 < µij + νij < 1 • F is a set of fact instances with schema F = {| x∈ X }, where x= is an ordered tuple belonging to a given universe X, µF(x) and νF(x) are the degree of membership and non-membership of x in the fact table F respectively. • H is an object type history that corresponds to a cubic structure( l, F, O, H′ ) which allows us to trace back the evolution of a cubic structure after performing a set of operators i.e. aggregation. 3. Cubic operators Selection (Σ): The selection operator selects a set of fact-instances from a cubic structure that satisfy a predicate (θ). A predicate (θ) involves a set of atomic predicates (θ1, …, θn ) associated with the aid of logical operators p ( i.e. ∧, ∨, etc.) . The set of possible facts (cubic instances) that satisfy the θ should carry a degree of membership µ and non-membership ν expressed as F = { | x∈ X } Input: Ci = (D, l, F, O, H) and the predicate θ Output: Co= (D, l, Fo , O, H) where Fo ⊆ F and Fo={f | (f ∈ F) ∧ (f satisfies θ) Mathematical notation: ∑ (Ci ) = Co θ Imprecise Data and Knowledge Based OLAP 52 Cubic Product ( ⊗ ): This is a binary operator Ci1 ⊗ Ci2 . It is used to relate two cubes Ci1 and Ci2 assuming that D1 ⊆ D2 and O1 , O2 are reconcilable partial orders. Thus, l1, l2 could lead to lo being a ragged hierarchy. Input: Ci1 = (D1, l1, F1, O1, H1) and Ci2 = (D2,l2, F2, O2, H2) Output: Co= (Do, lo, Fo, Oo, Ho) where Do= D1 ∪ D2 , lo= l1 ∪ l2, Oo= O1 ∪ O2 Ho= H1 ∪ H2, Fo= F1 X F2 Fo ={<, min(µf1(x), µf2(y)), max(νf1(x), νf2(y),)>|∈ X×Y} Mathematical notation: Ci1 ⊗ Ci2 = Co Join (Θ): It can be expressed using Cubic Product operator. Ci1 = (D1, l1, F1, O1, H1)) and Ci2 = (D2 ,l2, F2, O2, H2) are candidates to join if D1 ∩ D2 ≠ 0, Input: Ci1 = (D1, l1, F1, O1, H1) and Ci2 = (D2,l2, F2, O2, H2) Output: Co= (Do, lo, Fo, Oo, Ho) Mathematical notation: Ci1 Θ Ci2 = σp(Ci1 ⊗ Ci2) Union (∪ ∪): The union operator is a binary operator that finds the union of two cubes. Ci1 and Ci2 have to be union compatible. The operator also coalesces the value- equivalent facts using the minimum membership and maximum non-membership. Input: Ci1 = (D1, l1, F1, O1, H1) and Ci2 = (D2,l2, F2, O2, H2) Output: Co= (Do, lo, Fo, Oo, Ho) where Do=D1=D2, lo=l1=l2, Oo=O1=O2, Ho=H1=H2, Fo= F1 ∪ F2 = { < x, max(µF1 (x), µF2(x)), min(νF1(x),νF2(x)) > | x ∈ X } Mathematical notation: Ci1 ∪ Ci2 = Co Difference (-):. The difference operator removes the portion of the cube Ci1 that is common to both cubes. Ci1 and Ci2 have to be union compatible Input: Ci1 = (D1, l1, F1, O1, H1) and Ci2 = (D2, l2, F2, O2, H2) Output: Co= (Do, lo, Fo, Oo, Ho) where Do=D1=D2, lo=l1=l2, Oo=O1=O2, Ho=H1=H2, Fo= F1 ∩ F2 = { < x, min(µF1(x),µF2(x)), max(νF1(x),νF2(x)) > | x ∈ X } Mathematical notation: Ci1 – Ci2 = Co Aggregation (A): An aggregation operator A is a function A(G) where G = {| x∈ X } where x= is an ordered tuple belonging to a given universe X, {att1, …, attn} is the set of attributes of the elements of X, µF(x) and νF(x) are the degree of membership and non-membership of x. The result is a bag of the type {| x′∈ X }. To this extent, the bag is a group of elements that can be duplicated and each one has a degree of µ and ν. Input: Ci = (D, l, F, O, H) and the function A(G) Output: Co = (D, lo, Fo , Oo , Ho) The definition of the extended group operators allows us to define the extended group operators Roll up (∆), and Roll Down (Ω). Roll up (∆ ∆): The result of applying Roll up over dimension di at level dlr using the aggregation operator A over a datacube Ci = (Di ,li ,Fi , O , Hi ) is another datacube Co = (Do ,lo ,Fo , O , Ho ). Input: Ci = (Di ,li ,Fi , O , Hi ) Output: Co = (Do ,lo ,Fo , O , Ho ) ω is the initial state of the cube An object of type history is a recursive structure H = (l, D, A, H’) is the state of the cube after performing an operation on the cube 53 Ermir Rogova, Panagiotis Chountas and Krassimir Atanassov The structured history of the datacube allows us to keep all the information when applying Roll up and get it all back when Roll Down is performed. To be able to apply the operation of Roll Up we need to make use of the IFSUM aggregation operator. Roll Down (Ω): This operator performs the opposite function of the Roll Up operator. It is used to roll down from the higher levels of the hierarchy with a greater degree of generalization, to the leaves with the greater degree of precision. The result of applying Roll Down over a datacube Ci = (D, l, F, O, H) having H=( l’, D’, A’, H’ ) is another datacube Co= (D’, l’, F’, O, H’). Input: Ci=(D, l, F, O, H) Output: Co=(D’,l’, F’, O, H’) where F’  set of fact instances defined by operator A. To this extent, the Roll Down operative makes use of the recursive history structure previously created after performing the Roll Up operator. The definition of aggregation operator points to the need of defining the IF extensions for traditional group operators [9], [10], [11] such as SUM, AVG, MI? and MAX. Based on the standard group operators, we provide their IF extensions and meaning. IFSUM : The IFsum aggregate, like its standard counterpart, is only defined for numeric domains. Given a fact F defined on the schema X (att1, …,attn), let attn-1 defined on the domain U={u1 , …, un ). The fact F consists of fact instances Fi with 1 ≤ i ≤ m. The fact instances Fi are assumed to take Intuitionistic Fuzzy values for the attribute attn-1 for i = 1 to m we have Fi[attn-1] = {<µi(uki), νi(uki)>/ uki | 1 ≤ ki ≤ n } . The IFsum of the attribute attn-1 of the fact table F is defined by: IFSUM((attn-1)(F)) = m {/ y | (( u= min i =1 (µi(uki), νi(uki)) ∧ (y = ∑km u ki ) (∀ k1, …km : 1 ≤ k1, …km ≤ n))} ki = k 1 IFAVG : The IFAVG aggregate, like its standard counterpart, is only defined for numeric domains. This aggregate makes use of the IFSUM that was discussed previously and the standard COU?T. The IFAVG can be defined as: IFAVG((attn-1)(F) = IFSUM((attn-1)(F)) / COU?T((attn-1)(F)) IFMAX : The IFMAX aggregate, like its standard counterpart, is only defined for numeric domains. The IFsum of the attribute attn-1 of the fact table F is defined by: IFMAX((attn-1)(F)) = {y|((u= min im=1 (µi(uki),νi(uki))∧(y= max im=1 (µi(uki),νi(uki)))(∀k1,…km :1≤k1,…km≤ n))} IFMI) : The IFMI? aggregate, like its standard counterpart, is only defined for numeric domains. Given a fact F defined on the schema X (att1, …,attn), let attn-1 defined on the domain U={u1 , …, un ). The fact F consists of fact instances fi with 1 ≤ i ≤ m. The fact instances fi are assumed to take Intuitionistic Fuzzy values for the attribute attn-1 for i = 1 to m we have fi[attn-1] = {<µi(uki), νi(uki)>/ uki | 1 ≤ ki ≤ n } . The IFsum of the attribute attn-1 of the fact table F is defined by: IFMI?((attn-1)(F)) = {/ y|(( u= min im=1 (µi(uki),νi(uki))∧(y= min im=1 (µi(uki),νi(uki)))(∀k1,…km :1≤ k1,…km≤ n))} We can observe that the IFMI? is extended in the same manner as IFMAX aggregate except for replacing the symbol max in the IFMAX definition with min. Once we have defined our Intuitionistic Fuzzy multidimensional model and have defined the IF cubic-algebra, the concept of knowledge based OLAP is introduced. Imprecise Data and Knowledge Based OLAP 54 4. The Case for Knowledge Based OLAP-K,OLAP Let us consider the Intuitionistic fuzzy set M defined as: {Milk<0.8,0.1>, Whole- Milk<0.7,0.1>, Condensed-Milk<0.4,0.3>}} which is presented in “Fig.1”. Then the next step is to calculate the <µ, ν> values for “Pasteurized milk”, “Whole Pasteurized milk ” and “Condensed whole milk.” • If the hierarchical IF structure expresses preferences in a query, the choice of the maximum values for µ and minimum value ν from the pairs of values <µ, ν> from the parent elements to the sub elements allows us not to exclude any possible answer (high possibility necessity degrees). In real cases, the lack of answers to a query generally makes this choice preferable, because it consists of widening the query answer rather than restricting it. • If the hierarchical IF represents an ill-known concept, the choice of the maximum value for µ and minimum value ν allows us to preserve all the possible values, but it also makes the answer less specific. In a way, it also participates in enlarging the query, as a less specific datum may share more common values with the query (the possibility degree of matching can thus be higher, although the necessity degree can decrease). Milk Milk <0.8, 0.1> <0.8, 0.1> Pasteurized milk Wholemilk Condensedmilk Pasteurized milk Whole milk Condensed milk <0.7, 0.1> <0.4, 0.3> <0.8, 0.1> <0.7, 0.1> <0.4, 0.3> Whole pasteurizedmilk Condensedwhole milk Whole pasteurized milk Condensed whole milk <0.8, 0.1> <0.7, 0.1> Fig. 1. “IF Hierarchy ‘Milk’ ” Fig. 2. “Fully weighted Hierarchy ‘Milk’ “ “Fig.2” is a fully weighted Hierarchy after applying the maximum values for µ and minimum value ν from the pairs of values <µ, ν> from the parent elements to the sub elements, i.e. from (whole-milk, condensed-milk) to (condensed-whole-milk), from (milk) to (pasteurized milk), and from (whole-milk, pasteurized milk) to (pasteurized-whole- milk). The complete study of the hierarchical IF requires the formal definition of the IF hierarchical closure. We will further need to formally define the containment of an IF hierarchical set to another. 5. Conclusions In this paper we have presented a new multidimensional-cubic model named as the IF-Cube. The main contribution of this new model is that is able to operate over data with imprecision in the facts and the summarisation hierarchies. Classical 55 Ermir Rogova, Panagiotis Chountas and Krassimir Atanassov models imposed a rigid structure that made the models present difficulties when merging information from different but still reconcilable sources. We introduce the automatic recommendation of analysis according to the context of users’ explorations in order to guide the decision making with the aid of Intuitionistic fuzzy set over a universe that has a hierarchical structure and the corresponding hierarchies. There is finally a need to formally define the closure of Intuitionistic fuzzy set over a universe that has a hierarchical structure as well the containment between different versions of these sets. References [1] K., Atanassov (1999). Intuitionistic Fuzzy Sets, Springer-Verlag, Heidelberg [2] K., Atanassov Intuitionistic Fuzzy Sets, Fuzzy Sets and Systems, 20, 87–96, (1986) [3] R. Kimball, The Data Warehouse Toolkit. New York: John Wiley & Sons, 1996. [4] S., Chaudhuri U. Dayal V., Ganti. Database Technology for Decision Support Systems. In: Computer, Vol. 34, p. 48-55, 2001 [5] M., Jarke et al. Fundamentals of data warehouses. Springer, London, 2002 [6] H. Thomas & A., Datta, A Conceptual Model and Algebra for On-Line Analytical Processing in Decision Support Databases. Information Systems Research 12: 83-102, 2001 [7] C. Dyreson, Information retrieval from an incomplete data cube, VLDB, Morgan Kaufman Publishers, pp. 532-543, 1996. [8] T. Pedersen, C. Jensen, and C. Dyreson, A foundation for capturing and querying complex multidimensional data, Information Systems, vol. 26, pp. 383-483, 2001. [9]D., Dubois et al (1988). Handling Incomplete or Uncertain Data and Vague Queries in Database Applications, Plenum Press, 1988 [10] H., Prade (1993). Annotated bibliography on fuzzy information processing. Readings on Fuzzy Sets in Intelligent Systems, H. Prade, D. Dubois, and R. Yager, Eds. Morgan Kaufmann Publishers Inc., 1993 [11]E. Rundensteiner L. Bic. Aggregates in posibilistic databases, VLDB'89, pp. 287-295, 1989