=Paper= {{Paper |id=Vol-2161/paper45 |storemode=property |title=Inverse Tree-OLAP: Definition, Complexity and First Solution |pdfUrl=https://ceur-ws.org/Vol-2161/paper45.pdf |volume=Vol-2161 |authors=Domenico Saccà,Edoardo Serra,Alfredo Cuzzocrea |dblpUrl=https://dblp.org/rec/conf/sebd/SaccaSC18 }} ==Inverse Tree-OLAP: Definition, Complexity and First Solution== https://ceur-ws.org/Vol-2161/paper45.pdf
                         Inverse Tree-OLAP:
              Definition, Complexity and First Solution

               Domenico Saccà1 , Edoardo Serra2 , and Alfredo Cuzzocrea3
                                  1
                                   University of Calabria, Italy
                                     sacca@unical.it
                                2
                                   Boise State University, USA
                            edoardoserra@boisestate.edu
                          3
                            University of Trieste and ICAR-CNR, Italy
                          alfredo.cuzzocrea@dia.units.it



         Abstract. Count constraint is a data dependency that requires the results of given
         count operations on a relation to be within a certain range. By means of count
         constraints a new decisional problem, called the Inverse OLAP, has been recently
         introduced: given a flat fact table, does there exist an instance satisfying a set
         of given count constraints? This paper focus on a special case of Inverse OLAP,
         called Inverse Tree-OLAP, for which the flat fact table key is modeled by a Di-
         mensional Fact Model (DFM) with a tree structure.


1     Introduction
Emerging “Big Data” platforms and applications call for the invention of novel data
analysis techniques that are capable to handle large amount of data. There is therefore
an increasing need to use real-life datasets for data-driven experiments but, as pointed
out in a recent ACM SIGMOD Blog post by Gerhard Weikum [13], datasets used into
research papers are often poor. Companies have their own interesting data, and indus-
trial labs have access to such data and real-life workloads; however, such datasets are
often proprietary and out of reach for academic research. In order to ensure high quality
experimental findings, inverse mining techniques can be applied to generate artificial
datasets that reflect the patterns of real ones: the patterns are first discovered by data
mining techniques (or directly provided by domain experts) and then used to generate
“realistic” privacy-preserving datasets.
     A promising and important example of inverse mining is Inverse Frequent itemset
Mining (IFM), first stated in [9], which consists of finding a transactional database D
satisfying given support constraints on some itemsets, which are typically the frequent
ones. As an example, given the items a, b, c, d, the itemsets I1 = {a, b}, I2 = {b, c}
and I3 = {c, d}, the support constraints σI1 = σI2 = 100 and σI3 = 50, and a fixed
database size equal to 170, a feasible transactional database D consists of the following
(replicated) transactions: h{a, b, c}, 70i (i.e., there are 70 occurrences of the transaction
{a, b, c} in D), h{b, c, d}, 10i, h{a, b}, 30i, h{b, c}, 20i and h{c, d}, 40i. Observe that
the support constraint σI1 = 100 is satisfied by D as there are 30 occurrences of I1 in
    SEBD 2018, June 24-27, 2018, Castellaneta Marina, Italy. Copyright held by the author(s).
D and I1 is a subset of {a, b, c} occurring 70 times in D – it is easy to see that the other
support constraints are satisfied as well.
     Count constraints have been used in [12] to define a new integrity constraint deci-
sional problem: given a relation scheme and a number of count constraints on it, does
there exist a relation instance satisfying the count constraints? The problem has been
called Inverse OLAP because the typical relation scheme dealt with is a (flat) fact table,
whose attributes are dimensions (i.e., properties, possibly structured at various levels of
abstraction) – no measure attributes are considered as the only aggregation operator is
the count. We recall that a flat fact table can be represented as a star schema with ad-
ditional tables for storing dimensions or a snowflake schema, which includes additional
dimensional tables describing dimension hierarchies. This kind of fact table is called a
factless fact table [7] and it arises in some scenarios storing many-to-many mappings
in which no attribute qualifies as a measure – typical cases are event records, where an
event is given by a combination of simultaneously occurring dimensional characteris-
tics.
     Inverse OLAP has a potential high relevance in synthesizing data cubes having pre-
defined characteristics to be used in benchmarks of novel techniques for handling big
data. Inverse OLAP has been proved in [12] to be NEXP-complete under various con-
ditions: data complexity (i.e, the number of attributes and the size of constraints are
constant), program complexity (i.e, the domains are constant) and combined complex-
ity.
     We are not discouraged by the high complexity and in this paper we present a so-
lution approach of the Inverse OLAP problem for the case of a flat fact table, called
tree-fact table, whose schema is a special case of Dimensional Fact Model (DFM) of
[5]: it is a tree (and not a graph as in the general definition) having dimension hierarchy
levels as nodes and the “rolls-up-t” relationships between them as edges.
     This paper is the short version of the paper [11], where we present the main results
of our research.


2    Inverse Tree-OLAP

In this section we define the Inverse Tree-OLAP problem. We assume that the flat
fact table R has the following scheme (tree-fact table): R(B1 , . . . , Bn , S 1 , . . . , S n ,
M1 , . . . , Mm ), where (B1 , . . . , Bn ) is the table key (basic dimensions), S i , 1 ≤ i ≤ n,
is a dimension hierarchy tree rooted on Bi , and M1 , . . . , Mm ) are measures. We also
assume that the all the once-to-many relationships in the subtrees S 2 , . . . , S n are stored
as facts in ad-hoc additional domain tables (as in an ontology) whereas the first level of
the one-to-many relationships in the subtree S 1 are not. In addition we assume that all
the dimension domains are given in suitable tables except for the basic dimension B1
(aggregating basic dimension), whose values are generated by means of a polynomial
time function. Finally the values of measures are determined by means of a polynomial
time function having the table key domains as range.
    We make the following restrictions on the count constraints that can be applied to a
tree-fact table:
 – Tree-Tuple Count Constraint: the table key {B1 , . . . , Bn } must be contained in
   X ∪ Y, being X and Y suitable subsets of dimensions, respectively;
 – Tree-Tuple Count Constraint-B1 : Y = {B1 }, X is a subset of dimensions in the
   subtree S 1 rooted at A1 and all other basic dimensions B2 , . . . , Bn as well as the
   ones in the corresponding subtrees S 2 , . . . , S n must be contained in Z, being Z a
   suitable subset of dimensions;
 – Tree-Group Count Constraint: W = B1 , Y = {B2 , . . . , Bn } while all other di-
   mensions must be in Z (being Z a suitable subset of dimensions), i.e., they are
   existentially quantified.
    Observe that Tree-Group Count Constraints define an IFM-like subproblem, whose
very objective is to model how the values of the basic dimensions can occur together in
a many-to-many relationship. Restrictions on Tree-Tuple Count Constraints avoid the
problem of handling duplicates on table projections. In fact, the projected tuples are
guaranteed to be distinct as the table key B1 , . . . , Bn is included in the set of selection
/ projection attributes. The problem of removing duplicates from table projections has
been recognized in [1] as the critical step for checking satisfaction of cardinality con-
straints. As for Tree-Tuple Count Constraint-B1 , distinctness of projected tuples will
be guaranteed by our solution approach that group the tuples by the aggregation basic
dimension.
    We are now ready to define the specialization of Inverse OLAP.
Inverse Tree-OLAP Problem. Given a tree-fact table R on a DFM D and a set of 2D
count constraints C, the Inverse Tree-OLAP problem consists of deciding whether there
exists a relation r on R such that r |= C ∪ F , where F are the functional dependencies
corresponding to D.                                                                  2
  The next result shows that the (both data and combined) complexity of Inverse Tree-
OLAP remains NEXP-complete.
Proposition 1. Inverse Tree-OLAP is NEXP-complete.
     P ROOF (sketch). Membership to NEXP derives from the fact that Inverse OLAP is
in NEXP and Inverse Tree-OLAP is a specialization of Inverse OLAP. Actually satis-
faction of FDs is not explicitly required in the general Inverse OLAP formulation as
possible FDs are expressed as count constraints. Checking FDs can be obviously done
in time polynomial in the size of a problem certificate and, therefore, the complexity re-
mains in NEXP. To prove NEXP-hardness, it is sufficient to show that any instance x of
IFMI can be transformed in logarithmic space into an instance x0 of Inverse Tree-OLAP,
such that x is a YES-instance for IFMI if and only if x0 is a YES-instance for Inverse
Tree-OLAP. We shall give some intuition on this part of the proof in the example below.
2
    Observe that NEXP-completeness only depends on the presence of Tree-Group
Count Constraints. The fact that Tuple Constraints do not have high complexity is due
to our restrictions that produce table projections without duplicates.
    To get an intuition of our approach, consider a tree-fact table R(S, W, B,
C, R, T, L, A, G, P ) modeling a classical example of point-of-sales transaction appli-
cation. Recall that the dimensions are: S (Sale), W (Ware), B (ware Brand), C (sale
Customer), R (customer Ranking), T (customer Town), L (sale Location), A (sale Area)
and G (ware Group). The basic dimensions are S, W and B and S is the aggregating
one. There is a unique measure: P (item Price), where an item is a pair ware-brand
and its price may vary from one location to another. Then the value of P is univocally
determined by the values of L, W and B – we assume that the function returning the
price value is suitably stored to be used after a feasible solution has been produced. The
domains of the dimensions, denoted by DS , DW and so on, are given and they are finite
tables; however, DS is not explicitly stored but the values are generated as needed.
    The prefixed one-to-many relationships are given and stored into the tables: DW G ,
DLA , and DCT . The other one-to-many relationships are not predefined.
    A domain table D{W B}σ1 σ2 stores a number of predefined itemsets (i.e., sets of
items) and a range of admissible supports σ1 ≤ σ2 for each of them. There are ad-
ditional tables: DGA α1 α2 storing the admissible numbers of ware groups that must be
assigned for location areas, DBR γ1 γ2 storing the admissible numbers of brands that
must be sold to all customers of each ranking and DR ω1 ω2 storing the admissible num-
bers of regions that must be interested to all customers of each ranking. Observe that
the minimum value ω1 for the lowest ranking is 0.
Enforcing the total number of sales
               true → s1 ≤ #({S : R(S, w, b, c, r, t, l, a, g, p)}) ≤ s2                   (1)
where low letters represent existentially quantified variables and s1 , s2 with s1 ≤ s2 are
two non negative integers, denoting respectively the minimal and the maximal number
of sales in a feasible solution. Constraint (1) is a Tree-Tuple Count Constraint-B1 .
Enforcing frequency support constraints on itemsets
               D{W B}σ1 σ2 (Ẍ, σ̈1 , σ̈2 ) →
                            σ̈1 ≤ #({S : Ẍ ⊆ {W B : R(S, W, B, v)}}) ≤ σ̈2                (2)
where v is the list of existentially quantified variables [c, r, t, l, a, g, p], dotted letters
represent universally quantified variables and σ¨1 , σ¨2 are the minimal and maximal sup-
port for frequent itemsets. Constraint (2) is a Tree-Group Count Constraint that enforces
that all itemsets stored in D{I}β1 β2 be frequent.
Enforcing infrequency support constraints on itemsets
             Ÿ ={W B : DW (W )×DB (B)} ∧ D{W B}σ1 σ2 (Ẍ, σ̈1 , σ̈2 ) ∧
                  Ÿ 6⊆ Ẍ → #({S : Ẍ ⊆ {W B : R(S, W, B, v))}}) ≤ σ                      (3)
where σ is a non negative integer denoting the infrequency support threshold. Constraint
3 is a Tree-Group Count Constraint that enforces that all itemsets that are not a (not
necessarily proper) subset of some itemset in D{W B}σ1 σ2 must have a support below
the threshold σ.
    We point out that Constraints (1), (2) and (3) expresses the IFMI problem – so they
can be used in the NEXP-hardness proof of Proposition 1. Our formalism is rather
powerful as it enable to define additional constraints to IFMI – as an example, we next
show how to enforce a size range for the fact table and a maximum number of items for
transactions (the latter constraint has been analyzed in [3]).
Enforcing the total number of tuples in the fact table
                    true → s3 ≤ #({S W B : R(S, W, B, v)}) ≤ s4                       (4)
where s3 , s4 with s3 ≤ s4 are two non negative integers, denoting respectively the
minimal and the maximal cardinality of a feasible fact table instance. Constraint (1) is
a Tree-Tuple Count Constraint.
Enforcing a maximum number of itemsets in a sale
                        DS (S̈) → #({W B : R(S̈, W, B, v)}) ≤ λ                       (5)
where λ is a positive integer denoting the maximal transaction length. Constraint 5 is a
Tree-Tuple Count Constraint.
   We next define a number of additional Tree-Tuple Count Constraints (including type
B − 1) that involve also non-basic dimensions.
Enforcing prefixed one-to-many relationships Consider first the prefixed one-to-
many relationship from G (ware Group) to W (Ware), defined by DW G . The following
Tree-Tuple Count Constraint:
              DW (Ẅ) ∧ DS (S̈) ∧ DB (B̈) →
               #({G : R(S̈, Ẅ, B̈, c, r, t, l, a, G, p) ∧ ¬DW G (Ẅ, G)}) = 0.       (6)
excludes the possibility that any ware can take a group value different from the one that
has been fixed in DW G . Observe that we have slightly modified the notation inside the
function # by enabling a conjunction of the fact predicate with domain predicates – the
semantics of this extension is obvious.
    We add similar constraints for all other prefixed one-to-many relationships: from L
(sale Location) to A (Area), from C (Customer) to T (Town) and from C to R (customer
Ranking).
Distributing ware groups on areas
               DGA α1 α2 (G̈, Ä, α̈1 , α̈2 ) →
                  α̈1 ≤ #({S W B : R(S, W, B, c, r, t, l, Ä, G̈, p)}) ≤ α̈2 .        (7)
    This Tree-Tuple Count Constraint fixes the number of wares of each group that must
be sold to all locations of a given area.
Distributing sales on locations
                    DL (L̈) → γ1 ≤ #({S : R(S, W, B, v)}) ≤ γ2 .                      (8)
where v has been defined above and γ1 , γ2 with γ1 ≤ γ2 are two non negative inte-
gers denoting the same distribution range for all locations. This is a Tree-Tuple Count
Constraint-B1 .
Assigning brands to rankings
                DB R τ1 τ2 (B̈, R̈, τ̈1 , τ̈2 ) →
                   τ1 ≤ #({S W B̈ : R(S, W, B̈, c, R̈, t, l, a, g, p}) ≤ τ2 .         (9)
   This is a Tree-Tuple Count Constraint.
Assigning the numbers of sales to customers per ranking
                DR ω1 ω2 (R̈, ω̈1 , ω̈2 ) →
                        ω1 ≤ #({S : R(S, w, b, c, R̈, t, l, a, g, p)}) ≤ ω2 .               (10)
This is a Tree-Tuple Count Constraint-B1 . Observe that a ranking dimension is consid-
ered in [8] to show an example of derived (hidden) dimensions.


3   Solving Inverse Tree-OLAP
In this section we present a method for finding an approximate solution of the Inverse
Tree-OLAP. Given a instance x of the Inverse Tree-OLAP:

 1. Construct an instance y of IFMI using Tree-Group Count Constraints and possible
    Tree-Tuple Count Constraints (also in the variant B1 ) for which all non-basic di-
    mensions are existentially quantified. Any itemset is a set of (n − 1)-tuples on the
    basic dimensions B2 , . . . , Bn that share the same value for the aggregating basic
    dimension B1 .
 2. Solve y using the algorithm of [6] – let D be the transaction database computed by
    the algorithm, where a transaction is a set of items and each item is an element of
    B2 × · · · × Bn .
 3. Represent the database D as a set of triples (t, Z(t), δ(t)), where Z(t) is an itemset
    occurring in D, δ(t) is the number of its duplicates in D and t is an identifier for
    the itemset Z(t). Note that t is not a value in the domain of the aggregating basic
    domain B1 but it represents a group of δ(t) distinct values for B1 .
 4. Represent an instance r of the relation R by means of variables as follows: for each
    (t, Z(t), δ(t)), introduce variables xb̂1 b2 ...bn d , where b2 . . . bn d are all admissible
    values for B2 , . . . , Bn , d is the list of all possible values for non-basic domains that
    are not determined by the prefixed one-to-many relationships and b̂1 represents the
    portion of t duplicates that are assigned to the values of the other subscript indices.
    Observe that this representation can be seen as a chase, which is a finite “canonical”
    database, witnessing the satisfiability of integrity constraints [2].
 5. Use the Tree-Tuple Count Constraints and Tree-Tuple Count Constraints-B1 in-
    volving non-basic dimensions to compute the variable values. To this end, we use
    linear equations that are solved using a linear program whose objective function is
    the cost of violating the above constraints.
 6. Generate a flat table starting from the computed variable values by including the
    values for non-basic domains that are determined by the prefixed one-to-many re-
    lationships and by computing the measure values.

    For presentation sake, we next illustrate our method by referring to the example
described in Section 2.
    As already mentioned, constraints (1), (2) and (3) encode an instance of IFMI , de-
scribed in the following. We first present the formal definition of IFMI . Let:
 1. I be a given set of n items and UI be the set of all non-empty itemsets over I
 2. S be a given set of m itemsets over the items in I
 3. S 0 denote {I ∈ UI | I 0 ∈ S : I ⊆ I 0 }
 4. Γσ = {(I, σ1I , σ2I ) | I ∈ S, 0 ≤ σ1I ≤ σ2I } be a given set of triples assigning a
    minimum and maximum support to each itemset in S
 5. σ 0 ≥ 0 be the maximum support threshold for all itemsets in S 0
 6. size = (size1 , size2 ), 0 ≤ size1 ≤ size2 , be the minimal and maximal number
    of transactions.
The inverse frequent itemset mining problem with infrequency constraint (IFMI ) on I,
S, Γσ , σ 0 and size consists of finding a database D over I such that the following
conditions hold (or of eventually stating that there is no such a database):
                                       I
                             ∀I ∈ S : σmin ≤ σ D (I) ≤ σmax
                                                        I
                                                                                          (11)
                                   ∀I ∈ S 0 : σ D (I) ≤ σ 0                               (12)
                                  size1 ≤ |D| ≤ size2 .                                   (13)

    By the anti-monotonocity property, constraints (12) can be replaced by:

                                  ∀I ∈ BS 0 : σ D (I) ≤ σ 0                               (14)

where BS 0 = {I ∈ S 0 | I 0 ∈ S 0 : I 0 ⊂ I} is the set of the bottom (i.e., minimal)
elements of S 0 .
    An instance of IFMI is constructed by fixing: I = {hw bi | DW (w) × DB b}, Γσ =
{(X, σ1 , σ2 ) | D{W B}σ1 σ2 (X, σ1 , σ2 )}, σ 0 = σ (see constraint 3), size1 = s1 and size2 =
s2 (see constraint 1). It is then straightforward to derive S and BS 0 .
    For the solution of the IFMI instance, we adopt the approach of [6]. The instance
is represented as an integer linear program with a linear number of constraints (corre-
sponding to the support constraints of the itemsets in S ∪ BS 0 and to the two database
size constraints) and an exponential number 2n of variables xj , one for each possible
transaction Ij , indicating to number of its occurrences in the database. The program
is represented in a succinct format with size O(n (m + m0 )) – recall that a succinct
problem is a problem whose instances are not given straightforward, but are themselves
encoded using an a concise input with size logarithmic in the actual input size [10]. Then
the integer constraint on the variables xj are relaxed so that an approximate solution is
obtained by solving a linear program. Because of the exponential number of variables,
the column generation algorithm (see e.g [4]), which is a version of the simplex dealing
with a large number of variables (large-scale linear programs). This method solves a
linear program without explicitly including all columns (i.e., variables), in the coeffi-
cient matrix but only a subset of them with cardinality equal to the number of rows (i.e.,
constraints). Columns are dynamically generated by solving an auxiliary optimization
problem called the pricing problem.
    Observe that constraints (4) and (5) are not taken into account in the resolution
algorithm of [6]. Constraint (4) can be easily handled by a simple additional constraint
in the liar program. Constraint (5) states that for each itemset with length greater than λ,
the associated variable xj must be equal to zero. This restriction can be easily handled
inside the pricing problem resolution.
    The algorithm returns a transaction database D that is an approximate solution of
the problem. The database D is represented in a succinct format: (ŝ, Z, δ), where Z is
an itemset occurring in D, δ is the number of its duplicates and ŝ is an identifier for Z.
We stress that ŝ is not a sale value in DS as the itemset Z will be eventually assigned to
δ distinct sale values. Let Ŝ the set of all identifiers ŝ. For each ŝ ∈ Ŝ, δ(ŝ) denotes the
duplicate number for ŝ, Z(ŝ) is the itemset Z.
    So far we have implemented steps (1)-(3). To implement step (4), we introduce the
variables xŝwblc for each ŝ in Ŝ, for each ware w of the domain of W , for each brand b
of the domain of B, for each location l of the domain L and for each customer c of the
domain of C. There are no indices corresponding to non-basic domains that are defined
by means of on-to-many fixed relationships. Indeed, because of constraints of type (6),
once that a domain value is fixed, the value of its roll-up domain value is automatically
fixed as well.
    To make the computation effective, we relax the integer constraint on all variables;
so they are defined on the domain of non-negative rational numbers. We introduce the
following equations:

              X
                   xŝwblc =δ(ŝ)                           ∀ŝ ∈ Ŝ ∀ hw, bi ∈ Z(ŝ)       (15)
          l∈DL ,c∈DC

                                                      ∀ŝ ∈ Ŝ ∀l ∈ DL ∀c ∈ DC
                   xŝwblc =0                                                               (16)
                                                    ∀ hw, bi ∈ (DW ×DB ) \ Z(ŝ)

    Equation (15) states that, fixed a transaction ẑ and any item hw, bi of Z(Ŝ), the sum
of all assignments of the item to all possible locations and customers must be equal to
the duplicate number δ(ŝ). On the other hand, equation 16 enforces that xŝwblc = 0
whenever the item hw, bi i is not part of any transaction ŝ.
    Let us now implement step (5). To this end, we consider all constraints from (7) to
(10).
    Constraints (7) is implemented as:

                                   X                                            ∀g ∈ DG
            α1ga ≤                            xŝwblc ≤ α2ga                                (17)
                                                                                ∀a ∈ DA
                     ∀ŝ ∈ Ŝ ∀c ∈ DC ∀l : DLA (l, a)
                      ∀ hw, bi ∈ Z(ŝ) : DW G (w, g)

where (α1ga , α2ga ) is the pair s.t. DGA α1 α2 (g, a, α1lc , α2lc ) is true.
   Constraints (8), (9) and (10) are implemented as:

                                   X
                γ1 ≤                       xŝwblc ≤ γ2                     ∀l ∈ DL         (18)
                       ∀ŝ∈Ŝ ∀hw,bi∈Z(ŝ) ∀c∈DC


                                    X                                       ∀b ∈ DB
                 τ1br ≤                   xŝwblc ≤ τ2br                                    (19)
                                                                            ∀r ∈ DR
                         ∀ŝ ∈ Ŝ ∀c : DCR (c, r)
                        ∀w ∈ DW : hw, bi ∈ Z(ŝ)
                                  X
               ω1r ≤                      xŝwblc ≤ ω2r                     ∀r ∈ DR         (20)
                   ∀ŝ∈Ŝ ∀hw,bi∈Z(ŝ) ∀c: DCR (c,r)
where (τ1br , τ2br ) and (ω1r , ω2r ) are the pairs such that DBR τ1 τ2 (b, r, τ1gr , τ2gr ) and
DR ω1 ω2 (r, ω1r , ω2r ) are true.
    Obviously equations (15)–(20) can be solved in time polynomial in the domain
sizes. Nevertheless, there is a problem that must be dealt with: the equations could
be too restrictive so that no solution exists. However, as we are interest in finding an
approximate solution that has the minimal violation the problem can be solved by intro-
ducing two artificial variables for each equation. For instance, equation (18) is rewritten
as:

                              X
              w1l +                  xŝwblc ≥ γ1                       ∀l ∈ DL
                   ∀ŝ∈Ŝ ∀hw,bi∈Z(ŝ) ∀c∈DC
                              X
              w2l −                  xŝwblc ≥ − γ2                     ∀l ∈ DL
                   ∀ŝ∈Ŝ ∀hw,bi∈Z(ŝ) ∀c∈DC

where w1l and w2l are non-negative artificial variables. We take the summation of all
artificial variables as an objective function to be minimized. Therefore we obtain a
linear program representation of the equation resolution problem. By solving the linear
program we compute a solution minimizing the objective function – that is, the one with
the minimal constraint violation.
     We point out that the size of the linear program is not exponential as for the IFMI
linear program but polynomial. Nevertheless, the number of both rows and columns
may be large. So, a row-column generation approach can be adopted: at each iteration
step, the pricing problem has to also decide which row must be eventually expanded.
     Once solved the linear equation system, we have to perform the last task (6) to
construct a feasible instance r. To this end, for each positive basic variable xŝwblc , say
with value k, we select k distinct values s1 , . . . , sk from the domain of S, add the
values of non-basic domains R, T , A and G, compute the measure P and insert the so
constructed tuples into r.


4   Conclusions

Inverse OLAP is the problem of deciding whether a given fact table satisfies a set of
given count constraints [12]. In this paper we have defined a special case of this prob-
lem, called Inverse Tree-OLAP, for which the flat fact table key is modeled by a Dimen-
sional Fact Model (DFM) with a tree structure. The count constraints define aggregation
patterns to be respected by both the many-to-many relationship among the basic dimen-
sions and the one-to-many relationships within dimension hierarchies.


References

 1. Arvind Arasu, Raghav Kaushik, and Jian Li. Data generation using declarative constraints.
    In Proceedings of the 2011 international conference on Management of data, SIGMOD ’11,
    pages 685–696, New York, NY, USA, 2011. ACM.
 2. Catriel Beeri and Moshe Y. Vardi. Polynomial-time implication problems for unary inclusion
    dependencies. J. of the ACM, 37:15–46, 1990.
 3. Toon Calders. The complexity of satisfying constraints on databases of transactions. Acta
    Inf., 44(7-8):591–624, 2007.
 4. George B. Dantzig and Mukund N. Thapa. Linear Programming 2: Theory and Extensions.
    Springer-Verlag, 2006.
 5. Matteo Golfarelli, Dario Maio, and Stefano Rizzi. The dimensional fact model: A conceptual
    model for data warehouses. Int. J. Cooperative Inf. Syst., 7(2-3):215–247, 1998.
 6. Antonella Guzzo, Luigi Moccia, Domenico Saccà, and Edoardo Serra. Solving inverse fre-
    quent itemset mining with infrequency constraints via large-scale linear programs. TKDD,
    7(4):18:1–18:39, 2013.
 7. Ralph Kimball. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional
    Data Warehouses. John Wiley & Sons, 1996.
 8. Svetlana Mansmann, Nafees Ur Rehman, Andreas Weiler, and Marc H. Scholl. Discovering
    olap dimensions in semi-structured data. In Il-Yeol Song and Matteo Golfarelli, editors,
    DOLAP, pages 9–16. ACM, 2012.
 9. T. Mielikainen. On inverse frequent set mining. In IEEE Computer Society, editor, Proc. of
    2nd Workshop on Privacy Preserving Data Mining (PPDM), pages 18–23, 2003.
10. Christos H. Papadimitriou. Computational complexity. Addison-Wesley, Reading, Mas-
    sachusetts, 1994.
11. Domenico Saccà, Edoardo Serra, and Alfredo Cuzzocrea, editors. IDEAS 2018: 22nd In-
    ternational Database Engineering and Applications Symposium, June 1820, 2018, Villa San
    Giovanni, Italy. ACM, 2018.
12. Domenico Saccà, Edoardo Serra, and Antonella Guzzo. Count constraints and the inverse
    olap problem: Definition, complexity and a step toward aggregate data exchange. In FoIKS,
    pages 352–369, 2012.
13. Gerhard Weikum. Where’s the Data in the Big Data Wave? 2013. ACM Sigmod BLOG:
    http://wp.sigmod.org/?p=786.