<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal) = {(a1</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Logic Programs for Repairing Inconsistent Dimensions in Data Warehouses</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Loreto Bravo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M´onica Caniup´an</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Hurtado</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Adolfo Ib ́an ̃ez</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad de Bio-Bio</institution>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad de Concepci ́on</institution>
          ,
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A Data Warehouse (DW) is a data repository that integrates data from multiple sources and organizes the data according to a set of data structures called dimensions. Each dimension provides a perspective upon which the data can be viewed. In order to support an efficient processing of queries, a dimension is usually required to satisfy different classes of integrity constraints. In this paper, we study the problem of repairing a dimension when it fails to satisfy a set of two classes of integrity constraints: strictness constraints and covering constraints. We introduce the notion of minimal repair of a dimension in this context. A minimal repair is defined as a new dimension that is consistent with respect to the integrity constraints, which is obtained by applying a minimal amount of updates to the original dimension. We study the complexity of computing minimal repairs. Finally, we show how to characterize and compute minimal repairs of a dimension using Datalog programs with stable model semantics.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Data Warehouses (DWs) are data repositories that integrate data from different
sources, and keep historical data for analysis and decision support [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. When
generating reports, it is of central importance for these systems to compute
queries that are summaries of data in a simple and efficient way. In order to
do this, DWs organize data according to the multidimensional model. In the
multidimensional model, dimensions reflect the perspectives upon which facts
are viewed. Facts correspond to events which are usually associated to numeric
values known as measures, and are referenced using the dimension elements.
Dimensions are modeled as hierarchies whose nodes are called elements, where
each element belongs to a category. The categories are also organized into a
hierarchy called hierarchy schema.
Downloads
Article Date N
a1 01-01-2007 3
a2 01-01-2007 2
a1 02-01-2007 5
a2 02-01-2007 5
(c) Fact table
Journal
      </p>
      <p>Article
Area</p>
      <p>Subject</p>
      <p>
        all
(a) Publication dimension: (b) Publication dimension:
elehierarchy schema ments and rollup relations
to Subject category. The top category is All. Figure 1(b) shows the elements of
the Publication dimension, along with the relations between elements of different
categories, called rollup relations. For example, a1 and a2 are elements of category
Article and b1, b2 are elements of category Journal. For each edge in the hierarchy
schema, there is a rollup relation. For example, the rollup relation from the
category Article to the category Journal contains the pairs or elements: (a1, b1)
and (a2, b2). The fact table of the DW is shown in Figure 1(c). Each fact stored
in the table represents the numbers of times an article was downloaded in a given
date. As an example, article a1 was downloaded three times on January 1, 2007.
The hierarchical structure of dimensions allows users to access facts at different
levels of granularity. As an example, using the aforementioned DW it is easy to
compute summaries such as: number of times that each article was downloaded
per month, or number of downloads broken down by area and year. 2
In order to compute summaries efficiently, DWs use pre-computed summaries
at low level categories to derive summaries at higher level categories. Two main
classes of integrity constraints, strictness and covering constraints, are used to
check whether such computations, called summarizations, are correct [
        <xref ref-type="bibr" rid="ref13 ref19 ref25">13, 19, 25</xref>
        ].
In a consistent summarization, each fact is aggregated once and not more than
once. Strictness constraints are used to require rollup relations to be strict
(manyto-one) relations. If a rollup relation is required to be strict, then there cannot
exist an element connected to two different elements in the rollup relation. As an
example, in the Publication dimension (Figure 1(a) and (b)) the rollup relation
from Article to Journal is strict. In contrast, the rollup relation from Article to
Subject is not strict, because a2 is connected to two different elements in the
category Subject. Covering constraints are used to require a rollup relation to
connect all the elements that belongs to the category from which the rollup
relation departs. As an example, in the dimension of Figure 1(b), the rollup
relation from Article to Subject it is not covering since the element a1 is not
connected to any element in Subject.
      </p>
      <p>Example 2. The summary SArea(Area, N ) = {(c1, 8), (c2, 7)} (Figure 1) can be
correctly derived from the summary SArticle(Article, N ) = {(a1, 8), (a2, 7)} by
summing up the number of downloads for each group of articles that go to the
same area in the dimension (element a1 goes to c1, and a2 goes to c2). The
correctness of this derivation follows from the fact that the rollup relation from</p>
      <p>all
Area Subject c1 c2</p>
      <p>d1
Article
(a) Dimension D</p>
      <p>all</p>
      <p>all</p>
      <p>
        d1
Article to Area is strict and covering. Similarly, SArea can be correctly derived
from SJournal(Journal, N ) = {(b1, 8), (b2, 7)}. However, the summary SAll(All, N ) =
{(all, 15)} cannot be correctly derived from SSubject(Subject, N ) = {(d1, 0), (d2, 7)}
since such derivation would yield SAll(All, N ) = {(all, 7)}. This derivation is
incorrect because the rollup relation from Article to Subject is not covering. 2
It has been shown that the dimensions that arise in real-world applications do not
always have strict and covering rollup relations. The flexibility to represent rollup
relations has been incorporated as a central feature of several dimension models
[
        <xref ref-type="bibr" rid="ref13 ref24">13, 24</xref>
        ]. In order to keep the ability to check consistency of summarizations, it is
important for DW systems to know the set of strictness and covering constraints
that hold. This can be achieved by allowing DW designers to formulate the
constraints that must hold when the dimensions are modeled. In this paper, we
study the form of inconsistency that arises when a dimension does not satisfy
a set of strictness and covering constraints. Similar to the vast majority of the
work on inconsistency handling in databases (cf. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), we address the scenario
where the constraints prevail over the data and the dimension must be updated
in order for it to satisfy the constraints.
      </p>
      <p>
        Example 3. Suppose that the DW administrator considers that the following
constraints should be satisfied by the dimension D in Figure 2: ϕ1 : Article ⇒ All,
ϕ2 : Journal ⇒ All, ϕ3 : Area ⇒ All, ϕ4 : Subject ⇒ All, which state that the rollup
relations from categories Article, Journal, Area and Subject to All are covering, and
the constraint ϕ5 : Article → Area establishing that the rollup relation from Article
to Area is strict. However, dimension D fail to satisfy constraint ϕ5, since element
a1 goes to two different elements: c1 and c2 in the Area category. Also, D violates
ϕ1 because element a2 does not reach all. The DW administrator considers that
the constraints properly capture the semantics of the data, and therefore their
violation indicates the presence of errors. Therefore, the dimension should be
repaired (corrected) in order to resolve the inconsistency. Figure 2(b)-(c) are
two possible repairs of D. Repair D1 is obtained by deleting edge (a1, b2) to
remove the violation of ϕ5 and inserting (a2, d1) to remove the violation of ϕ1.
Repair D2 is obtained by deleting edge (a1, b1) and inserting (a2, d1). 2
We introduce the notion of a minimal repair of a dimension that does not satisfy
a set of constraints. A minimal repair is defined as a new dimension that satisfies
the constraints and is obtained by applying a minimal number of changes to the
original dimension. We study the complexity of the problem of computing
minimal repairs, and show that in general the problem is NP-hard. We also explore
the use of Datalog programs with stable model semantics [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to characterize
minimal repairs. Logic programs act as a compact and executable logical
specification of dimension repairs. The rest of the paper is organized as follows: Section
2 presents DW dimensions, and strictness and covering constraints. Next,
Section 3 presents the notion of repair and describes complexity issues. Section 4
presents the logic programs to compute repairs. Finally, Sections 5 and 6 present
related work and the conclusions of the paper.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        In this section we formalize dimensions using standard concepts from
multidimensional models obtained from [
        <xref ref-type="bibr" rid="ref15 ref17 ref24 ref9">9, 15, 17, 24</xref>
        ], with some minor modifications to
facilitate presentation.
      </p>
      <p>A hierarchy schema H consists of a pair (CH, րH), where (CH, րH) is an
acyclic directed graph. Vertices in the set CH are categories and the edges րH
represent the child/parent relations between categories. The transitive and
re∗
flexive closure of րH is denoted by րH. The set of categories CH contains a
distinguished top category denoted AllH, which is reachable from every other
category in CH and has no outgoing edges, that is, there is no category ci ∈ CH
∗
such that (AllH, ci) ∈րH and for every cj ∈ C, (cj , AllH) ∈րH. Sometimes, we
will write ca րH cb instead of (ca, cb) ∈րH.</p>
      <p>Example 4. The hierarchy schema H = (CH, րH), depicted in Figure 1(a), is as
follows: CH = {Article, Journal, Subject, Area, All}; AllH = All; and րH={ (Article,
Journal), (Journal, Area), (Area, All), (Article, Subject), (Subject, All)}. 2
A dimension D is a tuple (HD, ED, CatD, &lt;D), where HD = (CHD , րHD ) is a
hierarchy schema; ED is a set of constants, called elements; CatD : ED → CHD is
a function that defines to which category each element in ED belongs to; and the
relation &lt;D⊆ ED × ED represents the child/parent relations between elements
∗
of different categories. We denote by &lt;D the reflexive and transitive closure
of &lt;D. The following conditions hold: (i) allD is the only element in category
AllHD (ii) for all pair of elements a, b ∈ ED if a &lt;D b then CatD(a) րHD CatD(b).
Condition (ii) ensures that the child/parent relation (&lt;D) only connects elements
of categories that are connected in the schema.</p>
      <p>Example 5. Let D = (HD, ED, CatD, &lt;D) be the dimension given in Figure 1(b).
Then D is as follows ED = {all, a1, a2, b1, b2, c1, c2, d1, d2}; allD = all; CatD =
{all 7→ All, a1 7→ Article, a2 7→ Article, b1 7→ Journal, b2 7→ Journal, c1 7→ Area, c2 7→
Area, d1 7→ Subject, d2 7→ Subject, }; and &lt;D= {(a1,b1), (a2,b2), (b1,c1), (b2,c2), (c1,all),
(c2,all), (a2,d1), (a2,d2), (d1,all), (d2,all)}. 2
The set of rollup relations of a dimension D is defined as follows. For each pair
∗
of categories ci, cj ∈ CHD such that ci րHD cj , there is a rollup relation denoted
by RD(ci, cj ) that has the following set of pairs {(a, b)|CatD(a) = ci, CatD(b) =
cj and a &lt;∗D b}.
We say that the rollup relation RD(ci, cj ) is strict if for all elements x, y, z in
E , if (x, y) ∈ RD(ci, cj ) and (x, z) ∈ RD(ci, cj ) then y = z. The rollup relation
RD(ci, cj ) is covering if for all elements e ∈ E such that Cat(e) = ci, there exists
an element e′ ∈ E such that (e, e′) ∈ RD(ci, cj ). A dimension is strict if all
its rollup relations are strict. Otherwise, the dimension is said to be non-strict.
Similarly, we use the notions of covering and non-covering dimension. Covering
dimensions are also called homogeneous and non-covering dimensions are called
heterogeneous.</p>
      <p>
        The vast majority of research and industrial applications of DWs consider
dimensions that are strict and covering [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. For strict and covering dimensions [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
summarization operations that aggregate facts between two categories that are
connected in the hierarchy schema are always correct. However, it has been
shown that in some real situations dimensions fail to satisfy these conditions
[
        <xref ref-type="bibr" rid="ref14 ref24">14, 24</xref>
        ]. In these cases, in order to keep the ability to verify summarizability,
it is useful to allow the DW administrator to specify integrity constraints to
identify rollup relations that are strict or covering [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Let H = (CH, րH) be a hierarchy schema and let D = (HD, ED, CatD, &lt;D)
be a dimension such that HD = H.
(i) A strictness constraint over H is an expression of the form ci → cj where
∗
ci, cj ∈ CH and ci րH cj . The dimension D satisfies the strictness constraint
ci → cj if and only if the rollup relation RD(ci, cj ) is strict.
(ii) A covering constraint over H is an expression of the form ci ⇒ cj where
∗
ci, cj ∈ CH and ci րH cj . The dimension D satisfies the covering constraint
ci ⇒ cj if and only if the rollup relation RD(ci, cj ) is covering.</p>
      <p>We denote by Σs(H) and Σc(H) the set of all possible strictness constraints
and covering constraints, respectively, over hierarchy schema H. We say that a
dimension D satisfies a set of constraints Σ if D satisfies every constraint in Σ.
Otherwise, we say that the dimension D is inconsistent with respect to Σ.
Example 7. Let D be the publication dimension depicted in Figure 1(b). This
dimension satisfies the following set Σ of strictness constraints: {Article → Journal,
Article → Area, Journal → Area}. However, D does not satisfy Article → Subject,
since an element of Article may reach more than one element in Subject. In
contrast, dimension given in Figure 2 is inconsistent with respect to Σ since
it violates the strictness constraint Article → Area. Indeed, the rollup relation
RD(Article, Area) = {(a1, c1), (a1, c2)} is non-strict. 2
The problem of determining if a dimension D satisfies a set of constraints can
∗
be solved in polynomial time by computing &lt;D.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Dimension Repairs</title>
      <p>If a dimension D does not satisfy a set of constraints Σ, we propose to compute
the minimal repairs of D, that is, dimensions over the hierarchy schema of D
that satisfy Σ and that stay as close as possible to the inconsistent dimension.
Example 8. Let D be the dimension given in Figure 2, and let Σ = {Article ⇒
All, Journal ⇒ All, Area ⇒ All, Subject ⇒ All, Article → Area}. Clearly, D is
inconsistent with respect to Σ. Figure 2(b)-(c) shows repairs of the dimension D
with respect to Σ. Repair D1 is obtained by deleting edge (a1,b2) and inserting
(a2,d1), and repair D2 is obtained by deleting (a1,b1) and inserting (a2,d1). There
are other possible repairs, for instance, a repair D3 can be obtained by deleting
edge (b1,c1) and inserting edges (b1,c2) and (a2,d1). Even though repair D3 get
rid of the inconsistencies, repairs D1, and D2 remain closer to dimension D. 2
Given two dimensions D = (HD, ED, CatD, &lt;D) and D′ = (HD′ , ED′, CatD′, &lt;D′),
′
the distance between them, dist (D, D ), is defined as |(&lt;D′ \ &lt;D) ∪ (&lt;D \ &lt;D′)|.</p>
      <p>′
The distance dist (D, D ) is the size of the symmetric difference between the
child/parent relations of the two dimensions. Next, we define the notions of
repair and minimal repair.</p>
      <p>Definition 1. [Repair, Minimal Repair] Let D = (HD, ED, CatD, &lt;D) be a
dimension and Σ be a set of integrity constraints over HD.
(i) A repair of D with respect to Σ is a dimension D′ = (HD′ , ED′, CatD′, &lt;D′)
such that HD′ = HD, ED′ = ED, CatD′ = CatD, and D′ satisfies Σ.
(ii) A minimal repair of D with respect to Σ is a repair D′, such that dist (D, D′)
is minimal among all the repairs of D with respect to Σ. 2
In the notion of repair we propose, the set of elements in each category of the
original dimension is preserved over all the repairs, that is deletions or additions
of new elements are not allowed.</p>
      <p>Example 9. The distance between D and the repairs in Figure 2(b)-(c) are
dist (D, D1) = dist (D, D2) = 2. Indeed, D1, and D2 are the minimal repairs
of D with respect to Σ. 2
It can be shown that there always exists a repair of a dimension provided that
the dimension has at least one element in each category. However, there might be
an exponential number of them. Thus, it becomes relevant to study the problem
of finding one.</p>
      <p>Theorem 1. Let D be a dimension, and k an integer. The problem of deciding
if there exists a repair D′ of D such that dist (D, D′) ≤ k is NP-complete. 2
Hardness is proven by reduction from the set covering problem. It follows then,
that the problem of computing a minimal repair is NP-hard and that deciding
if a dimension is a minimal repair is co-NP-complete.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Computing Repairs</title>
      <p>
        Computing repairs of dimensions inconsistent with respect to a set of covering
and strictness constraints can be expensive in terms of computational resources.
However, we can use Datalog programs with negation under stable model
semantics [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to represent and compute the minimal repairs of a dimension. Because
of space limitations, we will present here the programs with examples.
      </p>
      <p>
        Even though Datalog programs with negation under stable models semantics
can be used to express NP problems, current implementations, such as DLV4
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and Smodels5 [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], have shown to be quite efficient in practice [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. In what
follows, we will present a Datalog program with negation using predicates in
Table 1 that can be used to obtain the minimal repairs of a dimension.
      </p>
      <p>Atom Interpretation</p>
      <p>C(a) element a belongs to category C in dimension D
R(a, b, C1, C2) in D, the parent of element a is b, CatD(a) = C1 and CatD(b) = C2
R′(a, b, C1, C2) in D′, the parent of element a is b, CatD(a) = C1 and CatD(b) = C2
RT ′(a, b, C1, C2) transitive closure of R′
Ins(a, b, C1, C2) the child/parent relation (a, b) with CatD(a) = C1 and CatD(b) = C2,
was inserted to get D′
Del (a, b, C1, C2) the child/parent relation (a, b) with CatD(a) = C1 and CatD(b) = C2,
was removed to get D′
In order to simplify the presentation, we will first discuss the case where Σ =
Σs(H) ∪ Σc(H), that is we have to fix a dimension that, according to the
constraints, should be strict and covering. Let us observe that a covering and strict
dimension is such that every element will rollup to exactly one element in each
parent category. The repair program relies on this observation and first computes
a set of candidate dimensions that contains the same elements as the inconsistent
dimension and only one rollup between every element and every parent category
(note that all these candidate dimensions satisfy the covering constraints). Then,
it discards the models that do not satisfy the strictness constraints.
Example 10. Consider the Phone dimensions DP hone in Figure 3. Even though
the dimension is expected to be covering and strict, the dimension is not since
the strictness constraint Number → Region and the covering constraint Number ⇒
AreaCode are violated. This is, DP hone is inconsistent with respect to the set of
constraints Σc(H) ∪ Σs(H). The repair program to compute the minimal repairs
contains the following facts and rules:
Number(N1). Number(N2). Number(N3). AreaCode(45). AreaCode(41).
City(TCH). City(TEM). City(CCP). All(all). Region(VIII).
Region(IX). R(N1,41,Number,AreaCode). R(N3,45,Number,AreaCode).
R(N1,TCH,Number,City). R(N2,TEM,Number,City). R(N3,CCP,Number,City).
R(45,IX,AreaCode,Region). R(41,VIII,AreaCode,Region). R(TCH,VIII,City,Region).
R(TEM,IX,City,Region). R(CCP,VIII,City,Region). R(IX,all,Region,All).
RR(′(VXII,I,Ya,lln,Ruemgbioenr,,Aarlle)a.Code) ← Number(X), AreaCode(Y ), choice((X, areaCode)(Y )). (1)
R′(X, Y, number, city) ← Number(X), City(Y ), choice((X, city)(Y )). (2)
R′(X, Y, areaCode, region) ← AreaCode(X), Region(Y ), choice((X, region)(Y )). (3)
R′(X, Y, city, region) ← City(X), Region(Y ), choice((X, region)(Y )). (4)
R′(X, Y, areaCode, all) ← AreaCode(X), All(Y ), choice((X, all)(Y )). (5)
R′(X, Y, city, all) ← City(X), All(Y ), choice((X, all)(Y )). (6)
4 DLV System: http://www.dbai.tuwien.ac.at/proj/dlv/
5 Smodels System: http://www.tcs.hut.fi/Software/smodels/</p>
      <p>Region
AreaCodeCity</p>
      <p>Number
all
IX VIII
all</p>
      <p>IX VIII
Facts of the repair program correspond to the elements and rollup relations
in DP hone. The rest of the rules in the repair program are used to compute
the repairs. In general terms, the program: generates all possible dimensions
D′ = (HD′, ED′, CatD′, &lt;D′) such that HD′ = HDPhone , ED′ = EDPhone , CatD′ =
CatDPhone and such that every element has a unique parent in every parent
category (using rules (1) to (6)), then discards the ones that are not strict (with
rules (7) to (9)) and finally chooses the ones that are at a minimal distance
(using rules (10) to (13)).</p>
      <p>
        More specifically, rules (1) to (6) populate predicate R′ with a new covering
child-parent relation for the elements in EDPhone. This new relation assigns to
each element a a new parent in every category cj such that Cat(a) րHDPhone cj.
This parent is obtained by using the choice operator [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] that generates one
model for each c such that Cat(c) =cj. In fact, choice((X, cj), (Y )) will assign a
unique value to Y in each stable model for each combination (X, cj).
      </p>
      <p>
        Rules (7) and (9) compute the transitive closure of the child-parent relation of
the repair. Rule (9) is a program constraint (a rule without head) which enforces
that it is not possible to have that an element x rolls-up to two different parents
in the same category. Namely, it discards all the models in which the rollup
relations, captured in predicate RT ′, are not strict. Rules (10) and (11) keep
track of the insertion and deletions, respectively, that are needed to obtain the
repair. The weak constraints [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], denoted by ⇐, in rules (12) and (13) are used
to minimize the number of insertion and deletions needed to restore consistency.
In general, a weak constraint is of the form ⇐ ϕ [weight : level ], where ϕ is a
conjunction of atoms, and weight and level are integers. If ϕ is true in the model
the weak constraint is violated and the weight is added up to the cost of the
level. The stable models which minimize the sum of the weights of the violated
weak constraints are called the best models of the program.
      </p>
      <p>This program has three best models of total weight three6: M1= {Ins(N2, 45,
Number, AreaCode), Ins(CCP, IX, City, Region), Del (CCP, VIII, City, Region)}, M2 = {
Ins(N3, 41, Number, AreaCode), Ins(N2, 45, Number, AreaCode), Del (N3, 45, Number,
AreaCode)} and M3={Ins(N3, TEM, Number, City), Ins(N2, 45, Number, AreaCode),
Del (N3, CCP, Number, City)}. The total weight of each best model corresponds to
the distance between each of the repairs and dimension D. Each of these models
corresponds indeed to the repairs of DP hone with respect to Σc(H) ∪ Σs(H)
shown in Figure 3(c)-(e). 2
The correctness of the program relies on the fact that in strict and covering
dimensions every element contains exactly one parent in each parent category.
In the general case, where the dimension is not necessarily required to be strict
and covering, this does not hold anymore. However, in a minimal repair of a
dimension D, the number of parents in a category for every element can be
shown to be between zero and the maximum between one (1) and the number
of parents for that same element in D. Thus, in the general case, in order to
compute the minimal repairs, we have to be able to reduce the number of parents
an element can have. This is achieved by constructing a new dimension Ddel from
D for which the repairs of D can be computed only through reclassification of
child-parent relations of Ddel.</p>
      <p>Example 11. Figure 4(a) shows the del-dimension Ddel obtained from dimension
D in Figure 2(a) which is inconsistent with respect to Σ = {Article ⇒ All,
Journal ⇒ All, Area ⇒ All, Subject ⇒ All, Article → Area}. From it we can compute
a set of candidate repairs by reclassifying all the child/parent relations. From
these candidate repairs, the only two that satisfy the constraints (ignoring the
child/parent relations that involve any del-elements) are shown in Figure
4(b)(c). If the del-elements are removed, these dimensions correspond to the repairs
of D with respect to Σ shown in Figure 2(b)-(c). The repair program for D with
respect to Σ contains the following facts and rules:
Article(a1). Article(a2). Article(del). Journal(b1). Journal(b2).
Journal(del). Area(c1). Area(c2). Area(del). Subject(d1).
Subject(del). All(all). All(del). R(a1,b1,Article,Journal).
6 To simplify the presentation we show only the Ins and Del atoms which show the
modifications needed to restore consistency.</p>
      <p>all del
all del
all del
R(a1,b2,Article,Journal). R(b1,c1,Journal,Area). R(b2,c2,Journal,Area).</p>
      <p>R(c1,all,Area,All). R(c2,all,Area,All). R(d1,all,Subject,All).</p>
      <p>R(a1,del,Article,Subject). R(a2,del,Article,Subject).
The facts of the repair program correspond to the elements and rollup relations
in Ddel. Rules (14) to (17) compute a new candidate child-parent relation for the
elements in EDdel. Then, the consistency of each candidate repair with respect
to strictness (rules (18) to (20)) and covering (rules (21) to (23)) constraints is
checked ignoring every child/parent relation with a del-element. Next, rules (24)
and (26) chooses the ones that are at a minimal distance. Finally, the repairs are
obtained from the consistent candidate repairs by removing every del-element.
The best models of this program correspond to the repairs of D as shown in
Figure 2(b)-(c). 2
5</p>
    </sec>
    <sec id="sec-5">
      <title>Related Work</title>
      <p>
        Letz et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] motivate the importance of enforcing strictness constraints in
dimensions using constraints represented in dimensions hierarchies. In
particular, the constraints are used to keep dimensions strict upon successive update
operations. Pedersen et al. [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] present a method to model non-strict dimensions
as strict dimensions when the OLAP implementation requires the dimensions to
be strict. In our setting, in contrast, the objective is to clean errors that turn a
dimension inconsistent with respect to a set of constraints.
      </p>
      <p>
        The problem of repairing relational databases with respect to a set of integrity
constraints (e.g. functional dependencies and inclusion dependencies) has been
extensively studied (see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for a survey). Even though there are several ways
to represent a data warehouse using relations (e.g. star schema or snowflake
schema [
        <xref ref-type="bibr" rid="ref10 ref18">10, 18</xref>
        ]) it is not possible to use the relational repair techniques to
compute DW’s repairs. This is because, it is either not possible to represent the
required strictness and covering constraints with the relational constraints, or
the minimal relational repairs obtained by tuple deletions, insertions or updates
do not coincide with the minimal repairs of DWs. Indeed, they might result in
the removal of more roll-up relations than needed or in discarding repairs that
are minimal in the DWs sense, but not in the relational case.
      </p>
      <p>
        Logic programs have been used to specify the repairs of relational databases
with respect to relational integrity constraints (e.g. functional dependencies,
foreign key constraints) [
        <xref ref-type="bibr" rid="ref2 ref3 ref7">2, 3, 7</xref>
        ]. As far as we know, this paper is the first work
on the application of logic programs to restore consistency of DWs.
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper we propose logic programs to compute repairs of dimensions, which
are obtained by performing insertions and deletions of edges between dimension
elements. We do not consider the insertion or elimination of dimension elements,
options that might be helpful to restore consistency of dimensions. We leave this
analysis for future work.</p>
      <p>
        Since an enterprise data warehouses can contain terabytes of data [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
ensuring that the dimensions satisfy the required integrity constraints can be vital for
an efficient computation of reports and summaries. Therefore, it is important to
develop tools to allow the designer of a DW to check integrity constraints and to
help in the process of repairing inconsistencies with respect to them. We believe
that such tools would be particularly useful in the creation and cleaning stage
of DWs and when the dimensions are updated and adapted due to changes in
data sources or modifications to the business rules [
        <xref ref-type="bibr" rid="ref15 ref16 ref22 ref6">6, 15, 16, 22</xref>
        ].
      </p>
      <p>
        We have explored a novel line of research which is the application of logic
programming to support data cleaning tasks in data warehousing. An interested
idea for future research is to compute consistent answers to queries over DWs,
following the framework proposed in [
        <xref ref-type="bibr" rid="ref1 ref4">1, 4</xref>
        ], where repairs are used as auxiliary
concepts to characterize consistent answers to queries. A preliminary analysis in
this direction is presented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Acknowledgements: Loreto Bravo is funded by CONICYT grant PSD-57.
M´onica Caniup´an is funded by FONDECYT grant #11070186.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bertossi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Chomicki</surname>
          </string-name>
          .
          <article-title>Consistent Query Answers in Inconsistent Databases</article-title>
          .
          <source>In PODS'99</source>
          , pages
          <fpage>68</fpage>
          -
          <lpage>79</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bertossi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Chomicki</surname>
          </string-name>
          .
          <article-title>Answer Sets for Consistent Query Answering in Inconsistent Databases</article-title>
          . TPLP,
          <volume>3</volume>
          (
          <issue>4</issue>
          -5):
          <fpage>393</fpage>
          -
          <lpage>424</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. P. Barcel´o and
          <string-name>
            <given-names>L.</given-names>
            <surname>Bertossi</surname>
          </string-name>
          .
          <article-title>Logic Programs for Querying Inconsistent Databases</article-title>
          .
          <source>In PADL 03</source>
          ,
          <string-name>
            <surname>Springer</surname>
            <given-names>LNCS</given-names>
          </string-name>
          2562, pages
          <fpage>208</fpage>
          -
          <lpage>222</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>L.</given-names>
            <surname>Bertossi</surname>
          </string-name>
          .
          <source>Consistent Query Answering in Databases. ACM Sigmod Record</source>
          ,
          <volume>35</volume>
          (
          <issue>2</issue>
          ):
          <fpage>68</fpage>
          -
          <lpage>76</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>L.</given-names>
            <surname>Bertossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bravo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Caniupan</surname>
          </string-name>
          .
          <article-title>Consistent Query Answering in Data Warehouses</article-title>
          .
          <source>In AMW'09</source>
          , volume
          <volume>450</volume>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Body</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Miquel</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <article-title>B´edard, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Tchounikine</surname>
          </string-name>
          .
          <article-title>Handling Evolutions in Multidimensional Structures</article-title>
          .
          <source>In ICDE'03</source>
          , pages
          <fpage>581</fpage>
          -
          <lpage>591</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>L.</given-names>
            <surname>Bravo</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Bertossi</surname>
          </string-name>
          .
          <article-title>Semantically Correct Query Answers in the Presence of Null Values</article-title>
          .
          <source>In IIDB 06</source>
          ,
          <string-name>
            <surname>Springer</surname>
            <given-names>LNCS</given-names>
          </string-name>
          4254, pages
          <fpage>336</fpage>
          -
          <lpage>357</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>F.</given-names>
            <surname>Buccafurri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leone</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Rullo</surname>
          </string-name>
          .
          <article-title>Enhancing Disjunctive Datalog by Constraints</article-title>
          .
          <source>TKDE</source>
          ,
          <volume>12</volume>
          (
          <issue>5</issue>
          ):
          <fpage>845</fpage>
          -
          <lpage>860</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>L.</given-names>
            <surname>Cabibbo</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Torlone</surname>
          </string-name>
          .
          <article-title>Querying Multidimensional Databases</article-title>
          .
          <source>In DBLP'98</source>
          , pages
          <fpage>319</fpage>
          -
          <lpage>335</lpage>
          . Springer-Verlag,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>S.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          and
          <string-name>
            <given-names>U.</given-names>
            <surname>Dayal</surname>
          </string-name>
          .
          <article-title>An Overview of Data Warehousing and OLAP Technology</article-title>
          .
          <source>SIGMOD Record</source>
          ,
          <volume>26</volume>
          (
          <issue>1</issue>
          ):
          <fpage>65</fpage>
          -
          <lpage>74</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>M.</given-names>
            <surname>Gelfond</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Lifschitz</surname>
          </string-name>
          .
          <article-title>The Stable Model Semantics for Logic Programming</article-title>
          .
          <source>In ICLP'88</source>
          , pages
          <fpage>1070</fpage>
          -
          <lpage>1080</lpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Greco</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Sacc`a, and</article-title>
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          .
          <article-title>Programming with Nondeterminism in Deductive Databases</article-title>
          . AMAI,
          <volume>19</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>97</fpage>
          -
          <lpage>125</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>C. Hurtado</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Mendelzon</surname>
          </string-name>
          .
          <article-title>Capturing Summarizability with Integrity Constraints in OLAP</article-title>
          . TODS,
          <volume>30</volume>
          (
          <issue>3</issue>
          ):
          <fpage>854</fpage>
          -
          <lpage>886</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>C.</given-names>
            <surname>Hurtado</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Mendelzon</surname>
          </string-name>
          .
          <article-title>Reasoning about Summarizability in Heterogeneous Multidimensional Schemas</article-title>
          .
          <source>In ICDT'01</source>
          , pages
          <fpage>375</fpage>
          -
          <lpage>389</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>C. Hurtado</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mendelzon</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaisman</surname>
          </string-name>
          .
          <article-title>Maintaining Data Cubes under Dimension Updates</article-title>
          .
          <source>In ICDE'99</source>
          , pages
          <fpage>346</fpage>
          -
          <lpage>355</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>C. Hurtado</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mendelzon</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaisman. Updating OLAP</surname>
          </string-name>
          <article-title>Dimensions</article-title>
          .
          <source>In DOLAP'99</source>
          , pages
          <fpage>60</fpage>
          -
          <lpage>66</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lakshmanan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          .
          <article-title>What can Hierarchies do for Data Warehouses</article-title>
          ? In VLDB'
          <volume>99</volume>
          , pages
          <fpage>530</fpage>
          -
          <lpage>541</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>R.</given-names>
            <surname>Kimball</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ross</surname>
          </string-name>
          .
          <article-title>The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling</article-title>
          . John Wiley &amp; Sons, Inc.,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>H.</given-names>
            <surname>Lenz</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Shoshani</surname>
          </string-name>
          .
          <article-title>Summarizability in OLAP and Statistical Data Bases</article-title>
          .
          <source>In SSDBM'97</source>
          , pages
          <fpage>132</fpage>
          -
          <lpage>143</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>N.</given-names>
            <surname>Leone</surname>
          </string-name>
          , G. Pfeifer,
          <string-name>
            <given-names>W.</given-names>
            <surname>Faber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Eiter</surname>
          </string-name>
          , G. Gottlob,
          <string-name>
            <given-names>C.</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mateis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Perri</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Scarcello</surname>
          </string-name>
          .
          <article-title>The DLV System for Knowledge Representation and Reasoning</article-title>
          . TOCL,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <fpage>499</fpage>
          -
          <lpage>562</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>C. Letz</surname>
            , E. Henn, and
            <given-names>G. Vossen.</given-names>
          </string-name>
          <article-title>Consistency in Data Warehouse Dimensions</article-title>
          .
          <source>In IDEAS'02</source>
          , pages
          <fpage>224</fpage>
          -
          <lpage>232</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>A.</given-names>
            <surname>Mendelzon</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaisman</surname>
          </string-name>
          .
          <article-title>Temporal Queries in OLAP</article-title>
          .
          <source>In VLDB'00</source>
          , pages
          <fpage>242</fpage>
          -
          <lpage>253</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>T.</given-names>
            <surname>Pedersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jensen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Dyreson. Extending Practical</surname>
          </string-name>
          Pre-Aggregation in
          <article-title>On-Line Analytical Processing</article-title>
          .
          <source>In VLDB'99</source>
          , pages
          <fpage>663</fpage>
          -
          <lpage>674</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>T.</given-names>
            <surname>Pedersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jensen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Dyreson</surname>
          </string-name>
          .
          <article-title>A Foundation for Capturing and Querying Complex Multidimensional Data</article-title>
          . Inf. Syst.,
          <volume>26</volume>
          (
          <issue>5</issue>
          ):
          <fpage>383</fpage>
          -
          <lpage>423</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>M.</given-names>
            <surname>Rafanelli</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Shoshani</surname>
          </string-name>
          .
          <article-title>STORM: a Statistical Object Representation Model</article-title>
          .
          <source>In SSDBM'90</source>
          , pages
          <fpage>14</fpage>
          -
          <lpage>29</lpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>T. Syrj</surname>
          </string-name>
          <article-title>¨anen and I. Niemel¨a. LPNMR'01, chapter The Smodels System</article-title>
          , pages
          <fpage>434</fpage>
          -
          <lpage>438</lpage>
          . Springer LNCS 2173.
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>