Introduction

Non-monotone Dualization via Monotone Dualization

Yoshitaka Yamamoto

Koji Iwanuma

Katsumi Inoue

0 0 National Institute of Informatics 2-1-2 Hitotsubashi , Chiyoda-ku, Tokyo 101-8430 , Japan 1 University of Yamanashi 4-3-11 Takeda , Kofu-shi, Yamanashi 400-8511 , Japan

The non-monotone dualization (NMD) is one of the most essential tasks required for finding hypotheses in various ILP settings, like learning from entailment [1, 2] and learning from interpretations [3]. Its task is to generate an irredundant prime CNF formula ψ of the dual f d where f is a general Boolean function represented by CNF [4]. The DNF formula φ of f d is easily obtained by De Morgan's laws interchanging the connectives of the CNF formula. Hence, the main task of NMD is to translate the DNF φ to an equivalent CNF ψ. This translation is used to find an alternative representation of the input form. For instance, given a set of models, it is desirable to seek underlying structure behind the models.

Introduction

Example 1. Suppose that a set of models M are given as follows: By treating M as the DNF formula, we translate it into CNF with NMD: In fact, H is regarded as a hypothesis in learning from interpretations [ 3 ]. In contrast, by converting a given CNF formula into DNF, we obtain the models satisfying the CNF formula. This fact shows an easy reduction from SAT problems to NMD, and then gives NP-hardness of it [ 5 ]. In this context, the research has been much focused on some restricted classes of NMD.

Monotone dualization (MD) is one such class that deals with monotone Boolean functions for which CNF formulas are negation-free [ 6, 7 ]. MD is one of the few problems whose tractability status with respect to polynomial-total time is still unknown. Besides, it is known that MD has many equivalent problems in discrete mathematics, such as the minimal hitting set enumeration and the transversal hypergraph computation [ 6 ]. Thus, this class has received much attention that yields remarkable algorithms: in terms of complexity, Fredman and Khachiyan [ 8 ] show that MD is solvable in a quasi-polynomial-total time (i.e., (n + m)O(log(n+m)) where n and m denote the input and output size, respectively). Uno [ 9 ] shows a practically fast algorithm whose average computation time is experimentally O(n) per output, for randomly generated instances.

This paper aims at clarifying whether or not NMD can be solved using these techniques of MD, and if it can be then how it is realized. In general, it is not straightforward to use them because of the following two problems in NMD: – NMD has to treat redundant clauses like resolvents and tautologies.

Example 2. Let a CNF formula φ be (x1 ∨ x2) ∧ (x2 ∨ x3). If we treat negated variables as regular variables, we can apply MD to φ and obtain the CNF formula ψ = (x1 ∨ x2) ∧ (x1 ∨ x3) ∧ (x2 ∨ x2) ∧ (x2 ∨ x3). However, ψ contains the tautology x2 ∨ x2 and the resolvent x1 ∨ x3 of x1 ∨ x2 and x2 ∨ x3. – Unlike MD, the output of NMD is not necessarily unique. It is known that the output of MD uniquely corresponds to the set of all the prime implicates of f d [ 10 ]. In contrast, some prime implicates can be redundant in NMD problems. Thus, the output of NMD corresponds to an irredundant subset of the prime implicates. However, such a subset is not unique in general. For the first problem, this paper shows a technique to prohibit any resolvents from being generated in MD. This is done by simply adding some tautologies to the input CNF formula φ in advance. We denote by φt and ψt the extended input formula and its output by MD, respectively. Then, ψt contains no resolvents. Example 3. Recall Example 2. We consider the CNF formula φt = (x1 ∨ x2) ∧ (x2∨x3)∧(x2∨x2) obtained by adding one tautology x2∨x2. Then, MD generates the CNF formula ψt = (x1 ∨ x2) ∧ (x2 ∨ x2) ∧ (x2 ∨ x3). Indeed, ψt does not contain the resolvent x1 ∨ x3, unlike ψ in Example 2.

By removing all tautologies from ψt, we obtain an irredundant CNF formula, denoted by ψir. Note that ψir is (x1 ∨ x2) ∧ (x2 ∨ x3) in Example 3.

We next address the second problem using a good property of ψir: a subset of the prime implicates is irredundant (i.e., an output of NMD) if and only if it subsumes ψir but never subsumes ψir if any clause is removed from it. This particular relation is called minimal subsumption. We then show that every subset satisfying the minimal subsumption is generated by MD. In this way, we reduce an original NMD problem into two MD problems: the one for computing ψir, and the other for generating a subset corresponding to an output of NMD. Due to space limitations, full proofs are omitted. 2 2.1

Background Preliminaries

A Boolean function is a mapping f : {0, 1}n → {0, 1}. We write g |= f if f and g satisfy g(v) ≤ f (v) for all v ∈ {0, 1}n. g is (logically) equivalent to f , denoted by g ≡ f , if g |= f and f |= g. A function f is monotone if v ≤ w implies f (v) ≤ f (w) for all v, w ∈ {0, 1}n; otherwise it is non-monotone. Boolean variables x1, x2, . . . , xn and their negations x1, x2, . . . , xn are called literals. The dual of a function f , denoted by f d, is defined as f (x) where f and x is the negation of f and x, respectively.

A clause (resp. term) is a disjunction (resp. conjunction) of literals which is often identified with the set of its literals. It is known that a clause is tautology if it contains complementary literals. A clause C is an implicate of a function f if f |= C. An implicate C is prime if there is no implicate C′ such that C′ ⊂ C.

A conjunctive normal form (CNF) (resp. disjunctive normal form (DNF)) formula is a conjunction of clauses (resp. disjunction of terms) which is often identified with the set of clauses in it. A CNF formula φ is irredundant if φ ̸≡ φ − {C} for every clause C in φ; otherwise it is redundant. φ is prime if every clause in φ is a prime implicate of φ; otherwise it is non-prime. Let φ1 and φ2 be two CNF formulas. φ1 subsumes φ2, denoted by φ1 ≽ φ2, if there is a clause C ∈ φ1 such that C ⊆ D for every clause D ∈ φ2. In turn, φ1 minimally subsumes φ2, denoted by φ1 ≽♮ φ2, if φ1 subsumes φ2 but φ1 − {C} does not subsume φ2 for every clause C ∈ φ1.

Let φ be a CNF formula. τ (φ) denotes the CNF formula obtained by removing all tautologies from φ. We say φ is tautology-free if φ = τ (φ). Now, we formally define the dualization problem as follows.

Definition 1 (Dualization problem). Input: A tautology-free CNF formula φ

Output: An irredundant prime CNF formula ψ such that

ψ is logically equivalent to φd We call it monotone dualization (MD) if φ is negation-free; otherwise it is called non-monotone dualization (NMD). As well known [ 6 ], the task of MD is equivalent to enumerating the minimal hitting sets (MHSs) of a family of sets. 2.2

MD as MHS enumeration

Definition 2 ((Minimal) Hitting set). Let Π be a finite set and F be a subset family of Π. A finite set E is a hitting set of F if for every F ∈ F , E ∩ F ̸= ∅. A finite set E is a minimal hitting set (MHS) of F if E satisfies that 1. E is a hitting set of F ; 2. For every subset E′ ⊆ E, if E′ is a hitting set of F , then E′ = E. Note that any CNF formula φ can be identified with the family of clauses in φ. Now, we consider the CNF formula, denoted by M (φ), which is the conjunction of all the MHSs of the family φ. Then, the following holds.

Theorem 1. [ 10 ] Let φ be a tautology-free CNF formula. A clause C is in τ (M (φ)) if and only if C is a non-tautological prime implicate of φd. Hence, the output of MD for φ uniquely corresponds to τ (M (φ)): the set of all MHSs of the family φ by Theorem 1. 2.3

NMD as MHS enumeration

Our motivation is to clarify whether or not any NMD problem can be reduced into some MD problems. While MD is done by the state-of-the-art algorithms to compute MHSs [ 9 ], it is not straightforward to use them for NMD. Here, we review the two problems explained before in the context of MHS enumeration. 1. Appearance of redundant clauses: τ (M (φ)) is prime but not irredundant. Example 4. Recall the input CNF formula φ2 = {{x1, x2}, {x2, x3}} of Example 2. Then, τ (M (φ2)) = {{x1, x2}, {x1, x3}, {x2, x3}}. Indeed, this contains the redundant clause {x1, x3}. 2. Non-uniqueness of NMD solutions: there are many subsets of τ (M (φ)) that are prime and irredundant.

Example 5. Let the input CNF formula φ be {{x1, x2, x3}, {x1, x2, x3}}. τ (M (φ)) consists of the non-tautological prime implicates as follows: τ (M (φ)) = {{x1, x2}, {x1, x3}, {x2, x3}, {x1, x3}, {x1, x2}, {x3, x2}}. Then, we may notice that there are at least two irredundant subsets of τ (M (φ)): ψ1 = {{x1, x2}, {x1, x3}, {x2, x3}}. ψ2 = {{x1, x3}, {x1, x2}, {x3, x2}}. Indeed, ψ1 is equivalent to ψ2, and thus both are equivalent to τ (M (φ)). To address the two problems, this paper focuses on the following CNF formula. Definition 3 (Bottom formula). Let φ be a tautology-free CNF formula and T aut(φ) be { x ∨ x | φ contains both x and x }. Then, the bottom formula wrt φ (in short, bottom formula) is defined as the CNF formula τ (M (φ ∪ T aut(φ))). 3

Properties of bottom formulas Now, we show two properties of bottom formulas.

Theorem 2. Let φ be a tautology-free CNF formula. Then, the bottom formula wrt φ is irredundant.

Example 6. Recall φ2 in Example 4. Then, T aut(φ2) is {x2∨x2}, and the bottom formula wrt φ2 is {{x1, x2}, {x2, x3}}. Indeed, it is irredundant, since it does not contain the resolvent {x1, x3}, unlike Example 4.

While any bottom formula is irredundant, it is not necessarily prime. Example 7. Recall the CNF formula φ in Example 5. Since T aut(φ) is the set {{x1, x1}, {x2, x2}, {x3, x3}}, the bottom formula is as follows: {{x1, x2, x3}, {x3, x2, x1}, {x3, x2, x1}, {x2, x3, x1}, {x2, x3, x1}, {x2, x3, x1}}. We write C1, C2, . . . , C6 for the above clauses in turn (i.e., C4 is {x2, x3, x1}). We then notice that the bottom formula is non-prime, because it contains a non-prime implicate C1 whose subset {x1, x2} is an implicate of φd. As shown in Example 7, the bottom formula itself is not necessarily an output of NMD. However, every NMD output is logically described with this formula. Theorem 3. Let φ be a tautology-free CNF formula. ψ is an output of NMD for φ iff ψ ⊆ τ (M (φ)) and ψ minimally subsumes the bottom formula wrt φ. Example 8. Recall Example 5 and Example 7. Fig. 1 describes the subsumption lattice bounded by two irredundant prime outputs ψ1 and ψ2 as well as the bottom formula {C1, C2, . . . , C6}. The solid (resp. dotted) lines show the subsumption relation between ψ1 (resp. ψ2) and the bottom formula. We then notice that both outputs ψ1 and ψ2 minimally subsume the bottom formula.

ψ1 {x1, x2} {x1, x3} {x2, x3} {x1, x3} {x1, x2} {x3, x2}

C3 C4 Bottom formula

C6 ψ2 C5 Theorem 3 shows that every NMD output ψ can be generated by selecting a subset ψ of τ (M (φ)) that minimally subsumes the bottom formula. Now, we show that the task of this selection is done by MD computation.

Let the bottom formula be {C1, C2, . . . , Cn}. We then denote by Si (1 ≤ i ≤ n) the set of clauses in τ (M (φ)) each of which is a subset of Ci. Fφ denotes the family of those sets {S1, S2, . . . , Sn}.

Theorem 4. Let φ be a tautology-free CNF formula. ψ is an output of NMD for φ if and only if ψ is an MHS of Fφ.

Example 9. Recall Example 8. We denote each clause of ψ1 and ψ2 in Fig. 4 by D1, . . . , D6, starting from left to right (i.e., D4 is {x1, x3}). Fφ is as follows: Fφ = {{D1, D4}, {D1, D6}, {D2, D6}, {D3, D4}, {D3, D5}, {D2, D5}}. By MHS computation, we have the five solutions, which contain {D1, D2, D3} and {D4, D5, D6} that correspond to ψ1 and ψ2, respectively. 5

Concluding remarks

This paper has presented a reduction technique from arbitrary NMD problem to two equivalent MD problems. Whereas algorithms and computation on MD has been extensively studied, it was not clarified whether or not NMD can be solved using the state-of-the-art MD computation. For this open problem, we give a solution how it is to be realized.

Our result can be used to investigate the complexity of NMD from the viewpoint of MD computation. For instance, the complexity of generating one NMD output can be described as follows: (n + t + x)O(log(n+t+x)) + (x + 1)O(log(x+1)), (1) where n, t and x are the sizes of the input formula φ, the tautologies T aut(φ) and the bottom formula τ (M (φ ∪ T aut(φ))), respectively. This is simply derived from the result on the complexity of MD computation [ 8 ]. Note that the righthand term in Formula (1) comes from the complexity of the incremental MHS generation problem [ 11 ]. In contrast, the complexity of MD for computing the prime implicates of φ is described as (n + m)O(log(n+m)) where m is the size of τ (M (φ)). Hence, the difference of their complexities lies in how big the blow-up in size can be when the bottom formula is derived. In other words, our proposal is not a polynomial reduction to MD, and does not establish NP-completeness of MD. However, if x is equal to m, like Example 4 and Example 7, there is no difference between NMD and MD in terms of the complexity. In this sense, it is an important future work to characterize those subclasses of NMD with only polynomial increase in the problem size x.

1. Inoue , K. : Induction as consequence finding . Machine Learning 55(2) ( 2004 ) 109 - 135

2. Yamamoto , Y. , Inoue , K. , Iwanuma , K. : Inverse subsumption for complete explanatory induction . Machine Learning 86(1) ( 2011 ) 115 - 139

3. DeRaedt , L.: Logical settings for concept-learning . Artificial Intelligence 95 ( 1997 ) 187 - 201

4. Eiter , T. , Makino , K. : Abduction and the dualization problem . In: Proceedings of the 6th Int. Conf. on Discovery Science. Volume 2843 of LNCS . ( 2003 ) 1 - 20

5. Eiter , T. , Ibaraki , T. , Makino , K. : Recognition and dualization of disguised bidual Horn functions . Information Processing Letters 82 ( 2002 ) 283 - 291

6. Eiter , T. , Makino , K. , Gottlob , G.: Computational aspects of monotone dualization: A brief survey . Discrete Applied Mathematics 156 ( 2008 ) 2035 - 2049

7. Hagen , M. : Algorithmic and computational complexity issues of MONET . PhD thesis , Friedrich Schiller University Jena ( 2008 )

8. Fredman , M. , Khachiyan , L. : On the complexity of dualization of monotone disjunctive normal forms . Algorithms 21 ( 1996 ) 618 - 628

9. Uno , T.: A practical fast algorithm for enumerating minimal set coverings . In: Proceedings of the 83rd SIGAL Conf. of the Information Processing Society of Japan . ( 2002 ) 9 - 16

10. Rymon , R. : An SE-tree based prime implicant generation algorithm . Annals of Mathematics and Artificial Intelligence ( 1994 ) 351 - 366

11. Elbassioni , K.M. : On the complexity of monotone dualization and generating minimal hypergraph transversals . Discrete Applied Mathematics 156 ( 2008 ) 2109 - 2123