Partial enumeration of minimal transversals of a
                   hypergraph

               Lhouari Nourine, Alain Quilliot and Hélène Toussaint

        Clermont-Université, Université Blaise Pascal, LIMOS, CNRS, France
                   {nourine, quilliot, helene.toussaint}@isima.fr


        Abstract. In this paper, we propose the first approach to deal with
        enumeration problems with huge number of solutions, when interesting-
        ness measures are not known. The idea developed in the following is to
        partially enumerate the solutions, i.e. to enumerate only a representative
        sample of the set of all solutions. Clearly many works are done in data
        sampling, where a data set is given and the objective is to compute a
        representative sample. But, to our knowledge, we are the first to deal
        with sampling when data is given implicitly, i.e. data is obtained using
        an algorithm. The experiments show that the proposed approach gives
        good results according to several criteria (size, frequency, lexicographical
        order).


 1    Introduction
 Most of problems in data mining ask for the enumeration of all solutions that
 satisfy some given property [1, 10]. This is a natural process in many applications,
 e.g. marked basket analysis [1] and biology [2] where experts have to choose
 between those solutions. An enumeration problem asks to design an output-
 polynomial algorithm for listing without duplications the set of all solutions. An
 output-polynomial algorithm is an algorithm whose running time is bounded by
 a polynomial depending on the sum of the sizes of the input and output.
     There are several approachs to enumerate all solutions to a given enumeration
 problem. Johnson et al. [13] have given a polynomial-time algorithm to enumer-
 ate all maximal cliques or stables of a given graph. Fredman and Khachiyan
 [7] have proposed a quasi-polynomial-time algorithm to enumerate all minimal
 transversal of an hypergraph. For enumeration problems the size of the output
 may be exponential in the size of the input, which in general is different from
 optimization problems where the size of the output is polynomially related to
 the size of the input. The drawback of the enumeration algorithms is that the
 number of solutions may be exponential in the size of the input, which is infea-
 sible in practice. In data mining, some interestingness measures or constraints
 are used to bound the size of the output, e.g. these measures can be explicitly
 specified by the user [8]. In operation research, we use quality criteria in order
 to consider appropriate decision [21].
     In this paper, we deal with enumeration problems with huge number of so-
 lutions, when interestingness measures are not known. This case happens when

c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA
 2015, pp. 123–134, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS
 laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and
 academic purposes.
124     Lhouari Nourine, Alain Quilliot and Hélène Toussaint


the expert has no idea about data and knowledges that are looking for. The
objective is to enumerate only a representative sample of the set of all solutions.
Clearly many works are done in data sampling, where are given a data set and
the objective is to compute a representative sample. To our knowledges, this idea
is new for sampling when data is given implicitly, i.e. data is obtained using an
algorithm. One can use the naive approach which first enumerates all the solu-
tions and then applies sampling methods, which is not possible for huge number
of solutions.
    To evaluate our approach, we consider a challenging enumeration problem,
which is related to mining maximal frequent item sets [1, 10], dualization of
monotone boolean functions [5] and other problems [10]. We applied our ap-
proach to several instances of transversal hypergraphs [17, 20], and obtain good
results.


2     Related works

Golovach et al. [9] have proposed an algorithm to enumerate all minimal domi-
nating sets of a graph. First they generate maximal independent sets and then
apply a flipping operation to them to generate new minimal dominating sets,
where the enumeration of maximal independent sets is polynomial. Clearly, a
relaxation of the flipping operation leads to a partial enumeration since the
number of minimal dominating sets can be exponential in the number of min-
imal independent sets, e.g. cobipartie graphs. Jelassi et al.[12] and Raynaud et
al.[19] have considered some kind of redundancy in hypergraphs like twin ele-
ments to obtain a concise representation. Their ideas can avoid the enumeration
of similar minimal transversals of an hypergraph.


3     Transversal hypergraph enumeration

A hypergraph H = (V, E) consists of a finite collection E of sets over a finite set
V . The elements of E are called hyperedges, or simply edges. An hypergraph is
said simple if for any E, E 0 ∈ E E 6⊆ E 0 . A transversal (or hitting set) of H is
a set T ⊆ V that intersects every edge of E. A vertex x in a transversal T is
said to be redundant if T \ {x} is still a transversal. A transversal is minimal if
it does not contain any redundant vertex. The set T of all minimal transversal
of H = (V, E) constitutes together with V also a hypergraph T r(H)    P = (V, T ),
which is called the transversal hypergraph of H. We denote by k = E∈E | E |.

Example 1. Consider the hypergraph H = (V, E): V = {1, 2, 3, 4, 5} and E =
{E1 , E2 , E3 } with E1 = {1, 3, 4}, E2 = {1, 3, 5} and E3 = {1, 2}. The set of all
minimal transversals is T = {{1}, {2, 3}, {2, 4, 5}} and k = 3 + 3 + 2 = 8

    Given a simple hypergraph H = (V, E), the transversal hypergraph enumera-
tion problem concerns the enumeration without repetitions of T r(H). This prob-
lem has been intensively studied due to its link with several problems isuch as
               Partial enumeration of minimal transversals of a hypergraph    125


data mining and learning [3, 4, 11, 15, 18]. Recently, Kante et al.[14] have shown
that the enumeration of all minimal transversals of an hypergraph is polynomi-
ally equivalent to the enumeration of all minimal domination sets of a graph.
It is known that the corresponding decision problem belongs to coNP, but still
open whether there exists an output-polynomial-time algorithm.


4     Partial transversal hypergraph enumeration
We introduce the partial (or incomplete) search algorithm for enumerating min-
imal transversals of an hypergraph H. The search space is the set of all transver-
sals which is very large. The strategy is divided into two steps:
 – The initialization procedure considers a transversal T of H and then ap-
   plies a reduction (at random) algorithm to T in order to obtain a minimal
   transversal Tm of H. This step is detailed in section 4.1.
 – The local search algorithm considers a minimal transversal Tm and then
   applies local changes to Tm in which we add and delete vertices according
   to some ordering of the vertices. This step is detailed in section 4.2
   These steps are repeated for at most k transversals depending on the input
hypergraph H. Figure 1 illustrates the proposed approach.


          Fig. 1. Approach to partial enumeration of minimal transversals


4.1   Initialization
Let H(V, E) be the input hypergraph, E ∈ E and x ∈ E. The initialization
step starts with the transversal (V \ E) ∪ {x} and then applies a reduction
126       Lhouari Nourine, Alain Quilliot and Hélène Toussaint


algorithm to obtain a minimal transversal. The following property shows that
the set (V \ E) ∪ {x} is a transversal and any minimal transversal contained in
(V \ E) ∪ {x} contains the vertex x.

Property 1. Let H = (V, E) be a simple hypergraph, E ∈ E and x ∈ E. Then
(V \ E) ∪ {x} a transversal of H. Moreover, any minimal transversal T ⊆ (V \
E) ∪ {x} will contain x.

Proof. Let H = (V, E) be a simple hypergraph and E 0 ∈ E with E 6= E 0 . Since
H is simple then there exists at least one element y ∈ E 0 such that y 6∈ E. So
y ∈ (V \ E) and thus E 0 ∩ (V \ E) 6= ∅. We conclude that (V \ E) ∪ {x} is a
transversal since x ∈ E.
   Now let T ⊆ (V \E)∪{x} be a minimal transversal. Then E ∩(V \E)∪{x} =
{x} and thus x must belong to T otherwise T does not intersect E. 
    According to property 1, we can apply the initialization procedure to any
pair (x, E) where E ∈ E and x ∈ E. In other words, the initialization is applied
to at most k transversals of H as shown in Algorithm 1.
 Algorithm 1: Initialization
  Input : A hypergraph H(V, E) and σ an ordering of V
  Output: A sample of minimal transversals
  begin
         ST RAN S = ∅;
         for E ∈ E do
            for x ∈ E do
               T = (V \ E) ∪ {x};{Initial transversal}
               Tm = Reduce(T, σ);
               ST RAN S = ST RAN S ∪ {Tm };
         return(ST RAN S);

    Now we describe the reduction process, which takes a transversal T and
a random ordering σ of V and returns a minimal transversal Tm . Indeed, we
delete vertices from T according to the ordering σ until we obtain a minimal
transversal.

 Algorithm 2: Reduce(T, σ)
  Input : A transversal T and an ordering σ = σ1 ...σ|V | of the vertices of
          H.
  Output: A minimal transversal
      for i = 1 to |V | do
         if T \ {σi } is a transversal then
             T ← T \ {σi } ;
      Return(T );
               Partial enumeration of minimal transversals of a hypergraph   127


Example 2 (continued). Suppose we are given the hypergraph in example 1 and
σ = (1, 2, 3, 4, 5) for the input to Algorithm 1. First, it takes the hyperedge
E = {1, 3, 4} and for x = 1 we obtain the minimal transversal {1}, for x = 3
we obtain {2, 3} and for x = 4 we obtain {2, 4, 5}. Then the algorithm con-
tinue with the hyper edges {1, 3, 5} and {1, 2}. Finally the algorithm returns
ST RAN S = {{1}, {2, 3}, {2, 4, 5}}, i.e. the other iterations do not add new
minimal transversals.
Theorem 1. Algorithm 1 computes at most k minimal transversals of an input
hypergraph H = (V, E).
Proof. The initialization procedure considers at most k minimal transversals of
H. Since a minimal transversal can be obtained several times, the result follows.

   The following proposition shows that any minimal transversal of the hyper-
graph H = (V, E) can be obtained using the initialization procedure. Indeed, the
choice of the ordering σ is important in the proposed strategy.

Proposition 1. Let H = (V, E) be an hypergraph and T be a minimal transver-
sal of H. Then, there exists a total order σ, E ∈ E and x ∈ E such that
T = Reduce((V \ E) ∪ {x}, σ).
Proof. Let T be a minimal transversal of H = (V, E). Then there exists at least
one hyperedge E ∈ E such that T ∩ E = {x}, x ∈ V , otherwise T is not minimal.
Thus T ⊆ (V \ E) ∪ {x}. Now, if we take the elements that are not in T before
the elements in T in σ, the algorithm Reduce((V \ E) ∪ {x}, σ) returns T . 
   The initialization procedure guaranties that for any vertex x ∈ V at least one
minimal transversal containing x is listed. The experiments in section 5, shows
the sample of minimal transversals obtained by the initialization procedure is a
representative sample of the set of all minimal transversals.

4.2   Local search algorithms
The local search algorithm takes each minimal transversal found in the initial-
ization step and searches for new minimal transversals to improve the initial
solution. The search of neighbors is based on vertices orderings.
    Let H = (V, E) be an hypergraph and x ∈ V . We define the frequency of
x as the number of minimal transversals of H that contain x. The algorithm
takes a minimal transversal T and a bound N max which bounds the number of
iterations and the number of neighboors generated by T . Each iteration of the
while loop, starts with a minimal transversal T and computes two orderings as
follows:
 – σ c is an ordering according to the increasing order of frequency of vertices
   in V \ T in minimal transversals already obtained by the current call. This
   ordering has a better coverage of the solution set, i.e. by keeping the rarest
   vertices in the transversals.
128       Lhouari Nourine, Alain Quilliot and Hélène Toussaint


 – σ is a random ordering of the vertices in T .

    Algorithm 3: Neighboor(T, N max)
     Input : A minimal transversal T of H = (V, E) and an integer N max
     Output: A set of minimal transversals
      Q = T;
      i = 1;
      while i ≤ N max do
          σ c ← the set V \ T sorted in increasing order of frequency of vertices
          in minimal transversals in Q;
          σ ← sort T at random;
          Add elements the elements in σ c to T until a vertex x ∈ T \ σ c
          becomes redundant in T ;
          T =Reduce(T, σ);
          Q = Q ∪ {T };
          i = i + 1;
      return(Q);

      Now we give the global procedure of the proposed approach.
    Algorithm 4: Global procedure for partial enumeration of minimal
    transversals
      Input : An hypergraph H = (V, E) and an integer N max
      Output: A sample of minimal transversals of H
      σ =choose a random ordering of V ;
      ST RAN = Q = Initialization(H, σ);
      while Q 6= ∅ do
         T = choose a minimal transversal T in Q;
         ST RAN S = ST RAN S ∪ N eighboor(T, N max);
      Return(ST RAN S);

   In the following, we describe experiments to evaluate the results that have
been obtained.


5      Experimentation
The purpose of the experiments is to see if the proposed approach allow us to
generate a representative set of solutions. For this reason, we have conducted
the experiments on two different classes of hypergraphs (see [20]) for which the
number of minimal transversals is huge compared to the size of the input. We use
Uno’s Algorithm SHD (Sparsity-based Hypergraph Dualization, ver. 3.1) [20],
to enumerate all minimal transversals. The experiments are done using linux
CentOS cpu Intel Xeon 3.6 GHz and C++ language.
   In the following, we denote Tpartial the set of minimal transversals generated
by Algorithm 4, and Texact the set of all minimal transversals. First, we analyze
                 Partial enumeration of minimal transversals of a hypergraph    129

                 T
the percentage Tpartial
                 exact
                        and then we evaluate the representativeness of the sample
Tpartial .


5.1      The size of Tpartial

We will distinguish between minimal transversals that are obtained using Algo-
rithm 1 (or the initialization procedure) and those that are generated using the
local search. For these tests we set the maximal number of neighboors N max to
3.
Tables 1 and 2 show the results for the two classes of hypergraph instances,
namely ”lose” and random ”p8”.

      The first three columns have the following meaning:

 – instance: instance name.
 – instance size: the size of the instance (number of edges × number of vertices).
 – total # of transv.: the exact number of minimal transversals | Texact |.

   The second (resp. last) three columns give the results for the initialization
procedure (resp. Global algorithm):

 – # transv. found : the number of minimal transversals found.
 – % transv. found : the percentage of minimal transversals found.
 – cpu (s): the run time in seconds


                       Table 1. Results for all ”lose” instances


    According to these tests, we can see that the percentage of minimal transver-
sals found using the initialization procedure is very low, but it decreases as far
as the size of Texact increases. Clearly, this percentage is strongly related to the
input. Indeed, the number k (entropy) of the hypergraph increases according to
the size of the input hypergraph. We can also see that the local search increases
significantly the number of solutions found by a factor 2 to 2.5 approximatively.
130     Lhouari Nourine, Alain Quilliot and Hélène Toussaint


                       Table 2. Results for all ”p8 ” instances


But it remains coherent with the chosen value N max = 3. It argues that the lo-
cal search finds other transversals that are not found either by the initialization
procedure nor previous local search. Notice that the parameter N max can be
increased whenever the size of Tpartial is not sufficient.


5.2   The representativeness of Tpartial

To evaluate the representativeness of Tpartial , we consider three criteria:

 – Size of the minimal transversals in Tpartial .
 – Frequency of vertices in Tpartial .
 – Lexicographical rank of the minimal transversals in Tpartial .

    Each criteria is illustrated using a bar graph for two instances from different
classes. The bar graphs in figures 2 and 3 are surprising. Indeed the bar graphs
vary nearly in the same manner with respect to the initialization and the local
search algorithm for all the considered criteria.

    Figures 2(a) 3(a) show that the percentage of minimal transversals of each
size (found either by the initialization procedure and local search) fits the per-
centage of all minimal transversals that are found.
    Figures 2(b) and 3(b) show that the same analysis holds when ordering min-
imal transversals lexicographically (e.g. based on a total ordering of vertices).
Clearly, the lexicographical rank of a minimal transversal belongs to the interval
[1..2|V | ]. For visualization aspect, we divide this interval into |V | subintervals,
where the subinterval i contains the number of minimal transversals with a rank
        |V |     |V |
r ∈ [ 2|V | i; 2|V | (i + 1)[, (i = 0, . . . , |V | − 1).
    Figures 2(c) and 3(c) confirm this behavior when considering frequency of
vertices. Indeed, frequency of vertices in minimal transversals in Tpartial is the
same when considering all minimal transversals, i.e.. the set Texact .
               Partial enumeration of minimal transversals of a hypergraph    131


Fig. 2. The bar graph for ”lose100”: a) number of minimal transversals per size b)
number of minimal transversals according the lexicographical rank; c) Frequency of
vertices in minimal transversals.


Fig. 3. The bar graph for ”p8 200”; a) number of minimal transversals per size; b)
number of minimal transversals according the lexicographical rank; c) Frequency of
vertices in minimal transversals.
132     Lhouari Nourine, Alain Quilliot and Hélène Toussaint


Fig. 4. Visualizing the solutions space of ”lose100”. The abscissa is given by the size
of the transversal (transversals of the same size are spread out using a norm) and the
ordinate corresponds to the frequency of its vertices.


Fig. 5. Visualizing the solutions space of ”p8 200”. The abscissa is given by the size
of the transversal (transversals of the same size are spread out using a norm) and the
ordinate corresponds to the frequency of its vertices.
                Partial enumeration of minimal transversals of a hypergraph       133


    Figures 4 and 5 show that the set Tpartial is also representative even when
considering minimal transversals with the same size. Indeed, minimal transver-
sals having the same size are spread out using a norm. We notice that the points
corresponding to minimal transversals in Tpartial are scattered in the image.
    This experiment allows us to conclude that the sample Tpartial produced by
Algorithm 4 is representative relatively to the criteria under consideration. Other
results can be found in http://www2.isima.fr/˜toussain/


6   Conclusion and discussions

We are convinced that the initialization procedure is the most important in this
approach. Indeed, the set of minimal transversals obtained using this procedure is
a representative sample, since it garantee that for any vertex of the hypergraph
there is at least one minimal transversal which contains it (see property 1).
Moreover the local search procedure can be used to increase the number of
solutions, and as we have seen in the experiments, it keeps the same properties
as the initialization procedure.
    We hope that this approach improves enumeration in big data and will be
of interests to the readers to investigate heuristics methods [6] for enumeration
problems.
    This paper opens new challenges related to partial and approximate enu-
meration problems. For example, given an hypergraph H = (V, E), is there an
algorithm that for any given ε, it enumerates a set Tpartial ⊆ T r(H) such that
(1 − ε)|T r(H)| ≤ |Tpartial | ≤ |T r(H)|? We also require that the algorithm is
output-polynomial in the sizes of H, Tpartial and 1ε . To our knowledge, there is
no work on approximate algorithms for enumeration problems, but results on
approximate counting problems may be applied [16].

   Acknowledgment: This work has been funded by the french national re-
search agency (ANR DAG project, 2009-2013) and CNRS (Mastodons PETASKY
project, 2012-2015).


References
 1. R. Agrawal, T. Imielinski, and A. Swami. Mining associations between sets of items
    in massive databases. In ACM SIGMOD 1993, Washington D.C., pages 207–216,
    1993.
 2. J. Y. Chen and S. Lonardi. Biological Data Mining. Chapman and Hall/CRC,
    2009.
 3. T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and
    related problems. SIAM J. Comput., 24(6):1278–1304, 1995.
 4. T. Eiter, G. Gottlob, and K. Makino. New results on monotone dualization and
    generating hypergraph transversals. SIAM J. Comput., 32(2):514–537, 2003.
 5. T. Eiter, K. Makino, and G. Gottlob. Computational aspects of monotone dual-
    ization: A brief survey. Discrete Applied Mathematics, 156(11):2035–2049, 2008.
134     Lhouari Nourine, Alain Quilliot and Hélène Toussaint


 6. T. Feo and M. Resende. Greedy randomized adaptive search procedures. Journal
    of Global Optimization, 6(2):109–133, 1995.
 7. M. Fredman and L. Khachiyan. On the complexity of dualization of monotone
    disjunctive normal forms. Journal of Algorithms, 21:618–628, 1996.
 8. L. Geng and H. J. Hamilton. Interestingness measures for data mining: A survey.
    ACM Comput. Surv., 38(3), Sept. 2006.
 9. P. Golovach, P. Heggernes, D. Kratsch, and Y. Villanger. An incremental polyno-
    mial time algorithm to enumerate all minimal edge dominating sets. Algorithmica,
    72(3):836–859, 2015.
10. D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R. S. Sharm.
    Discovering all most specific sentences. ACM Trans. Database Syst., 28(2):140–174,
    2003.
11. D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen. Data mining, hyper-
    graph transversals, and machine learning. In PODS, pages 209–216, 1997.
12. M. Jelassi, C. Largeron, and S. Ben Yahia. Concise representation of hypergraph
    minimal transversals: Approach and application on the dependency inference prob-
    lem. In Research Challenges in Information Science (RCIS), 2015 IEEE 9th In-
    ternational Conference on, pages 434–444, May 2015.
13. D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis. On generating all maxi-
    mal independent sets. Inf. Process. Lett., 27(3):119–123, 1988.
14. M. M. Kanté, V. Limouzy, A. Mary, and L. Nourine. On the enumeration of mini-
    mal dominating sets and related notions. SIAM Journal on Discrete Mathematics,
    28(4):1916–1929, 2014.
15. L. Khachiyan, E. Boros, K. M. Elbassioni, and V. Gurvich. An efficient implemen-
    tation of a quasi-polynomial algorithm for generating hypergraph transversals and
    its application in joint generation. Discrete Applied Mathematics, 154(16):2350–
    2372, 2006.
16. J. Liu and P. Lu. Fptas for counting monotone cnf. In Proceedings of the Twenty-
    Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’15, pages
    1531–1548. SIAM, 2015.
17. K. Murakami and T. Uno. Efficient algorithms for dualizing large-scale hyper-
    graphs. Discrete Applied Mathematics, 170:83–94, 2014.
18. L. Nourine and J.-M. Petit. Extending set-based dualization: Application to pat-
    tern mining. In ECAI, pages IOS Press ed, Montpellier, France, 2012.
19. O. Raynaud, R. Medina, and C. Noyer. Twin vertices in hypergraphs. Electronic
    Notes in Discrete Mathematics, 27:87–89, 2006.
20. T. Uno. http://research.nii.ac.jp/ uno/dualization.html.
21. C. A. Weber, J. R. Current, and W. Benton. Vendor selection criteria and methods.
    European Journal of Operational Research, 50(1):2 – 18, 1991.