Preliminary Results on Mixed Integer
        Programming for Searching Maximum
      Quasi-Bicliques and Large Dense Biclusters

                               Dmitry Ignatov1,3
                   (0000-0002-6584-8534) dignatov@hse.ru,
                                Polina Ivanova1
                (0000-0001-6010-7991) ivanova.p.m@gmail.com,
                            Albina Zamaletdinova1
          (0000-0002-4116-9633) aazamaletdinova 1@edu.hse.ru, and
                               Oleg Prokopyev2,1
                    (0000-0003-2888-8630) droleg@pitt.edu
          1
           National Research University Higher School of Economics, Russia
                         2
                            University of Pittsburgh, USA
3
  St. Petersburg Department of Steklov Mathematical Institute of Russian Academy
                                of Sciences, Russia


        Abstract. This short paper is related to the problem of finding maxi-
        mum quasi-bicliques in a bipartite graph (bigraph). A quasi-biclique in
        a bigraph is its “almost” complete subgraph; here, we assume that the
        subgraph is a quasi-biclique if it lacks γ · 100% of the edges to become a
        biclique. The problem of finding the maximal quasi-biclique(s) consists
        of finding subset(s) of vertices of an input bigraph such that the induced
        by these subsets subgraph is a quasi-biclique and its size is maximal.
        A model based on mixed integer programming (MIP) to search for a
        quasi-biclique is proposed and tested. Another its variant is tested that
        simultaneously maximizes both the size of the quasi-biclique and its den-
        sity, using the least-square criterion similar to the one exploited by Tri-
        Box method for tricluster generation. Therefore, the output patterns
        can be called large dense biclusters as well.

        Keywords: quasi-biclique, maximal quasi-biclique, mixed integer pro-
        gramming, biclustering, triclustering


1     Introduction

There are many data sources that can be represented as bipartite graphs. For
example, social network data, where the underlying binary relation represents
interactions between people and communities, advertisement data with a set of
manufacturers and corresponding set of products etc.
    Copyright c 2019 for this paper by its author. Copying permitted for private and
    academic purposes.
2       D.I. Ignatov et al.

    In this study, we are interested in analysis of such bipartite data and search
for largest dense communities, where almost all elements are connected. A situ-
ation where all elements of a community are actually involved can be described
by a biclique (complete subgraph) of a bipartite graph.
   Unfortunately, the community completeness requirement excludes almost
complete communities frequently met in real-world data. Due to this reason
we allow some edges to be absent and introduce the concept of quasi-biclique.
In order to bound the size of quasi-biclique we can use the subgraph minimal
density or the maximum number of absent edges needed to complete a subgraph.
    The problem of searching for maximal quasi-clique as well as the problem
of searching for maximal clique is NP-hard. Many algorithms that solve the
problem are being developed. For instance, Pattillo et al. [1] offered an integer
programming model for searching for maximal quasi-clique but the case of bi-
partite graph with biclique was not studied in details. Note that quasi-bicliques
and dense biclusters can be considered as relaxed versions of formal concept
definition [2, 3].
    The aim of this paper is to propose a mixed integer programming models
for finding a maximum quasi-bicliques in a bipartite graph and compare those
models with existing ones.
   The remainder of the paper is organised as follows. Section 2 proposes two
MIP models for quasi-biclique search. In Section 3, we provide the preliminary
experimental results. Section 4 concludes the paper.


2    Quasi-biclique searching models


Definition 1. A γ-quasi-biclique in a bipartite graph G = (U, V, E) is its bipar-
tite induced subgraph G[V 0 , U 0 ] with U 0 ⊆ U , V 0 ⊆ V and its number of edges
|E 0 | ≥ γ|V 0 | · |U 0 |, where γ ∈ (0, 1].


   The problem of maximum quasi-biclique in a bipartite graph G(U, V, E)
with fixed γ: 0 < γ ≤ 1 is to find U 0 ⊆ U and V 0 ⊆ V such that vertex-induced
subgraph G[U 0 , V 0 ] is a γ-quasi-biclique of size |U 0 | + |V 0 |, maximum for this
graph. Lets denote a maximum γ – quasi-biclique in the graph G by ωγ (G)
Model 1. Here, we adapt model F3 from [4] for searching for maximum quasi-
bicliques. We introduce the following variables: ui = 1 ⇔ i ∈ U 0 , vj = 1 ⇔ j ∈
                                             (1)                  (2)
V 0 , yij = 1 ⇔ ∃(i, j) ∈ E ∩ (U 0 × V 0 ), zk = 1 ⇔ |U 0 | = k, zk = 1 ⇔ |V 0 | = k,
  (1)    (1)                                                          (2) (2)
ωl , ωu are the lower and upper bounds for vertex set U 0 , ωl , ωu are the
                                               0
lower and upper bounds for vertex set V .
                                              Searching for Maximum Quasi-Bicliques                        3

   Then we build Model 1.
                                                    
                                         X      X
                           ωγ (G) = max   ui +   vj  ,                                               (1)
                                          u,v,y,z
                                                       i∈U           j∈V
                                                              (1)      (2)
                                                             ωu       ωu
                                   X                         X        X
           under conditions                 yij ≥ γ                           n · m · zn(1) · zm
                                                                                               (2)
                                                                                                   ,   (2)
                                  (i,j)∈E                  (1)  (2)
                                                        n=ωl m=ωl

                 yij ≤ ui , yij ≤ vj , yij ≥ vi + vj − 1, ∀i ∈ U, ∀j ∈ V,                              (3)
                                    (1)                                 (2)
                                   ωu                                  ωu
                     X             X                   X               X
                           ui =             nzn(1) ,          vj =                (2)
                                                                                mzm   ,                (4)
                     i∈U             (1)
                                  n=ωl                 j∈V              (2)
                                                                     m=ωl
                                (1)                       (2)
                               ωu                        ωu
                               X                         X
                                       zn(1) = 1,                    (2)
                                                                    zm   = 1,                          (5)
                                (1)                       (2)
                             n=ωl                      m=ωl

      ui ∈ {0, 1}, vj ∈ {0, 1} ∀i ∈ U, ∀j ∈ V, yij ∈ {0, 1} ∀i < j, (i, j) ∈ E,                        (6)
                           (1)                                  (2)
          zn(1) ≥ 0 ∀n ∈ {ωl , . . . , ωu(1) }, zm
                                                 (2)
                                                     ≥ 0 ∀m ∈ {ωl , . . . , ωu(2) }.                   (7)
                                                       (1)           (2)
   As in the model F3 we can bound zk                        and zk        and recast them into contin-
uous variables.
Remark. In Model 1, condition 2 is not linear, so we linearize it further. In
the worst-case scenario for dense graphs there are |U | + |V | binary variables and
|E| + |U | · |V | continuous variables to be optimized.
Model 2. Let us have a look at a different maximizing criteria for our MIP
model from [5] dedicated to triclustering generation. By narrowing this criteria
to binary setting, it is possible to produce another maximising criteria for model
GF3(f) from [4].
    For a bipartite graph G(U, V, E) and its subgraph G[C1 , C2 ] the function f
is maximized over the density and the size of G[C1 , C2 ].

                                                                                                       2
                                           (|{(i, j) : i ∈ C1 , j ∈ C2 , (i, j) ∈ E}|)
f (C1 , C2 ) = ρ2 (G[C1 , C2 ]) · |C1 | · |C2 | =                                      .
                                                            |C1 | · |C2 |
                                                                                     (8)
Using variables definitions from the previous model we can apply logarithm and
rewrite f as follows:
                                                         !                     
                                  X                  X                    X
      flog (C1 , C2 ) = 2 · log     yij  − log         ui − log            v j . (9)
                                      (i,j)∈E                         i∈U                   j∈V

   Without any extra bounds on vertex sets of a quasi-biclique and the minimum
number of edges in it, the model has 2 · (|U | + |V | + |E|) variables.
4        D.I. Ignatov et al.

3      Experimental Validation

The greedy algorithm of finding of maximal γ-quasi-bicliques in a bipartite graph
was written in Python 2.7. The MIP models were implemented with the opti-
misation package IBM Cplex. All calculations were carried out on a laptop with
the MacOs operating system, 2.7 GHz Intel Core i5 processor, RAM 8 GB 1867
MHz.
   Datasets for testing the performance of the algorithms are taken from [6, 7]:

 1. Divorse in US: 9 × 50 vertices, 225 edges;
 2. Dutch Elite: 3810 × 937 vertices, 5221 edges;
 3. Dutch Elite (TOP-200): 200 × 395 vertices, 877 edges;
 4. Movie-Lens (ml-latest-small): 9125 × 20 vertices, 20340 edges.

    The results of applying the algorithms are presented in the Table 1 for γ =
0.6. For each algorithm the main parameters are indicated: the algorithm running
time (time)4 , the number of founded maximum quasi-bicliques (count) and the
maximum size of the founded solution.
    If one of the maximal quasi-biclique components has cardinality 1, this is
marked in the table as (U 0 , V 0 ), where U 0 and V 0 are the sizes of the fractions.


    Table 1: Results of maximum γ-quasi-biclique search. Parameters: γ = 0.6.

                             Model 1        Model 2      Greedy algorithm [8]
          Data
                         time count size time count size time count    size
      Divorse in US     1.23 s 1      54 3.38 s 1    53 360ms 1      (37, 1)
    DutchElite (top200) 7602 s 2 (26,1) 181 s   1    14   3s    1       13
        DutchElite         -    -      - 3265 s 1    22 1954 s 1        21
    Movie-Lens (small) 28068 2       694 5042 s 1 712 2591 s 1         681


4      Results and conclusions

One can note, that MIP models work an order of magnitude slower than the [8]
algorithm, but they find more quasi-cliques and generally each has larger size.
   If we consider the results not in terms of speed, but in terms of quality,
then Model 2 on the examples of data worked best. This model produced more
unique and larger quasi-bicliques than other algorithms (however, only sizes of
the maximum quasi-bicliques are mentioned in the tables).
4
    Dashes (”-”) in the following tables mean that the algorithm worked too long and
    did not find a solution.
                                      Searching for Maximum Quasi-Bicliques           5

Acknowledgement. The work of Dmitry Ignatov shown in all the sections has
been supported by the Russian Science Foundation grant no. 17-11-01276 and
performed at St. Petersburg Department of Steklov Mathematical Institute of
Russian Academy of Sciences, Russia.


References
1. Pattillo, J., Veremyev, A., Butenko, S., Boginski, V.: On the maximum quasi-clique
   problem. Discrete Applied Mathematics 161(1) (2013) 244 – 257
2. Ignatov, D.I., Kuznetsov, S.O., Poelmans, J.: Concept-based biclustering for internet
   advertisement. In: 12th IEEE International Conference on Data Mining Workshops,
   ICDM Workshops, Brussels, Belgium, December 10, 2012. (2012) 123–130
3. Kaytoue, M., Kuznetsov, S.O., Macko, J., Napoli, A.: Biclustering meets triadic
   concept analysis. Ann. Math. Artif. Intell. 70(1-2) (2014) 55–79
4. Veremyev, A.: Exact mip-based approaches for finding maximum quasi-cliques and
   dense subgraphs. Computational Optimization and Applications 64(1) (May 2016)
   177–214
5. Ignatov, D.I., Gnatyshak, D.V., Kuznetsov, S.O., Mirkin, B.G.: Triadic formal
   concept analysis and triclustering: searching for optimal patterns. Machine Learning
   101(1) (Oct 2015) 271–302
6. Borgatti, S.P., Everett, M.G., Freeman, L.C.: UCINET. In: Encyclopedia of Social
   Network Analysis and Mining. (2014) 2261–2267
7. Batagelj, V., Mrvar, A.: Pajek. In: Encyclopedia of Social Network Analysis and
   Mining. (2014) 1245–1256
8. Liu, X., Li, J., Wang, L.: Quasi-bicliques: Complexity and binding pairs. In Hu, X.,
   Wang, J., eds.: Computing and Combinatorics, Berlin, Heidelberg, Springer Berlin
   Heidelberg (2008) 255–264