<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Preliminary Results on Mixed Integer Programming for Searching Maximum Quasi-Bicliques and Large Dense Biclusters</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Dmitry Ignatov</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Pittsburgh</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This short paper is related to the problem of nding maximum quasi-bicliques in a bipartite graph (bigraph). A quasi-biclique in a bigraph is its \almost" complete subgraph; here, we assume that the subgraph is a quasi-biclique if it lacks 100% of the edges to become a biclique. The problem of nding the maximal quasi-biclique(s) consists of nding subset(s) of vertices of an input bigraph such that the induced by these subsets subgraph is a quasi-biclique and its size is maximal. A model based on mixed integer programming (MIP) to search for a quasi-biclique is proposed and tested. Another its variant is tested that simultaneously maximizes both the size of the quasi-biclique and its density, using the least-square criterion similar to the one exploited by TriBox method for tricluster generation. Therefore, the output patterns can be called large dense biclusters as well.</p>
      </abstract>
      <kwd-group>
        <kwd>quasi-biclique</kwd>
        <kwd>maximal quasi-biclique</kwd>
        <kwd>mixed integer programming</kwd>
        <kwd>biclustering</kwd>
        <kwd>triclustering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>There are many data sources that can be represented as bipartite graphs. For
example, social network data, where the underlying binary relation represents
interactions between people and communities, advertisement data with a set of
manufacturers and corresponding set of products etc.</p>
      <p>Copyright c 2019 for this paper by its author. Copying permitted for private and
academic purposes.</p>
      <p>In this study, we are interested in analysis of such bipartite data and search
for largest dense communities, where almost all elements are connected. A
situation where all elements of a community are actually involved can be described
by a biclique (complete subgraph) of a bipartite graph.</p>
      <p>Unfortunately, the community completeness requirement excludes almost
complete communities frequently met in real-world data. Due to this reason
we allow some edges to be absent and introduce the concept of quasi-biclique.
In order to bound the size of quasi-biclique we can use the subgraph minimal
density or the maximum number of absent edges needed to complete a subgraph.</p>
      <p>The problem of searching for maximal quasi-clique as well as the problem
of searching for maximal clique is NP-hard. Many algorithms that solve the
problem are being developed. For instance, Pattillo et al. [1] o ered an integer
programming model for searching for maximal quasi-clique but the case of
bipartite graph with biclique was not studied in details. Note that quasi-bicliques
and dense biclusters can be considered as relaxed versions of formal concept
de nition [2, 3].</p>
      <p>The aim of this paper is to propose a mixed integer programming models
for nding a maximum quasi-bicliques in a bipartite graph and compare those
models with existing ones.</p>
      <p>The remainder of the paper is organised as follows. Section 2 proposes two
MIP models for quasi-biclique search. In Section 3, we provide the preliminary
experimental results. Section 4 concludes the paper.
2</p>
      <p>Quasi-biclique searching models
De nition 1. A -quasi-biclique in a bipartite graph G = (U; V; E) is its
bipartite induced subgraph G[V 0; U 0] with U 0 U , V 0 V and its number of edges
jE0j jV 0j jU 0j, where 2 (0; 1].</p>
      <p>
        The problem of maximum quasi-biclique in a bipartite graph G(U; V; E)
with xed : 0 &lt; 1 is to nd U 0 U and V 0 V such that vertex-induced
subgraph G[U 0; V 0] is a -quasi-biclique of size jU 0j + jV 0j, maximum for this
graph. Lets denote a maximum { quasi-biclique in the graph G by ! (G)
Model 1. Here, we adapt model F3 from [4] for searching for maximum
quasibicliques. We introduce the following variables: ui = 1 , i 2 U 0, vj = 1 , j 2
V 0, yij = 1 , 9(i; j) 2 E \ (U 0
!l(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ); !u(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) are the lower and upper bounds for vertex set U 0, !l(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ); !u(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) are the
lower and upper bounds for vertex set V 0.
      </p>
      <p>
        V 0), zk(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) = 1 , jU 0j = k, zk(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) = 1 , jV 0j = k,
X ui =
i2U
(i;j)2E
!u(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
      </p>
      <p>
        X
n=!l(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
!u(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
      </p>
      <p>
        X
n=!l(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
z(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) = 1;
n
i2U
!u(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
X
j2V
!u(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
      </p>
      <p>
        X
n=!l(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) m=!l(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
j2V
!u(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
      </p>
      <p>
        X
m=!l(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
!u(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
      </p>
      <p>
        X
m=!l(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
zm(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) = 1;
nzn(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ); X vj =
mzm(
        <xref ref-type="bibr" rid="ref2">2</xref>
        );
Then we build Model 1.
2
      </p>
      <p>3
! (G) = um;va;yx;z 4X ui +</p>
      <p>
        X vj 5 ;
under conditions X
yij
n m z(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) zm(
        <xref ref-type="bibr" rid="ref2">2</xref>
        );
n
yij
ui; yij
vj ; yij
vi + vj
1; 8i 2 U; 8j 2 V;
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
ui 2 f0; 1g; vj 2 f0; 1g 8i 2 U; 8j 2 V; yij 2 f0; 1g 8i &lt; j; (i; j) 2 E;
z(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
n
0 8n 2 f!l(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ); : : : ; !u(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )g; zm(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
0 8m 2 f!l(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ); : : : ; !u(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )g:
      </p>
      <p>
        As in the model F3 we can bound zk(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and zk(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) and recast them into
continuous variables.
      </p>
      <p>Remark. In Model 1, condition 2 is not linear, so we linearize it further. In
the worst-case scenario for dense graphs there are jU j + jV j binary variables and
jEj + jU j jV j continuous variables to be optimized.</p>
      <p>Model 2. Let us have a look at a di erent maximizing criteria for our MIP
model from [5] dedicated to triclustering generation. By narrowing this criteria
to binary setting, it is possible to produce another maximising criteria for model
GF3(f) from [4].</p>
      <p>
        For a bipartite graph G(U; V; E) and its subgraph G[C1; C2] the function f
is maximized over the density and the size of G[C1; C2].
f (C1; C2) =
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
Using variables de nitions from the previous model we can apply logarithm and
rewrite f as follows:
2
2(G[C1; C2]) jC1j jC2j = (jf(i; j) : i 2 C1; j 2 C2; (i; j) 2 Egj) :
jC1j jC2j
0
      </p>
      <p>X
(i;j)2E</p>
      <p>1
yij A
log</p>
      <p>X ui
i2U
!
0</p>
      <p>1
j2V
(9)</p>
      <p>Without any extra bounds on vertex sets of a quasi-biclique and the minimum
number of edges in it, the model has 2 (jU j + jV j + jEj) variables.</p>
      <p>Experimental Validation
The greedy algorithm of nding of maximal -quasi-bicliques in a bipartite graph
was written in Python 2.7. The MIP models were implemented with the
optimisation package IBM Cplex. All calculations were carried out on a laptop with
the MacOs operating system, 2.7 GHz Intel Core i5 processor, RAM 8 GB 1867
MHz.</p>
      <p>Datasets for testing the performance of the algorithms are taken from [6, 7]:
1. Divorse in US: 9 50 vertices, 225 edges;
2. Dutch Elite: 3810 937 vertices, 5221 edges;
3. Dutch Elite (TOP-200): 200 395 vertices, 877 edges;
4. Movie-Lens (ml-latest-small): 9125 20 vertices, 20340 edges.</p>
      <p>The results of applying the algorithms are presented in the Table 1 for =
0:6. For each algorithm the main parameters are indicated: the algorithm running
time (time)4, the number of founded maximum quasi-bicliques (count) and the
maximum size of the founded solution.</p>
      <p>If one of the maximal quasi-biclique components has cardinality 1, this is
marked in the table as (U 0; V 0), where U 0 and V 0 are the sizes of the fractions.
One can note, that MIP models work an order of magnitude slower than the [8]
algorithm, but they nd more quasi-cliques and generally each has larger size.</p>
      <p>If we consider the results not in terms of speed, but in terms of quality,
then Model 2 on the examples of data worked best. This model produced more
unique and larger quasi-bicliques than other algorithms (however, only sizes of
the maximum quasi-bicliques are mentioned in the tables).
4 Dashes ("-") in the following tables mean that the algorithm worked too long and
did not nd a solution.
Acknowledgement. The work of Dmitry Ignatov shown in all the sections has
been supported by the Russian Science Foundation grant no. 17-11-01276 and
performed at St. Petersburg Department of Steklov Mathematical Institute of
Russian Academy of Sciences, Russia.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Pattillo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veremyev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Butenko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boginski</surname>
          </string-name>
          , V.:
          <article-title>On the maximum quasi-clique problem</article-title>
          .
          <source>Discrete Applied Mathematics</source>
          <volume>161</volume>
          (
          <issue>1</issue>
          ) (
          <year>2013</year>
          )
          <volume>244</volume>
          {
          <fpage>257</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ignatov</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poelmans</surname>
          </string-name>
          , J.:
          <article-title>Concept-based biclustering for internet advertisement</article-title>
          .
          <source>In: 12th IEEE International Conference on Data Mining Workshops</source>
          , ICDM Workshops, Brussels, Belgium, December
          <volume>10</volume>
          ,
          <year>2012</year>
          . (
          <year>2012</year>
          )
          <volume>123</volume>
          {
          <fpage>130</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kaytoue</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macko</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Biclustering meets triadic concept analysis</article-title>
          .
          <source>Ann. Math. Artif. Intell</source>
          .
          <volume>70</volume>
          (
          <issue>1-2</issue>
          ) (
          <year>2014</year>
          )
          <volume>55</volume>
          {
          <fpage>79</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Veremyev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Exact mip-based approaches for nding maximum quasi-cliques and dense subgraphs</article-title>
          .
          <source>Computational Optimization and Applications</source>
          <volume>64</volume>
          (
          <issue>1</issue>
          ) (May
          <year>2016</year>
          )
          <volume>177</volume>
          {
          <fpage>214</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ignatov</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gnatyshak</surname>
            ,
            <given-names>D.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mirkin</surname>
            ,
            <given-names>B.G.</given-names>
          </string-name>
          :
          <article-title>Triadic formal concept analysis and triclustering: searching for optimal patterns</article-title>
          .
          <source>Machine Learning</source>
          <volume>101</volume>
          (
          <issue>1</issue>
          ) (Oct
          <year>2015</year>
          )
          <volume>271</volume>
          {
          <fpage>302</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Borgatti</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Everett</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freeman</surname>
            ,
            <given-names>L.C.</given-names>
          </string-name>
          : UCINET. In:
          <article-title>Encyclopedia of Social Network Analysis and Mining</article-title>
          . (
          <year>2014</year>
          )
          <volume>2261</volume>
          {
          <fpage>2267</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Batagelj</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mrvar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          : Pajek. In:
          <article-title>Encyclopedia of Social Network Analysis and Mining</article-title>
          . (
          <year>2014</year>
          )
          <volume>1245</volume>
          {
          <fpage>1256</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Quasi-bicliques: Complexity and binding pairs</article-title>
          . In
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , J.,
          <source>eds.: Computing and Combinatorics</source>
          , Berlin, Heidelberg, Springer Berlin Heidelberg (
          <year>2008</year>
          )
          <volume>255</volume>
          {
          <fpage>264</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>