BicOPT:Biochips Data Clustering algorithm


                             Faouzi Mhamdi1,2                        Ahmed Zammali1
                        faouzi.mhamdi@ensi.rnu.tn                zammaliahmad@gmail.com,

 1Laboratory of Technologies of Information and Communication and Electrical Engineering (LaTICE)

         National Superior School of Engineers of Tunis (ENSIT), University of Tunis, Tunisia
 2Hihger Institute of Applied Language and Computer Science of Beja, University of Jendouba, Tunisia


                         Abstract                                 matrix M where n is the number of rows and m is the
                                                                  number of columns. A bigroupe B is a set of pairs (I, J), with
    Biochips present a new technology that allows to              I is a subset of rows of M and J is a subset of M columns, all
    analyze the level of expression of genes, among the           of these subassemblies have a sub matrix called bigroupe.
    techniques that are applicable on this technology is          The aim of the clustering algorithms is to produce a
    the biclustering. The main objective of the latter is         coherent, stable and homogeneous bigroup. The
    to extract groups of genes taking into account the            homogeneity criteria vary from one algorithm to another.
    coherence between all the conditions that                     Generally, the biclustering problem is NP-difficult. We then
    characterize them. There are a variety of                     used heuristic algorithms to construct biclusters close to the
    biclustering algorithms that have already been                optimal. The problem of biclustering can be formulated as
    proposed in the field of biochips. Each of these              following [2]:
    algorithms differs from the others by a set of
    characteristics. In this paper, we focus on the                              𝑓(𝐵𝑜𝑝𝑡 ) = max 𝑓(𝐵)                     (1)
    BicFinder algorithm, where we propose to make
    improvements in order to make it faster. In the first
    place, we will present a fast variant of this
                                                                   with   •     BBC(M)
    algorithm. Then we will present our version of                        •     f is an objective function measuring the
    algorithm named BicOPT followed by a set of                                 quality i.e., the degree of coherence, of a
    experiments applied to real data.                                           group of bigroupes.

    Keywords biochips, biclustering, BicFinder,                           •     BC(M) : is the set of all groups of possible
    BicOPT, Evaluation functions, experimental study.                           bigroupes associated with M


1    Introduction                                                 Madeira et Oliveira [9] propose to classify the algorithms of
In a data matrix, we can find links between the set of rows       biclustering according to the approaches used for their
or between the set of columns, or between the set of rows         construction. These approaches are classified according to
and columns simultaneously. A technique called clustering         five categories [7]: IRCCC (Iterative Row and Column
only allows us to detect the first and second cases. So, this     Clustering Combination), DC (Divide and Conquer), GIS
technique remains too simplistic to determine the third           (Greedy Iterative Search), EBE (Exhaustive Bicluster
case. Another more interesting technique, called                  Enumeration)        et    DPI     (Distribution    Parameter
simultaneous classification, cross-classification or block        Identification). BicOPT is based on the BicFinder algorithm
classification. It is also referred to as a biclustering [8,9]    following the Greedy Iterative Search approach of a
hence the objective of this approach is to extract the groups     polynomial complexity O(n⁴m). So, in this paper we will
of rows while taking into account the consistency with all        present in the first place the BicFinder algorithm. In the
the columns. This technique can be used in several fields,        second place, we will detail our BicOPT contributions and
among which we mention that of Bio-chips.                         we will pass to the illustrations of the experimental study of
                                                                  our approach, we will end with a conclusion.
The input file for a biclustering algorithm of biochip data is
a data matrix, where the rows are filled by the names of the
genes and the columns are the conditions. So, let a data
2     BicFinder                                                                           1 𝑖𝑓 𝑀[𝑖, 𝑙] < 𝑀[𝑖, 𝑙 + 1]               (2)
BicFinder is a systematic greedy algorithm, its polynomial                     𝑀′[𝑖, 𝑙] {−1 𝑖𝑓 𝑀[𝑖, 𝑙] > 𝑀[𝑖, 𝑙 + 1]
complexity is equal to O(n⁵m), based on the construction of                               0 𝑖𝑓 𝑀[𝑖, 𝑙] = 𝑀[𝑖, 𝑙 + 1]
an acyclic directed graph (DAG). BicFinder allows to extract
and produce a set of bigroupes close to what a biologist can         With i[1, 𝑛] and l[1. . 𝑚 − 1]
do by looking for the maximum homogeneous zones. The                 The discretization allows us to know the shape of the gene
stage of generation of bigroupes passes through 4 essential          expression profile (which can be either monotonically
steps first of all the discretization of matrix M in M' (see         increasing or monotonically decreasing ...).
equation 1), then the construction of DAG from M', then the
extraction by applying the function ACSI (see equation 2)            2.2    Construction of DAG
and validation using the ASR function (see equation 3).
                                                                     Our graph is associated with the matrix M ', where each
      Algorithm 1. BicFinder [1]                                     node nᵢ has a gene gᵢ. Two nodes nᵢ and nⱼ are connected by
      1: Input: M, α, β ; Output: B                                  an arc if and only if (i> j). CSl ᵢ, ⱼ is assigned for each arc (nᵢ,
      2: Discretize M using Equation 7 to obtain M'//                nⱼ).
      Step of discretization
      3: Build the DAG associated with M'//
      Construction Step
      4: B = Ø // Extraction step
      5: For any nᵢ in the DAG do
      6: I′ᵢ=Ø; J′ᵢ=Ø; // Bi = (I′ᵢ , J′ᵢ)
      7: Sort arcs of nᵢ in decreasing order according to
      the number of true
      8: For any edge (nᵢ,nᵏ) do
      9: Ic=I′ᵢ U {gᵢ,gk}; Jc=J′ᵢ ᴜ {cl,cl+1 with T(M′[i, l] =
      M′[k, l]) = true};                                                                Figure 2. Example of DAG
      10: If ACSIᵢ(Ic, Jc) >= α then Bᵢ = (Ic, Jc)
      11: End                                                        2.3    Extraction: ACSI
      12: B = B U Bi                                                 Is a extraction function based on Concordance Index (CI)
      13: End                                                        [12]. To calculate ACSI, the CSI function must be calculated
      14: For any bigroupe Bi = (I′i , J′i) in B do //
                                                                     for each arc of the graph (Dag) (see equation 3).
      Selection step
      15: If ASR(I′i , J′i) < β then B = B\Bi                              𝐶𝑆𝐼(𝑖, 𝑗, 𝑘)                                            (3)
      16: End                                                                ∑𝑚−1       ′       ′          ′
                                                                              𝑖=1 𝑇(𝑀 [𝑖, 𝑙] = 𝑀 [𝑗, 𝑙] = 𝑀 [𝑘, 𝑙])
      17: Return B                                                         =
                                                                                           MaxCSLᵢ

   Group extraction processes are subdivided into four main
steps (see Figure 1).
                                                                     with i [1. . n − 2], j[2. . n − 1], k[3. . n], 1[1. . m
                                                                                        − 1]and i < j < k


                                                                             𝐴𝐶𝑆𝐼ᵢ(𝐼′, 𝐽′)                                         (4)
                                                                                  ∑𝑗∈𝐼;𝑗≥𝑖+1 ∑𝑘∈𝐼;𝑘≥𝑖+1 𝐶𝑆𝐼(𝑖, 𝑗, 𝑘)
                                                                             =2∗
                                                                                           |I′′|(|I′′| − 1)


                                                                     Our bigroup starts with an initial arc (MaxCSL ᵢ, ⱼ) and at
                                                                     each iteration we add an arc if and only if ACSIᵢ (I ', J')> = α
             Figure 1.BicFinder algorithm process                    otherwise we pass to the next arc.

2.1     Discretization                                               2.4    Evaluation: ASR

To compute ACSI, we must first discretize the initial matrix         The last step used is the evaluation of bigroupes generated
M (I, J), I = {1, 2, ..., n} and J = {1, 2, ..., m} Matrix M '(see   by applying ASR function.
equation 7).
  𝐴𝑆𝑅(𝐼′ , 𝐽′ )                                               (5)
              ∑𝑖∈𝐼′ ∑𝑗∈𝐼′;𝑗≥𝑖+1 𝑝𝑖𝑗 ∑𝑘∈𝐽′ ∑𝑙∈𝐽′;𝑙≥𝑘+1 𝑝𝑖𝑗
  = 2 max {                        ,                      }
                 |𝐼′ |(|𝐼′ | − 1)      |𝐽′|(|𝐽′| − 1)


 with                      6 ∑𝑚    𝑖   𝑖          𝑗
                              𝑘=1(𝑟𝑘 (𝑥𝑘 ) − 𝑟𝑘 (𝑥𝑘 ))
                                                       𝑗
                                                       2      (6)
               pᵢⱼ = 1 −
                                  𝑚(𝑚2 − 1)

   A bigroup is valid if its ASR> = β.

2.5     Clustering: K-medoids                                                 Figure 3. DAG associated with the matrix M '
After the presentation of the algorithm and the explanation
                                                                    For the first node g0 we have CSL (g0) = {(b), (a), (e), (d), (f),
of its operating principle, we describe, in this section, the
                                                                    (c)}. So we take the first two arcs "b" and "a"
BicFinder process using an illustrative example.
                                                                                          CSI(0,1,2)       3/4
So, we fix the parameter α which controls the extraction and        𝐴𝐶𝑆𝐼𝑔0 (𝑏, 𝑎) =                    =         = 0.75 We have ACSIg0 (b,
                                                                                          2(2−1)/2         1
addition of the arc and the parameter β which controls the
                                                                    a) = α so we add the arc "e"
validation of bigroupes. Let the parameters α = 0.75, β = 0.5.
                                                                                           CSI(0,1,2) + CSI(0,1,5) + CSI(0,2,5)
                                                                        𝐴𝐶𝑆𝐼𝑔0 (𝑏, 𝑎, 𝑒) =
                      Table I. Data Matrix M                                                           3(3 − 1)/2
                                                                                           3 2 2
                                                                                              + +
          C0         C1         C2        C3          C4      C5                         = 4 4 4 = 0.58 < α
                                                                                               3
 g0       13         7          5         20          10      -5
 g1       15         10         20        30          -2      15                           CSI(0,1,2) + CSI(0,1,4) + CSI(0,2,4)
                                                                        𝐴𝐶𝑆𝐼𝑔0 (𝑏, 𝑎, 𝑑) =
 g2       15         9          8         20          10      10                                       3(3 − 1)/2
 g3       3          8          10        9           15      4                            3 1 1
                                                                                              + +
 g4       13         15         17        8           3       1                          = 4 4 4 = 0.41 < α
                                                                                               3
 g5       20         8          12        25          27      1
                                                                                           CSI(0,1,2) + CSI(0,1,6) + CSI(0,2,6)
 g6       13         15         17        8           3       1         𝐴𝐶𝑆𝐼𝑔0 (𝑏, 𝑎, 𝑓) =
                                                                                                       3(3 − 1)/2
                                                                                           3 1 1
                                                                                              + +
            Table II. Matrix M' after discretization                                     = 4 4 4 = 0.41 < α
                                                                                               3
                      C0      C1     C2      C3       C4                                   CSI(0,1,2) + CSI(0,1,3) + CSI(0,2,3)
               g0     -1      -1     1       -1       -1                𝐴𝐶𝑆𝐼𝑔0 (𝑏, 𝑎, 𝑐) =
                                                                                                       3(3 − 1)/2
               g1     -1      1      1       -1       1                                     3
               g2     -1      -1     1       -1       0
                                                                                         = 4 = 0.25 < α
               g3     1       1      -1      1        -1                                    3
               g4     1       1      -1      -1       -1            We apply the same processes on the rest of the nodes and
               g5     -1      1      1       1        -1            we          obtain            as              a             result:        B=
               g6     1       1      -1      -1       -1            {(         {g0, g1, g2}; {c ′ 0, c ′ 1, c ′ 2, c ′ 3, c′4}          )        ;
                                                                    ({g3, g4, g6};  {c′0, c 1, c 2, c 3, c 4, c 5})}. Only the bigroups
                                                                                           ′    ′     ′      ′      ′

The DAG is constructed from the matrix M '. The arcs are            who have a score ASR >= β Will be selected.
sorted in decreasing order relative to the weight associated        𝐴𝑆𝑅({g0, g1, g2}; {c ′ 0, c ′ 1, c ′ 2, c ′ 3, c ′ 4}) >          β       and
with each edge (with the weight equal to the sum of true).          𝐴𝑆𝑅({g3, g4, g6}; {c ′ 0, c ′ 1, c ′ 2, c ′ 3, c ′ 4, c ′ 5})< β. Finally, we
                                                                    obtain: B= {({g0, g1, g2}; {c ′ 0, c ′ 1, c ′ 2, c ′ 3, c′4})}.

                                                                    3     BicOPT
                                                                    The BicFinder algorithm has shown better performance
                                                                    compared to other bicluster algorithms [1]. The results
                                                                    obtained prompted us to study and improve this algorithm.
3.1     Optimization                                                  7:    nbrLine:=0 ;
The BicFinder algorithm resulted in better performance                8:    Tab[0]:=arc[0] ;
                                                                      9:    int j:=1 ;
compared to other bicluster algorithms [1]. These results
                                                                      10: While (j< arc.length et x<=Ɣ) //Ɣ Is the
present a motivation for us to study and improve this
                                                                      maximum number of rows in a bigroup
algorithm.
                                                                      11:         nbrline++ ;
                                                                      12:         Tab[nbrLine]:=arc[j] ;
3.1.1     Main Program
                                                                      13:         Eval:=0
The temporal complexity of the extraction step is O (n⁵,m)            14:         For k := 0 to nbrLine do // the
[1], which is rather complex. The second and third                    maximum number of rows in a bigroup = n
equations show that for a single node the minimum                     15:              For z := k+1 to nbrLine do
complexity time for the extraction step is O (n²m) but we             16:                  A[]:=extractionEntier(Tab[k]) ;
need to browse the whole data file so we have as a time of            17:                  B[]:=extractionEntier(Tab[z]) ;
minimal complexity O (n3m). Our main algorithm is divided             18:                  C[]:=extractionEntier(Tab[0]) ;
into five steps (see algorithm 2):                                    19:
                                                                      Eval:=eval+csi(A[1],A[2],B[2])/c[0] ; //
     •        Discretization
                                                                      Calculate the CSI function and CSI complexity =
     •        Construction of DAG                                     O(m)
     •        Extraction of bigroupes                                 20:             End
     •        Evaluation of bigroupes                                 21:         End
                                                                      22:     If (eval/((nbrLine+1)*nbrline)/2<seuil)
     •        Results Visualization
                                                                      nbrLine-- ; // If our evaluation variable below
      Algorithm 2. Main program                                       the threshold so the last added arc will be
      1: F : File // Initial file                                     deleted
      2: M : Integer // Number of columns                             23:     End
      3: N : Integer // Number of rows                                24:     Row:=returnRow(tab) ; //O(n²)
      4: α, β,Ɣ : Integer // parameter                                25:     colomn :=returnColomn(row,Mat) ;
      5: Mat : matrix // Matrix after discretization                  //O(nm)
      6: T : Table // table of DAG                                    26:     B:=B+{row,colomn} ;
      7: Begin                                                        27: End // Extraction is of the order of
      8: Mat =Discretization(F) ;                                     complexity of O(n⁴m)
      9:T=DagTree(Mat) ;                                              28: Return B;
      10:B=Extraction(T,Mat) ;                                        29: End.
      11:B=Evaluation(B);
      12: Visualization(B);                                     The complexity time is calculated from the for loop nested
      13: End                                                   of the extraction part so we have O (n⁴m + nᶟ + n²m)
                                                                therefore:
3.1.2     Function “Extraction”                                 O (n⁴m + nᶟ + n²m) = O (n² (n²m + n + m)) = O (n² (n (1 +
The extraction function uses equation 3. We have already        nm) + m)) 1 is negligible with respect to nm, (N² (n²m +
mentioned that this equation has a minimum complexity           nm)) = O (n² (nm (1 + n)))  we also have 1 is negligible
time which is equal to O (nᶟm), so we tried to implement this   with respect to n, so that O (n⁴m) is obtained as a time of
equation with a time of complexity less than O (n⁵m ) (See      final complexity.
algorithm 3.).
                                                                3.2     Improvement
      Algorithm 3. Extraction(T ,Mat)                           Among the improvements made to our application, we
                                                                mention: the addition of gamma parameter which allows us
      1: A,B,C,Arc : table
                                                                to limit the number of rows of a bigroup. In addition, the
      2: tab : table //Contains the detected arcs for a
      bicluster                                                 creation of a graphical interface that serves to display the
      3: nbrLine: integer //Number of rows for a                results obtained and also to plot the gene expression curves
      bicluster                                                 of each group.
      4: Begin
      5: For i :=0 to n-2 do // browse all the lines of         3.2.1    Size of biclusters
      the input file                                            The columns of our bigroupe is calculated from the lines
      6:     Arc[]:= extractionEntier(T[i],’,’) ; // Arc[] :    obtained from the ACSI function, so in the case of decreasing
      Table contains all the arcs of a selected node
values of the threshold α, the number of rows increases and      The design of the interface of our system offers the user
in return the number of columns can decrease until no            several advantages, of which we quote that it allows to
column is obtained.                                              follow and to visualize all the steps of extraction of
Hence the risk of losing this bigroup altogether. We used        bigroupes and also allows him to determine the position of
another parameter Ɣ that allows us to limit the number of        bigroupe in the matrix of initial data.
rows of a bigroup.


                                                                               Figure 6. Gene expression curve

                                                                 To measure the coherence of dictated biclusters, BicOPT
                                                                 makes it possible to visualize the curves created as a
                                                                 function of the level of expression of genes (see FIG. 6).
           Figure 4. Results obtained with gamma
                                                                 If the curves are similar, we can deduce that our bigroup has
                                                                 a strong coherence.
For a first test, the value of Ɣ is equal to n (where n is the
number of rows of the data file), we obtained six biclusters
with an execution time equal to 1145660 ms. We changed           4    Results
the value of parameter Ɣ (number of genes generated) to 20       The data file used by our program must be structured in the
where we obtained ten biclusters with a run time of 6354         following format:
ms. So, the addition of this parameter allows the user to win
in the execution time and in the number of biclusters
                                                                                  Table III. Sample data file
obtained.
                                                                  𝐺𝑒𝑛𝑒0            15                …           100
3.2.2   Display of biclusters                                     ……….             …                 …           …
Unlike Command-row interface (CLI) systems that require           𝐺𝑒𝑛𝑒𝑖            12.4              …           -15
commands to be stored, GUI systems offer a relatively             ………              …                 …           …
intuitive approach. Even users without significant training       𝐺𝑒𝑛𝑒𝑛            10.5              …           125
can easily learn the system and use it to achieve their goals.
                                                                 The first column of the file must contain the name of the
                                                                 genes. The columns are separated with spaces. Each row
                                                                 contains the gene name followed by a set of gene
                                                                 expressions.
                                                                 However, to test BicOPT's ability to extract different types
                                                                 of biclusters, we used a set of real files : Human B-cell
                                                                 Lymphoma dataset 1 with the size of (4026 rows, 96


              Figure 5. Display of biclusters

    1 Available on http://arep.med.harvard.edu/biclustering/
columns) and Saccharomyces Cerevisiae dataset 2 with the           the significant terms of gene ontology shared by the
size of (2993 rows, 173 columns).                                  selected gene groups. To identify the characteristics that the
In order to compare the results obtained by BicOPT with the        genes can have in common, we selected three random
other algorithms we used the BicAT toolbox [3], this tool          groups in a random way (see Table 7).
contains a set of bicluster algorithms (BiMax [11], CC [6],
ISA [5], OPSM [4], xMotives [10]). The BicOPT algorithm and
                                                                        Table IV.The most significant GO terms for three
all other algorithms are run on an Intel Core I3 2.2 GHz
                                                                                          bigroupes.
machine and 4 Gb of RAM.
                                                                     Biclusters      Biological         Molecular         Cell
4.1     Human B-cell Lymphoma dataset
                                                                                     Process            Function          Component
The BicOPT settings are set to 0.8 for Alpha and 0.4 for Beta.       20 Genes x      Cytoplasmic        Structural        Cytosolic
The test execution time of our algorithm lasted 137.15               101             Translation        component         Ribosome
minutes.                                                             Conditions      (95.0%,            of       the      (95.0%,
                                                                                     3.1E-28)           ribosome          8.66E-29)
                                                                                                        (95.0%,
                                                                                                        3.82E-27)
                                                                                     Biosynthesi        Structural        Ribosomal
                                                                                     s process of       molecular         subunit
                                                                                     peptides           activity          (95.0%,
                                                                                     (95.0%,            (95.0%,           5.85E-26)
                                                                                     2.22E16)           1.31E-23)
        Figure 7. Expression profiles of two biclusters                              Metabolic          SnoRNA            Nucleolus
                                                                     18 Genes x      process of         binding           (83.3%,
The first bigroup of size (8.57), which is presented by the          91              ncRNA              (27.8%,           1.70E-16)
curve (a) (see Figure 7) and the second bigroup of size (12,         Conditions      (88.9%,            2.65E-08)
49), is presented by the curve (b) Figure 7). We have                                1.33E-14)
associated a curve for each gene, the latter is created as a                         Treatment          RNA               Preribosome
function of the level of gene expression.                                            of ncRNA           binding           (72.2%,
The curves are grouped together in the same frame in order                           (83.3%,            (55.6%,           3.74E-16)
to form an expression profile of a bicluster. According to                           2.41E-14)          0.00017)
curve (a) we have a strong gene expression profile with the          18 Genes x      Cytoplasmic        Structural        Cytosolic
presence of some noises. In the second curve (b) we also             88              translation        Component         Ribosome
observe a strong gene expression profile but without the             Conditions      (83.3%,            Of       the      (83.3%,
presence of noises. The presence of noise in the data file is                        3.66E-20)          ribosome          1.19e-20)
invaluable in the groups detected by BicOPT.                                                            (83.3%,
                                                                                                        1.61E-19)
4.2     Saccharomyces Cerevisiae dataset
                                                                                     Translation        Structural        Ribosomal
After testing a set of simulations, the BicOPT parameters                            (83.3%,            molecular         subunit
were set to 0.85 for Alpha and 0.7 for Beta. The test                                5.70E-11)          activity          (83.3%,
execution time of our algorithm is 54.5 minutes. To evaluate                                            (83.3%,           1.90e-18)
the biological relevance of bigroupes detected by our                                                   9.18E-17)
algorithm, we used two web tools, GOTermFinder 3 and
FuncAssociate4, to calculate the p-value [13]. More than the       We used the GOTermFinder tool to describe the most
p-value is lower more than the bigroup genes are consistent.       significant shared terms, respectively, for the biological
                                                                   process, molecular function, and cellular component (see
4.2.1     GOTermFinder :                                           Table 4). For the first bigroup of size (20,101) we have for
                                                                   example Cytoplasmic Translation (95.0%, 3.1E-28), so this
GOTermFinder is a well-known web-tool which allows to
                                                                   bigroup is involved in Cyptoplasmic translation with a
check the quality of each detected group and to search for
                                                                   frequency of 95.0% (among 20 genes in the first bigroup 19

      2 Available on http://people.ee.ethz.ch/~sop/bimax/              3 Available on http://www.yeastgenome.org/cgi-

      SupplementaryMaterial/Datasets/BiologicalValidation/data/s       bin/GO/goTermFinder.pl
      accharomyces/yeast_GOEnrichment,Gasch2000,2944x173.txt           4 Available on http://llama.mshri.on.ca/funcassociate/
belong To this process) and with a p-value equal to 3.1E-28      References
(very significant value).
                                                                 [1]    W. Ayadi, M. Elloumi, and J. K. Hao, “BicFinder :
4.2.2     FuncAssociate                                                 A biclustering algorithm for microarray data
FuncAssociate is a web application that helps to discover               analysis,” Knowl. Inf. Syst., Vol. 30, N°2 : (2012),
properties enriched in lists of genes or proteins that emerge           p341–358.
from the experiment on a large scale [13]. The basic idea is
to select 20 biclusters and then to determine whether the        [2]    W. Ayadi and M. Elloumi. Algorithms in
set of genes discovered by biclustering algorithms shows a              Computational Molecular Biology: Techniques,
significant enrichment compared to an annotation of                     Approaches     and     Applications, chapter
genetic ontology (GO) or not (see Figure 8).                            Biclustering of Microarray Data.Wiley Book
                                                                        Series on Bioinformatics : Computational
                                                                        Techniques and Engineering. Wiley-Blackwell,
                                                                        John Wiley & Sons Ltd., New Jersey, USA
                                                                        (Publish), (2011), p651-664.

                                                                 [3]    S. Barkow, S. Bleuler, A. Prelic, P. Zimmermann,
                                                                        and E. Zitzler. Bicat: a biclustering analysis
                                                                        toolbox. Bioinformatics, Vol. 22, N°10 : (2006),
                                                                        p1282–1283.

                                                                 [4]   A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini.
                                                                       Discovering local structure in gene expression
                                                                       data: the order-preserving submatrix problem.
    Figure 8. Percentages of Biclusters enriched by GO
                                                                       In RECOMB ’02: Proceedings of the sixth annual
                       annotations
                                                                       international conference on Computational
                                                                       biology, p49–57, New York, NY, USA, 2002. ACM.
For values associated with parameter p, BicOPT surpassed
the other algorithms with a percentage of 100% followed by
Opsm with a percentage 90% for p = 0.0001. The other             [5]    S. Bergmann, J. Ihmels, and N. Barkai. Defining
algorithms also perform reasonably well. The experiments                transcription modules using large-scale gene
applied to the real data set used prove that our proposed               expression data. Bioinformatics, Vol. 20, N°13 :
algorithm can identify bigroups with high biological                    (2004), p1993–2003.
relevance.
                                                                 [6]   G. F. Berriz, J. E. Beaver, C. Cenik, M. Tasan, and F.
5       Conclusion                                                     P. Roth, “Next generation software for functional
                                                                       trend analysis,” vol. 25, N°22 : (2009), p3043–
In this paper, we propose the BicOPT algorithm which                   3044.
presents a new optimized version of the BicFinder
algorithm. The complexity time of our algorithm is equal to
                                                                 [7]    M. (Riadi) Charrad, G. (Riadi) Saporta, Y.
O (n⁴m), which is less than that of BicFinder. BicOPT allows
                                                                        Lechevallier, and M. Ben Ahmed, “Le bi-
the extraction and the production of a set of biclusters based
                                                                        partitionnement : Etat de l’art sur les approches
on the construction of an acyclic oriented graph (DAG). We
                                                                        et les algorithmes,”Ecol’IA'08 : (2008).
added a new GAMMA parameter to limit the gene numbers
of each generated bigroup. BicOPT has a graphical interface
                                                                 [8]    Y. Cheng, G.M. Church, Biclustering of
allowing to manage well the obtained bigroupes We
                                                                        Expression Data, in Proc. International
realized different tests on real databases to evaluate the
                                                                        Conference on Intelligent Systems for Molecular
performance of BicOPT. In the realization of this study, we
                                                                        Biology : (2000), p93-103.
used two web tools GOTermFinder, FuncAssociate and a
BicAT application. The experimental study of our approach
to biclustering have good results.                               [9]    S. C. Madeira, A. L. Oliveira, Biclustering
                                                                        Algorithms for Biological Data Analysis: A
                                                                        Survey, IEEE Transactions on Computational
       Biology and Bioinformatics , Vol.1, N°1 : (2004),
       p24-45.

[10]   S. K. T. M. Murali, Extracting conserved gene
       expression motifs from gene expression data,
       Pac. Symp. Biocomput, Vol.8 : (2003), p77-88.

[11]   A. Prelic, S. Bleuler, P. Zimmermann, P.
       Buhlmann, W. Gruissem, L. Hennig, L. Thiele,
       and E. Zitzler. A systematic comparison and
       evaluation of biclustering methods for gene
       expression data. Bioinformatics, Vol. 22, N°9 :
       (2006), p1122–1129

[12]      Y.S. Son and J. Baek. A modified correlation
       coefficient based similarity measure for
       clustering time-course gene expression data.
       Pattern Recognition Letters, Vol. 29, N°3 :
       (2008), p232–242.

[13]   Wasserstein, Ronald L.; Lazar, Nicole A. "The
       ASA's Statement on p-Values: Context, Process,
       and Purpose". The American Statistician. Vol.
       70, N°2 : (2016), p129–133.