Proceedings of the International Conference on Big Data Cloud and Applications
Tetuan, Morocco, May 25 - 26, 2015


     Architecture optimization model for the probabilistic
                    self-organizing maps

          EN-NAIMANI Zakariae                               Mohamed LAZAAR                                Mohamed ETTAOUIL
      Modeling and Scientific Computing              National School of Applied Sciences             Modeling and Scientific Computing
      Laboratory, Faculty of Science and               Abdelmalek Essaadi University                 Laboratory, Faculty of Science and
        Technology, Fez, MOROCCO                            Tetouan, MOROCCO                           Technology, Fez, MOROCCO
          z.ennaimani@gmail.com                             lazaarmd@gmail.com                           mohamedettaouil@yahoo.fr


         Abstract— The PRobabilistic Self-Organizing Maps                 SOM, associated with a given problem, is one of the most
     (PRSOM) become more and more interesting in many fields such
                                                                          important research problems in the neural network research.
     as: pattern recognition, clustering, classification, speech
     recognition, data compression, medical diagnosis, etc. The           More precisely, the choice of components (neurons) number,
     PRSOM give an estimation of the density probability function of      the initial weights and covariances matrix has a great impact
     the data, which depends on the parameters of the PRSOM, such         on the convergence of learning methods. The optimization of
     as the architecture of the network. When we take a random
                                                                          the artificial neural networks architectures, particularly
     PRSOM architecture choice (the number of neurons or
     components), we could have degenerated solutions, called also        PRSOM networks, is a recent problem. The first techniques
     singular solutions. Associated with a given problem, it is one of    consist in building the map in an evolutionary way: allowing,
     the most important research problems in the neural network           adding neurons and deleting some others. Some methods that
     research. In the present paper we describe a recent approach of
                                                                          have been proposed in the literature can be broadly classified
     probabilistic self-organizing maps (PRSOM) trying to propose a
     solution to this problem. We propose a speech compression            into two categories: the first fixes a priori the size of the map
     technique based on vector quantization. The main innovation is       in an evolutionary way [24]; the second category allows the
     the use of an optimal probabilistic self-organizing map to           data themselves to choose the dimension of the map. Recently,
     determine the optimal codebook, unlike in classical PRSOM.
     Also, we give an implementation and an evaluation of the
                                                                          another method is introduced to determine the network
     proposed method; the numerical results are powerful and show         parameters, in the supervised learning and in the Kohonen
     the practical interest of our approach.                              networks [8,9,10]. The mean purpose of this work is to model
                                                                          this choice problem of neural architecture in terms of a mixed-
        Keywords— Neural Network ; self-organization; classification;
     unsupervized learning; compression.                                  integer nonlinear problem with linear constraints. Because of
                                                                          its effectiveness in solving the optimization problems, the
                           I.    INTRODUCTION                             genetic algorithm approach is used to solve this nonlinear
        Artificial Neural Network (ANN) often called Neural               problem. It should be noted that a good local optimum of the
     Network (NN) is a computational model or mathematical                obtained model permits to improve the performance of the
     model based on biological neural networks.                           PRSOM learning algorithm.
         Teuvo Kohonen has introduced the very interesting                    This paper is organized as follows: The section 2 presents
     concept of self-organizing topological feature maps [18], The        the formalism of probabilistic self-organizing maps and vector
     central property of this formalism is that it forms a nonlinear      quantization. In section 3 we introduce the model to optimize
     projection of a high-dimensional data manifold on a regular,         the probabilistic Self Organizing architecture Maps. And
     low-dimensional (usually 2D) grid. In the display, the               before concluding, experimental results are given in the
     clustering of the data space as well as the metric-topological       section 5.
     relations of the data items are clearly visible[17,19].
         In the following we introduce the probabilistic Self-               II.   PROBABILISTIC SELF ORGANIZING MAP AND VECTOR
                                                                                                  QUANTIZATION
     Organizing Maps (PRSOM) using a probabilistic
     formalism[1,2]. This algorithm gives a maximum                       A. Probabilistic Self-Organizing Map
     approximation of the density distribution obtained by the               In this section, we will briefly introduce the formal PRSOM
     learning phase. Since the training stage is very important in        model. It allows not only the quantification of data space, but
     the    probabilistic     Self-Organizing      Maps      (PRSOM)      also it does local densities estimation.
     performance, the selection of the architecture of PRobabilistic


                                                                                                                                         8
       As the standard Self-Organizing Maps (SOM) [17,19,18],                               p( x / ci1 )  f c1 ( x, wc1 , c1 ) Where f c1 is the i th Gaussian
                                                                                                                  i               i          i                            i
PRSOM consists of a discrete set C of formal neurons, which
associates to each neuron c  (C ) a spherical Gaussian density
                                                                                     density           with       mean vector                                 wc1 and covariance                                          matrix
                                                                                                                                                                 i


function f c [5], which is defined by its mean (referent vector)                     c1   c1 2  .
                                                                                        i          i

wc ∈ ℝn and its covariance matrix. Thus we denote by
W  {wc ; c  C} and   { c ; c  C} the two sets of parameters                           Then
defining the PRSOM model [1].                                                                                           K    KT (d (c 2j , ci1 ))
       In this probabilistic formalism presented in Figure 1, the                                           pc2 ( x)   K                        f c1 ( x, wc1 , c1 )
                                                                                                                             KT (d (c j , ck ))
                                                                                                              j                                      i        i     i
                                                                                 1                                     i 1               2     1
classical map C is duplicated into two similar maps C and
    2                                                                                                                                 k 1
 C provided with the same topology as C. It is assumed that
the model satisfies the Markov chain hypothesis [7], thus for                                               K         K    KT (d (c 2j , ci1 ))
                                                                                                Or p( x)   p(c 2j ) K                           f c1 ( x, wc1 , c1 )
every input data x  D and every pair of neurons
                                                                                                                           T j k
                                                                                                                                                      i        i     i
                                                                                                           j 1      i 1               2     1
                                                                                                                            K  ( d ( c    , c   ))
 (ci1 , c 2j )  C1  C 2 :                                                                                                                          k 1
        p(c2j / x, ci1 )  p(c 2j / ci1 ) and p( x / ci1 , c 2j )  p( x / ci1 )
                                                                                         The curve of this likelihood is a very complicated shape,
                                                                                     which often has very numerous local maxima. Practically, it is
                                                                                     impossible to maximize directly this likelihood, even to reach
                                                                                     a local maximum [5].
                                                                                         The following algorithm ensures the convergence into a
                                                                                     local maximum of data probability.
                                                                                         PRSOM learning algorithm:
                                                                                         - Initialization : k=0
                                                                                            -     Initial parameters W 0 and  0 , and the maximum
                                                                                                  number of iterations T_max is chosen.
                                                                                            -     Let’s compute  0 ( x)  arg max
                                                                                                                                2
                                                                                                                                   pc2 ( x) j  1,..., K
                                                                                                                                                                     cj              j


                                                                                            -     Iterative step k
                                                                                                                  N                                                            f ci ( xl , wcki 1 , cki1 )
        Figure 1: Probabilistic Self Organizing Map (PRSOM)
                                                                                                                  x K ( d (c , 
                                                                                                                  l 1
                                                                                                                             l                   i
                                                                                                                                                      k 1
                                                                                                                                                             ( xl )))
                                                                                                                                                                                         p k 1 ( x ) ( xl )
                                                                                                       wcki                                                                                             l

                                                                                                                      N                                                       f ci ( xl , wcki 1 , cki1 )
    It is thus possible to compute the probability of any pattern                                                   K ( d (c , 
                                                                                                                      l 1
                                                                                                                                         i
                                                                                                                                                     k 1
                                                                                                                                                            ( xl )))
                                                                                                                                                                                     p k 1 ( x ) ( xl )
x                                                                                                                                                                                                l

                                                                                                                                                     (1)
                                         K
                             p( x)   p(c 2j ) pc2 ( x)                                                N                                                                                    f ci ( xl , wcki 1 , cki1 )
                                                                                                        || wcki 1  xl ||2 K (d (ci ,  k 1 ( xl )))
                                                            j
                                         j 1

                                                                                                       l 1                                                                                              p k 1 ( x ) ( xl )
   Where K is the number of neurons for the two maps C 1                              ( cki ) 2                                                                                                k 1
                                                                                                                                                                                                                   l

                                                                                                                       N                                                      f ci ( xl , w                  , cki1 )
and C 2                                                                                                          n K (d (ci ,  k 1 ( xl )))
                                                                                                                                                                                                 ci

                                                    K                                                                 l 1                                                           p k 1 ( x ) ( xl )
                 pc2 ( x)  p( x / c )   p(c / c ) p( x / c )
                                                                                                                                                                                                     l
                                     2                  1       2   1
                   j
                                     j                  i       j   i                                                                                (2)
                                                i 1
                                                                                            With              ci  1,..., K
   The probability density pc2 ( x) is a mixture of densities
                                                                                                                                  k ( x)  arg max pc ( x)
                                       j
completely defined from the map given the conditional                                                                                                                            2
                                                                                                                                                                c2j              j
probability p(ci1 1/ c 2j ) on the map and the conditional
                                                                                                                                                     (3)
probability p( x / ci ) on the data. In the following we deal with
Gaussian densities and assume that:                                                         While (k>T_max)
                                         KT (d (c 2j , ci1 ))                            The expression (1) is used to update the neurons weights
                      p(ci1 / c 2j )  K                                             (referents).
                                       KT (d (c 2j , ci1 ))
                                             i 1


                                                                                                                                                                                                                                9
   The expression (2) is used to update the neurons standard               To overcome this problem we propose in this paper a new
deviations.
   The expression (3) is used to partition the data space.
B. Vector quantization
   Vector quantization (VQ) is defined as follows: given a
set of feature vectors  , find a partitioning of the feature
vector space into the predefined K number of regions
              K
which             i with i           j   . Every vector inside
             i 1                i j

such region is represented by the corresponding centroid.
These regions are called clusters and a set of centroids, which
represents the whole vector space, is called a codebook[7].
    In addition, vector quantization is considered as a data
compression technique in the speech coding [9] [11]. Vector
quantization has also been increasingly applied to reduce
complexity problem like pattern recognition.                 The
quantization method using the Artificial Neural Network,
particularly in Probabilistic Self Organizing Maps, is more
suitable in this case than the statistical distribution of the
original data that changes with time, since it supports the
adaptive data learning [11]. Also, the neural network has a                  Figure 2: illustration of the three classes neurons of (PRSOM)
huge parallel structure and the possibility for high speed
processing.                                                             mathematical model of PRSOM that controls the size of the
    But the main problems encountered in the probabilistic              map. In this section, we will describe the construction
SOM formalism are:                                                      steps of our model. The first one consists in integrating
     - The risk to find degenerated solutions that present at           the special term which controls the size of the map. The
          least one neuron non adjusted to any input. But the           second step gives the constraints which ensure the
          likelihood of such Gaussian cannot be infinite, i.e. we       allocation of each data to only one neuron (component).
          get closer from a peak of Dirac.
                                                                        B. Modeling of PRSOM architecture optimization
    -     The problem of the network architecture choice, i.e.
          the number of neurons in the map and the                        We propose a new modeling of neural architecture
          initialization parameters.                                    optimization problem of probabilistic self-organizing maps as
                                                                        an optimization problem in terms of a mixed-integer nonlinear
   III.   PROPOSED MODEL TO OPTIMIZE THE PROBABILISTIC                  problem with linear constraints. To formulate this model we
           SELF-ORGANIZING ARCHITECTURE MAPS
                                                                        need to define some parameters as follows:
A. Problem description                                                  Parameters
    Generally, if the size of the probabilistic self-organizing          n : number of data set observation,
map is chosen randomly, the PRSOM learning algorithm gives               N : Optimal number of neurons (components) in the
three classes of neurons as showing in Figure 2. The first class            topology map of PRSOM,
(red neurons) doesn’t represent any observation (empty class),           Nmax : Maximal number of neurons in the topology map
the second class (green) represents the neurons that contain                of PRSOM.
few information data and the third class represents the
important information data (blue).                                      Variables
                                                                         X = (xij )1≤i≤n : Matrix of Training base elements;
    In the above remark, we noticed that there exists a strong
                                                                         U = (uij )1≤j≤p 1≤i≤n :matrix of the binary variables
relation between the two problems mentioned in the previous              W = (wij1≤j≤N )1≤i≤Nmax Matrix of referent vectors
                                                                                               max
section. In other words, we cannot distinguish between the               σ = i 1≤i≤Nmax matrix of covariance
                                                                                 (σ   )    1≤j≤p
two cases. When we take a random PRSOM architecture                        A general formulation for the (MINLP) is given by (𝑃𝑀𝑎𝑥 )
choice (the number of neurons or components), we could have             then (𝑃𝑀𝑖𝑛 ).
degenerated solutions, called also singular solutions. More, the
neurons (components) of the first class have a negative effect
because they make the learning process heavier.


                                                                                                                                          10
                                            n N max N max           telecommunication,            routing, scheduling, and it proves its
                   Max p(U , W ,  )   ( j *(  K ( ( j, k )) efficiency
                                                                     k (x i , wk ,  kto
                                                           T                              uij
                                                                                       ))) obtain
                                                                                              (1) good solutions [24].
                                          i 1 j 1 k 1
                    Subject to :                                       Each solution represents an individual who is coded in
                                                                    one or several chromosomes. These chromosomes represent
                    N max                                          the problem’s variables.
        ( PMax )    uij  1;...;1  i  n (2)
                     j 1                                              First, an initial population composed by a fixed number
                    U  {0,1}n N max                               of individuals is generated, then operators of reproduction
                                                                    are applied to a number of individuals selected according
                    W  N max  p                                   to their fitness. This procedure is repeated until the
                    
                                                                 maximum number of iterations is attained.
                            N max


                                                                        The relevant steps of GA are:
  The mathematical problem Pmax is equivalent to the                 Step 1: Coding individuals
problem P’max
                                                                     Step 2: Randomly generate an initial population.
                                        n N max              N max
                                                                                Step 3: Evaluate the fitness of each individual in the current
             Max ln( p (U ,W ,  ))    uij [ln( j )  ln(  K ( ( j , k ))population.
                                                                                 k (xi , wk , k ))] (1)
                                                                    T

                                      i 1 j 1               k 1
             Subject to :                                                      Step 4: Execute genetic operators including selection,
            N                                                                  crossover and mutation.
             max
 ( PMax )    uij  1;...;1  i  n (2)
    '
                                                                                Step 5: Generate the next population using genetic operators.
              j 1
             U  {0,1}n N max                                                 Step 6: Return to Step 2 until the maximum of the fitness
                                                                               function is obtained.
             W  N max  p
                                                                                    2) Solving the optimized model
               max
                    N
                                                                                     A specially designed genetic algorithm is applied to solve
                                                                                the optimization problem of the Architecture optimization
    The research for a maximum can always be transformed to                     model of the probabilistic self-organizing maps described in
the research of a minimum.                                                      Section 3.2.
                                                                                            Encoding
                                        n N max                   N max

             Min  E (U , W ,  )  [      ij u [ln( j )  ln(      K T
                                                                             ( ( j , k )) k (x iIn   ,  k ))]]
                                                                                                  , wkour    model, we have encoded an individual by three
                                      i 1 j 1                   k 1
                                                                                            chromosomes see Figure 3 , the first one (a) represent the
            Subject to :                                                                   matrix of decision variables U, the second one (b)
           N                                                                               represents the matrix of weights W and the last one (c)
           
( PMin )    uij  1;...;1  i  n                                                        represents the vector of variances  .
               max


             j 1
            U  {0,1}n N max
            
            W  Nmax  p                                                                                 0.2       0.8      …       …        0.5
                                                                                                                   …        …       …
              max
                    N                                                                                     0.1                                 0.6
                                                                                                           …        …        …       …         …
                                                                                          0.9          …           …            …          0.3
                                                                                                                      (a)
  In the following section, we study the resolution of the last                           1        0       0        …       0         0      0
mathematical program.                                                                     0        0       1        0       …         0      0
                                                                                          0        0       1        0       …         0      0
C. Resolution of the obtained nonlinear model                                             …        …       …        …       …         …      …
                                                                                          0        0       …        …       …         0      1
  We use the Genetic Algorithm approach to solve this                                     1        0       0        …       0         0      0
mathematical model.                                                                       0        1       0        0       …         0      0
  1) Genetic algorithm                                                                                                (b)
                                                                                        1200     780     395      …         …       702   2000
  Genetic Algorithm belongs to a class of stochastic
methods called “evolutionary algorithms”. Introduced by J.                                                            (c)
HOLLAND [16], they are efficient and robust adaptive search
techniques based on the idea of natural evolution (Darwin
theory). This algorithm has been applied in a large number                      Figure 3: Genetic representation of an individual Initial Population
of optimization problems in several               domains:


                                                                                                                                                         11
   An initial population is built such that each individual must                      Linear constraints associated with this problem are defined
at least be possible solution, i.e., every component (U ,W ,  )
                                                                                    by the following statement:

                                                                                      Each element xi ; i  1,..., n is affected to a single neuron j.
in the initial population must be feasible solution. The initial
population could be randomly generated, but there exist other
ways to generate the initial population like applying other                         These constraints are given by:
heuristics. In our case, we do not use the random initialization
of the variable U. When we set the variables W and Sigma in                                            N max
 ( PMin ) , we find a linear model of binary variables under linear                                    
                                                                                                       j 1
                                                                                                            uij  1;...;1  i  n  AX  b
constraints. Thus, the initialization of the variable U is                                                           n Nmax
                                                                                      The matrix A {0,1}
                                                                                                                                                           n
obtained by the resolution of the model ( PU ) , with W and                                                                      and the vector b             are
Sigma randomly initialized.                                                         defined by:
                                                                                                  1             1     0       0          0        0
  The obtained model ( PU ) is defined by:                                                                                                         
                                                                                                    0            0     1                  1   0    0
                                                                                               A
                                    n N max
                                                                                                                                                   
                                                          N max

           Min    E (U  )              u ij ln[   j  K ( ( j , k )) k (x i , wk ,  k )](1)
                                                                   T
                                                                                                                                                   
                                 i 1 j 1               k 1
                                                                                                  0                     0         0            0 11
          Subject to :
( PU )   N                                                                                                                            1
                                                                                                                                        
           u  1;...;1  i  n (2)                                                                                                b 
            
             max


           j 1 ij                                                                                                                     1
                                                                                                                                       
                         n N max
          U  {0,1}                                                                                Finally we obtain a linear program with variables 0-1, and
                                                                                                 with linear constraints.
   The matrix U can be transformed into a vector X of size m,
                                                                                                                                Min E (X)  C , X 
with m=n*Nmax                                                                                                                   Subject to :
                                                                                                                               
X   u1,1 u1,2                                                                        unk                           ( PU )  
                                                                                                                               AX  b
                               u1, k             ui,1           ui, k    un1
                                                                                                                                X  {0,1}nNmax
                                                                                                                               
   Afterwards we can define the objective function as follows:                                   Evaluating individuals
                                                                                                     In this step, to each individual is assigned a numerical value
                                      E (X)  C t X                                              called fitness which corresponds to its performance; it
   With:                                                                                         depends essentially on the value of objective function
                                                                                                 corresponding to this individual. An individual who has a
                         N
                                                                         
                           
                           max

                 ln[          K  T
                                      (  (1, k  ))   (x   , w  ,   )]                        great fitness is the most adapted to the problem.
                       1                      k   1    k   k
                       k 1                                                          The fitness suggested in our work is the following function:
                      N max                                         
            ln[ 2  K ( (2, k )) k (x1 , wk ,  k )]
                                T
                                                                     
                                                                                                                   1
      
                        k 1
                                                                                                                     fi 
                                                                                                                 Ei  1
                     N max
                                                                                    Minimize the value of the objective function is equivalent to
        ln[ N max  K ( ( N max , k )) k (x1 , wk ,  k )] 
                               T

                      k 1                                                        maximizing the value of the fitness function.
                                                                    
                                                                    
                                                                    
                        N max
                                                                                    Selection
   C 
              ln[  1       K  T
                                   ( (1, k )) k (x i , wk ,  k )] 
                       k 1
                                                                                    The application of the fitness criterion is intended to select
                                                                                  which individuals from a population will go on to reproduce.
                     N max                                          
        ln[                                                                     Where:
                N max  K ( ( N max , k )) k (x i , wk ,  k )]
                               T
                      k 1
                                                                     
                                                                                                                         f
                                                                                                                   Pi  n i
                                                                    
                                                                                                                               f
                       N max
            ln[ 1  K T ( (1, k )) k (x n , wk ,  k )]                                                                         j
                       k 1                                                                                                  j 1
                                                                    
                                                                    
                     N                                                            Crossover
        ln[ N max  K ( ( N max , k )) k (x n , wk ,  k )] 
                        max
                               T

                     k 1                                           
                                                                                      The crossover is a very important phase in the genetic
                                                                                    algorithm. In this step, new individuals called children are


                                                                                                                                                                 12
created by individuals selected from the population called          Input:
parents. Children are constructed as follows see Figure 4 :              n, p, X, N iter , N max ;
  We fix three points of crossover, the parents are cut from
these points, the first part of parent 1 and the second of parent
                                                                         [Tmin , Tmax ] the interval of the parameter T;
2 goes to child 1 and the rest goes to child 2.
                                                                    Output:
  In the crossover that we adopted, we choose 4 different                     Optimal probabilistic topological map
crossover points: the first one corresponds to the matrix of
weights, the second one is for vector U and the last one              Initialization:
corresponds to the vector of variances  .
                                                                          w1 (0),..., wNmax (0) Randomly initialized
Mutation
                                                                           1 (0),..., N (0)
                                                                                          max
                                                                                                      Randomly initialized with the
   The rule of mutation is to keep the diversity of solutions in    great values
order to avoid local optimums. It corresponds to changing the
                                                                          U initialized via resolution of the model ( PU ) .
values of one (or several) value (s) of the individuals who are
(s) chosen randomly.                                                         T  Tmax t  0

                                                                    Step 1:
   IV.    PROPOSED MODEL TO OPTIMIZE THE PROBABILISTIC
                                                                         Construction the model of PRSOM ( PMin )
           SELF-ORGANIZING ARCHITECTURE MAPS
                                                                    Step 2:
  This algorithm is probabilistic self-organizing based on
                                                                         - Solving the model of PRSOM via Genetic algorithm.
solving the optimization problem
                                        (P )
                                         Min    that gives in            - Outcome : the optimal number of neurons N used.
output: weights initialization (vectors referents), covariance           - Initial weights matrix Initial variances vector.
matrix and the optimal neurons number. This is summarized in        Step 3:
the following scheme Fig 5:                                              - Optimized model outputs, constructed in the
                                                                    initialization phase of OPRSOM.
                                                                         - Training phase of OPRSOM.
                                   Optimal probabilistic                 - Assignment-decision phase (Equation 3).
     Training                        Kohonen Model                       - Minimization phase (Equation 1 and Equation 2).
        set                                                         Return
                                                                           Optimal parameters of OPRSOM.


                                                                        V.     PROPOSED MODEL TO OPTIMIZE THE PROBABILISTIC
       Optimal                     Solving via genetic                          SELF-ORGANIZING ARCHITECTURE MAPS
      Codebook                         algorithm
                                                                    A. Data set Description
                                                                      The experiments were performed using the Arabic digit
                                                                    corpus collected by the laboratory of automatic and signals,
                                                                    University of Badji-Mokhtar - Annaba, Algeria. A number of
                                                                    88 individuals (44 males and 44 females), Arabic native
                                Optimal neurons number              speakers were asked to utter all digits ten times [27].
                                + initialization of weights
                                                                    Depending on this, the database consists of 8800 tokens (10
                                    matrix and vectors
                                                                    digits x 10 repetitions x 88 speakers). In this experiment, the
                                     variances in the
                                 Probabilistic Kohonen              data set is divided into two parts: a training set with 75% of
                                     topological map                the samples and test set with 25% of the samples

                                                                                            Table 1. Arabic Digits

                                                                                        Arabic       English   Symbol
                                                                                        ‫صفر‬           ZERO       ‘0’
                                                                                        ‫واحد‬           ONE       ‘1’
                Figure 4: Training Model OPRSOM                                         ‫اثنان‬         TWO        ‘2’
                                                                                        ‫ثالثة‬        THREE       ‘3’
  To more understand the previous scheme, we explain it                                 ‫أربعه‬         FOUR       ‘4’
using the following iterative algorithm                                                 ‫خمسه‬          FIVE       ‘5’
                                                                                         ‫ستة‬           SIX       ‘6’
                                                                                        ‫سبعه‬         SEVEN       ‘7’


                                                                                                                                 13
                   ‫ثمانية‬     EIGHT           ‘8’                 PSNR and the MSE calculated by classical approach
                   ‫تسعه‬       NINE            ‘9’                 (PRSOM) a map of T=20 (Because for the other choices of
                                                                  map we find degenerated solutions) neurons and the MSE
                                                                  calculated by a new size of map (mean of N for each digit) for
  Table 1 shows the Arabic digits, the first column presents      example N=5 for WAHID (1), N=3 for ARBAA (4) neurons
the digits in Arabic language, the second column presents the     which determined by the proposed approach.
digits in English language and the last column shows the
symbol of each digit.                                                Table 3. MSE and PSNR obtained for Arabic digit by PRSOM and
                                                                                              OPRSOM
B. Experiments and discussion
                                                                  ARABIC            PSNR            PSNR           MSE         MSE
  In this section, we extensively study the performance of the    DIGITS          OPRSOM           PRSOM        OPRSOM       PRSOM
proposed approach of speech compression using OPRSOM
algorithm, Arabic digits set is considered.                            0           17.95            20.36         0.95        0.85
                                                                       1           17.65            17.80         0.98        0.95
  The evaluation of the proposed approach in speech data               2           14.91            15.36         1.64        1.50
compression was performed using the following measure,                 3           16.60            17.18         1.25        1.10
                                                                       4           15.55            16.22         1.45        1.25
  Peak Signal-to-Noise Ratio (PSNR) is given by:                       5           19.77            19.72         0.73        0.74
                                                                       6           18.44            18.69         1.26        1.20
                                           nX 2
                   PSNR = 10 log10 (            )                      7           20.00            21.00         1.00        0.79
                                           MSE                         8           16.09            15.68         1.20        1.31
                                                                       9           18.80            18.80         1.09        1.09
  Where n is the length of the reconstructed signal, X is the
maximum absolute square value of the signal x, and Mean
                                                                    Figure 6 and Figure 7 show the MSE and the PSNR
Squared Error (MSE) is defined as follows:
                                                                  comparison of digits Arabic between both approaches PRSOM
                                n
                       1                                          and OPRSOM. We can see that the MSE and PSNR very close
                  MSE = ∑(x̂(i) − x(i))2                          between both approaches. But proposed method can reduce
                       n                                          the training time and the number of neurons, from the 20 to 5
                               i=1
                                                                  neurons, rate of reduction is about 75%.
  Where x̂ is the original speech signal, and x is the
reconstructed speech signal.
  To choice of optimal neural network (N), we tried five
different sizes of topological maps (Nmax). In each map, we
compute the optimal size by our model (P). Numerical results
obtained on dataset of Arabic digits are presented in the
Table2. We note that the optimal size is between 3 and 7
neurons whatever the initial size is. For example, for a map of
50 neurons on digits 1,2,3,7 the optimal size is 5 neurons.

              Table 2. Optimal results of topological map


                   N max            20          30          50
                                                                     Figure 5: Comparison between both approaches for the MSE
  SYFR 0          N                 7            6          7
  WAHID 1         N                 5            5          5
  ITNAN 2         N                 5            5          5
THALATA 3         N                 5            5          5
 ARBAA 4          N                 3            4          3
 KHAMSA 5         N                 7            7          6
   SITA 6         N                 6            6          6
  SABAA 7         N                 5            5          5
THAMANIA
                  N                 5            5          5
   8
  TISAA 9         N                 7            6          7

                                                                    Figure 6: Comparison between both approaches for the PSNR
  The compression numerical results using optimal size are
presented in Table 3. This table list all Arabic digits, The


                                                                                                                                     14
  Recall that the proposed method contains an additional                    Science Issues (IJCSI), Volume 9, Issue 2, No 1, pp. 197-205,
phase; this phase consists on solving the proposed model in                 2012.
order to remove the unnecessary neurons from the initial map.           10. Ettaouil, M. ; Lazaar, M. ; En-Naimani, Z. "A hybrid
For example, for a map with 50 neurons we get a map of 5, the               ANN/HMM models for arabic speech recognition using optimal
proposed approach can thus remove about 90% neurons from                    codebook”, Intelligent Systems: Theories and Applications
initial map to construct the optimal PRSOM.                                 (SITA), 2013 8th International Conference on Mai 5-6.
                                                                        11. M. Ettaouil, M. Lazaar, K. Elmoutaouakil, K. Haddouch, A New
                                                                            Algorithm for Optimization of the Kohonen Network
   VI.   PROPOSED MODEL TO OPTIMIZE THE PROBABILISTIC                       Architectures Using the Continuous Hopfield Networks, WSEAS
          SELF-ORGANIZING ARCHITECTURE MAPS                                 TRANSACTIONS on COMPUTERS,Issue 4, Volume 12, April
                                                                            2013.
    In this paper, we have presented an approach to determine           12. R. Fletcher and S. Leyffer. "Solving Mixed Integer Programs by
the optimal codebook and covariance matrix by the Optimal                   Outer Approximation", Math. Program. 66, 1994, 327–349.
Probabilistic Self Organizing Maps (OPSOM). As a first step             13. Gascuel O. Canu S. Thiria .S and Lechevallier Y. Statistique et
we construct a mathematical model, after we solve via genetic               méthodes neuronales. 1997.
algorithm, therefore we obtain the optimal number used in the           14. D.E Goldberg, “Genetic Algorithms in Search, Optimization,
card and the best initialization parameters of the network.                 and Machine Learning”. Addison-Wesley, 1989.
    This approach has been compared to speech compression               15. O.K. Gupta and A. Ravindran. "Branch and Bound Experiments
problem using a datasets of Arabic digit. The obtained results              in Convex Nonlinear Integer Programming", Manage Sci., 31
demonstrate the performance of our proposed method.                         (12) , 1985, pp. 1533–1546.
     In the future works, we will use exact approaches or               16. Holland J. ”Adaptation in natural and artificial systems”. Ann
others heuristics methods to resolve this problem and                       Arbor,MI: University of Michigan Press, 1992
determine the optimal solution for the optimization of neural           17. T. Kohonen, S. Kaski, K. Lagus, J. Salojr , J. Honkela, V.
networks architectures. The proposed method can be applied                  Paatero, A. Saarela. "Self organization of a massive document
to solve the pattern recognition problems, speech recognition               collection". IEEE transaction on neural networks, 11, No. 3,
problems and image compression problems.                                    2000.
                                                                        18. T. Kohonen. "Self Organizing Maps". Springer, 3th edition,
                          REFERENCES                                        2001.
                                                                        19. T. Kohonen. "The Self Organizing Maps". Proceedingsof IEEE,
 1. F. ANOUAR, F.BADRAN and S.THIRIA, Self Organized Map, A
                                                                            78, No. 9, 1990, pp. 1464-1480.
    Probabilistic Approach proceedings of the Workshop on Self-
                                                                        20. S.P Luttrel. A bayesian analysis of self- organizing maps. Neural
    Organized Maps, Helsinki University of Technology, Espoo,
                                                                            Computing 6: 767-794, 1994.
    Finland, June 4-6,1997.
                                                                        21. S. Manuel, M. L. José, M. B. Victor, M.R. José, GENES, “a
 2. F. Anouar. Modélisation probabiliste des auto-organisées :
                                                                            Genetic Algorithms and Fast Time Simulation’’, 3nd ATM
    Application en classification et en régression. Thèse de doctorat
                                                                            R&D Symposium, Spain, 2002.
    soutenue au conservatoires national des arts et métiers. 1996.
                                                                        22. I. Quesada and I.E. Grossmann. "An LP/NLP Based Branch and
 3. M. Ayache, M. Khalil and F. Tranquart. "Artificial Neural
                                                                            Bound Algorithm for Convex MINLP Optimization Problems",
    Network for Transfer Function Placental Development: DCT
                                                                            Computers Chem. Eng., 16 (10/11), 1992, pp. 937–947.
    and DWT Approach". IJCSI International Journal of Computer
                                                                        23. N. Rogovschi. Classification à la base de modèles de mélanges
    Science Issues, Vol. 8, Issue 5, No 3, September 2011.
                                                                            topologiques des données catégorielles et continues. Thèse de
 4. C.M. Bishop. Pattern Recognition and Machine Learning,
                                                                            doctorat soutenue à l’université Paris 13- Insitut Galilée. 2009.
    Springer,2006
                                                                        24. E. Taillard, J. Dréo, A.Pétrowski, P. Siarry. Métaheuristiques
 5. P. Bruneau, Marc Gelgon and F. Picarougne, Parameter-based
                                                                            pour l’optimisation difficile, Eyrolles,2003.
    reduction of Gaussian mixture models with a variational-Bayes
                                                                        25. D. Wang. "Fast Constructive-Coverting Algorithm forneural
    approach, IEEE, 2008.
                                                                            networks and its implement in classification".Applied Soft
 6. M. Duran, I. E. Grossmann. "An outer-approximation algorithm
                                                                            Computing, 8, 2008, pp. 166-173.
    for a class of mixed-integer non linear programs".
                                                                        26. M. YACOUB and al, Clustering and classification based on
    Mathematical Programming, 1986, pp. 307-339.
                                                                            expert knowledge propagation using Probabilistic Self
 7. Z. En-Naimani, M. Lazaar, M. Ettaouil ‘Hybrid System of
                                                                            Organizing Map: application to geophysics, Data
    Optimal Self Organizing Maps and Hidden Markov Model for
                                                                            AnalysisStudies in Classification, Data Analysis, and Knowledge
    Arabic Digits Recognition”. Journal of WSEAS Transactions on
                                                                            Organization 2000, pp 67-78 .
    Systems, Volume 13, pp. 606-616, 2014.
                                                                        27. http://archive.ics.uci.edu/ml/datasets/Spoken+Arabic+Digit
 8. Z. En-Naimani, M. Lazaar,            M. Ettaouil, “Architecture
    optimization model for the probabilistic self-organizing maps
    and classification”, 9th International Conference on Intelligent
    Systems: Theories and Applications, Rabat (Morocco), May 07-
    08, 2014.
 9. M. Ettaouil, M. Lazaar, "Improved Self-Organizing Maps and
    Speech Compression", International Journal of Computer


                                                                                                                                            15