Proceedings of the International Conference on Big Data Cloud and Applications Tetuan, Morocco, May 25 - 26, 2015 Architecture optimization model for the probabilistic self-organizing maps EN-NAIMANI Zakariae Mohamed LAZAAR Mohamed ETTAOUIL Modeling and Scientific Computing National School of Applied Sciences Modeling and Scientific Computing Laboratory, Faculty of Science and Abdelmalek Essaadi University Laboratory, Faculty of Science and Technology, Fez, MOROCCO Tetouan, MOROCCO Technology, Fez, MOROCCO z.ennaimani@gmail.com lazaarmd@gmail.com mohamedettaouil@yahoo.fr Abstract— The PRobabilistic Self-Organizing Maps SOM, associated with a given problem, is one of the most (PRSOM) become more and more interesting in many fields such important research problems in the neural network research. as: pattern recognition, clustering, classification, speech recognition, data compression, medical diagnosis, etc. The More precisely, the choice of components (neurons) number, PRSOM give an estimation of the density probability function of the initial weights and covariances matrix has a great impact the data, which depends on the parameters of the PRSOM, such on the convergence of learning methods. The optimization of as the architecture of the network. When we take a random the artificial neural networks architectures, particularly PRSOM architecture choice (the number of neurons or components), we could have degenerated solutions, called also PRSOM networks, is a recent problem. The first techniques singular solutions. Associated with a given problem, it is one of consist in building the map in an evolutionary way: allowing, the most important research problems in the neural network adding neurons and deleting some others. Some methods that research. In the present paper we describe a recent approach of have been proposed in the literature can be broadly classified probabilistic self-organizing maps (PRSOM) trying to propose a solution to this problem. We propose a speech compression into two categories: the first fixes a priori the size of the map technique based on vector quantization. The main innovation is in an evolutionary way [24]; the second category allows the the use of an optimal probabilistic self-organizing map to data themselves to choose the dimension of the map. Recently, determine the optimal codebook, unlike in classical PRSOM. Also, we give an implementation and an evaluation of the another method is introduced to determine the network proposed method; the numerical results are powerful and show parameters, in the supervised learning and in the Kohonen the practical interest of our approach. networks [8,9,10]. The mean purpose of this work is to model this choice problem of neural architecture in terms of a mixed- Keywords— Neural Network ; self-organization; classification; unsupervized learning; compression. integer nonlinear problem with linear constraints. Because of its effectiveness in solving the optimization problems, the I. INTRODUCTION genetic algorithm approach is used to solve this nonlinear Artificial Neural Network (ANN) often called Neural problem. It should be noted that a good local optimum of the Network (NN) is a computational model or mathematical obtained model permits to improve the performance of the model based on biological neural networks. PRSOM learning algorithm. Teuvo Kohonen has introduced the very interesting This paper is organized as follows: The section 2 presents concept of self-organizing topological feature maps [18], The the formalism of probabilistic self-organizing maps and vector central property of this formalism is that it forms a nonlinear quantization. In section 3 we introduce the model to optimize projection of a high-dimensional data manifold on a regular, the probabilistic Self Organizing architecture Maps. And low-dimensional (usually 2D) grid. In the display, the before concluding, experimental results are given in the clustering of the data space as well as the metric-topological section 5. relations of the data items are clearly visible[17,19]. In the following we introduce the probabilistic Self- II. PROBABILISTIC SELF ORGANIZING MAP AND VECTOR QUANTIZATION Organizing Maps (PRSOM) using a probabilistic formalism[1,2]. This algorithm gives a maximum A. Probabilistic Self-Organizing Map approximation of the density distribution obtained by the In this section, we will briefly introduce the formal PRSOM learning phase. Since the training stage is very important in model. It allows not only the quantification of data space, but the probabilistic Self-Organizing Maps (PRSOM) also it does local densities estimation. performance, the selection of the architecture of PRobabilistic 8 As the standard Self-Organizing Maps (SOM) [17,19,18], p( x / ci1 )  f c1 ( x, wc1 , c1 ) Where f c1 is the i th Gaussian i i i i PRSOM consists of a discrete set C of formal neurons, which associates to each neuron c  (C ) a spherical Gaussian density density with mean vector wc1 and covariance matrix i function f c [5], which is defined by its mean (referent vector) c1   c1 2  . i i wc ∈ ℝn and its covariance matrix. Thus we denote by W  {wc ; c  C} and   { c ; c  C} the two sets of parameters Then defining the PRSOM model [1]. K KT (d (c 2j , ci1 )) In this probabilistic formalism presented in Figure 1, the pc2 ( x)   K f c1 ( x, wc1 , c1 )  KT (d (c j , ck )) j i i i 1 i 1 2 1 classical map C is duplicated into two similar maps C and 2 k 1 C provided with the same topology as C. It is assumed that the model satisfies the Markov chain hypothesis [7], thus for K K KT (d (c 2j , ci1 )) Or p( x)   p(c 2j ) K f c1 ( x, wc1 , c1 ) every input data x  D and every pair of neurons  T j k i i i j 1 i 1 2 1 K ( d ( c , c )) (ci1 , c 2j )  C1  C 2 : k 1 p(c2j / x, ci1 )  p(c 2j / ci1 ) and p( x / ci1 , c 2j )  p( x / ci1 ) The curve of this likelihood is a very complicated shape, which often has very numerous local maxima. Practically, it is impossible to maximize directly this likelihood, even to reach a local maximum [5]. The following algorithm ensures the convergence into a local maximum of data probability. PRSOM learning algorithm: - Initialization : k=0 - Initial parameters W 0 and  0 , and the maximum number of iterations T_max is chosen. - Let’s compute  0 ( x)  arg max 2 pc2 ( x) j  1,..., K cj j - Iterative step k N f ci ( xl , wcki 1 , cki1 ) Figure 1: Probabilistic Self Organizing Map (PRSOM)  x K ( d (c ,  l 1 l i k 1 ( xl ))) p k 1 ( x ) ( xl ) wcki  l N f ci ( xl , wcki 1 , cki1 ) It is thus possible to compute the probability of any pattern  K ( d (c ,  l 1 i k 1 ( xl ))) p k 1 ( x ) ( xl ) x l (1) K p( x)   p(c 2j ) pc2 ( x) N f ci ( xl , wcki 1 , cki1 )  || wcki 1  xl ||2 K (d (ci ,  k 1 ( xl ))) j j 1 l 1 p k 1 ( x ) ( xl ) Where K is the number of neurons for the two maps C 1 ( cki ) 2  k 1 l N f ci ( xl , w , cki1 ) and C 2 n K (d (ci ,  k 1 ( xl ))) ci K l 1 p k 1 ( x ) ( xl ) pc2 ( x)  p( x / c )   p(c / c ) p( x / c ) l 2 1 2 1 j j i j i (2) i 1 With ci  1,..., K The probability density pc2 ( x) is a mixture of densities  k ( x)  arg max pc ( x) j completely defined from the map given the conditional 2 c2j j probability p(ci1 1/ c 2j ) on the map and the conditional (3) probability p( x / ci ) on the data. In the following we deal with Gaussian densities and assume that: While (k>T_max) KT (d (c 2j , ci1 )) The expression (1) is used to update the neurons weights p(ci1 / c 2j )  K (referents).  KT (d (c 2j , ci1 )) i 1 9 The expression (2) is used to update the neurons standard To overcome this problem we propose in this paper a new deviations. The expression (3) is used to partition the data space. B. Vector quantization Vector quantization (VQ) is defined as follows: given a set of feature vectors  , find a partitioning of the feature vector space into the predefined K number of regions K which   i with i  j   . Every vector inside i 1 i j such region is represented by the corresponding centroid. These regions are called clusters and a set of centroids, which represents the whole vector space, is called a codebook[7]. In addition, vector quantization is considered as a data compression technique in the speech coding [9] [11]. Vector quantization has also been increasingly applied to reduce complexity problem like pattern recognition. The quantization method using the Artificial Neural Network, particularly in Probabilistic Self Organizing Maps, is more suitable in this case than the statistical distribution of the original data that changes with time, since it supports the adaptive data learning [11]. Also, the neural network has a Figure 2: illustration of the three classes neurons of (PRSOM) huge parallel structure and the possibility for high speed processing. mathematical model of PRSOM that controls the size of the But the main problems encountered in the probabilistic map. In this section, we will describe the construction SOM formalism are: steps of our model. The first one consists in integrating - The risk to find degenerated solutions that present at the special term which controls the size of the map. The least one neuron non adjusted to any input. But the second step gives the constraints which ensure the likelihood of such Gaussian cannot be infinite, i.e. we allocation of each data to only one neuron (component). get closer from a peak of Dirac. B. Modeling of PRSOM architecture optimization - The problem of the network architecture choice, i.e. the number of neurons in the map and the We propose a new modeling of neural architecture initialization parameters. optimization problem of probabilistic self-organizing maps as an optimization problem in terms of a mixed-integer nonlinear III. PROPOSED MODEL TO OPTIMIZE THE PROBABILISTIC problem with linear constraints. To formulate this model we SELF-ORGANIZING ARCHITECTURE MAPS need to define some parameters as follows: A. Problem description Parameters Generally, if the size of the probabilistic self-organizing  n : number of data set observation, map is chosen randomly, the PRSOM learning algorithm gives  N : Optimal number of neurons (components) in the three classes of neurons as showing in Figure 2. The first class topology map of PRSOM, (red neurons) doesn’t represent any observation (empty class),  Nmax : Maximal number of neurons in the topology map the second class (green) represents the neurons that contain of PRSOM. few information data and the third class represents the important information data (blue). Variables  X = (xij )1≤i≤n : Matrix of Training base elements; In the above remark, we noticed that there exists a strong  U = (uij )1≤j≤p 1≤i≤n :matrix of the binary variables relation between the two problems mentioned in the previous  W = (wij1≤j≤N )1≤i≤Nmax Matrix of referent vectors max section. In other words, we cannot distinguish between the  σ = i 1≤i≤Nmax matrix of covariance (σ ) 1≤j≤p two cases. When we take a random PRSOM architecture A general formulation for the (MINLP) is given by (𝑃𝑀𝑎𝑥 ) choice (the number of neurons or components), we could have then (𝑃𝑀𝑖𝑛 ). degenerated solutions, called also singular solutions. More, the neurons (components) of the first class have a negative effect because they make the learning process heavier. 10  n N max N max telecommunication, routing, scheduling, and it proves its Max p(U , W ,  )   ( j *(  K ( ( j, k )) efficiency k (x i , wk ,  kto T uij ))) obtain (1) good solutions [24].  i 1 j 1 k 1  Subject to : Each solution represents an individual who is coded in  one or several chromosomes. These chromosomes represent  N max the problem’s variables. ( PMax )    uij  1;...;1  i  n (2)  j 1 First, an initial population composed by a fixed number U  {0,1}n N max of individuals is generated, then operators of reproduction  are applied to a number of individuals selected according W  N max  p to their fitness. This procedure is repeated until the    maximum number of iterations is attained. N max The relevant steps of GA are: The mathematical problem Pmax is equivalent to the Step 1: Coding individuals problem P’max Step 2: Randomly generate an initial population.  n N max N max Step 3: Evaluate the fitness of each individual in the current  Max ln( p (U ,W ,  ))    uij [ln( j )  ln(  K ( ( j , k ))population.  k (xi , wk , k ))] (1) T  i 1 j 1 k 1  Subject to : Step 4: Execute genetic operators including selection, N crossover and mutation.  max ( PMax )    uij  1;...;1  i  n (2) ' Step 5: Generate the next population using genetic operators.  j 1 U  {0,1}n N max Step 6: Return to Step 2 until the maximum of the fitness  function is obtained. W  N max  p  2) Solving the optimized model   max N A specially designed genetic algorithm is applied to solve the optimization problem of the Architecture optimization The research for a maximum can always be transformed to model of the probabilistic self-organizing maps described in the research of a minimum. Section 3.2. Encoding  n N max N max  Min E (U , W ,  )  [   ij u [ln( j )  ln(  K T ( ( j , k )) k (x iIn ,  k ))]] , wkour model, we have encoded an individual by three  i 1 j 1 k 1 chromosomes see Figure 3 , the first one (a) represent the  Subject to : matrix of decision variables U, the second one (b) N represents the matrix of weights W and the last one (c)  ( PMin )    uij  1;...;1  i  n represents the vector of variances  . max  j 1 U  {0,1}n N max  W  Nmax  p 0.2 0.8 … … 0.5  … … …   max N 0.1 0.6 … … … … … 0.9 … … … 0.3 (a) In the following section, we study the resolution of the last 1 0 0 … 0 0 0 mathematical program. 0 0 1 0 … 0 0 0 0 1 0 … 0 0 C. Resolution of the obtained nonlinear model … … … … … … … 0 0 … … … 0 1 We use the Genetic Algorithm approach to solve this 1 0 0 … 0 0 0 mathematical model. 0 1 0 0 … 0 0 1) Genetic algorithm (b) 1200 780 395 … … 702 2000 Genetic Algorithm belongs to a class of stochastic methods called “evolutionary algorithms”. Introduced by J. (c) HOLLAND [16], they are efficient and robust adaptive search techniques based on the idea of natural evolution (Darwin theory). This algorithm has been applied in a large number Figure 3: Genetic representation of an individual Initial Population of optimization problems in several domains: 11 An initial population is built such that each individual must Linear constraints associated with this problem are defined at least be possible solution, i.e., every component (U ,W ,  ) by the following statement: Each element xi ; i  1,..., n is affected to a single neuron j. in the initial population must be feasible solution. The initial population could be randomly generated, but there exist other ways to generate the initial population like applying other These constraints are given by: heuristics. In our case, we do not use the random initialization of the variable U. When we set the variables W and Sigma in N max ( PMin ) , we find a linear model of binary variables under linear  j 1 uij  1;...;1  i  n  AX  b constraints. Thus, the initialization of the variable U is n Nmax The matrix A {0,1} n obtained by the resolution of the model ( PU ) , with W and and the vector b are Sigma randomly initialized. defined by: 1 1 0 0 0 0 The obtained model ( PU ) is defined by:   0 0 1 1 0 0  A n N max   N max  Min E (U )     u ij ln[  j  K ( ( j , k )) k (x i , wk ,  k )](1) T    i 1 j 1 k 1 0 0 0 0 11  Subject to : ( PU )   N  1    u  1;...;1  i  n (2) b   max  j 1 ij  1    n N max U  {0,1} Finally we obtain a linear program with variables 0-1, and with linear constraints. The matrix U can be transformed into a vector X of size m,  Min E (X)  C , X  with m=n*Nmax  Subject to :  X   u1,1 u1,2 unk  ( PU )   AX  b u1, k ui,1 ui, k un1  X  {0,1}nNmax  Afterwards we can define the objective function as follows: Evaluating individuals In this step, to each individual is assigned a numerical value E (X)  C t X called fitness which corresponds to its performance; it With: depends essentially on the value of objective function corresponding to this individual. An individual who has a  N   max   ln[  K T ( (1, k )) (x , w ,  )]  great fitness is the most adapted to the problem. 1 k 1 k k  k 1  The fitness suggested in our work is the following function:  N max    ln[ 2  K ( (2, k )) k (x1 , wk ,  k )] T    1  k 1  fi    Ei  1  N max  Minimize the value of the objective function is equivalent to   ln[ N max  K ( ( N max , k )) k (x1 , wk ,  k )]  T  k 1  maximizing the value of the fitness function.       N max Selection C   ln[  1  K T ( (1, k )) k (x i , wk ,  k )]   k 1  The application of the fitness criterion is intended to select   which individuals from a population will go on to reproduce.  N max    ln[  Where: N max  K ( ( N max , k )) k (x i , wk ,  k )] T  k 1    f   Pi  n i   f N max   ln[ 1  K T ( (1, k )) k (x n , wk ,  k )]  j  k 1  j 1      N  Crossover   ln[ N max  K ( ( N max , k )) k (x n , wk ,  k )]  max T  k 1  The crossover is a very important phase in the genetic algorithm. In this step, new individuals called children are 12 created by individuals selected from the population called Input: parents. Children are constructed as follows see Figure 4 : n, p, X, N iter , N max ; We fix three points of crossover, the parents are cut from these points, the first part of parent 1 and the second of parent [Tmin , Tmax ] the interval of the parameter T; 2 goes to child 1 and the rest goes to child 2. Output: In the crossover that we adopted, we choose 4 different Optimal probabilistic topological map crossover points: the first one corresponds to the matrix of weights, the second one is for vector U and the last one Initialization: corresponds to the vector of variances  . w1 (0),..., wNmax (0) Randomly initialized Mutation  1 (0),..., N (0) max Randomly initialized with the The rule of mutation is to keep the diversity of solutions in great values order to avoid local optimums. It corresponds to changing the U initialized via resolution of the model ( PU ) . values of one (or several) value (s) of the individuals who are (s) chosen randomly. T  Tmax t  0 Step 1: IV. PROPOSED MODEL TO OPTIMIZE THE PROBABILISTIC Construction the model of PRSOM ( PMin ) SELF-ORGANIZING ARCHITECTURE MAPS Step 2: This algorithm is probabilistic self-organizing based on - Solving the model of PRSOM via Genetic algorithm. solving the optimization problem (P ) Min that gives in - Outcome : the optimal number of neurons N used. output: weights initialization (vectors referents), covariance - Initial weights matrix Initial variances vector. matrix and the optimal neurons number. This is summarized in Step 3: the following scheme Fig 5: - Optimized model outputs, constructed in the initialization phase of OPRSOM. - Training phase of OPRSOM. Optimal probabilistic - Assignment-decision phase (Equation 3). Training Kohonen Model - Minimization phase (Equation 1 and Equation 2). set Return Optimal parameters of OPRSOM. V. PROPOSED MODEL TO OPTIMIZE THE PROBABILISTIC Optimal Solving via genetic SELF-ORGANIZING ARCHITECTURE MAPS Codebook algorithm A. Data set Description The experiments were performed using the Arabic digit corpus collected by the laboratory of automatic and signals, University of Badji-Mokhtar - Annaba, Algeria. A number of 88 individuals (44 males and 44 females), Arabic native Optimal neurons number speakers were asked to utter all digits ten times [27]. + initialization of weights Depending on this, the database consists of 8800 tokens (10 matrix and vectors digits x 10 repetitions x 88 speakers). In this experiment, the variances in the Probabilistic Kohonen data set is divided into two parts: a training set with 75% of topological map the samples and test set with 25% of the samples Table 1. Arabic Digits Arabic English Symbol ‫صفر‬ ZERO ‘0’ ‫واحد‬ ONE ‘1’ Figure 4: Training Model OPRSOM ‫اثنان‬ TWO ‘2’ ‫ثالثة‬ THREE ‘3’ To more understand the previous scheme, we explain it ‫أربعه‬ FOUR ‘4’ using the following iterative algorithm ‫خمسه‬ FIVE ‘5’ ‫ستة‬ SIX ‘6’ ‫سبعه‬ SEVEN ‘7’ 13 ‫ثمانية‬ EIGHT ‘8’ PSNR and the MSE calculated by classical approach ‫تسعه‬ NINE ‘9’ (PRSOM) a map of T=20 (Because for the other choices of map we find degenerated solutions) neurons and the MSE calculated by a new size of map (mean of N for each digit) for Table 1 shows the Arabic digits, the first column presents example N=5 for WAHID (1), N=3 for ARBAA (4) neurons the digits in Arabic language, the second column presents the which determined by the proposed approach. digits in English language and the last column shows the symbol of each digit. Table 3. MSE and PSNR obtained for Arabic digit by PRSOM and OPRSOM B. Experiments and discussion ARABIC PSNR PSNR MSE MSE In this section, we extensively study the performance of the DIGITS OPRSOM PRSOM OPRSOM PRSOM proposed approach of speech compression using OPRSOM algorithm, Arabic digits set is considered. 0 17.95 20.36 0.95 0.85 1 17.65 17.80 0.98 0.95 The evaluation of the proposed approach in speech data 2 14.91 15.36 1.64 1.50 compression was performed using the following measure, 3 16.60 17.18 1.25 1.10 4 15.55 16.22 1.45 1.25 Peak Signal-to-Noise Ratio (PSNR) is given by: 5 19.77 19.72 0.73 0.74 6 18.44 18.69 1.26 1.20 nX 2 PSNR = 10 log10 ( ) 7 20.00 21.00 1.00 0.79 MSE 8 16.09 15.68 1.20 1.31 9 18.80 18.80 1.09 1.09 Where n is the length of the reconstructed signal, X is the maximum absolute square value of the signal x, and Mean Figure 6 and Figure 7 show the MSE and the PSNR Squared Error (MSE) is defined as follows: comparison of digits Arabic between both approaches PRSOM n 1 and OPRSOM. We can see that the MSE and PSNR very close MSE = ∑(x̂(i) − x(i))2 between both approaches. But proposed method can reduce n the training time and the number of neurons, from the 20 to 5 i=1 neurons, rate of reduction is about 75%. Where x̂ is the original speech signal, and x is the reconstructed speech signal. To choice of optimal neural network (N), we tried five different sizes of topological maps (Nmax). In each map, we compute the optimal size by our model (P). Numerical results obtained on dataset of Arabic digits are presented in the Table2. We note that the optimal size is between 3 and 7 neurons whatever the initial size is. For example, for a map of 50 neurons on digits 1,2,3,7 the optimal size is 5 neurons. Table 2. Optimal results of topological map N max 20 30 50 Figure 5: Comparison between both approaches for the MSE SYFR 0 N 7 6 7 WAHID 1 N 5 5 5 ITNAN 2 N 5 5 5 THALATA 3 N 5 5 5 ARBAA 4 N 3 4 3 KHAMSA 5 N 7 7 6 SITA 6 N 6 6 6 SABAA 7 N 5 5 5 THAMANIA N 5 5 5 8 TISAA 9 N 7 6 7 Figure 6: Comparison between both approaches for the PSNR The compression numerical results using optimal size are presented in Table 3. This table list all Arabic digits, The 14 Recall that the proposed method contains an additional Science Issues (IJCSI), Volume 9, Issue 2, No 1, pp. 197-205, phase; this phase consists on solving the proposed model in 2012. order to remove the unnecessary neurons from the initial map. 10. Ettaouil, M. ; Lazaar, M. ; En-Naimani, Z. "A hybrid For example, for a map with 50 neurons we get a map of 5, the ANN/HMM models for arabic speech recognition using optimal proposed approach can thus remove about 90% neurons from codebook”, Intelligent Systems: Theories and Applications initial map to construct the optimal PRSOM. (SITA), 2013 8th International Conference on Mai 5-6. 11. M. Ettaouil, M. Lazaar, K. Elmoutaouakil, K. Haddouch, A New Algorithm for Optimization of the Kohonen Network VI. PROPOSED MODEL TO OPTIMIZE THE PROBABILISTIC Architectures Using the Continuous Hopfield Networks, WSEAS SELF-ORGANIZING ARCHITECTURE MAPS TRANSACTIONS on COMPUTERS,Issue 4, Volume 12, April 2013. In this paper, we have presented an approach to determine 12. R. Fletcher and S. Leyffer. "Solving Mixed Integer Programs by the optimal codebook and covariance matrix by the Optimal Outer Approximation", Math. Program. 66, 1994, 327–349. Probabilistic Self Organizing Maps (OPSOM). As a first step 13. Gascuel O. Canu S. Thiria .S and Lechevallier Y. Statistique et we construct a mathematical model, after we solve via genetic méthodes neuronales. 1997. algorithm, therefore we obtain the optimal number used in the 14. D.E Goldberg, “Genetic Algorithms in Search, Optimization, card and the best initialization parameters of the network. and Machine Learning”. Addison-Wesley, 1989. This approach has been compared to speech compression 15. O.K. Gupta and A. Ravindran. "Branch and Bound Experiments problem using a datasets of Arabic digit. The obtained results in Convex Nonlinear Integer Programming", Manage Sci., 31 demonstrate the performance of our proposed method. (12) , 1985, pp. 1533–1546. In the future works, we will use exact approaches or 16. Holland J. ”Adaptation in natural and artificial systems”. Ann others heuristics methods to resolve this problem and Arbor,MI: University of Michigan Press, 1992 determine the optimal solution for the optimization of neural 17. T. Kohonen, S. Kaski, K. Lagus, J. Salojr , J. Honkela, V. networks architectures. The proposed method can be applied Paatero, A. Saarela. "Self organization of a massive document to solve the pattern recognition problems, speech recognition collection". IEEE transaction on neural networks, 11, No. 3, problems and image compression problems. 2000. 18. T. Kohonen. "Self Organizing Maps". Springer, 3th edition, REFERENCES 2001. 19. T. Kohonen. "The Self Organizing Maps". Proceedingsof IEEE, 1. F. ANOUAR, F.BADRAN and S.THIRIA, Self Organized Map, A 78, No. 9, 1990, pp. 1464-1480. Probabilistic Approach proceedings of the Workshop on Self- 20. S.P Luttrel. A bayesian analysis of self- organizing maps. Neural Organized Maps, Helsinki University of Technology, Espoo, Computing 6: 767-794, 1994. Finland, June 4-6,1997. 21. S. Manuel, M. L. José, M. B. Victor, M.R. José, GENES, “a 2. F. Anouar. Modélisation probabiliste des auto-organisées : Genetic Algorithms and Fast Time Simulation’’, 3nd ATM Application en classification et en régression. Thèse de doctorat R&D Symposium, Spain, 2002. soutenue au conservatoires national des arts et métiers. 1996. 22. I. Quesada and I.E. Grossmann. "An LP/NLP Based Branch and 3. M. Ayache, M. Khalil and F. Tranquart. "Artificial Neural Bound Algorithm for Convex MINLP Optimization Problems", Network for Transfer Function Placental Development: DCT Computers Chem. Eng., 16 (10/11), 1992, pp. 937–947. and DWT Approach". IJCSI International Journal of Computer 23. N. Rogovschi. Classification à la base de modèles de mélanges Science Issues, Vol. 8, Issue 5, No 3, September 2011. topologiques des données catégorielles et continues. Thèse de 4. C.M. Bishop. Pattern Recognition and Machine Learning, doctorat soutenue à l’université Paris 13- Insitut Galilée. 2009. Springer,2006 24. E. Taillard, J. Dréo, A.Pétrowski, P. Siarry. Métaheuristiques 5. P. Bruneau, Marc Gelgon and F. Picarougne, Parameter-based pour l’optimisation difficile, Eyrolles,2003. reduction of Gaussian mixture models with a variational-Bayes 25. D. Wang. "Fast Constructive-Coverting Algorithm forneural approach, IEEE, 2008. networks and its implement in classification".Applied Soft 6. M. Duran, I. E. Grossmann. "An outer-approximation algorithm Computing, 8, 2008, pp. 166-173. for a class of mixed-integer non linear programs". 26. M. YACOUB and al, Clustering and classification based on Mathematical Programming, 1986, pp. 307-339. expert knowledge propagation using Probabilistic Self 7. Z. En-Naimani, M. Lazaar, M. Ettaouil ‘Hybrid System of Organizing Map: application to geophysics, Data Optimal Self Organizing Maps and Hidden Markov Model for AnalysisStudies in Classification, Data Analysis, and Knowledge Arabic Digits Recognition”. Journal of WSEAS Transactions on Organization 2000, pp 67-78 . Systems, Volume 13, pp. 606-616, 2014. 27. http://archive.ics.uci.edu/ml/datasets/Spoken+Arabic+Digit 8. Z. En-Naimani, M. Lazaar, M. Ettaouil, “Architecture optimization model for the probabilistic self-organizing maps and classification”, 9th International Conference on Intelligent Systems: Theories and Applications, Rabat (Morocco), May 07- 08, 2014. 9. M. Ettaouil, M. Lazaar, "Improved Self-Organizing Maps and Speech Compression", International Journal of Computer 15