Population Initialization Methods for the Swallow Swarm
Algorithm in Solving the Problem of Fuzzy Classifier Parameter
Optimization
Artem Slezkin and Ilya Hodashinsky
Faculty of Security, Tomsk State University of Control Systems and Radioelectronics, 40 Lenin Ave. 634050,
Tomsk, Russia

                Abstract
                This paper considers and compares population initialization methods for the swallow swarm
                algorithm in solving the problem of the fuzzy classifier parameter optimization. Population
                initialization is important in swarm and evolutionary optimization algorithms, in which the
                lack of diversity in the population can lead to early convergence and to hitting the local
                optimum. Methods based on quasi-random sequences, chaotic maps and random value
                distributions were considered. The hybrid initialization method based on the normal and
                uniform distribution revealed the lowest classification error. The fastest convergence was
                shown by the method based on the beta distribution.

                Keywords 1
                Population initialization, optimization, swallow swarm algorithm, fuzzy rule-based classifier

1. Introduction
    Stochastic methods, called metaheuristics, make it possible to find satisfactory solutions for
problems of large dimensions in reasonable time. Metaheuristics, which provide a new solution based
on a single previous solution, belong to trajectory methods. Population metaheuristics give a solution
based on the previous experience and information about the best solutions in the population. Among
the many population metaheuristics, one can distinguish two most common classes: evolutionary and
swarm algorithms [1].
    The Swallow Swarm algorithm is one of the population swarm algorithms which has unique
features that are absent in other swarm algorithms, such as the division of the population into several
subpopulations with their local leaders and the use of three types of particles, with each of them
performing certain functions in the population. Due to these features, the algorithm has a good rate of
convergence and copes with the problem of leaving local optima [2, 3]. The algorithm is used for
solving the problems of node control in sensor networks [4], for optimizing digital filter parameters
[5], and for solving the problem of selecting tests for fault diagnostics [6].
    Population initialization is important in the swarm and evolutionary optimization algorithms, in
which the lack of diversity in the population can lead to early convergence and to hitting the local
optimum. The efficiency of the optimization methods can be improved by developing methods for
generating perspective initial solutions. However, to date there are no systematic studies of the
initialization process and impact of the initial solution distributions on the efficiency of the swarm and
evolutionary algorithms when solving optimization problems [1,7].
    Many real-world problems can be reduced to the problem of classification. Fuzzy classifiers use
fuzzy rules to describe the relationship between the values of the features of an object and the class to
which the object belongs. The main advantages of the fuzzy classifiers are their interpretability and

SibDATA 2021: The 2nd Siberian Scientific Workshop on Data Analysis Technologies with Applications 2021, June 25, 2021,
Krasnoyarsk, Russia
EMAIL: hodashn@gmail.com (I. Hodashinsky)
ORCID: 0000-0002-8644-9698 (A. Slezkin); 0000-0002-9355-7638 (I. Hodashinsky)
           © 2021 Copyright for this paper by its authors.
           Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           CEUR Workshop Proceedings (CEUR-WS.org)
tolerance to inaccurate and missing data [8]. The simple and clear architecture of the fuzzy classifiers,
as well as a large number of training schemes, including the gradient, swarm, and evolutionary
methods, have allowed the fuzzy classification to play one of the central roles in data mining [9, 10,
11].
    The purpose of this study is to compare the population initialization methods for the swallow
swarm algorithm when solving the problem of optimizing the fuzzy classifier parameters.

2. Literature review
    The initialization procedure includes specifying the initial population size and actually creating the
initial population. The population size affects the reliability and computational cost of the algorithm.
A small population size can lead to fast convergence to the local optimum, while a large population
size will increase the computational costs and may lead to slow convergence [12]. The population size
can be related to the dimensionality of the search space. However, most often the population size is
chosen experimentally. Another approach to solve this problem is to introduce an adaptive method of
forming the population size, which makes the algorithm less sensitive to the initial choice of the
population size [1].
    It is possible to distinguish several different approaches to creating the initial population:
stochastic methods, quasi-random sequences, oppositional learning method, multistep method, chaotic
mappings (maps) [1,13]. The stochastic methods can be divided into two groups: pseudorandom
number generators and chaotic number generators. Levy distribution, beta distribution, normal
distribution, lognormal distribution, etc. are used to create the initial population along with a uniform
distribution. [7]. The main advantage of the method based on the generation of pseudorandom
numbers is the simplicity of its realization. Chaotic number generators are based on Gaussian
mapping, logistic mapping, sinusoidal mapping, tent-map mapping, etc. [13,14]. Using the chaotic
initialization methods can improve the efficiency of the metaheuristic optimization algorithms in
terms of population diversity and convergence rate [15].
    Quasi-random sequences are neither random nor pseudorandom, their generation algorithms do not
use random elements, and these sequences are fully deterministic. The advantage of the method based
on quasi-random sequences is the homogeneity of the generated initial population, which allows
increasing the research capability of the algorithm at the early stages of its operation [13]. In [16] an
improvement of the gravitational search algorithm by generating the initial population using the Sobol
quasi-random number generator is described. To initialize the population of the swarming particle
algorithm, the quasi-random Van der Corput sequences were used in [17], while in [18] the Faure
sequences were used.
    In [15], to improve the convergence of the artificial bee colony algorithm, an initial population is
created by combining chaotic systems with oppositional learning. The idea of initialization based on
oppositional learning is to simultaneously generate both the current solutions and the solutions
opposite to the current ones, using opposite numbers for this purpose [19]. When considering several
test problems using the differential evolution algorithm, the authors [20] proved that it is possible to
improve the efficiency of metaheuristics by using oppositional learning for the initialization and
during the optimization process.

3. Fuzzy classifier
   The fuzzy classifier rules have the following form [21]:
    IF x1  A1 j AND x2  A2 j AND ... AND xn  Anj THEN class  c j ,
   where j  1...m , m is the number of rules, Aij is the term of the j-th rule of the i-th variable, cj is the
identifier of the j-th level.
   Then, the fuzzy classifier itself can be represented as:
                                           C  f ( x, θ),                                              (1)
where x is the input vector, θ is the vector of the parameters of the antecedent membership functions.
   Then, the optimization task of the membership function parameters will be to find such a vector of
parameters θ, at which the input data classification quality will be maximum. However, the
population metaheuristic optimization algorithms do not require a single initial solution, but a set of
them.

4. Creation of initial populations
   The most common approaches to initializing the initial population are considered in this study:
quasi-random sequences, chaotic mappings and methods based on distributions of random variables.
Of the above approaches, the methods having the best results in numerous experiments were chosen.
The following notations will be introduced: N is the size of the initialized population, θ0 is the vector
obtained after the formation of the initial base of the fuzzy rules.

4.1.    Methods based on quasi-random sequences
   Halton's method. The Halton's formation of quasi-random vectors [22] is based on the Van der
Corput transformation, a sequence of positive integers which transforms the natural number n for the
given number r in the case when n has a representation in the r-number system according to the
expression:
                               m                         m                                   (2)
                           т       k 1
                                 ak r , then  r ( n ) 
                                 k 1
                                                              k
                                                           ak r ,
                                                          k 1


   Faure's method. In the Faure sequence [23] the n-th element can be expressed using the following
expression:

                                  0
                                    b
                                           1
                               ( n),  ( n), ...,  (
                                               b      b
                                                          s 1
                                                                 
                                                              n) ,
                                                                                              (3)

                                                                                            i 1 
where b is the prime number, b ≥ s, P is the Pascal matrix, whose element (i, j) is             , where
                                                                                           j  1
i  1, j  1.

4.2.    Methods based on chaotic maps
   To initialize the population based on chaotic procedure maps, the initial vector θ0, is given as an
input at the first iteration, and at each of the following N-2 iterations the result obtained at the
previous iteration is given. The initial vector θ0 is added to all the generated vectors.

   The method based on the chaotic Gauss iterated map. This method generates the initial
population according to the formula:
                                               2
                                   θ  exp( θ )   ,                          (4)
                                        n 1          n
where α and β are the real coefficients, 0  n  N  2.

   The method based on the chaotic tent map. This method initializes the population according to
the following formula:
                                                       θ n , θ n  0.5                  (5)
                               θ n 1  f  (θ n )                           ,
                                                       (1  θ n ), 0.5  θ n
where μ is the positive integer, 0  n  N  2.
4.3.    Methods based on distributions of random variables
    To initialize the population, N-1 vectors of random variables are generated, with each of them
element-wise multiplied with the initial vector θ0. The last vector added to the population was the
initial vector θ0. Here, normal, uniform, gamma, and beta distributions are used.

5. Experiment description
    A computational experiment was performed on 34 datasets from the KEEL repository
(http://keel.es/) using tenfold cross-validation. The fuzzy classifier was used as a classifier. The
swarm algorithm was used to optimize the parameters of the membership functions [2, 24]. Ten
initialization methods presented in Table 1 were analyzed. The parameters of the initialization
methods are shown in Table 2. The following parameters of the swarm swallow algorithm were
established: population size - 40, number of local leaders - 3, number of aimless particles - 6, number
of iterations - 100. The choice of the parameter values is due to the experiments conducted earlier
[24]. For each data set, the experiment was repeated 30 times, and then, the results were averaged.
Two parameters were evaluated during the experiment: the classification error on the test data and the
number of iterations performed until convergence.

Table 1
Population initialization methods
                                      Method                                            Abbreviation
Faure's method                                                                              Faure
Halton's method                                                                             Halton
Chaotic Gauss Iterated Map                                                               Gauss Map
Chaotic Tent Map                                                                          Tent Map
Method based on normal distribution                                                        Normal
Method based on uniform distribution                                                       Uniform
Method based on gamma distribution                                                         Gamma
Method based on beta distribution                                                            Beta
Hybrid method based on uniform and normal distributions                                      UN
Hybrid method based on uniform, normal, beta and gamma distributions                        BGNU

Table 2
Parameters of the population initialization methods
             Method abbreviation                                         Parameters
Gauss Map                                                       α = rand(1;3), β = rand(-1;1)
Tent Map                                                                    µ=2
Uniform                                                                 a = 0.5; b = 2
Normal                                                                   μ = 1; σ = 1
Gamma                                                                    k = 1; θ = 1
Beta                                                                   α = 0.5; β = 0.5

   The description of the data sets and the results of the initialization methods are presented in [25].
   The Friedman two-factor analysis of variance for related samples (significance level equal to 0.05)
was used for the statistical evaluation of the initialization methods. Tables 3 and 4 show the ranks of
each of the population initialization methods for the parameters evaluated. The p-value for the
classification error is 2.74E-07, and the p-value for convergence is 0.021.
Table 3
Ranks of the population initialization methods with regard to the classification error
Faure Halton Normal Uniform UN                  BGNU Beta Gamma Gauss Map                   Tent Map
 6.76     7.29     4.71         5.00     3.96    4.34     4.44      5.43          6.25        6.82

Table 4
Ranks of the population initialization methods with regard to convergence
Faure Halton Normal Uniform UN BGNU Beta Gamma Gauss Map                                    Tent Map
 6.37     6.16     4.62          6.44     5.1   5.22      4.5     4.65    5.9                 6.04

    All the p-values are lower than the significance level of 0.05; thus, the results are statistically
significant and the null hypotheses of equality of classification errors and equality of the performed
iterations to the point of convergence are rejected.
    In Figure 1 the asterisks mark the elements that are part of the Pareto set. Regarding the
classification error and convergence considered as separate criteria, the UN method has the minimum
classification error, while the Beta method shows the best convergence.


Figure 1: Graphical representation of the results of the initialization methods

6. Conclusion
   In this research ten methods of initialization of the swarm algorithm population were studied to
solve the problem of optimization of the fuzzy classifier parameters when classifying 34 datasets. The
hybrid initialization method based on the normal and uniform distribution showed the lowest
classification error. This population initialization method is recommended if minimizing the
classification error is a priority. The fastest convergence was shown by the method based on the beta
distribution. This method is recommended if the priority is the speed of the algorithm.
   In future studies, it is planned to test the proposed population initialization methods on other
classifiers and other metaheuristic algorithms.

7. Acknowledgements
   This work was supported by the Ministry of Science and Higher Education of the Russian
Federation (Project No. FEWM-2020-0042).
8. References
[1] I. A. Hodashinsky, Methods for Improving the Efficiency of Swarm Optimization Algorithms. A
     Survey, Automation and Remote Control 82(6) (2021) 935–967.
[2] M. Neshat, G. Sepidnam, M. Sargolzaei, Swallow swarm optimization algorithm: a new method
     to optimization, Neural Computing and Application 23(2) (2013) 429–454.
[3] M. Neshat, G. Sepidnam, A new hybrid optimization method inspired from swarm intelligence:
     Fuzzy adaptive swallow swarm optimization algorithm (FASSO), Egyptian Informatics Journal
     16(3) (2015) 339–350.
[4] A. Nithya, A. Kavitha,Node Weight Swallow Swarm Optimization Convex Node Segmentation
     (Nws2cns) Algorithm for Distributed 3-D Localization in Wireless Sensor Networks (Wsns),
     International Journal of Scientific & Technology Research 9(2) (2020) 5130–5137.
[5] S. K. Sarangi, R. Panda, A. Abraham, Design of optimal low-pass filter by a new Levy swallow
     swarm algorithm, Soft Computing 24 (2020) 18113–18128.
[6] Z. Yao, L. Zhu, T. Zhang, J.Wang, Optimal Selection of Tests for Fault Diagnosis in Multi-Path
     System with Time-delay, Journal of Electronic Testing 36 (2020) 75–86.
[7] Q. Li, S.-Y. Liu, X.-S.Yang, Influence of initialization on the performance of metaheuristic
     optimizers, Applied Soft Computing 91 (2020) 106193.
[8] P. Carmona, J.L. Castro, Fuzzy Feature Rank. Bringing order into fuzzy classifiers through fuzzy
     expressions, Fuzzy Sets and Systems 401 (2020) 78–90.
[9] X. Hu, W. Pedrycz, X.Wang, Fuzzy classifiers with information granules in feature space and
     logic-based computing, Pattern Recognition 80 (2018)156–167.
[10] I. A. Hodashinsky, I. V. Gorbunov, Algorithms of the tradeoff between accuracy and
     complexity in the design of fuzzy approximators, Optoelectronics, Instrumentation and Data
     Processing 49, (2013) 569–577.
[11] A. Lavygina, I. Hodashinsky, Hybrid algorithm for fuzzy model parameter estimation based on
     genetic algorithm and derivative based methods, in: ECTA 2011 FCTA 2011, Proceedings of the
     International Conference on Evolutionary Computation Theory and Applications and
     International Conference on Fuzzy Computation Theory and Applications, Portugal, 2011, pp.
     513–515.
[12] D. B. Chen, C. X. Zhao, Particle Swarm Optimization with Adaptive Population Size and Its
     Application, Applied Soft Computing 9(1) (2009) 39–48.
[13] B. Kazimipour, X. Li, A. K. Qin, A review of Population Initialization Techniques for
     Evolutionary Algorithms, in: Proceedings of IEEE Congress on Evolutionary Computation,
     2014, pp. 2585–2592.
[14] B. Alatas, Chaotic bee colony algorithms for global numerical optimization, Expert Systems with
     Applications 37 (2010) 5682–5687.
[15] W.-f. Gao, S.-y. Liu, A modified artificial bee colony algorithm, Computers & Operations
     Research 39(3) (2012) 687–697.
[16] O. T. Altinoz, A .E. Yilmaz, G. Weber, Improvement of the Gravitational Search Algorithm by
     Means of Low-Discrepancy Sobol Quasi Random-Number Sequence Based Initialization,
     Advances in Electrical and Computer Engineering 14(3) (2014) 55–62.
[17] M. Pant, R. Thangaraj, A. Abrana, Low Discrepancy Initialized Particle Swarm Optimization for
     Solving Constrained Optimization Problems, Fundamenta Informaticae 95 (2009) 511–531.
[18] R. Brits, A. P. Engelbrecht, F. Van den Bergh, A Niching Particle Swarm Optimizer, in: Proc.
     4th Asia-Pacific Conference on Simulated Evolution and Learning, 2002, pp. 692–696.
[19] S. Mahdavi, S. Rahnamayan, K. Deb, Opposition Based Learning: A Literature Review, Swarm
     and Evolutionary Computation 39 (2018) 1–23.
[20] S. Rahnamayan, H. R. T izhoosh, M. M. A. Salama, Opposition-Based Differential Evolution, in:
     IEEE Transactions on Evolutionary Computation, volume 12(1), 2008, pp. 64–79.
[21] M. Bardamova, A. Konev, I. Hodashinsky, A. Shelupanov, A fuzzy classifier with feature
     selection based on the gravitational search algorithm, Symmetry 10(11) (2018) 609.
[22] S. M. Ermakov, On the Halton quasi-random sequences randomization, Vestnik of St Petersburg
     University. Mathematics. Mechanics. Astronomy 4(62) (2017) 570–576.
[23] H. Faure, Good permutations for extreme discrepancy, Journal of Number Theory 42 (1992) 47–
     56.
[24] I. Hodashinsky, K. Sarin, A. Shelupanov, A. Slezkin, Feature selection based on swallow swarm
     optimization for fuzzy classification, Symmetry 11(11) (2019) 1423.
[25] A. Slezkin, Population Initialization Methods for Swallow Swarm Algorithm when Solving
     Fuzzy Classifier Parameters Optimization Problem, Mendeley Data 2 (2021). doi:
     10.17632/zwntkycn83.2.