<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Inductive Modeling of Amylolytic Microorganisms Quantity in Copper Polluted Soils</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olha Moroz</string-name>
          <email>lhahryhmoroz@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volodymyr Stepashko</string-name>
          <email>stepashko@irtc.org.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department for Information Technologies of Inductive Modelling International Research and Training Centre for Information Technologies and Systems of the NASU and MESU Glushkov Avenue 40</institution>
          ,
          <addr-line>Kyiv, 03680</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>1</fpage>
      <lpage>3</lpage>
      <abstract>
        <p>The article presents the application results of the combinatorial-genetic algorithm COMBI-GA for modeling from experimental data of amylolytic microorganism quantity in a soil plot contaminated by copper. The constructed nonlinear mathematical models describe dependence of microorganisms concentration in soil from basic vital environmental factors and the copper concentration in the soil.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        One of the main components of most ecosystems is soil,
in which microorganisms play an important role in the
evolution and formation of fertility. Anthropogenic pollution
of the biosphere affects all living components of
biogeocenoses, including soil microorganisms [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Modern agroecosystems are subject to the considerable
technogenic influence resulting frequently in pollution of
arable soils. Pollutants influence negatively on soil
mikrobiotics, which causes the necessity to carry out
longterm observations after its current status. In parallel with
monitoring, there is the task of determination of critical
deviations and prediction of the mikrobiotum state
dependently on pollutants concentration in soil. Solving this
task is possible only by formalization of monitoring data in
the form of mathematical models [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Investigation of the effect of specific anthropogenic
factors, such as heavy metals, on microbial community
functioning is very important. Negative influence of heavy
metals on microbial kenosis, soil and biological activity is
well-known. In this connection, the organization and carrying
out of regular control of soil state for the purpose of critical
situation detecting and forecasting are actual.</p>
      <p>Thus the dynamics of change of microorganism quantity
in soil is studied under influence of different ecological
factors. Modeling from observation data is the necessary
condition of the ecological monitoring as it allows operative
estimating current ecological situations and forecasting their
development.</p>
      <p>In this research we find models of dependence of
microorganisms functioning from supporting weather
conditions, this means kind of models, where inputs are
hydrometeorological variables and outputs ecological
indexes. Such models can be used both for restoring the
omitted data and for forecasting the development of the
controlled ecological processes on the basis of weather terms
prediction. For construction of such models we use the Group
Method of Data Handling (GMDH) as the most effective
method for the analysis, modeling and forecasting of complex
processes from experimental data under conditions of
incompleteness of a priori information and short samples
given.</p>
      <p>
        To analyse of soil microbiotics of dark gray podzol soil
(Kyiv region) polluted with heavy metals, the automated
system of simulation ASTRID [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] with different algorithms
of GMDH was initially used.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] the application results of the hybrid
combinatorialgenetic algorithm COMBI-GA [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] for finding optimal
linear models on the basis of observations of the change in
the number of amylolytic microorganisms in the soil
contaminated by copper are presented. But to solve prediction
tasks we need more accurate approximation. So, in this paper
new results of the application of the COMBI-GA algorithm
for building optimal nonlinear models are presented.
      </p>
      <p>Section II of this paper describes briefly task of
modelling. Section III considers hybrid combinatorial-genetic
GMDH algorithm COMBI-GA and their features. Section IV
presents modelling results.</p>
    </sec>
    <sec id="sec-2">
      <title>II. TASK OF MODELING</title>
      <p>
        Experiments regarding functioning amylolytic
microorganisms under copper contamination were carried out
on small plots in deep-gray podzolic soil (Kyiv region). The
The traditional chart [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] of experiments was used: several
plots of the same soil type were selected for experiments and
a plot remained as control non-contaminated one.
      </p>
      <p>Model contamination of soil was carried out by the annual
one time bringing in soil solutions of Cu2+ salts at the
beginning of a vegetation season. The amount of the applied
metal (computed as content of their ions) corresponds to
contamination doses of 2 maximum permissible
concentrations (MPC). The soil pieces for the analyses was
taken during vegetation periods from 1993 to 1996 from the
arable layer depth (0-20 cm) approximately in 2nd, 30th and
90th day after bringing of the metal salt. It was hence
received three measuring points during four years or 12
points together.</p>
      <p>The amount of amylolytic microorganisms in the control
and polluted soils was determined by the method of sowing
of soil suspension on a nourishing medium consisting of a
starch-ammonia agar.</p>
      <p>
        Based on observations data of below listed variables, the
linear mathematical models were built in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for description
of the amylolytic bacteria quantity changes in control and
polluted by the heavy metal soils.
      </p>
      <p>As input (independent) variables (factors) for construction
of models were used: concentration of mobile forms of Cu2+,
decade average values of temperature, humidity of soil and
air, and number of microorganisms in soil of control
unpolluted plot. Quantity data of amylolytic microorganisms
were output (dependent) variables in plots with model
pollution of soil by copper salt. Based on the obtained data,
the models of microorganisms number in soil were built.</p>
      <p>For construction of model of changing quantity of
amylolytic microorganisms such list of input variables was
formed: x1 – quantity of microorganisms in the control plot
(millions in 1 g of dry soil); x2 – concentration of copper
(mg/cg soil); x3 – number of days from the date of pollution;
x4 – current decade average temperature of air (oC); x5 –
previous decade average air temperature (oC); x6 – current
decade average humidity of air (%); x7 – previous decade
average humidity of air (%)</p>
      <p>A definition of the inductive modelling problem in this
task may be done as follows. Let us given: a data set of n
observations after 7 inputs x1, x2, …, x7 and one output y
variables. The GMDH task is to find a model y=f(x1, x2,…,
x7, θ) with minimum value of a given model quality criterion
C(f), where θ is unknown vector of model parameters. The
optimal model is defined as f*=argminΦC(f), where Φ is a set
of models of various complexity, f∈Φ.</p>
    </sec>
    <sec id="sec-3">
      <title>III. HYBRIDS GMDH-GA ALGORITHM</title>
      <p>
        The genetic algorithm [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is one of the meta-heuristic
procedures of global optimization constructed as a result of
generalization and simulation in artificial systems of such
properties of living nature as natural selection, adaptability to
changing environmental conditions, inheritance by offspring
of vital properties from parents.
      </p>
      <p>Since GA is based on the principles of biological
evolution and genetics, biological terms are used actively
(and sometimes incorrectly) to describe them. Here are some
of these terms. Individual is the potential solution to the
problem; population is a set of individuals; offspring is
usually improved copy potential solution (father); fitness is
usually a quality characteristic of the solution. Chromosome
is encoded data structure of an individual in the form of an
array of fixed lengths. In the simplest case it’s a binary string
of fixed length. The gene is an element of this array.</p>
      <p>Formally, GA can be represented in such a way:</p>
      <p>GA = {P0 , M , L, F , G, s} ,
where P0= (a10 ,..., aM0 ) is an initial population; ai0 is an
individual of this population treated as a candidate for the
solution of the optimization problem presented in the form of
a chromosome; M is the population size (integer number); L
is the length of each chromosome of the population (integer
number); F is a fitness function of an individual; G is a set of
genetic operators; s is the algorithm stopping rule.</p>
      <p>As input data for any GA initial population Р0, a finite set
of chromosomes is used each of which represents a potential
solution of the problem. Then the first population of offspring
Р1 is formed from the parent chromosomes Р0 using some
genetic operators, similarly the next population Р2 is formed
from the population Р1 and so on. The process continues until
the specified stopping rule of the algorithm will be satisfied.</p>
      <p>An important feature of the GA work is that with each
step the mean FF value of the current population improved
and converges to the solution of the optimization problem.</p>
      <p>The effectiveness of GA's work depends on the method of
encoding genes, the composition of the initial population
used by genetic operators, GA parameters, such as population
size, number of chromosomes selected during selection and
for crossover, probability of using genetic operators. The
most important in GA are genetic operators especially the
selection of which stores a certain amount of chromosomes
with the best values of FF for each iteration of GA, and the
operators of the creation of new offspring-chromosomes such
as crossover and mutation. The crossover operator creates
offspring by exchanging genetic material between the parent
chromosomes, and the mutation operators by changing one
chromosome in accordance with certain rules.</p>
      <p>
        Formally, the hybrid of COMBI [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and GA algorithm
can be defined as follows:
      </p>
      <p>COMBI-GA = 〈Z, y, f, X, D, CR, Р0 , H , M , G, k, F 〉 ,
where Z[n×r] is the measurement matrix of input variables of
an object, r and n are numbers of inputs and measurements
respectively; y[n×1] is vector of measurements of an output
variable; f [m×1] is vector of a given m base functions of
input variables; X[n×m] is the measurement matrix of base
set of arguments; D is a given rule of dividing matrix X[n×m]
and vector y[n×1] to testing A and checking B parts; CR is an
external selection criterion (as fitness function) based on
dividing the sample (X, y); Р0 is a set of model structures of
GA initial population consisting of binary chromosomes
(encoded structure of partial models); Н is size of initial
population of models, Н&lt;m; M is size of any next population,
М&gt;H; G is set of genetic operators;k is stopping rule of GA;
F is number of best partial models (freedom of choice)
monitored during all iterations of the algorithm, 1&lt; F ≤ H.</p>
      <p>This algorithm consists of the following steps:</p>
      <p>Step 1. Calculating the matrix of the base set of arguments
X[n×m] using the input matrix Z and the vector of base
functions f and dividing it and the output vector of
measurements y[n×1] according to the rule D in testing
XА[nA×m] and checking XВ[nB×m] submatrices (nA + nB = n).
Obviously, in the case of linear polynomial, matrices X and Z
are identical (m = r).</p>
      <p>Step 2. Random generating the initial population Р0 of the
genetic algorithm.</p>
      <p>Step 3. Calculating the coefficients of each partial model
by LSM or another method using the training matrix of base
arguments XА and output vector yА.</p>
      <p>Step 4. Calculating the value of an external criterion CR
(as the GA fitness function) for each partial model using the
checking matrix XВ and output yВ.</p>
      <p>Step 5. Forming the current population of partial models
(chromosomes) of the size H with better criterion values to
form the next offspring. In addition, selection the best F
partial models that are potential solutions of the task of model
building.</p>
      <p>Step 6. Forming new population of M individuals applying
genetic operators of crossover and mutation to individuals of
the current population.</p>
      <p>Step 7. Checking a given GA stopping rule. If it is
satisfied, then go to step 8, otherwise go to step 3.</p>
      <p>Step 8. Choosing F best models from the current
population of the size Н.</p>
      <p>Step 9. The end.</p>
    </sec>
    <sec id="sec-4">
      <title>IV. MODELING RESULTS</title>
      <p>Based on experimental data, models of quantity of
amylolytic microorganisms were built in the control as well
as in the copper polluted soils. In all cases we use the
COMBI-GA algorithm with the following division of all data
sample (12 points of observation during 4 years for the
vegetation period 1993-1996): 6 points (2 years) as training
set A, 3 points (1 year) as checking set B, and 3 points (1
year) as validation set C.</p>
      <p>
        Results in the class of linear models. In case of linear
modelling, the quantity of amylolytic in a control soil Ycontr=
x1 is described by the model obtained in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]:
      </p>
      <p>Ycontr = 0,2136x4 – 0,7149x5 + 0,5412x7 .</p>
      <p>Models for the quantity of microorganisms in polluted
soil were built taking into account the quantity of amylolytic
microorganisms at the observation of the control soil x1.</p>
      <p>
        The linear model for quantity of amylolytic
microorganisms in the copper polluted soil [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]:
      </p>
      <p>Y = 0,743x1+1,6516x2 – 0,9182x5 – 0,2845x7</p>
      <p>Proper graphs of experimental and model data are given
on Fig. 2. The characteristics of accuracy of these models are
presented in the table below. This accuracy level is
insufficient for quality monitoring needs. That is why we
decide to build more complex nonlinear (polynomial)
models. The results of this stage of modeling are presented
below.</p>
      <p>Results in the class of nonlinear models. In the case of
nonlinear modelling, the quantity of amylolytics in a control
soil is described by the model:</p>
      <p>x1 = – 0.19297x5 – 2.0976x7 + 0.130x4x5 + 0.111x5x7
Fig. 3 shows graphs of measured and modeled data .</p>
      <p>The model of dynamics of changing the amylolytic
microorganisms in soil polluted by copper:</p>
      <p>Y = 1,1759x5 + 0,2159x7 + 0,1638x1x5 – 0,0261x2x7 –
0,1252x4x5</p>
      <p>The characteristics of all obtained models quality are
presented in the table calculated according to next formulas:
MSE = 1 1∑2 ( xi − x )2 , ARB =|| y B − X B θˆ A ||2 .</p>
      <p>12 i =1</p>
      <p>The designation “Valid. err.” means error on the
independent validation set C calculated like ARB.</p>
      <sec id="sec-4-1">
        <title>Linear case</title>
        <p>Measured Predicted</p>
      </sec>
      <sec id="sec-4-2">
        <title>Nonlinear case Measured Predicted MSE AR</title>
        <p>Valid. err.</p>
        <p>Proper graphs of experimental and model data are given
on Fig. 4. These graphs shows that in most points the data
measured and predicted by the model coincide, that is the
models adequately represents the change of microorganisms
quantity. Three last three validation points on the graphs
testify good results of models verification in the forecasting
mode. Some distinctions can be accounted for by spatial
heterogeneity of soil and other terms what could cause
irregular variability of quantity.
Fig. 3 Graphs of change of quantity of amylolytic microorganisms
on the control plot (nonlinear model)</p>
        <p>As it is evident from the results, the functioning of
amylolytic bacteria in soil is substantially influenced by the
temperature and humidity of air.</p>
        <p>In nonlinear case we obtain much more accurate results
that can further help to effectively solve different ecological
tasks based on microbial monitoring.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>V. CONCLUSION</title>
      <p>The carried out research manifests the possibility of
formalization of the given ecological observations by
construction of mathematical models. For the modeling of
quantity of microorganisms in soil, application of inductive
approach on the basis of GMDH is effective.</p>
      <p>The obtained nonlinear models in a high degree coincide
with experimental data that enables to use them in the system
of the experiments for estimation of degree of soil
contamination, renewal of intermediate or omitted data and
operative forecasting the dynamics of microorganisms under
various ecological conditions. Equally, these models will be
helpful also for the data restoration with the purpose of
obtaining the uniform series of ecological observations.</p>
      <p>REFERENCES</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.I.</given-names>
            <surname>Andreyuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.O.</given-names>
            <surname>Iutynska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.F.</given-names>
            <surname>Antypchuk</surname>
          </string-name>
          et al,
          <article-title>"The functioning of soil microbial communities under conditions of anthropogenic load,"</article-title>
          K .:
          <string-name>
            <surname>Oberehy</surname>
          </string-name>
          ,
          <year>2001</year>
          , 240 p.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H. G.</given-names>
            <surname>Schlegel</surname>
          </string-name>
          ,
          <article-title>"</article-title>
          <source>General microbiology," 7th edition</source>
          , Cambridge University Press,
          <year>1993</year>
          , 655 p.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.S.</given-names>
            <surname>Stepashko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Yu.V.</given-names>
            <surname>Koppa</surname>
          </string-name>
          ,
          <article-title>"Experience of the ASTRID system application for the modeling of economic processes from statistical data," Cybernetics and computing technique</article-title>
          , vol.
          <volume>117</volume>
          , pp.
          <fpage>24</fpage>
          -
          <lpage>31</lpage>
          ,
          <year>1998</year>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Iutynska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Moroz</surname>
          </string-name>
          ,
          <article-title>"Inductive modeling of changes of amilolitic microorganisms on polished surface tuber," Inductive modeling of complex systems</article-title>
          ,
          <source>IRTC ITS NASU, Kyiv</source>
          ,
          <year>2017</year>
          , vol.
          <volume>9</volume>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Moroz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stepashko</surname>
          </string-name>
          ,
          <article-title>"Hybrid Sorting-Out Algorithm COMBI-GA with Evolutionary Growth of Model Complexity," Advances in Intelligent Systems and Computing II / N.</article-title>
          <string-name>
            <surname>Shakhovska</surname>
          </string-name>
          , V. Stepashko, Editors,
          <source>AISC book series</source>
          , Berlin: Springer Verlag, vol.
          <volume>689</volume>
          , pp.
          <fpage>346</fpage>
          -
          <lpage>360</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>O.H.</given-names>
            <surname>Moroz</surname>
          </string-name>
          ,
          <article-title>"Sorting-Out GMDH algorithm with genetic search of optimal mode,"</article-title>
          <source>Control Systems and Machines, no. 6</source>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>79</lpage>
          ,
          <year>2016</year>
          . (In Russian)
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.I.</given-names>
            <surname>Andreyuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.A.</given-names>
            <surname>Iutynska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.V.</given-names>
            <surname>Petrusha</surname>
          </string-name>
          ,
          <article-title>"Homeostasis of microbial of soils polluted by heavy metals,"</article-title>
          <source>Mіkrobіol. Journ</source>
          , vol
          <volume>61</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>21</lpage>
          ,
          <year>1991</year>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Ivakhnenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.S.</given-names>
            <surname>Stepashko</surname>
          </string-name>
          ,
          <article-title>"Noise-Immunity of Modeling,"</article-title>
          <source>Kiev: Naukova Dumka</source>
          ,
          <year>1985</year>
          , 216 p. (
          <source>In Russian)</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Holland</surname>
          </string-name>
          ,
          <article-title>"Adaptation in natural and artificial systems, An introductory analysis with application to biology, control</article-title>
          , and artificial intelligence," University of Michigan, Computers,
          <year>1975</year>
          , 183 p.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.S.</given-names>
            <surname>Stepashko</surname>
          </string-name>
          ,
          <article-title>"Combinatorial Algorithm of the Group Method of Data Handling with Optimal Model Scanning Scheme,"</article-title>
          <source>Soviet Automatic Control</source>
          , vol
          <volume>14</volume>
          , no 3, pp.
          <fpage>24</fpage>
          -
          <lpage>28</lpage>
          ,
          <year>1981</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>