<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fractional Genetic Programming with Probability Density Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Artur Rataj</string-name>
          <email>arataj@iitis.gliwice.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Theoretical and Applied Computer Science</institution>
          ,
          <addr-line>Baltycka 5, Gliwice</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We extend the fractional genetic programming scheme with data elements that are no more scalar, but instead are similar to probability density functions. The extension straightforwardly fits into fractional programming, in which data elements are blended from several values. In the case of our previous work, the blend produced a single scalar value. The extension proposes to build an approximate probability density function out of the blended elements. The extension turned out to be very effective in an unsuspected way: when a data element, despite being destined to approximate a probability density, represented a single-dimensional image of spatial data.</p>
      </abstract>
      <kwd-group>
        <kwd>genetic programming</kwd>
        <kwd>real-coded genetic algorithm</kwd>
        <kwd>evolutionary method</kwd>
        <kwd>probability density</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The fractional genetic programming (FGP), which we presented in [10], was built
around the notion of gradualism – an instruction could be added into a program
gradually, by increasing that instruction’s strength, beginning with its very small
value. A weak instruction had only a minimal influence on the program, as it
affected the fractional data structures in a correspondingly minimal way. Tests
have shown, that the graduality may give the programming scheme a substantial
advantage, similar to that observed in the real–coded genetic programming [5,
11]. In order to take a data element eout out of a fractional structure, a possibly
stark data loss was required, though – eout was a weighted mean of M data
elements eiin, i = 1, . . . M , each having a weight wi, related to the fraction of eiin
actually removed from the data structure, in order to read eout. We will further
call the variant FGPs, where S stands for scalar data elements. The discussed
extension attempts to alleviate the data loss – the mixture of values eiin, wi is
approximately preserved in eout as a probability density function–like (PDF) set
of N pairs (value, probability). Operations on these data elements are similar to
these on PDFs, and thus require costly convolutions. To limit the complexity,
N has typically a small value. The extended variant will further be referred as
FGPd, with D standing for density. Making genetic algorithms able to process
stochastic data had been tried before, see e.g. [6], yet we are not aware of a
solution similar to ours in the subfield of genetic programming.</p>
      <p>The extension has been introduced to alleviate yet another problem –
processing of vectors, whose subsequent elements are related to each other, e.g. the
index represents a spatial coordinate. Consider, for example, that a program
processes a single–dimensional image consisting of, say, 10 pixels. In FGPs, to
hold these elements on a fractional stack, pushing of these 10 elements one on
top of another would be required. In a program, that does not feature a loop, this
may seem like a quite unwieldy way of presenting the data. A program would
need to separately evolve distinct sets of instructions for popping and for initial
processing of the the subsequent input values, one set per input value. Yet, it
might be hypothesised, that the subsequent values would require a similar
processing, and thus a much simpler program might perform the same task, if the
input values were presented in a compact form, apt for a parallel processing by
a single instruction. A similar reasoning might be applied to vectors of output
data, and obviously, to the possible internal vectors used by the program for
storing intermediate results. Taking into account, that the computational
complexity of evolving a program may strongly increase with that program’s length,
the advantage could be clear. And here is the second purpose of the presented
extension. While the elements are processed similarly to PDFs, they need not be
ones – the genetic program is just expected to use the elements in the way most
suitable for the task to realise. The formalism of combining data using operators
like addition or division of PDFs might simply be a good mechanism to reuse:
– if a PDF has a zero or low variance, the combining works similarly to the
corresponding operations on scalars, so the mechanism provides basic
arithmetic operations;
– the operands are convoluted, and convolution is one of the more popular
operations in the processing of mathematical data;
– last but not least, the similarity of the data elements to PDFs provides an
approximate way of processing probabilistic data.</p>
      <p>The paper is constructed as follows. Section 2 describes the handling of data
in FGPd. Sections 3 and 4 show examples, in which, respectively, the training
data contains approximations of PDFs, and the training data are images of a
parameterised space. Finally, there is a concluding discussion in Sec. 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Approximated PDFs</title>
      <p>Let a data element be a set of N pairs (vi, pi), i = 1 . . . N , vi &lt; vi+1, pi ∈ h0, 1i,
where vi defines a value, and pi is the probability of occurrence of a certain
region hri, ri+1), ri &lt; vi, ri+1 &gt; vi, around vi. Let the region be further called a
layer. Let, because the number of layers N is typically a small value, the data
element be called a sandwich. A sandwich, even that it is an approximation of
a continuous PDF, contains only a set of discrete points. It is for computational
performance, but also for a convenient representation of subsequent elements of
(a) (b)
Fig. 1. An example sandwich and a continuous PDF representation, approximated by
the sandwich (a); (b) creating a sandwich out of a cloud of points.
input or output vectors. Larger values of N provide a better approximation, but
also a greater computational complexity.
2.1</p>
      <sec id="sec-2-1">
        <title>A corresponding continuous PDF</title>
        <p>The values ri define a corresponding continuous PDF, approximated by a
sandwich, as illustrated in Fig. 1(a). They are computed as follows:
 vi − (ri+1 − vi)

ri =  vi−1+vi
2
for i = 0
for i ≥ 1 ∧ i ≤ N .</p>
        <p> vi−1 + (vi−1 − ri−1) for i = N + 1
Thus, a border of ‘influence’ of the probability value is halfway between two
neighbouring values vi, and the two extreme regions are symmetric, relatively
to the respective values. The reason for choosing such a corresponding PDF was
a computational speed and the lack of any concrete statistical model between
the sandwich – as will be seen in the example in Sec. 4, there is no general
enforcement of any such model on the input/output data to the extend of pi not
representing the probability at all.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Unary operators</title>
        <p>An instruction W = f (U ), representing an unary operator, involves only a single
sandwich, and thus its definition is trivial:
viW = f (viU ),
piW = piU ,
i = 1, . . . N,
where U consists of viU , piU and W consists of viW , piW . As we see, the definition
is a generalisation of an unary operation on a scalar – U , and in effect V , would
contain only a single non–zero value piU in such a special case.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Binary operators</title>
        <p>As pi represent the probability of occurrence of an event of a value vi, an
instruction representing a binary operator on two sandwiches requires the computation
of a joint probability. Thus, up to N 2 events of a non–zero probability may
represent the outcome of the operation. Combining PDFs is a complex problem
in general – two stochastic variables might be dependent on each other in
various ways, and involved methods of tracking the dependence might be necessary
[7]. Assuming, that two PDFs are always independent in a computer language,
which is to be used by a human, may lead to counter–intuitive results. Yet, in
our case, it is an evolutionary method that designs the program, and thus we
can maintain the merely loose connection of sandwiches to PDFs – it is never
explicitly stated in the method, that the sandwiches are PDFs anyway. Let us
thus assume, that events in separate sandwiches are independent – no need of
tracking the possible dependencies saves time, and the temporal performance is
very often of an essential importance in evolutionary computing [4].</p>
        <p>Let (viL, piL) and (viR, piR), represent, respectively, the left and the right
operand, and let an arbitrary binary operator, which combines two scalar values
into a third scalar value, be symbolised by ⊙. Then:
wi,j = viL ⊙ vjR,
Following the set of instructions defined in [10], in the further presented tests
the following binary operators are employed: +, −, ∗, /.</p>
        <p>There are too many resulting pairs (wi,j , pi,j) to fit into the sandwich
representing the result – to avoid explosion of complexity, data loss might be necessary.
The problem of creating a sandwich out of M points, M &gt; N , occurs not only
when a result of a binary operation needs to be parsed – as the example in Sec. 3
will illustrate, fitting of a cloud of points into a sandwich may also occur when
preparing data for a FGPd. The next section proposes a fitting method to be
used in both cases.
To create a sandwich based on a set of M points, M being an arbitrary value, we
could use one of the many clustering methods [2, 3]. Yet in genetic programming,
a fast rough tool might be better than a precise yet slow one – the latter may
slow down the optimisation process, and in effect decrease, and not increase,
the quality of the best candidate. Using a rough, fast tool might have a special
sense in our case, as the sandwiches, as discussed, are not strictly semantically
defined, so it is hard to estimate, what would be meant by a precise tool.</p>
        <p>
          Let the cloud consists of points viC , piC , i = 1 . . . M , where viC is a scalar
value and piC is a probability of its occurrence. The value piC is introduced
mainly to deal with (wi,j , pi,j ) defined in (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ), but it might also represent e.g. an
importance of some element in a training set. Let the sandwich to create out of
the cloud consists of vkQ, pkQ , k = 1 . . . N . Let the cloud be split in regard to
viC into N subsets Sk,
tj = min viC +
i
        </p>
        <p>j
N + 1
max viC − min viC ,</p>
        <p>i i
j = 1 . . . N + 1,
viC , piC</p>
        <p>∈ Sk ⇐⇒ h viC = tj ∧ j = 1 ∨ viC &gt; tj i ∧ viC ≤ tj+1.</p>
        <p>
          The mean value of points within Sk, weighted by probabilities, becomes vkQ, and
the probabilities pkQ maintain the same ratios to each other, as the k sums of
probabilities within Sk do:
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
X viC piC
Sk
X piC
        </p>
        <p>Sk
vkQ =
, pkQ =</p>
        <p>X piC
XSk piC .</p>
        <p>i
wi,j = viL − vjR ,</p>
        <p>pi,j = piLpjR,
i = 1, . . . N, j = 1, . . . N.</p>
        <p>An example fitting is illustrated in Fig. 1(b).
2.5</p>
      </sec>
      <sec id="sec-2-4">
        <title>Comparing sandwiches</title>
        <p>
          Comparing two sandwiches, just like the comparison of two PDFs, can have
many meanings. Consider applying of an absolute difference operator to (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ):
Intuitively, it is a stochastic equivalent of d = |t − y|, where d, t, y ∈ R. The
outcome d, in turn, can be a base for computing the mean square error (MSE)
of fitting, if t is a desired value in a supervised training, and y is an actual
value, computed by a candidate. Yet, conversely, the mean value m = |T − Y |
might be a bad base for MSE, if T and Y are PDFs. It is because even if an
ideal outcome T is exactly the same as the outcome Y produced by a candidate,
m 6= 0 if Var(T ) = Var(Y ) 6= 0. Obviously, the stochastic nature of the desired
outcome would creep into MSE. Thus, even that (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) might represent a helpful
instruction in a program, we we will use another method here for comparing two
sandwiches, such that assesses their goodness of fitting. A typical approach is the
Kolmogorov–Smirnov statistic [8], but we will rather use the Cram´er–von Mises
criterion [1, 9], as the former conveys only the most different part of the empirical
distribution functions of T and Y , hiding any improvements or regressions in the
other parts.
        </p>
        <p>v</p>
        <p>T</p>
        <p>Y</p>
        <p>Ap</p>
        <p>Ac
p
p
p
p</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Extrapolation of a stochastic function</title>
      <p>
        Let there be a stochastic function
y =
sin(2x)
2
+
e3 sin(x)u(
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        )
50
where u(
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ) is a random variable, having a uniform distribution spanning
values h0, 1i. A cloud of points, consisting of 10000 observations of (
        <xref ref-type="bibr" rid="ref7">7</xref>
        ), is shown in
Fig. 3(a). Let the training sandwiches be computed, as defined in Fig. 3(b). A
supervised learning, using the evolutionary method described in [10], has been
performed, with MSEmax = 0.005, and the maximum time allowed tmax = 10000 s.
A mean x value of observations fitted to a sandwich Q was inputted to a
candidate, and the response of the candidate has been compared to Q. The better
candidate in Fig. 4 was one of the best programs obtained. We see a poor to
moderate fitting quality. The asymmetric nature of the noise applied to the scaled
sine function is not visible in the candidate’s response. There is an extrapolation
of a moderate quality at x ∈ h2, 3.5i, yet it might be a coincidence, as the same
can not be said about the extrapolation at x &lt; −2.
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
0.5
0.0
−0.5
(a) (b)
Fig. 3. A cloud generated by the extrapolated stochastic equation (a), training
observations marked black, and (b) the corresponding fitted sandwiches for x ∈ h0, 2i, 60
observations having subsequent x per sandwich, N = 5; circle area denotes probability
within a sandwich. The common horizontal position of a set of circles defines a single
sandwich, and is equal to the mean x value within the respective observations.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Navigation in a parameterised space</title>
      <p>Let us now try a very different task, to assess the universality of the method.
There is a turtle, that starts somewhere in a two–dimensional space,
paramed
4
3</p>
      <p>d d
Fig. 5. Turtle’s senses. In the FGPd mode, a value f inside a circle denote a pair
(vf , pf ) in one of the two input sandwiches.
terised by a function v = s(x, y), belonging to the family
2</p>
      <p>2
1
3
4</p>
      <p>α
x ∈ h−1, 1i , y ∈ h−1, 1i ,
xr = x + A sin x + B cos y,
yr = y + C cos x + D sin y,
r = pxr2 + yr2,
v = 10 cos r
10+r
The turtle has a position tx ∈ h−1, 1i , ty ∈ h−1, 1i, and an orientation α ∈
h0, 2π). The turtle has two eyes. One is looking 45◦ to the left, relatively to α, and
acquiring F parameters of the surrounding space ef−1, f = 1 . . . F . Analogously,
the other eye looks to the right, and collects ef1 parameters. If d = 0.05 is the step
length along α, which the turtle performs in a single iteration, then the points
efD,x, efD,y , D = −1, 1, at which the parameters are collected, are as follows:
f − 1 d
g = ,</p>
      <p>F − 1 5
efD,x = tx + g(D sin α + cos α),
efD,y = ty + g(−D cos α + sin α).</p>
      <p>The way, in which a turtle can see its surroundings for F = 4, is illustrated in
Fig. 5.</p>
      <p>
        A turtle’s goal is to find the maximum global value of the parameter (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ),
using only its eyes. According to (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ), the maximum value is located at a point
H, at which r = 0.
      </p>
      <p>
        The training has been performed as follows. A space has been parameterised
by choosing A = 0.5, B = 0.6, C = 0.4, D = 0.3. A number of fixed starting
locations and orientations has been selected in the space. Both the parameters
and the starting locations are shown in Fig. 6(a). To estimate a candidate’s
quality, a single turtle starts from each of these locations. Each turtle has 100
steps at its disposal, and thus can travel 100d at most. If a turtle either travels
the maximum distance, or walks to a position closer than a single step d to
H, it stops. What the turtle sees is fed to the estimated candidate, and the
mean of the returned value modifies α. The modified α is used in the next
step. So, the candidate’s role is to determine, where a turtle turns after each
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
step, depending on what the turtle saw in its direct surroundings. After all
turtles stop, their mean distance to H becomes the candidate’s error. The same
evolutionary method as in Sec. 3 has been used, with MSEmax = 0.05, being a
misnomer in this case, and the maximum time allowed tmax = 10000s.
      </p>
      <p>The test in Sec. 3 did not allow for a comparison of FGPs and FGPd, because
the former was not able to process stochastic data. Yet, in this test the data is
not stochastic, so let us compare the efficiency of these two versions. Let in
FGPs the sensed parameters, or ‘ground’s height’, are pushed to the stack in the
following order: left eye’s f = 1, . . . 4, then right eye’s f = 1, . . . 4, the numbers
f as shown in the circles in Fig. 5. In FGPd, let F = N , and let us use only
two sandwiches, (vf−1, pf−1) and (vf1, p1 ) for respectively the left and the right
f
eye, to avoid the problematic stacking of many values, discussed in Sec. 1. These
sandwiches won’t be used like PDFs at all – let
vfD =
f − 1
F − 1</p>
      <p>
        ,
pfD = s(efD,x,efD,y) ,
n
(
        <xref ref-type="bibr" rid="ref10">10</xref>
        )
where n is a normalising constant, so that pfD sums to 1. See that pfD now conveys
the ‘ground’s height’, hardly related to a probability.
      </p>
      <p>Let a test be a single run of the evolutionary method, that explores and
exploits candidates, until a good one is found, or the maximum time is reached,
as described in [10]. In the FGPs version, we ran 60 such tests for F = 3, of whose
only 4 produced a candidate meeting the quality criterion given by MSEmax,
and 60 tests for F = 5, which produced only one such a candidate. In the FGPd
version, 120 tests were run for each value of F = 2, 3, 5. Fig. 8 summarises the
results. We see in Fig. 8(a), that the mean number of evaluations of candidates,
required to produce a successful candidate by a test with a given probability,
decreases with F , and thus with N . Yet, it does not mean, that increasing the
precision of sandwiches N brings a successful candidate faster. When compared
to in Fig. 8(b), we see that a single evaluation lasts predictably longer for larger
N , and in effect the optimal value of N is 3 here.</p>
      <p>The number of tests that produced a successful candidates in the FGPd
version is 111, 119 and 108 for F = 2, 3, 5, respectively. Thus, more than 93%
of tests brought candidates which were good enough, vs 7% in the case of the
FGPs version. The mean time in the FGPs version, as seen in Fig. 8(b), was
shortest for F = 3, reaching about 3000 s. For the FGPdversion it obviously was
much longer, as the unsuccessful 93% of tests ran for tmax = 10000 s.</p>
      <p>(b)
Fig. 6. Tracks of turtles, decided by an example candidate, FGPd, N = 3: (a) starting
at training locations, all succeeded, (b) starting at testing positions, with parameters
computed using A = 0.4, B = 0.7, C = −0.6, D = 0.5; solid line: succeeded within
100 steps, dashed line: succeeded within 200 steps, dotted line: unsuccessful within 300
steps.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>An introduction of sandwiches into the fractional programs produced mixed
results in the test of an extrapolation of a stochastic equation. We hypothesise,
0
x</p>
      <p>(a)
0
x
1
1
1
(a) (b)
Fig. 8. Efficiency of FGPd: (a) CDF of the probability of producing a successful
candidate by a test, vs the number of required evaluations of the candidate; (b) mean CPU
time required to produce a successful candidate for different tmax.
that the candidates might be missing some special instructions, that directly
operate on probabilities. We plan to search for such instructions in the future.
Employing the sandwiches in passing of a sequence of training data showed
a large improvement, even that the sandwiches did not necessarily operate as
PDF approximations in that case. It might thus make it worth to introduce
instructions, which treat the sandwiches as general data containers, and not
merely approximations of PDFs.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cramer</surname>
          </string-name>
          , H.:
          <article-title>On the composition of elementary errors</article-title>
          .
          <source>Scandinavian Actuarial Journal</source>
          (
          <year>1928</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Dav´e,
          <string-name>
            <given-names>R.N.</given-names>
            ,
            <surname>Krishnapuram</surname>
          </string-name>
          , R.:
          <article-title>Robust clustering methods: a unified view</article-title>
          .
          <source>Fuzzy Systems, IEEE Transactions on 5(2)</source>
          ,
          <fpage>270</fpage>
          -
          <lpage>293</lpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Filippone</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Camastra</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Masulli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rovetta</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A survey of kernel and spectral methods for clustering</article-title>
          .
          <source>Pattern recognition 41(1)</source>
          ,
          <fpage>176</fpage>
          -
          <lpage>190</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fok</surname>
            ,
            <given-names>K.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
          </string-name>
          , T.T.,
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          :
          <article-title>Evolutionary computing on consumer-level graphics hardware</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          <volume>22</volume>
          (
          <issue>2</issue>
          ),
          <fpage>69</fpage>
          -
          <lpage>78</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>D.E.</given-names>
          </string-name>
          :
          <article-title>Real-coded genetic algorithms, virtual alphabets, and blocking</article-title>
          .
          <source>Complex Systems</source>
          <volume>5</volume>
          ,
          <fpage>139</fpage>
          -
          <lpage>167</lpage>
          (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Iwamura</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A genetic algorithm for chance constrained programming</article-title>
          .
          <source>Journal of Information and Optimization Sciences</source>
          <volume>17</volume>
          (
          <issue>2</issue>
          ),
          <fpage>409</fpage>
          -
          <lpage>422</lpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hyman</surname>
            ,
            <given-names>J.M.:</given-names>
          </string-name>
          <article-title>Computer arithmetic for probability distribution variables</article-title>
          .
          <source>Rel. Eng. &amp; Sys. Safety</source>
          <volume>85</volume>
          (
          <issue>1-3</issue>
          ),
          <fpage>191</fpage>
          -
          <lpage>209</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Marsaglia</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsang</surname>
            ,
            <given-names>W.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Evaluating kolmogorov's distribution</article-title>
          .
          <source>Journal of Statistical Software</source>
          <volume>8</volume>
          (
          <issue>18</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>von</surname>
            <given-names>Mises</given-names>
          </string-name>
          , R.E.: Wahrscheinlichkeit. Statistik und Wahrheit (
          <year>1928</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Rataj</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Fractional genetic programming for a more gradual evolution</article-title>
          . In: CS&amp;P. pp.
          <fpage>371</fpage>
          -
          <lpage>382</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Wright</surname>
            ,
            <given-names>A.H.</given-names>
          </string-name>
          , et al.:
          <article-title>Genetic algorithms for real parameter optimization</article-title>
          .
          <source>In: Foundations of Genetic Algorithms</source>
          . pp.
          <fpage>205</fpage>
          -
          <lpage>218</lpage>
          (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>