<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>The Fourth International Workshop on Computer Modeling and Intelligent Systems, April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/s00354-017-0027-x</article-id>
      <title-group>
        <article-title>Homogeneity hypothesis in discriminant analysis  </article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dmitriy Klyushin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Taras Shevchenko National University of Kyiv, Ukraine</institution>
          ,
          <addr-line>03680, Kyiv, Akademika Glushkova Avenue 4D</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>27</volume>
      <issue>2021</issue>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>   One of the most important properties of a machine learning algorithm is its ability to generalize results of learning on finite training sets. This property is based on the compactness hypothesis stating that objects of the same class in the feature space, as a rule, are located closer to each other than to objects of other classes. The compactness hypothesis has a geometric nature and uses the concept of proximity in the feature space, which is most often expressed in terms of a metric. Meanwhile, this hypothesis does not fully take into account the probabilistic nature of the features. It is quite suitable for data with unimodal distributions that have a compact support, but in a general case it may not hold leading to incorrect generalizations. In the paper, an alternative approach is described in which the homogeneity hypothesis is used instead of the compactness hypothesis. Within the framework of this approach, objects are called homogeneous, if their features follow identical distributions. We propose as measures of homogeneity the Petunin's p-statistics and its versions, which is highly efficient in recognizing both disjoint and significantly overlapping samples that violate the compactness assumption. This approach has a rigorous mathematical foundation and high efficiency in practical applications.</p>
      </abstract>
      <kwd-group>
        <kwd> 1  Discriminant analysis</kwd>
        <kwd>relational analysis</kwd>
        <kwd>featureless pattern recognition</kwd>
        <kwd>compactness hypothesis</kwd>
        <kwd>homogeneity hypothesis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction </title>
      <p>The complexity of machine learning significantly depends on the compactness hypothesis, which
allows generalizations based on finite training samples. Intuitively, the hypothesis states that in the
feature space, similar objects should be closer to each other than to dissimilar ones. This definition
appeals to the geometric concept of proximity and implicitly uses a metric. A typical example of a
method based on such principles is the nearest neighbor method, which recognizes test objects by
their closeness to training objects.</p>
      <p>The compactness hypothesis ignores the probabilistic nature of the random training data. More
precisely, it is only acceptable for classifying random data with unimodal distributions having
compact support. In practice, such a condition is too burdensome. Therefore, it is necessary to
develop a method that would estimate the proximity between random samples on different principles.
To solve the problem, let us introduce the concept of object homogeneity, which means that objects
are drawn from the same population, i.e. their features obey the same distribution. This allows objects
to be classified using criteria to test static hypotheses of homogeneity.</p>
      <p>The aim of the article is to describe a new approach to machine learning based on the homogeneity
hypothesis as an alternative to the compactness hypothesis. Using a measure of homogeneity, not just
a metric, allows generalizing relational discriminant analysis and increasing its efficiency.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Distance‐based machine learning techniques </title>
      <p>Duin, Pekalska et al. [1–4], Mottle, Seredin et al. [5–8] and others proposed the concept of
featureless discriminant analysis. They suggested replacing the feature vector of an object with an
estimate of its proximity to some training set using a metrics. Unfortunately, this approach is poorly
suited to solving problems often encountered in biomedical research. Let us say a researcher studies
the parameters of a set of cells. In this case, it gets samples of real numbers, not an ordered vector. In
such cases, the metric is not applicable and the only useful tool is a homogeneity (similarity) measure.
Among the numerous statistical tests for the homogeneity of samples, only the Kolmogorov-Smirnov
test, the Wilcoxon test and the Klyushin-Petunin test allow us to assess robustly the homogeneity of
samples in the form of the probability of belonging to the same general population [9].</p>
      <p>As it was noted above, in the featureless discriminant analysis, objects are represented not as
vectors in a feature space, but as a measure of proximity to a training set. As a result, the starting
point of featureless analysis is a distance matrix filled with distances or labels characterizing the
similarity between objects in the training set (reference points). As a basic distance, usually Euclidean
and pseudo-Euclidean distances are used coupled with the kernel trick. Obviously, this approach leads
to problems with generalization power and strong dependence from a training set. In addition, it is not
valid for samples containing of independent identically distributed (i.i.d.) random values.</p>
      <p>Recently, this approach was renewed as machine learning techniques such as Minimal Learning
Machine [10] and the Extreme Minimal Learning Machine [11]. The main tool in these methods is the
nonlinear distance regression, which estimates the dissimilarity between observations. Nowadays,
various metrics and learning techniques are used in this field [12–21]. Excellent surveys of these
methods may by found in [22, 23]. These methods have some useful advantages, but they use
Euclidean distance-constructed probability distributions. Thus, they fail in situations when reference
points are not vectors in some vector space but samples of i.i.d. random values.</p>
      <p>The problem we try to solve is to extend the application field of the distance-based machine
learning techniques using not metrics to estimate distances between objects but the homogeneity
(similarity) measures described below, and propose an alternative way of similarity-based
classification. These homogeneity measures do not dependent on underlying distributions of training
samples and have useful properties of generalization. For example, in opposite to standard
counterparts (the Kolmogorov–Smirnov statistics and the Wilcoxon statistics), they successfully work
both with samples following distributions with different means and identical variance and with
identical means and different variances.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Two‐sample homogeneity measure </title>
      <p>Consider training samples x   x1, x2 ,..., xn   G1 and y   y1, y2 ,..., yn   G2 from populations G1
and G2 following absolutely continuous distribution functions F1 and F2 . We reduce the
classification of a test sample z   z1, z2 ,..., zn  to testing of homogeneity z and x from the one side,
and z and y from the other side. There are many nonparametric tests for two samples homogeneity:
Kolmogorov–Smirnov test, Wilcoxon sign rank test etc. (see, for example, [24]). However, as it will
be shown, the most effective tool for testing homogeneity of two samples is the Petuninʼs p-statistics
[25]. This is explained by the fact that the p-statistics has similar high significance and sensitivity
independently in both cases when samples are disjoint or almost overlapped.
3.1.</p>
    </sec>
    <sec id="sec-4">
      <title>Original Klyushin–Petunin test </title>
      <p>The Klyushin–Petunin test [25] is non-parametric one and use only assumption that distribution
functions are absolutely continuous. This test uses the Hill's assumption A(n) [26] stating that for
exchangeable random values x1, x2 ,..., xn G following to an absolutely continuous distribution
function we have:</p>
      <p>P  x  xi , x j  
j  i
definiteness, we use the Wilson confidence interval Iijn   pij1 , pij2  where
j  i
where xi and x j are the i-th and j-th order statistics. The Hillʼs assumption was proved as for i.i.d.
random values [27] and for exchangeable i.d. random values [28]. Finding the relative frequency hij
of the event zm  xi , x j  for the elements of z, we can estimate a proximity between hij and
. This may be made using numerous confidence intervals for binomial proportion. For
pi(j1)  hij n  g 2 2  g hij (1  hij )n  g 2 4
pi(j2)  hij n  g 2 2  g hij (1  hij )n  g 2 4
n  g 2
n  g 2
,
j  i  Iijn   n  n 1 </p>
      <p> ,
n  1 2 </p>
      <p>The significance level of this interval is the function of g. When g = 3 the significance level of Iijn
does not exceed 0.05 [25]. P-statistics, estimating the homogeneity of samples x and z, is defined by
the equation
[31]. When the null hypothesis holds, lim
n n  1
0,1 , and lim
n n  1</p>
      <p>
        As we see, the p-statistics is the estimation of the probability that the samples are homogeneous,
therefore, using (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) we may formulate the following test: the null hypothesis is accepted if h is greater
than 0.95, else it is rejected.
      </p>
      <p>
When the null hypothesis is true, the events  pij 
n  1
scheme [29, 30]. When the alternative hypothesis is true, they generate a modified Bernoulli scheme.
When the null hypothesis can be either true or false, they generate the Matveichuk–Petunin scheme
j  i i
 Iijn </p>
      <p> generate a generalized Bernoulli
j  i
0,1 , then the asymptotic
significance level  of a sequence of confidence intervals Iijn is less than 0.05 [25].
3.2.</p>
    </sec>
    <sec id="sec-5">
      <title>Modified Klyushin–Petunin test </title>
      <p>
        In practice, samples, as a rule, contain rounded numbers and duplicates (ties). Thus, we must
distinguish a hypothetical sample drawn from a hypothetical population G of precise measurements
and an empirical sample drawn from an empirical population G of rounded measurements. Let us
introduce a sample x   x1, x2 , ..., xn  approximating a hypothetical sample x   x1, x2 , ..., xn  and let
the variational series x(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )  x(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )  ...  x(n) and x(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )  x(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )  ...  x(m) be variational series of hypothetical
and empirical samples.
      </p>
      <p>
        If a number x is drawn from G independently from x then
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
p  x  x(k) , x(k1)  
,
where tl  t  x(l)  is the multiplicity of x(l) . If x does not contain ties then  i  0.
      </p>
      <p>
        Suppose, that the hypothetical population G follows a hypothetical absolutely continuous
distribution function F . Then, (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) holds. Consider empirical samples x   x1,..., xn  and
z   z1,..., zn . Using the Wilson confidence interval Iij   pi(j1) , pi(j2)  for the probability (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) of the
event zk  x(i) , x( j)  we find an observed relative frequency. Let us denote N  # Iij 
1  j  i 
compute the empirical p-statistics h  #   Iij . Then, we can formulate the following test: the
      </p>
      <p>N  n 1 
null hypothesis is accepted if the h (the probability that the samples are homogeneous) is greater than
0.95, else the null hypothesis is rejected.
n  n 1
2
and
3.3.</p>
    </sec>
    <sec id="sec-6">
      <title>Exact Klyushin–Petunin test </title>
      <p>As we see, the versions of the Klyushin–Petunin test based on the Wald confidence interval
depend on the parameter g, that varies from 1.96 for the normal distribution to 3 for an general
unimodal distribution. To avoid this uncertainty, we propose to use the exact confidence interval for
the unknown probability p on the basis of the proportion h in the Bernoulli model consisting of n
trials [32]. To do this, consider two functions depending on p 0,1 :</p>
      <p>
          p   h  p
and
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
Denote
8
3
where  
      </p>
      <p>is the parameter of the Vysochansky–Petunin inequality [33]
The graph of   p , p  R1 is the upper half of the ellipse E passing through the points
  p 
1
2n


n</p>
      <p>np 1  p 
p  y  m  y     y  
  p  np 1  p </p>
      <p>, p  R1 .
1
12
1
12
with the center  12 , 0  . The graph of   p is the restriction of the graph of   p on the segment
0,1 stretching or shrinking the graph by  and shifting it by 1 .</p>
      <p>n 2n</p>
      <p>Therefore, the graph of the function   p which does not depend on h is an arc of ellipse 
passing through the points 0, 0 ,  1 ,  1   , 1, 1 , such that the function   p reach the
 2  2  
minimum at the point p  1 and it is symmetrical with respect to this point.</p>
      <p>2
The lower confidence limit p1 is a root of the quadratic equation
1  n2  p2   n2  n1  2h  p  h2  nh  41n2 1 </p>
      <p>
        , then the lower confidence limit p1 is the least root of (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ). If h  0 ,
then p1  0 .
      </p>
      <p>Similarly, the upper confidence limit p2 is a root of the square equation
1  n2  p2   n2  1n  2h  p  h2  nh  41n2 1 
 2 </p>
      <p>  0.
3 </p>
      <p>
        If 1  h  1 , then the upper confidence limit p2 is the largest root of (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ). If 1  h  1 , then
p2  1.
      </p>
      <p>Remark. Since p1  h  p2 , the proportion of successes always is in the confidence interval
 p1, p2  .</p>
      <p>For the generalized Bernoulli model similar reasoning gives the following quadratic equation for
lower confidence limit:</p>
      <p>
        1   m n n 21m 2  p2   m1   mn n 21m 2  2h  p  h2  mh  4m12 1  32   0 (
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
If h  21m  m12   , then the lower confidence limit p1 for the generalized Bernoulli model is
the least root of (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ). If h   , then p1  0 .
      </p>
      <p>Similar, the upper confidence limit p2 for the generalized Bernoulli model is the root of the
equation</p>
      <p>
        1   m n n 21m 2  p2   m1   mn n 21m 2  2h  p  h2  mh  4m12 1  32   0 (
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
If 1  h   , then the upper confidence limit p2 is the largest root of (
        <xref ref-type="bibr" rid="ref9">9</xref>
        ). If 1  h   , then p2  1.
By virtue of the previous results the significance level of the confidence interval does not exceed
4 1
9  2 (in particular, 0.05 for   3 ).
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
      </p>
    </sec>
    <sec id="sec-7">
      <title>4. Experiments and results </title>
      <p>To assess the true positive and true negative rates of the proposed tests, we performed numerical
experiments using samples from the normal distribution N ,  of various degree of overlapping.
We considered 100 samples of 40 random numbers having different averages and the same variance
(location shift) and as well as 100 samples of 40 random numbers having the same average value and
different variance (scale shift). We calculated the average p-statistics and its lower and upper
confidence limits, the average Kolmogorov-Smirnov statistics and its p-value, and the average
Wilcoxon statistics and its p-value. To estimate the true positive rate of the Klyushin–Petunin test we
used the relative frequency of an event when the p-statistic is less than 0.95 for different distributions.
The true positive rate of the Kolmogorov–Smirnov and Wilcoxon sign rank tests is the relative
frequency of an event when the corresponding p-value is less than 0.05, when the distributions are
different. The true negative rate of the Klyushin–Petunin test is the relative frequency of an event
when the upper confidence limit of the p-statistic is greater than 0.95 for identical distributions. The
true negative rate of the Kolmogorov–Smirnov tests and Wilcoxon signed ranks is the relative
frequency of an event when the value of p is less 0.05, when the distributions are identical. Thus, we
tested two statistical hypotheses: location shift and scale. The null location shift hypothesis means that
the mathematical expectations of both distributions are identical. The null scale hypothesis means that
the variances of both distributions are identical. Alternative hypothesis, in contrast, asserts that the
distribution functions are different. The results are presented in Tables 1-11.
 </p>
      <sec id="sec-7-1">
        <title>Table 1 </title>
        <p>P‐statistics for the location shift hypothesis without ties </p>
        <p>
          Distribution  N(
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          )  N(
          <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
          )  N(
          <xref ref-type="bibr" rid="ref1 ref2">2,1</xref>
          )  N(
          <xref ref-type="bibr" rid="ref1 ref3">3,1</xref>
          )  N(
          <xref ref-type="bibr" rid="ref1 ref4">4,1</xref>
          ) 
N(
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          )  1.000  0.752  0.680  0.457  0.389 
N(
          <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
          )  –  1.000  0.846  0.584  0.424 
N(
          <xref ref-type="bibr" rid="ref1 ref2">2,1</xref>
          )  –  –  1.000  0.680  0.442 
N(
          <xref ref-type="bibr" rid="ref1 ref3">3,1</xref>
          )  –  –  –  1.000  0.570 
N(
          <xref ref-type="bibr" rid="ref1 ref4">4,1</xref>
          )  –  –  –  –  1.000 
 
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>Table 2 </title>
        <p>Exact p‐statistics for the location shift hypothesis without ties </p>
        <p>
          Distribution  N(
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          )  N(
          <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
          )  N(
          <xref ref-type="bibr" rid="ref1 ref2">2,1</xref>
          ) 
        </p>
        <p>
          N(
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          )  1.000  0.646  0.459 
N(
          <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
          )  –  1.000  0.990 
N(
          <xref ref-type="bibr" rid="ref1 ref2">2,1</xref>
          )  –  –  1.000 
N(
          <xref ref-type="bibr" rid="ref1 ref3">3,1</xref>
          )  –  –  – 
N(
          <xref ref-type="bibr" rid="ref1 ref4">4,1</xref>
          )  –  –  – 
N(
          <xref ref-type="bibr" rid="ref1 ref3">3,1</xref>
          ) 
0.376 
0.522 
0.859 
1.000 
– 
        </p>
        <p>Note that the p-statistic is monotonically decreasing as the location shift increases. As expected, in
this case the Kolmogorov-Smirnov and Wilcoxon sign rank tests work well. However, when the
distribution functions are largely overlapped the discrepancy between them is not very significant.
Moreover, the Wilcoxon signed-rank test poorly recognizes the inversions between largely overlapped
samples. These statements are justified by the following results (Table 5–8).
 </p>
      </sec>
      <sec id="sec-7-3">
        <title>Table 5 </title>
        <p>P‐statistics for the scale shift hypothesis without ties </p>
        <p>
          Distribution  N(
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          )  N(
          <xref ref-type="bibr" rid="ref2">0,2</xref>
          )  N(
          <xref ref-type="bibr" rid="ref3">0,3</xref>
          )  N(
          <xref ref-type="bibr" rid="ref4">0,4</xref>
          )  N(
          <xref ref-type="bibr" rid="ref5">0,5</xref>
          ) 
N(
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          )  1.000  0.726  0.641  0.581  0.427 
N(
          <xref ref-type="bibr" rid="ref2">0,2</xref>
          )  –  1.000  0.819  0.753  0.620 
N(
          <xref ref-type="bibr" rid="ref3">0,3</xref>
          )  –  –  1.000  0.979  0.976 
N(
          <xref ref-type="bibr" rid="ref4">0,4</xref>
          )  –  –  –  1.000  0.998 
N(
          <xref ref-type="bibr" rid="ref5">0,5</xref>
          )  –  –  –  –  1.000 
        </p>
        <p>
          The Kolmogorov–Smirnov test fails when samples are largely overlapped in more than almost a
half of the cases, and the Wilcoxon signed-rank test has failed at all. The Klyushin–Petunin test fails
in almost a third of cases of very overlapped samples following the distributions N(
          <xref ref-type="bibr" rid="ref3">0,3</xref>
          ), N(
          <xref ref-type="bibr" rid="ref4">0,4</xref>
          ) and
N(
          <xref ref-type="bibr" rid="ref5">0,5</xref>
          ).
        </p>
        <p>To simulate the ties in samples we rounded the samples from previous experiments to two decimal
digits. Due to this, every sample contained four ties. The results are provided in Tables 9–12. The
construction of the Kolmogorov–Smirnov and Wilcoxon signed rank tests do not depend on ties.
Thus, we provide only results for the p-statistics.
 </p>
      </sec>
      <sec id="sec-7-4">
        <title>Table 9 </title>
        <p>P‐statistics for the location shift hypothesis with ties </p>
        <p>
          Distribution  N(
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          )  N(
          <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
          )  N(
          <xref ref-type="bibr" rid="ref1 ref2">2,1</xref>
          )  N(
          <xref ref-type="bibr" rid="ref1 ref3">3,1</xref>
          )  N(
          <xref ref-type="bibr" rid="ref1 ref4">4,1</xref>
          ) 
N(
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          )  1.000  0.672  0.505  0.355  0.309 
N(
          <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
          )  –  1.000  0.831  0.305  0.323 
N(
          <xref ref-type="bibr" rid="ref1 ref2">2,1</xref>
          )  –  –  1.000  0.705  0.424 
N(
          <xref ref-type="bibr" rid="ref1 ref3">3,1</xref>
          )  –  –  –  1.000  0.573 
N(
          <xref ref-type="bibr" rid="ref1 ref4">4,1</xref>
          )  –  –  –  –  1.000 
        </p>
        <p>
          The p-statistics monotonically decreases as the difference between the means increases. The
Klyushin-Petunin test, like the Kolmogorov-Smirnov test, does not distinguish between the
distributions of N(
          <xref ref-type="bibr" rid="ref3">0, 3</xref>
          ), N(
          <xref ref-type="bibr" rid="ref4">0, 4</xref>
          ), and N(
          <xref ref-type="bibr" rid="ref5">0, 5</xref>
          ). At the same time, it turned out to be effective in cases
where the Kolmogorov–Smirnov tests and the Wilcoxon sign rank test do not work. Thus, there is an
advantage of the p-statistics over the Kolmogorov–Smirnov and Wilcoxon sign rank tests.
        </p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>5. Conclusions </title>
      <p>Correct generalization based on finite training sets depends on correctly chosen underlying
hypotheses. Traditional discriminant analysis is based on the compactness hypothesis, which states
that objects of one class in the feature space are located closer to each other than to objects from
another class. This geometric hypothesis does not work when classifying random samples that differ
from feature vectors. For samples, the concept of distance is meaningless. It should be replaced by the
concept of homogeneity, meaning that features of objects have the same distribution function. The
evaluation of the homogeneity of the samples is provided by the Petunin p-statistics and its variants,
which demonstrate high sensitivity and specificity in experiments both when testing the hypothesis of
a shift in the mean and in testing the hypothesis of a shift in the scale. The proposed method has a
rigorous mathematical justification and high efficiency in practical applications.</p>
    </sec>
    <sec id="sec-9">
      <title>6. References </title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.P.W.</given-names>
            <surname>Duin</surname>
          </string-name>
          , D. de Ridder,
          <string-name>
            <given-names>D.N.J.</given-names>
            <surname>Tax</surname>
          </string-name>
          ,
          <article-title>Experiments with a featureless approach to pattern recognition</article-title>
          ,
          <source>Pattern Recognit Lett</source>
          <volume>18</volume>
          (
          <year>1997</year>
          )
          <fpage>1159</fpage>
          -
          <lpage>1166</lpage>
          . doi:
          <volume>10</volume>
          .1016/S0167-
          <volume>8655</volume>
          (
          <issue>97</issue>
          )
          <fpage>00138</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.P.W.</given-names>
            <surname>Duin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Pekalska</surname>
          </string-name>
          , D. de Ridder,
          <article-title>Relational discriminant analysis</article-title>
          ,
          <source>Pattern Recognition Letters</source>
          <volume>20</volume>
          (
          <year>1999</year>
          )
          <fpage>1175</fpage>
          -
          <lpage>1181</lpage>
          . doi:
          <volume>10</volume>
          .1016/S0167-
          <volume>8655</volume>
          (
          <issue>99</issue>
          )
          <fpage>00085</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Pekalska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.P.W.</given-names>
            <surname>Duin</surname>
          </string-name>
          ,
          <article-title>On combining dissimilarity representations</article-title>
          , in: J.
          <string-name>
            <surname>Kittler</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Roli</surname>
          </string-name>
          (Eds.),
          <source>Multiple Classifier Systems, LNCS</source>
          , vol.
          <source>2096</source>
          , Springer-Verlag,
          <year>2001</year>
          , pp.
          <fpage>359</fpage>
          -
          <lpage>368</lpage>
          . doi:
          <volume>10</volume>
          .1007/3-540-48219-9_
          <fpage>36</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Pekalska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.P.W.</given-names>
            <surname>Duin</surname>
          </string-name>
          ,
          <article-title>The Dissimilarity Representation for Pattern Recognition, Foundations</article-title>
          and Applications, World Scientific, Singapore,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mottl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dvoenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Seredin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kulikowski</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Muchnik</surname>
          </string-name>
          ,
          <article-title>Featureless pattern recognition in an imaginary Hilbert space and its application to protein fold classification</article-title>
          .
          <source>Machine Learning and Data Mining in Pattern Recognition, Lecture Notes in Computer Science</source>
          ,
          <volume>2123</volume>
          (
          <year>2001</year>
          )
          <fpage>322</fpage>
          -
          <lpage>336</lpage>
          . doi:
          <volume>10</volume>
          .1007/3-540-44596-X_
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mottl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Seredin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dvoenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kulikowski</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Muchnik</surname>
          </string-name>
          ,
          <article-title>Featureless pattern recognition in an imaginary Hilbert space, in: Object recognition supported by user interaction for service robots, Quebec City</article-title>
          ,
          <string-name>
            <surname>QC</surname>
          </string-name>
          , Canada,
          <year>2002</year>
          , pp.
          <fpage>88</fpage>
          -
          <lpage>91</lpage>
          , vol.
          <volume>2</volume>
          . doi:
          <volume>10</volume>
          .1109/ICPR.
          <year>2002</year>
          .
          <volume>1048244</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O.</given-names>
            <surname>Seredin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mottl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tatarchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Razin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Windridge</surname>
          </string-name>
          ,
          <article-title>Convex support and Relevance Vector Machines for selective multimodal pattern recognition</article-title>
          ,
          <source>in: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012)</source>
          , Tsukuba, Japan,
          <year>2012</year>
          , pp.
          <fpage>1647</fpage>
          -
          <lpage>1650</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mottl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Seredin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Krasotkina</surname>
          </string-name>
          , Compactness Hypothesis,
          <source>Potential Functions, and Rectifying Linear Space in Machine Learning: International Conference Commemorating the 40th Anniversary of Emmanuil Braverman's Decease</source>
          , Boston, MA, USA, April
          <volume>28</volume>
          -
          <issue>30</issue>
          ,
          <year>2017</year>
          ,
          <string-name>
            <given-names>Invited</given-names>
            <surname>Talks</surname>
          </string-name>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -99492-
          <issue>5</issue>
          _
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.I.</given-names>
            <surname>Andrushkiw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.V.</given-names>
            <surname>Boroday</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.A.</given-names>
            <surname>Klyushin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.I. Petunin</surname>
          </string-name>
          ,
          <article-title>Computer-aided cytogenetic method of cancer diagnosis</article-title>
          , New York, Nova Publishers,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kulis</surname>
          </string-name>
          .
          <article-title>Metric learning: A survey</article-title>
          .
          <source>Foundations and Trends in Machine Learning</source>
          ,
          <volume>5</volume>
          (
          <year>2013</year>
          )
          <fpage>287</fpage>
          -
          <lpage>364</lpage>
          . doi:
          <volume>10</volume>
          .1561/2200000019
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>A. H. de Souza Junior</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Corona</surname>
            ,
            <given-names>G. A.</given-names>
          </string-name>
          <string-name>
            <surname>Barreto</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Miche</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lendasse</surname>
          </string-name>
          ,
          <article-title>Minimal Learning Machine: A novel supervised distance-based approach for regression and classification</article-title>
          .
          <source>Neurocomputing</source>
          ,
          <volume>164</volume>
          (
          <year>2015</year>
          )
          <fpage>34</fpage>
          -
          <lpage>44</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.neucom.
          <year>2014</year>
          .
          <volume>11</volume>
          .073.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D. P. P.</given-names>
            <surname>Mesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P. P.</given-names>
            <surname>Gomes</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. H. de Souza Junior</surname>
          </string-name>
          ,
          <article-title>Ensemble of efficient minimal learning machines for classification and regression</article-title>
          ,
          <source>Neural Processing Letters</source>
          ,
          <volume>46</volume>
          (
          <year>2017</year>
          )
          <fpage>751</fpage>
          -
          <lpage>766</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11063-017-9587-5.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Maia</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. L. D. Dias</surname>
            ,
            <given-names>J. P. P.</given-names>
          </string-name>
          <string-name>
            <surname>Gomes</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. R.</surname>
          </string-name>
          <article-title>da Rocha Neto, Optimally selected minimal learning machine</article-title>
          , in: H.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Camacho</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Novais</surname>
            ,
            <given-names>A. J.</given-names>
          </string-name>
          <article-title>Tall ón-</article-title>
          <string-name>
            <surname>Ballesteros</surname>
          </string-name>
          (Eds.),
          <source>Intelligent Data Engineering and Automated Learning - IDEAL</source>
          , Springer International Publishing, Cham,
          <year>2018</year>
          , pp.
          <fpage>670</fpage>
          -
          <lpage>678</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -33617-2.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>